The root of the problem is that there are just too many papers and too few best-qualified reviewers. In many other CS fields, papers are reviewed by program committee members who are typically tenure-track faculty. These faculty all know each other, and their personal reputations are on the line when they review. They also know the history of the field and have a broad understanding of what is novel and interesting in a space.
In ML, we have PhD students, and sometimes even undergrads, doing reviewing. They have much less experience than faculty. They also for the most part have no intention of remaining in the academic community, so they have little incentive to build reviewing skills or to build a representation as a reviewer. No wonder the reviews are bad and random.
So the problem isn’t really solvably by changing the way the reviewing process works because it’s not really a process problem.
Have you considered emailing the authors of these papers and asking them for their datasets?