Hey everyone. I’m a graduate student currently studying machine learning. I’ve had a decent amount of exposure to the field; I’ve already seen many students publish and many students graduate. This is just to say that I have some experience so I hope I won’t be discounted when I say with my whole chest: I hate machine learning conferences.
Everybody puts the conferences on a pedestal The most popular machine learning conferences are a massive lottery, and everyone knows this and complains about this, right? But for most students, your standing in this field is built off this random system. Professors acknowledge the randomness but (many) still hold up the students who get publications. Internships and jobs depend on your publication count. Who remembers that job posting from NVIDIA that asked for a minimum of 8 publications at top conferences?
Yet the reviewing system is completely broken Reviewers have no incentive to give coherent reviews. If they post an incoherent review, reviewers still have no incentive to respond to a rebuttal of that review. Reviewers have no incentive to update their score. Reviewers often have incentive to give negative reviews, since many reviewers are submitting papers in the same area they are reviewing. Reviewrs have incentive to collude, because this can actually help their own papers.
The same goes for ACs: they have no incentive to do anything beyond simply thresholding scores.
I have had decent reviewers, both positive and negative, but (in my experience) they are the minority. Over and over again I see a paper that is more or less as good as many papers before it, but whether it squeaks in, or gets an oral, or gets rejected, all seem to depend on luck. I have seen bad papers get in with faked data or other real faults because the reviewers were positive and inattentive. I have seen good papers get rejected for poor or even straight up incorrect reasons that bad, negative reviewers put forth and ACs follow blindly.
Can we keep talking about it? We have all seen these complaints many times. I’m sure to the vast majority of users in this sub, nothing I said here is new. But I keep seeing the same things happen year after year, and complaints are always scattered across online spaces and soon forgotten. Can we keep complaining and talking about potential solutions? For example:
- Should reviewers have public statistics tied to their (anonymous) reviewer identity?
- Should reviewers have their identities be made public after reviewing?
- Should institutions reward reviewer awards more? After all, being able to review a project well should be a useful skill.
- Should institutions focus less on a small handful of top conferences?
A quick qualification This is not to discount people who have done well in this system. Certainly it is possible that good work met good reviewers and was rewarded accordingly. This is a great thing when it happens. My complaint is that whether this happens or not, seems completely random. I’m getting repetitive, but we’ve all seen good work meet bad reviewers and bad work meet good reviewers…
All my gratitude for people who have been successful with machine learning conferences but are still willing to entertain the notion that the system is broken. Unfortunately, some people take complaints like this as if they were attacks on their own success. This NeurIPS cycle, I remember reading an area chair complain unceasingly about reviewer complaints. Reviews are almost always fair, rebuttals are practically useless, authors are always whining…they are reasonably active on academic Twitter so there wasn’t too much pushback. I searched their Twitter history and found plenty of author-side complaints about reviewers being dishonest or lazy…go figure.
The root of the problem is that there are just too many papers and too few best-qualified reviewers. In many other CS fields, papers are reviewed by program committee members who are typically tenure-track faculty. These faculty all know each other, and their personal reputations are on the line when they review. They also know the history of the field and have a broad understanding of what is novel and interesting in a space.
In ML, we have PhD students, and sometimes even undergrads, doing reviewing. They have much less experience than faculty. They also for the most part have no intention of remaining in the academic community, so they have little incentive to build reviewing skills or to build a representation as a reviewer. No wonder the reviews are bad and random.
So the problem isn’t really solvably by changing the way the reviewing process works because it’s not really a process problem.
Maybe we should make a dataset of top faculty reviewers and train a model on that dataset. Then that model can review papers. Unless there’s papers using the same model, in which case you need another model, and this model only reviews papers of the first model. The first model can review papers about the second model. Both models improve akin to stable GAN training. Then someone writes up this overall modeling and we enter a deeper layer of recursion.