Hey everyone. I’m a graduate student currently studying machine learning. I’ve had a decent amount of exposure to the field; I’ve already seen many students publish and many students graduate. This is just to say that I have some experience so I hope I won’t be discounted when I say with my whole chest: I hate machine learning conferences.

Everybody puts the conferences on a pedestal The most popular machine learning conferences are a massive lottery, and everyone knows this and complains about this, right? But for most students, your standing in this field is built off this random system. Professors acknowledge the randomness but (many) still hold up the students who get publications. Internships and jobs depend on your publication count. Who remembers that job posting from NVIDIA that asked for a minimum of 8 publications at top conferences?

Yet the reviewing system is completely broken Reviewers have no incentive to give coherent reviews. If they post an incoherent review, reviewers still have no incentive to respond to a rebuttal of that review. Reviewers have no incentive to update their score. Reviewers often have incentive to give negative reviews, since many reviewers are submitting papers in the same area they are reviewing. Reviewrs have incentive to collude, because this can actually help their own papers.

The same goes for ACs: they have no incentive to do anything beyond simply thresholding scores.

I have had decent reviewers, both positive and negative, but (in my experience) they are the minority. Over and over again I see a paper that is more or less as good as many papers before it, but whether it squeaks in, or gets an oral, or gets rejected, all seem to depend on luck. I have seen bad papers get in with faked data or other real faults because the reviewers were positive and inattentive. I have seen good papers get rejected for poor or even straight up incorrect reasons that bad, negative reviewers put forth and ACs follow blindly.

Can we keep talking about it? We have all seen these complaints many times. I’m sure to the vast majority of users in this sub, nothing I said here is new. But I keep seeing the same things happen year after year, and complaints are always scattered across online spaces and soon forgotten. Can we keep complaining and talking about potential solutions? For example:

  • Should reviewers have public statistics tied to their (anonymous) reviewer identity?
  • Should reviewers have their identities be made public after reviewing?
  • Should institutions reward reviewer awards more? After all, being able to review a project well should be a useful skill.
  • Should institutions focus less on a small handful of top conferences?

A quick qualification This is not to discount people who have done well in this system. Certainly it is possible that good work met good reviewers and was rewarded accordingly. This is a great thing when it happens. My complaint is that whether this happens or not, seems completely random. I’m getting repetitive, but we’ve all seen good work meet bad reviewers and bad work meet good reviewers…

All my gratitude for people who have been successful with machine learning conferences but are still willing to entertain the notion that the system is broken. Unfortunately, some people take complaints like this as if they were attacks on their own success. This NeurIPS cycle, I remember reading an area chair complain unceasingly about reviewer complaints. Reviews are almost always fair, rebuttals are practically useless, authors are always whining…they are reasonably active on academic Twitter so there wasn’t too much pushback. I searched their Twitter history and found plenty of author-side complaints about reviewers being dishonest or lazy…go figure.

  • dinkboz@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    The amount of papers that post academia job postings demand from phd students to publish just incentives bad papers and falsifying data.

  • lexected@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    The system is quite broken, one could say that in its present state, it almost discourages genuine novelty of thought.

    But, it’s imperfect, first and foremost, because the people involved are imperfect. Reviewing is often a job assigned to the lowest performers in research groups, or traded by the highest performers (constantly on-big tech internships, building startups/open source models on the side) with their colleagues that have a somewhat more laid-back attitude to research excellence. You can submit a bad review and it will not come back to bite you, but in the age of reproducibility, a messed-up experiment or a poorly written/plainly incorrect paper that slips through the review system could be your end.

    The idea is that you enter the publishing game at the beginning of your PhD and emerge seeing through and being above the game once you’ve graduated. After all, you first have to master the rules of the game to be able to propose meaningful changes. It is just that once done, you might have a lot more incentives to switch to industry/consultancy and not care about the paper-citation game ever again.

    • we_are_mammals@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      a messed-up experiment or a poorly written/plainly incorrect paper that slips through the review system could be your end

      Is that true? If your paper is totally wrong, publish a retraction, do not include the paper in your “list of publications”, and move on.

    • bregav@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      After all, you first have to master the rules of the game to be able to propose meaningful changes.

      I think this logic is part of what perpetuates the dysfunction that OP is complaining about. There’s a selection bias that occurs in which the people who do the best job of mastering the game are also the people who were least unhappy about it to begin with, and who thus would never have been very interested in changing it. And, moreover, after having invested a lot of time into mastering the game they now have a vested interest in continuing it.

      I don’t have a good or easy solution to that problem, but I wanted to point out that suggesting to buy into the game isn’t really great advice for someone who sees systemic flaws in it and wants to change it.

    • MLConfThrowaway@alien.topOPB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      Definitely agree with everything you say. It’s unfortunate…I know the reviewers and people who want to move on after academia are not at fault, although they are often made to carry the extra burden of making the system more fair.

      • Should institutions reward reviewer awards more? After all, being able to review a project well should be a useful skill.

      What do you think of this suggestion, by the way? I think if industry (and everyone really) recognizes reviewing as a good skill, it might slowly give good reviews more incentive.

  • bregav@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    I think there are a lot of good reasons to be unhappy about the standards of publishing in CS and ML, but I wanted to highlight this:

    Who remembers that job posting from NVIDIA that asked for a minimum of 8 publications at top conferences?

    In addition to the problems that exist in academia more broadly, I think many issues in ML can be attributed to the amount of money involved in the industry. How often are fields of study awash in staggering amounts of concentrated wealth without being beset by social dysfunction and perverse incentives?

    I sympathize with the young people who want to do ML research just because they’re very intellectually curious. They have the misfortune of having to coexist and compete with a larger cohort of people who also have dollar signs in their eyes.

    And if you too have dollar signs in your eyes, well this is pretty much what you can expect in any environment that attracts many such people: success is determined to a significant degree by luck and personal connections.

    • MLConfThrowaway@alien.topOPB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      On the one hand, everything you said is very true. On the other hand, I think the CS community is unusually open about changing up their publishing system and trying things like double blind at a scale I don’t usually see in other fields. Maybe with enough momentum, some of the current faults in how we publish can change (despite the misaligned incentives?)

      • bregav@alien.topB
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 year ago

        Oh for sure, there are a lot of people who feel the way you do and who are open to trying new things to mitigate these problems.

        I think it’s important to be clear about the core problem, though, because otherwise you might be tempted to do a lot of work on solutions that are ultimately mostly cosmetic. Like, why is reviewing such a problem to begin with? It’s ultimately because, for authors, there’s a lot of incentive to prioritize publishing volume rather than publishing quality, because that’s what gets you a job at NVIDIA.

        Thus the publishing incentives are fundamentally set up such that you need a large amount of labor to do reviewing, because there’s just such a large number of submissions. Double blind reviewing etc. can help to adjust the incentives a bit in favor of fairness but it ultimately does nothing to stem the firehose of frivolous garbage research that people try to get published in the first place.

        So a real solution would do at least one of two things:

        1. increase the number/efficiency of reviewers, or
        2. reduce the number of submissions

        This problem exists throughout academia, but I think it’s especially acute in CS and ML because of the weirdly constrained channels for publishing research. For example I think that using conferences as the primary method of communicating results has unnecessarily hamstrung the entire field of research. Other fields of study primarily use journals, which is inherently less expensive and more scalable.

        • MLConfThrowaway@alien.topOPB
          link
          fedilink
          English
          arrow-up
          1
          ·
          1 year ago

          I appreciate your laying things out clearly. I think breaking up the field into smaller, more journal-like venues sounds like a step in the right direction, and I’m sure some of that thought went into creating TMLR. I do wonder though if that becomes too popular, whether the same problems would reappear…people/companies would uphold a select number of venues, and everyone will end up submitting to those venues, etc…

    • AuspiciousApple@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      Even if you’re doing it out of passion, you’re still being crushed by the insane competition and publish or perish on steroids.

    • vicks9880@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      Talk about publishing, everyone and their mom creating blogs and posting them everywhere posoble… And these people just read quick start page of any new library and flood the internet with mediocre content. I’m tired of looking through hundreds of such article to find one whenever I want to do something which is just one step more than hello world.

  • deepneuralnetwork@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    For the most part I ignore conferences and pretty much most of academia when it comes to progress in AI.

    Academia and conferences are a joke. Real progress in AI is going to come from industry, and it won’t be published until years after impact, if at all.

  • bbbbbaaaaaxxxxx@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    On a related note: where is a good place to share a long ML paper (we got told we’re too dense for ICLR, which I agree with) that doesn’t have a 2-year review process (looking at you JMLR)? Subject is tabular synthetic data evaluation.

  • vector0x17@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    This completely broken review process is probably the single largest frustration I have with the field. Fundamentally I think the only solution would be to somehow incentivize high quality reviews and potentially punish bad reviewers. Making the identities of the reviewers public afterwards would be one way but I think it creates other problems (such as breeding animosity). My controversial proposal would be to somehow tie your own submissions to the quality of your reviews. Maybe something along the lines of:

    • Force the authors of every submitted paper to jointly review something like 3-4 other papers.
    • Have meta reviewers who read a given paper and the reviews, scoring the reviews themselves, not the manuscripts. This could be done for some random subset of reviews / manuscripts, not necessarily all.
    • Incentivize good reviews, potentially giving a certification of “good reviewer” for accepted papers, displayed publicly, similar to the TMLR certification.
    • Punish bad reviewers. Either outright reject their submissions based on their review quality (even if they would get accepted otherwise), or for a less extreme option mark them with a “bad reviewer” certification for their accepted papers as a public badge of shame.

    What do people think? Could something along these lines work or is it completely unreasonable?

    • MLConfThrowaway@alien.topOPB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      I agree with many of these points.

      Making the identities of the reviewers public afterwards would be one way but I think it creates other problems (such as breeding animosity).

      Along this line, what if we gave reviewers public statistics on openreview, while keeping everything else anonymous? We would see if a reviewer tends to reject papers or accept papers way more than average. If we add “good reviewer” or “bad reviewer” badges as you suggested, that would follow their reviewer history too. That could be a way of forcing accountability while preserving privacy (I think?)

  • mr_birkenblatt@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    the review process is completely pointless with regards to reproducibility. the reviewers basically have to go off of what somebody wrote in the paper. other than maybe finding some systematic error from the writeup there is not really much a reviewer can actually detect and criticize (if the model works what else is there to say?). most published papers would be better off as just github projects with a proper descriptive readme that also shows benchmarks anyway. it’s not like papers are written very well to begin with. but that doesn’t get you a phd.

    in physics there is basically no (or minimal) review process and publications are judged by how much your paper got cited. also, there is a full secondary track of researchers who just take other papers and recreate the experiments to actually confirm reproducibility. in ML right now there is no incentive for anyone to just run a published model on different/their own data and confirm that it works correctly. in fact you’d probably be crucified for doing that

  • atdlss@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    I’m a PhD candidate at a UK university but I have 6+ years of prior work experience in the industry as an ML engineer. I have zero intention of staying in the academia and see my PhD as an investment. I’ve reviewed papers for ECCV, CVPR and NeurIPS to help others as I don’t understand how doing blind reviews would further my career in any way. I do my best to read each paper carefully but it’s getting insane lately.

    I volunteer to review 2-3 papers and get assigned 6 papers, and then 2 more urgent last minute reviews on a Sunday. What I see is academia is full of toxic people, there is almost no one to complaint to and it thrives on making PhD students feel worthless. It works because most PhDs don’t have any industry experience and feel like they can’t get a job if they quit. I think the solution is to get rid of reviewing on a voluntary basis and stop conferences mooching off from early stage researchers.

    • MLConfThrowaway@alien.topOPB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      That’s real good of you to try to give thoughtful reviews. I agree, the review process shouldn’t have to depend on PhDs volunteering their time uncomplainingly.

      I think the solution is to get rid of reviewing on a voluntary basis and stop conferences mooching off from early stage researchers.

      Curious to hear more about this. Do you think conferences should have editors instead, like journals? Among other things, I am concerned about how that will scale. Like you mentioned, there are already too many papers and not enough reviewers. (Or do you think it will scale better with proper incentive, like payment?)

      • atdlss@alien.topB
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 year ago

        Absolutely, I think they should be run more professionally. Since the review cycles are much shorter full-time editors might not make sense but they can have contractors who are verified to not have conflicts with the papers they are reviewing.

        Also they definitely have the budget for it, organizing top conferences is profitable. NeurIPS had a net $3M profit in 2020 and that’s the last time they announced their budget and profits I believe:
        https://neurips.cc/Conferences/2020/Budget

  • bougnaderedeter@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Reviewers are doing this for free on top of everything else in their lives. Shaming them publicly is just going to lead to fewer people reviewing

  • lifesthateasy@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Reddit posts are problematic.

    I’ve been an active participant in the Machine Learning subreddit for quite some time now, and lately, I’ve noticed a trend that’s been concerning. While the subreddit serves as an incredible hub for knowledge sharing and discussions around ML, there’s a growing issue with the quality and reliability of some posts.

    Numerous submissions lack proper context, thorough explanations, or credible sources, making it challenging for newcomers and even seasoned practitioners to discern accurate information from misinformation. This trend isn’t just about incomplete explanations; it also extends to the validity of claims made in these posts.

    It’s important to acknowledge that not all content falls into this category—there are incredible insights shared regularly. However, the influx of hasty, ill-explained, or unverified information is diluting the overall value the subreddit offers to the community.

    In a field as intricate as machine learning, accuracy and credibility are paramount. Misleading or incomplete information can misguide newcomers and even experts, leading to misconceptions or wasted efforts in pursuit of understanding or implementing certain techniques.

    Thus, after observing this trend over some time, I firmly believe that there is indeed a problematic issue with the quality and reliability of several Reddit posts within the Machine Learning subreddit. It’s a plea to the community to uphold standards of clarity, depth, and substantiation in discussions and submissions to maintain the subreddit’s integrity and credibility.

    • MLConfThrowaway@alien.topOPB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      These claims are very widespread. You can check previous conference posts in this subreddit to find people saying similar things. Every year there is some drama at some conference…plagiarism that slips past reviews (CVPR 2022), controversial decisions on papers (the last few ICML Best Paper awards). Complaints about the reviewing process is the reason venues like TMLR exist.

      My point is that there is, already, years and years of evidence that the reviewing system is broken. How much longer are junior researchers supposed to sit on their feet and act like it isn’t happening?

      • lifesthateasy@alien.topB
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 year ago

        Absolutely, there have been instances of controversy and concerns regarding the reviewing process at various conferences. However, it’s crucial to note that while these incidents do occur, they might not necessarily represent the entire system. Many conferences continuously strive to improve their review processes and address these issues. While acknowledging these problems is essential, it’s also important to engage constructively in efforts to make the system better, perhaps by actively participating in discussions or proposing reforms, rather than solely highlighting the flaws.

        • MLConfThrowaway@alien.topOPB
          link
          fedilink
          English
          arrow-up
          1
          ·
          1 year ago

          Perhaps by actively participating in discussions or proposing reforms

          I proposed some solutions already; please join in the discussion if you want to contribute. I’m guessing from the slightly nonsensical and overly verbose responses that these comments are LLM-generated.

          • lifesthateasy@alien.topB
            link
            fedilink
            English
            arrow-up
            1
            ·
            1 year ago

            The comments might differ in style, but they do address the issue. Engaging in thoughtful discourse can enrich conversations, even if the perspectives expressed aren’t in alignment with one’s own.

          • lifesthateasy@alien.topB
            link
            fedilink
            English
            arrow-up
            1
            ·
            1 year ago

            While dissenting voices may have faced challenges historically, acknowledging their existence doesn’t discount the progress made in recognizing diverse perspectives over time.

  • ohdangggg@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    >Should institutions reward reviewer awards more? After all, being able to review a project well should be a useful skill.

    While rewarding positive behavior is good, IMO there also need to be negative consequences for bad reviewers. Poor reviews are endemic in the field and there are too many people (often famous people in the field, too) who write shoddy, low-quality reviews and are not punished.

    I think one concrete idea is to have ACs (or some other third party) rate reviewer quality, and if someone has low-quality reviews they should be banned from submitting to the conference for a year.

  • linearmodality@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    The root of the problem is that there are just too many papers and too few best-qualified reviewers. In many other CS fields, papers are reviewed by program committee members who are typically tenure-track faculty. These faculty all know each other, and their personal reputations are on the line when they review. They also know the history of the field and have a broad understanding of what is novel and interesting in a space.

    In ML, we have PhD students, and sometimes even undergrads, doing reviewing. They have much less experience than faculty. They also for the most part have no intention of remaining in the academic community, so they have little incentive to build reviewing skills or to build a representation as a reviewer. No wonder the reviews are bad and random.

    So the problem isn’t really solvably by changing the way the reviewing process works because it’s not really a process problem.

    • Neighbor5@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      Maybe we should make a dataset of top faculty reviewers and train a model on that dataset. Then that model can review papers. Unless there’s papers using the same model, in which case you need another model, and this model only reviews papers of the first model. The first model can review papers about the second model. Both models improve akin to stable GAN training. Then someone writes up this overall modeling and we enter a deeper layer of recursion.

  • RandomUserRU123@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Over and over again I see a paper that is more or less as good as many papers before it, but whether it squeaks in, or gets an oral, or gets rejected, all seem to depend on luck. I have seen bad papers get in with faked data or other real faults because the reviewers were positive and inattentive.

    I agree but the problem is also that faked data is incredible hard or even impossible to spot with the current system. You would need to standardize the whole process (code request, exact experiment description, code explanations, creating docker image for reproducability, computational cost, …). Then the reviewers would need to run some of the experiments themselves aswell (alongside additional experiments to make sure you are not cherrypicking results). This would take a tremendous amount of time and resources

  • Terrible_Button_1763@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    The problem from a theoretic perspective is that many of the things you recommend might have unintended and in fact, the opposite effect.

    • Should reviewers have public statistics tied to their (anonymous) reviewer identity?
      • We do have public statistics tied to our actual profile (our name). You will run into the same reviewers again and again within the peer review process. You’ll remember the person a few years ago who was totally unreasonable in the same paper as you. You’ll remember that one reviewer who made a brilliant point that the meta-reviewer overrode and turned out they were right. Yes, you’ll also run into your former advisors in the peer review process. Try reviewing a paper that your former advisor is a coauthor on and raking it over the coals and finding out later that it was your former advisor’s paper. Then some time later your advisor and you are reviewing the same paper side by side, and you have to decide whether you agree or disagree with them.
    • Should reviewers have their identities be made public after reviewing?
      • Not sure. This might be good for senior reviewers who do take their job very seriously. But junior reviewers without much experience mess up all the time. Imagine having your social media post from when you were 14 be public, forever, and un-deleteable. That’s what it’s like to be a junior reviewer and messing up and having it be public. Peer review is much like everything else in academia, an apprenticeship. You learn by doing, and that process requires an element of psychological safety that anonymity can provide.
    • Should institutions reward reviewer awards more? After all, being able to review a project well should be a useful skill.
      • I’d love this. We always need more people willing to review well and dispassionately.
    • Should institutions focus less on a small handful of top conferences?
      • Institutions do, everyone does. Top conference echo chambers are only for those who think the world revolves around ICML/NeurIPS/ICLR. I’ll let you in on a little secret, everyone at COLT laughs at your papers… don’t even get me started at what people in STOC think about your papers.
    • MLConfThrowaway@alien.topOPB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      I appreciate the response! I’m afraid I don’t understand what you mean when you say we have public statistics tied to our profile. Currently reviews are tied to an anonymized name, and I’m thinking we should be able to link the name to a past history of review scores, meta-reviews, etc.

      I’ll let you in on a little secret, everyone at COLT laughs at your papers… don’t even get me started at what people in STOC think about your papers.

      Ive heard this before! I never worked on anything that could be submitted to COLT or STOC, are the review processes different?

      • Terrible_Button_1763@alien.topB
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 year ago

        we have public statistics tied to our profile

        I meant that if you’re reviewing for a conference, then all the reviewers know each other’s name. This adds up over time as you (typically) stay in the same or similar areas through many years of your career. That way, your reviewing personality and thinking is something that everyone informally keeps track of. I agree that ideally this should also be public as some sort of statistical measure for all to see… however this also has the complicated issue of not making the reviewing apprenticeship safe for newer and less experienced reviewers.

        Ive heard this before! I never worked on anything that could be submitted to COLT or STOC, are the review processes different?

        Well, it’s hard for any person to speak on behalf of two entire conferences and two entire sub-communities. What I can say is the quality, impact, and rigor of papers at COLT or STOC is far higher. They’re also incredibly challenging to publish in as well. Even towards the final years of your PhD, you’ll still be mostly supervised and mostly learning relatively lower level details of doing theory work. It simply takes forever to learn to come up with mental abstractions such that you can start to do theory work. Then, the real challenge becomes what theory problems do you want to solve? Which problems are worth solving? What do people in these communities want to see and what would they be surprised by seeing?