• 0 Posts
  • 7 Comments
Joined 1 year ago
cake
Cake day: October 27th, 2023

help-circle




  • It is absolutely valuable. But the mainstream is more interested in beating the next metric, rather than investigating why such phenomena happens. But being fair there are quite of researchers trying to do that. I’ve read a few papers in such direction.

    But the thing is, in order experiment with it you need 40 GPUs and the people with 40 GPUs available are more worried about other things. That was the whole gist of my rant…


  • You don’t. The process is broken, but nobody cares anymore.

    1. Big names and labs want to maintain the status quo = churning paper out (and fighting on Twitter…erm X, of course).
    2. If you’re a Ph.D. student, you just want to get the hell out of there and hopefully try to ride a bit the wave and make some = trying to along and churn some papers out.
    3. If you’re a researcher in a lab, you don’t really care as long as you try something that works and, eventually you have to prove in the yearly/bi/x review that you actually did some work = churn whatever paper out there.

    Now, if by any chance, any absolutely crazy reason, you’re someone who’s actually curious about understanding the foundations of ML, deeply reason about why “ReLU” behaves like so over “ELU”, or, I don’t know, you question why some models with 90 billion parameters behave almost the same as a model that was compressed by a factor of 2000x and only lose 0.5% of accuracy, in brief, the science behind it all, then you’re absolutely doomed.

    In ML…(DL, since you mention NLP), the name of the game is improving some “metric” with an aesthetically appealing name, but not so strong underlying development (fairness, perplexity). All, of course using 8 GPU’s, 90B parameters and zero replications of your experiment. Ok, let’s be fair, there are some papers indeed that replicate their experiments in a total of…10…times. "The boxplot shows our median is higher, I won’t comment on the variance of of it, we will leave it for future work. "

    So, yes…that’s the current state of affairs right there.