I started my PhD in NLP a year or so before the advent of Transformers, and finished it just as ChatGPT was unveiled (literally defended a week before). Halfway through, I felt the sudden acceleration of NLP, where there was so much everywhere all at once. Before, knowing one’s domain, and the state-of-the-art GCN, CNN or Bert architectures, was enough.
Since, I’ve been working in a semi-related area (computer assisted humanities) as a data engineer/software developer/ML engineer (it’s a small team so many hats). Not much in terms of latest news, so I tried recently to get up to speed with the recent developments.
But there are so many ! Everywhere. Even just in NLP, not considering all the other fields such as reinforcement learning, computer vision, all the fundamentals of ML etc. It is damn near impossible to gather an in-depth understanding of a model as they are so complex, and numerous. All of them are built on top of other ones, so you also need to read up on those to understand anything. I follow some people on LinkedIn who just give new names every week or so. Going to look for papers in top conferences is also daunting as there is no guarantee that a paper with an award will translate to an actual system, while companies churn out new architectures without the research paper/methodology being made public. It’s overwhelming.
So I guess my question is two fold : how does one get up to speed after a year of not being too much in the field ? And how does one keep up after that ?
You don’t. The process is broken, but nobody cares anymore.
Now, if by any chance, any absolutely crazy reason, you’re someone who’s actually curious about understanding the foundations of ML, deeply reason about why “ReLU” behaves like so over “ELU”, or, I don’t know, you question why some models with 90 billion parameters behave almost the same as a model that was compressed by a factor of 2000x and only lose 0.5% of accuracy, in brief, the science behind it all, then you’re absolutely doomed.
In ML…(DL, since you mention NLP), the name of the game is improving some “metric” with an aesthetically appealing name, but not so strong underlying development (fairness, perplexity). All, of course using 8 GPU’s, 90B parameters and zero replications of your experiment. Ok, let’s be fair, there are some papers indeed that replicate their experiments in a total of…10…times. "The boxplot shows our median is higher, I won’t comment on the variance of of it, we will leave it for future work. "
So, yes…that’s the current state of affairs right there.
Hold on, why is it useless to understand why a model which is 2000x smaller has only a 0.5% reduction in accuracy? Isn’t that insanely valuable?
It is absolutely valuable. But the mainstream is more interested in beating the next metric, rather than investigating why such phenomena happens. But being fair there are quite of researchers trying to do that. I’ve read a few papers in such direction.
But the thing is, in order experiment with it you need 40 GPUs and the people with 40 GPUs available are more worried about other things. That was the whole gist of my rant…
You managed to put into words what bugs me with the field nowadays. What kills me most is that third paragraph you said : no-one cares what the model does IRL but how it improves a metric on a benchmark task and dataset. When the measure becomes the objective, you’re not doing proper science anymore.
The doomed student is me :’(