Hi all.
I was researching generative model evaluation and found this post interesting: https://deepsense.ai/evaluation-derangement-syndrome-in-gpu-poor-genai
A lot of it kind of corresponds to what I see happening in the industry and feels like a good fit here
Idk man, I’ve seen some pretty sketchy papers this year.
Like what?
I mean there’s always sketchy papers because of p-hacking. But I doubt that there’s papers that don’t have a proper evaluation at all.
i mean the evaluation process itself is an active field of research…
That’s kind of what my original comment was all about.