[D] Is grokking present in LLMs?

PierroZ-PLKG@alien.top · 10 months ago

[D] Is grokking present in LLMs?

sciencesebi3@alien.top · 10 months ago

How… would that happen?

yannbouteiller@alien.top · 10 months ago

Am I correct to say that “grokking” is apparently an effect of regularization, as in reaching good generalization performance from pushing the weights to be as small as possible until the model reaches a capacity that is smaller than the dataset?

Alt-Depixelator-777@alien.top · 10 months ago

“…grass groks being walked on…” llms do not grok, nor do they grok grokking, but mainly, they do not grok “not grokking”