Brudaks@alien.topBtoMachine Learning@academy.garden•Are LLMs at a practical limit for layer stacking? [D]English
1·
1 year agoThe bottleneck is the total compute budget devoted to training, so while I’m quite certain that stacking a few more layers can be done and would have some benefit, it might well be that spending the same extra compute on a larger context window or ‘wider’ layers or simply doing more iterations on the same data would have a larger benefit than more layers, and if the people training the very large models think so, they would do these other things instead of stacking more layers.
Cross-validation is a reasonable alternative, however, it does increase your compute cost 5-10 times, or, more likely, means that you generate 5-10 times smaller model(s) which are worse than you could have made if you’d just made a single one.