• 0 Posts
  • 21 Comments
Joined 1 year ago
cake
Cake day: October 30th, 2023

help-circle





  • I think that’s not the whole story. The smaller increments can lead to “course changes” that would not have happened otherwise. Might let things slip into other local minima and all that. It’s not just several small steps instead of one big one. The straight line that is the big step will become a curve, capable of bringing you into an entirely different place. The whole dataset can have its impact before some giant leaps jump into a single direction. As a laymen, maybe I’ve got this wrong, but I really don’t see how you can categorically dismiss the possiblity of creating a much more robust and effective architecture instead of essentially jumping to conclusions and then somewhat fixing it up.















  • I don’t agree with the assumption that there is a pressure for companies like MS to reduce costs via local models. Compute on that gamer’s PC is probably the biggest problem right now. Especially since in a game, pretty much all of the hardware is already used to the limit. And then you throw a 10GB LLM on top, maybe even loading different finetunes for different jobs? Then the TTS model? This does not result in reasonable response times any time soon, at least not with somewhat generalistic models.

    On the other hand, that’s something MS must like a whole lot. What you see as “optimizing costs” is optimizing their profit away. They can sell that compute to you. That’s great for them, not something to be optimized away. And it’s the best DRM ever too.