What I currently see is that many companies try to create a “GPT”, a model which is basically competing with the GPT models of OpenAI or claude. The problem is, in my opinion, that these open source projects with just a few people working on it have very limited resource power. Even if they have 10 A100s with 80 GB of VRAM, you will never come close to the computing power and to the manpower you need in order to actually get such a model. If you go above 13 billion parameters, you already have the problem that over 99% of all humanity cannot use your model.
While, yes, you can run it on Colab, you have then the problem that you have people indebted to Colab, so to speak. If Colab pulls the plug, then it doesn’t work. If it’s hosted by another company and the company pulls the plug, it doesn’t work anymore. So, in my opinion, people should focus on creating models that are focused on something. Basic example, Japanese to English translation. Or maybe a model which is really good with historic facts. Because every single thing is an additional parameter, which makes it harder and harder to actually load the entire model. If this goes on, in my opinion, we will not see any development that is actually really beneficial. And this is not me being a doomer and saying “oh, no, it will never work” but unless new technology is released, which specifically makes it possible to get basically something that is equal to 300 billion parameters or something like that working, in my opinion, it’s useless.
We need to actually do something with that which we use. I think open source projects should focus on something and then actually have 13 billion parameters of something hyper-focused on a very specific part, allowing the model to perform amazing at the subject. Let the big thing be llama 3 from meta, but I think it’s impossible to get something like gpt 3.5 and gpt-4 with open-source methods. One of the best models are currently llama and Mistral… both from companies that are either billions or 100s of millions worth now.
You can certainly try to finetune that which is released, like the new llama models, try to modify them, but I see so many models being released that basically nobody uses. or really have an use.
What do you all think about it? I just think, after testing out so many different models, that these goals that small teams set themselves to, are simply not possible, and should try to create something that is amazing at one thing.
TLDR: I think open source projects should focus on being very good at certain tasks instead of being good at “everything”.
edit: when i say opensource, I mean the small teams that are just a few people and a few a100s. Not the open-source models of mistral and meta.
I wouldn’t even mind, loading in a 23B model takes 8 seconds and then generates at 28t/s using a 3090ti. Lower parameters go even faster.
Thats heaps faster than loading in a 70B model, and it taking 6 minutes to generate a single reply at 1.1t/s for something that might not even be correct because you didn’t prompt it right.
What 23B model are you running?