What I currently see is that many companies try to create a “GPT”, a model which is basically competing with the GPT models of OpenAI or claude. The problem is, in my opinion, that these open source projects with just a few people working on it have very limited resource power. Even if they have 10 A100s with 80 GB of VRAM, you will never come close to the computing power and to the manpower you need in order to actually get such a model. If you go above 13 billion parameters, you already have the problem that over 99% of all humanity cannot use your model.
While, yes, you can run it on Colab, you have then the problem that you have people indebted to Colab, so to speak. If Colab pulls the plug, then it doesn’t work. If it’s hosted by another company and the company pulls the plug, it doesn’t work anymore. So, in my opinion, people should focus on creating models that are focused on something. Basic example, Japanese to English translation. Or maybe a model which is really good with historic facts. Because every single thing is an additional parameter, which makes it harder and harder to actually load the entire model. If this goes on, in my opinion, we will not see any development that is actually really beneficial. And this is not me being a doomer and saying “oh, no, it will never work” but unless new technology is released, which specifically makes it possible to get basically something that is equal to 300 billion parameters or something like that working, in my opinion, it’s useless.
We need to actually do something with that which we use. I think open source projects should focus on something and then actually have 13 billion parameters of something hyper-focused on a very specific part, allowing the model to perform amazing at the subject. Let the big thing be llama 3 from meta, but I think it’s impossible to get something like gpt 3.5 and gpt-4 with open-source methods. One of the best models are currently llama and Mistral… both from companies that are either billions or 100s of millions worth now.
You can certainly try to finetune that which is released, like the new llama models, try to modify them, but I see so many models being released that basically nobody uses. or really have an use.
What do you all think about it? I just think, after testing out so many different models, that these goals that small teams set themselves to, are simply not possible, and should try to create something that is amazing at one thing.
TLDR: I think open source projects should focus on being very good at certain tasks instead of being good at “everything”.
edit: when i say opensource, I mean the small teams that are just a few people and a few a100s. Not the open-source models of mistral and meta.
I think you are 100% correct!
I was just thinking of doing something. Gotta still find the compute power to do what I want to though.
This is the right answer. I’ve been building enterprise LLM solutions the last 9mo, there’s a ton of use cases in healthcare and finance related bi.
Lots and lots of classification and labeling work that requires domain specific context.
I’m finding less work in the avenue of generating ‘content’, and a ton in what is effectively workflow solutions or business process automation.
At this point I’m advising my company to avoid chatbot jobs all together as they seem to be low value.
So do those enterprises generally build their own small but targeted model? Or is it more fine tuning with an existing llm as a base?
Wait until you realize that they can generate amino acid sequences too.
The big company gives you a base model and then it’s up to you to do that.
I’ve seen some agent and medical tunes. Smaller models for image and vision or tts, etc. Anyone doing it for a specific business case is probably not posting the model or advertising it.
Are you asking for people to make specialized 1.3b models from scratch? Because I think even that takes a long time on a “few” A100s.
Say, these finetunes are all merged LoRA stuff, aren’t they? Is nobody doing stuff where you just continue regular training with your own dataset?
Perhaps we should have like hundred different 7B models for different categories like history, arts, science etc. and then above that there’s new layer where there’s generic LLM which parses the question to correct category, and then finally the correct 7B model loads into your VRAM? :D Like if you had the fastest NVME (not sure if DirectStorage would help, probably not?) perhaps the waiting wouldn’t be too terrible unless every of your question is in different category
I wouldn’t even mind, loading in a 23B model takes 8 seconds and then generates at 28t/s using a 3090ti. Lower parameters go even faster.
Thats heaps faster than loading in a 70B model, and it taking 6 minutes to generate a single reply at 1.1t/s for something that might not even be correct because you didn’t prompt it right.
What 23B model are you running?
I think the main problem is GPU resources needed to train a model from scratch. Finetuning requires fraction of time in comparison to training, hence why there are a lot of GPT-like models, and almost no specialised models
And here we are again: https://en.wikipedia.org/wiki/Single-responsibility_principle
100% correct tho. Individual devs only have so much time and money.
Small specialized LLMs are going to be a thing the same way using frameworks is now.
Makes sense. Similar to the python with its modules, there should be one base and we can add ‘modules’ with specific knowledge base
I think most targeted models are going to be too targeted for anyone other than the organization that trained it to use. I can’t really see a 7B translation model being worth anything vs. GPT or Google Translate. (And for real translation, you really want something which can answer questions about connations/puns/whatever, which requires general understanding.)
I’m all for specialized models but it would be super depressing if it’s the year 2025, 2030, 2050, 3100 and ChatGPT circa 2022 is still crushing open source models.
For a year or two that gulf may not be a big deal but the further we get from 2022, the less useful training and fine-tuning this sucks compared to OpenAI circa 2022 models will be. By 2030 there may be no point at all. Maybe even earlier.
I disagree. I think narrow models are the previous ML generation. This generation is defined by its generalisability, AGI is, after all, just a machine that can have a solution to any problem that can be solved by computers. This is what we are competing with. So if you want to compete, and I think it’s a good thing for the human race if we do compete, then you need to compete on the same level of abstraction.
If you want narrow AI then we already have all the state of the art tools that can do this, just be prepared to know linear algebra inside-out.
I do agree in the sense that if you want to bring real value you need to be practical, but you need to keep your eyes on the prize in the long run.
I think the reason this is the case is because of benchmarks, there are no benchmarks that are used to verify most specific tasks or knowledge for AI models. Most model fine tuning companies are trying to show they know how to fine tune models based on current benchmarks to get more funding from Private Equity. Unless, more benchmarks are developed, this will not change.
https://github.com/XueFuzhao/OpenMoE
Mixture of Experts may help with getting more performance from smaller models.
Maybe the best way is to have multiple models combined by a “router model” like Medusa.