In my opinion open-source projects should focus an a very narrow thing, instead of focusing on being a "GPT", that focuses on being able to do everything.

GodEmperor23@alien.top · 2 years ago

In my opinion open-source projects should focus an a very narrow thing, instead of focusing on being a "GPT", that focuses on being able to do everything.

Tridente@alien.top · 2 years ago

I think you are 100% correct!

nerdyvaroo@alien.top · 2 years ago

I was just thinking of doing something. Gotta still find the compute power to do what I want to though.

Defektivex@alien.top · 2 years ago

This is the right answer. I’ve been building enterprise LLM solutions the last 9mo, there’s a ton of use cases in healthcare and finance related bi.

Lots and lots of classification and labeling work that requires domain specific context.

I’m finding less work in the avenue of generating ‘content’, and a ton in what is effectively workflow solutions or business process automation.

At this point I’m advising my company to avoid chatbot jobs all together as they seem to be low value.

Drited@alien.top · 2 years ago

So do those enterprises generally build their own small but targeted model? Or is it more fine tuning with an existing llm as a base?

MaxwellsMilkies@alien.top · 2 years ago

Wait until you realize that they can generate amino acid sequences too.

a_beautiful_rhind@alien.top · 2 years ago

The big company gives you a base model and then it’s up to you to do that.

I’ve seen some agent and medical tunes. Smaller models for image and vision or tts, etc. Anyone doing it for a specific business case is probably not posting the model or advertising it.

Are you asking for people to make specialized 1.3b models from scratch? Because I think even that takes a long time on a “few” A100s.

involviert@alien.top · 2 years ago

Say, these finetunes are all merged LoRA stuff, aren’t they? Is nobody doing stuff where you just continue regular training with your own dataset?

Dangerous_Injury_101@alien.top · 2 years ago

Perhaps we should have like hundred different 7B models for different categories like history, arts, science etc. and then above that there’s new layer where there’s generic LLM which parses the question to correct category, and then finally the correct 7B model loads into your VRAM? :D Like if you had the fastest NVME (not sure if DirectStorage would help, probably not?) perhaps the waiting wouldn’t be too terrible unless every of your question is in different category

Conflictx@alien.top · 2 years ago

I wouldn’t even mind, loading in a 23B model takes 8 seconds and then generates at 28t/s using a 3090ti. Lower parameters go even faster.

Thats heaps faster than loading in a 70B model, and it taking 6 minutes to generate a single reply at 1.1t/s for something that might not even be correct because you didn’t prompt it right.

iChrist@alien.top · 2 years ago

What 23B model are you running?

ithkuil@alien.top · 2 years ago

https://github.com/XueFuzhao/OpenMoE

slifeleaf@alien.top · 2 years ago

I think the main problem is GPU resources needed to train a model from scratch. Finetuning requires fraction of time in comparison to training, hence why there are a lot of GPT-like models, and almost no specialised models

blueeyedlion@alien.top · 2 years ago

And here we are again: https://en.wikipedia.org/wiki/Single-responsibility_principle

100% correct tho. Individual devs only have so much time and money.

KeyAdvanced1032@alien.top · 2 years ago

Small specialized LLMs are going to be a thing the same way using frameworks is now.

TestPilot1980@alien.top · 2 years ago

Makes sense. Similar to the python with its modules, there should be one base and we can add ‘modules’ with specific knowledge base

Ansible32@alien.top · 2 years ago

I think most targeted models are going to be too targeted for anyone other than the organization that trained it to use. I can’t really see a 7B translation model being worth anything vs. GPT or Google Translate. (And for real translation, you really want something which can answer questions about connations/puns/whatever, which requires general understanding.)

sickvisionz@alien.top · 2 years ago

I’m all for specialized models but it would be super depressing if it’s the year 2025, 2030, 2050, 3100 and ChatGPT circa 2022 is still crushing open source models.

For a year or two that gulf may not be a big deal but the further we get from 2022, the less useful training and fine-tuning this sucks compared to OpenAI circa 2022 models will be. By 2030 there may be no point at all. Maybe even earlier.

amemingfullife@alien.top · 2 years ago

I disagree. I think narrow models are the previous ML generation. This generation is defined by its generalisability, AGI is, after all, just a machine that can have a solution to any problem that can be solved by computers. This is what we are competing with. So if you want to compete, and I think it’s a good thing for the human race if we do compete, then you need to compete on the same level of abstraction.

If you want narrow AI then we already have all the state of the art tools that can do this, just be prepared to know linear algebra inside-out.

I do agree in the sense that if you want to bring real value you need to be practical, but you need to keep your eyes on the prize in the long run.

Similar-Repair9948@alien.top · 2 years ago

I think the reason this is the case is because of benchmarks, there are no benchmarks that are used to verify most specific tasks or knowledge for AI models. Most model fine tuning companies are trying to show they know how to fine tune models based on current benchmarks to get more funding from Private Equity. Unless, more benchmarks are developed, this will not change.

ithkuil@alien.top · 2 years ago

https://github.com/XueFuzhao/OpenMoE

Mixture of Experts may help with getting more performance from smaller models.

Laurdaya@alien.top · 2 years ago

Maybe the best way is to have multiple models combined by a “router model” like Medusa.