What I currently see is that many companies try to create a “GPT”, a model which is basically competing with the GPT models of OpenAI or claude. The problem is, in my opinion, that these open source projects with just a few people working on it have very limited resource power. Even if they have 10 A100s with 80 GB of VRAM, you will never come close to the computing power and to the manpower you need in order to actually get such a model. If you go above 13 billion parameters, you already have the problem that over 99% of all humanity cannot use your model.

While, yes, you can run it on Colab, you have then the problem that you have people indebted to Colab, so to speak. If Colab pulls the plug, then it doesn’t work. If it’s hosted by another company and the company pulls the plug, it doesn’t work anymore. So, in my opinion, people should focus on creating models that are focused on something. Basic example, Japanese to English translation. Or maybe a model which is really good with historic facts. Because every single thing is an additional parameter, which makes it harder and harder to actually load the entire model. If this goes on, in my opinion, we will not see any development that is actually really beneficial. And this is not me being a doomer and saying “oh, no, it will never work” but unless new technology is released, which specifically makes it possible to get basically something that is equal to 300 billion parameters or something like that working, in my opinion, it’s useless.

We need to actually do something with that which we use. I think open source projects should focus on something and then actually have 13 billion parameters of something hyper-focused on a very specific part, allowing the model to perform amazing at the subject. Let the big thing be llama 3 from meta, but I think it’s impossible to get something like gpt 3.5 and gpt-4 with open-source methods. One of the best models are currently llama and Mistral… both from companies that are either billions or 100s of millions worth now.

You can certainly try to finetune that which is released, like the new llama models, try to modify them, but I see so many models being released that basically nobody uses. or really have an use.

What do you all think about it? I just think, after testing out so many different models, that these goals that small teams set themselves to, are simply not possible, and should try to create something that is amazing at one thing.

TLDR: I think open source projects should focus on being very good at certain tasks instead of being good at “everything”.

edit: when i say opensource, I mean the small teams that are just a few people and a few a100s. Not the open-source models of mistral and meta.

  • nerdyvaroo@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    I was just thinking of doing something. Gotta still find the compute power to do what I want to though.

  • Defektivex@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    This is the right answer. I’ve been building enterprise LLM solutions the last 9mo, there’s a ton of use cases in healthcare and finance related bi.

    Lots and lots of classification and labeling work that requires domain specific context.

    I’m finding less work in the avenue of generating ‘content’, and a ton in what is effectively workflow solutions or business process automation.

    At this point I’m advising my company to avoid chatbot jobs all together as they seem to be low value.

    • Drited@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      So do those enterprises generally build their own small but targeted model? Or is it more fine tuning with an existing llm as a base?

  • a_beautiful_rhind@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    The big company gives you a base model and then it’s up to you to do that.

    I’ve seen some agent and medical tunes. Smaller models for image and vision or tts, etc. Anyone doing it for a specific business case is probably not posting the model or advertising it.

    Are you asking for people to make specialized 1.3b models from scratch? Because I think even that takes a long time on a “few” A100s.

    • involviert@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      Say, these finetunes are all merged LoRA stuff, aren’t they? Is nobody doing stuff where you just continue regular training with your own dataset?

  • Dangerous_Injury_101@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Perhaps we should have like hundred different 7B models for different categories like history, arts, science etc. and then above that there’s new layer where there’s generic LLM which parses the question to correct category, and then finally the correct 7B model loads into your VRAM? :D Like if you had the fastest NVME (not sure if DirectStorage would help, probably not?) perhaps the waiting wouldn’t be too terrible unless every of your question is in different category

  • slifeleaf@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    I think the main problem is GPU resources needed to train a model from scratch. Finetuning requires fraction of time in comparison to training, hence why there are a lot of GPT-like models, and almost no specialised models

  • TestPilot1980@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Makes sense. Similar to the python with its modules, there should be one base and we can add ‘modules’ with specific knowledge base

  • Ansible32@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    I think most targeted models are going to be too targeted for anyone other than the organization that trained it to use. I can’t really see a 7B translation model being worth anything vs. GPT or Google Translate. (And for real translation, you really want something which can answer questions about connations/puns/whatever, which requires general understanding.)

  • sickvisionz@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    I’m all for specialized models but it would be super depressing if it’s the year 2025, 2030, 2050, 3100 and ChatGPT circa 2022 is still crushing open source models.

    For a year or two that gulf may not be a big deal but the further we get from 2022, the less useful training and fine-tuning this sucks compared to OpenAI circa 2022 models will be. By 2030 there may be no point at all. Maybe even earlier.

  • amemingfullife@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    I disagree. I think narrow models are the previous ML generation. This generation is defined by its generalisability, AGI is, after all, just a machine that can have a solution to any problem that can be solved by computers. This is what we are competing with. So if you want to compete, and I think it’s a good thing for the human race if we do compete, then you need to compete on the same level of abstraction.

    If you want narrow AI then we already have all the state of the art tools that can do this, just be prepared to know linear algebra inside-out.

    I do agree in the sense that if you want to bring real value you need to be practical, but you need to keep your eyes on the prize in the long run.

  • Similar-Repair9948@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    I think the reason this is the case is because of benchmarks, there are no benchmarks that are used to verify most specific tasks or knowledge for AI models. Most model fine tuning companies are trying to show they know how to fine tune models based on current benchmarks to get more funding from Private Equity. Unless, more benchmarks are developed, this will not change.

  • Laurdaya@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Maybe the best way is to have multiple models combined by a “router model” like Medusa.