1. for coding
  2. for generating stories, writing email, poems etc.
  3. good overall
  4. etc.
  • Mbando@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    We’ve been fine-tuning models for specific applications like RAG and structured data extraction. Falcon – 7B has been really good for training. It’s both shifting to understand the target domains use of language from the training data, but also picking up instructions really well. Going to try mistral-7B soon for a comparison.

    • ___defn@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      Same here. Switched to Mistral a few weeks ago. The results will blow you away, the difference is remarkable.

  • ntn8888@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Oh god 🤦 But seriously we need a wiki with a leader board with votes😁

  • DontPlanToEnd@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    For generating full and uncensored stories (I provide a starting paragraph), collectivecognition-v1.1-mistral-7b has been by far the most creative and well written in my testing.

    • JohnExile@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      Neural Chat 7b works fine with normal instructions for assistant use, but after trying to give it custom instructions for things like summarization, using code blocks or formatting, it completely broke. The same instructions that worked fine with other models I use. YMMV.

        • Dry-Vermicelli-682@alien.topB
          link
          fedilink
          English
          arrow-up
          1
          ·
          1 year ago

          what hardware are you running it on? cpu/gpu, ram, etc? Trying to figure out what I need. My old gen 1 16 core threadripper with 64GB ram doesnt seem to work very well. Multiple minutes for a simple hello response. No GPU though, but looking to put a 6700XT GPU… not sure if that GPU will help a lot or what.

      • danigoncalves@alien.topB
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 year ago

        I was actually today comparing both (codellama 7B) and man codellama just gave crap, deepseek was vey accurate.

        • Illustrious-Lake2603@alien.topB
          link
          fedilink
          English
          arrow-up
          1
          ·
          1 year ago

          I for one just don’t trust these Chinese models at all. Not saying there’s anything wrong with this but it’s clear it’s aligned with the Chinese agenda when I try to ask it anything about Taiwan. But for coding it works good and you can run it offline

        • Sufficient-Math3178@alien.topB
          link
          fedilink
          English
          arrow-up
          1
          ·
          1 year ago

          AFAIK models used to be just plain code, when you load one, for example, it would do so by calling a method pickled inside the model file. Uploader could set up this method to do practically anything they want, and it doesn’t need to be obviously malicious since code runs just like a normal python script. For example, it could simply load/render a webp image that is designed to use the recent libwebp vulnerability.

          They changed this a while back, so now you need to pass an argument when loading the model to allow this behavior, and this model requires it.