https://huggingface.co/deepnight-research

I’m not affiliated with this group at all, I was just randomly looking for any new big merges and found these.

100B model: https://huggingface.co/deepnight-research/saily_100B

220B model: https://huggingface.co/deepnight-research/Saily_220B

600B model: https://huggingface.co/deepnight-research/ai1

They have some big claims about the capabilities of their models, but the two best ones are unavailable to download. Maybe we can help convince them to release them publicly?

  • wind_dude@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    so it sounds like for the 600b they just finetuned llama2 again with the same stuff Llama2 was trained with, just more of it…

    RefinedWeb

    Opensource code from GitHub

    Common Crawl we fine-tuned the model on a huge dataset (generated manually and with automation) for logical understanding and reasoning. We also trained the model for function calling capabilities.

  • planetofthemapes15@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    This is fun, I should publish a 1T model called “AGI-QSTAR-1T” and say it’s as good as GPT-5 but no you may not see it.

    “Oh and BTW if you want to hire me, I’m willing to accept $1M/yr jobs.”

  • opi098514@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    It’s the best out there…. But no you can’t try it because it’s to dangerous.

    • VertexMachine@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      I doubt there is any model really… follow the trail, you’ll end up at a company founded by single person from India (who is founder of another company with a single app for collaborative drawing)… that at least doesn’t have any employees on LinkedIn…

      And the founder looks like a relatively young person that most likely wouldn’t be even able to gather the required funding to have enough GPU compute for making model that’s better than gpt4 (or know how). I think that’s just a front for him trying to get some hype or funding.

    • SomeOddCodeGuy@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      Right. This part right here is very suspicious to me, and I’m taking their claims with a grain of salt.

      No! The model is not going to be available publically. APOLOGIES. The model like this can be misused very easily. The model is only going to be provided to already selected organisations.

      • bot-333@alien.topB
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 year ago

        I think they changed it to it’s still an experiment and they are finishing evaluations to better understand the model.

    • LocoMod@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      We need some hero to develop an app that downloads more GPU memory like those apps back in the 90’s. /s

    • iCantHack@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      I wonder if there’s any real demand for even 48GB 4090s enough to incentives somebody to do it. I bet the hardware/electronics part of it is trivial, tho.

        • BangkokPadang@alien.topB
          link
          fedilink
          English
          arrow-up
          1
          ·
          1 year ago

          Honestly, a 4bit quantized version of the 220B model should run on a 192GB M2 Studio, assuming these models could even work with a current transformer/loader.

  • FaustBargain@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    how much ram do you think the 600B would take? I have 512gb and I can fit another 512gb in my box before I run out of slots. I think with 1TB I should be able to run it unquantized because falcon 180b used slightly less than half my ram.

    • theyreplayingyou@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      Can you please share a bit more about your setup and experiences?

      I’ve been looking to use some of my idle enterprise gear for LLM’s but everyone tells me not to bother. I’ve got a few dual xeon boxes with quad channel DDR4 in 256 & 384GB capacities, NVMe or RAID10 SSDs, 10GBe, etc and I guess (having not yet experienced it) I have a hard time imagining that the equivalent of 120Ghz, 1/2 - 1tb of RAM and 7GB/s disk reads “not being fast enough.” I don’t need instant responses from a sex chatbot, rather I would like to run a model that can help my wife (in the medical field) with work queries, to help my school age kid with math and grammar questions, etc.

      Thank you much!

      • FaustBargain@alien.topB
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 year ago

        if you have the ram don’t worry about disk at all. if you have to drop to any kind of disk even if it’s gen 5 ssd you speeds will tank. memory bandwidth matters so much more than compute for LLMs, but it all depends on your needs. there are probably cheaper ways to go about this if you just need something occasionally. maybe runpod or something, but if you need a lot of inference then locally could save you money, but renting a big machine with a100s will always be faster. so will a 7B model do what you need or do you need the accuracy and comprehension of a 70b or one of the new 120b merges? also llama3 is supposed to be out in jan/feb and if it’s significantly better then everything changes again.

  • FaustBargain@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    wait the 100B one says it’s based on llama2-chat? did they take the llama 2 foundational model, up the parameter count, and just continue training?

  • BalorNG@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    “Prompt Template: Alpeca” Wut?

    Looks like a scam to be fair. I bet if you apply, you’ll get “Just send us 100$ for access!”

  • noeda@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Some quotes I found on the pages:


    “No! The model is not going to be available publically. APOLOGIES. The model like this can be misused very easily. The model is only going to be provided to already selected organisations.”

    “[SOMETHING SPECIAL]: AIN’T DISCLOSING!🧟”

    “Hallucinations: Reduced Hallucinations 8x compared to ChatGPT 🥳”


    My guess: it’s just another merge like Goliath. At best it’s marginally better than a good 70B.

    I can also “successfully build 220B model” easily with mergekit. Would it be good? Probably not.

    The lab should write on their model card why should I not think it’s just bullshit. Not exactly the first mystery lab making big claims.

  • UnignorableAnomaly@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Deepnight were the guys that uploaded upstage’s instruct v2, claimed it was their own, then deleted with an oopsie whoopsie.
    I am skeptical.