100B, 220B, and 600B models on huggingface!

Illustrious_Sand6784@alien.top · 2 years ago

100B, 220B, and 600B models on huggingface!

wind_dude@alien.top · 2 years ago

so it sounds like for the 600b they just finetuned llama2 again with the same stuff Llama2 was trained with, just more of it…

RefinedWeb

Opensource code from GitHub

Common Crawl we fine-tuned the model on a huge dataset (generated manually and with automation) for logical understanding and reasoning. We also trained the model for function calling capabilities.

planetofthemapes15@alien.top · 2 years ago

This is fun, I should publish a 1T model called “AGI-QSTAR-1T” and say it’s as good as GPT-5 but no you may not see it.

“Oh and BTW if you want to hire me, I’m willing to accept $1M/yr jobs.”

yiyecek@alien.top · 2 years ago

Huggingface should add a dislike button

opi098514@alien.top · 2 years ago

It’s the best out there…. But no you can’t try it because it’s to dangerous.

VertexMachine@alien.top · 2 years ago

I doubt there is any model really… follow the trail, you’ll end up at a company founded by single person from India (who is founder of another company with a single app for collaborative drawing)… that at least doesn’t have any employees on LinkedIn…

And the founder looks like a relatively young person that most likely wouldn’t be even able to gather the required funding to have enough GPU compute for making model that’s better than gpt4 (or know how). I think that’s just a front for him trying to get some hype or funding.

opi098514@alien.top · 2 years ago

Uuummmm no. It’s for sure real. And the best one out there. No questions asked. It’s better that CHATGPT 4 and OpenAI has been trying to hack this new company to get the 600b model because they are scared that it will end OpenAI for good.

Obligatory /s

aurumvexillum@alien.top · 2 years ago

You forgot to mention that your uncle is the CEO of OpenAI! 😉

opi098514@alien.top · 2 years ago

Well that’s because he’s not. Sam is actually my dad.

LetsGoBrandon4256@alien.top · 2 years ago

https://in.linkedin.com/company/deepnight

View 1 employee

Work experience: Google Startup Alumni

lmao

opi098514@alien.top · 2 years ago

Everything on that page is hype for something that doesn’t exist.

ananthasharma@alien.top · 2 years ago

A cursory look at the website makes me think these guys don’t know what they are doing

SomeOddCodeGuy@alien.top · 2 years ago

Right. This part right here is very suspicious to me, and I’m taking their claims with a grain of salt.

No! The model is not going to be available publically. APOLOGIES. The model like this can be misused very easily. The model is only going to be provided to already selected organisations.

bot-333@alien.top · 2 years ago

I think they changed it to it’s still an experiment and they are finishing evaluations to better understand the model.

Illustrious_Sand6784@alien.top · 2 years ago

No they haven’t, on the 220B model it’s always been that message above, while on the 600B model it’s a message similar to the one you stated.

bot-333@alien.top · 2 years ago

I guess they might open source the 600B one? They have different names, so maybe different training approaches.

a_beautiful_rhind@alien.top · 2 years ago

Somebody pilfer this thing and quant it. We can run the 100B for sure. At least at Q3.

You_Wen_AzzHu@alien.top · 2 years ago

We need some 4090s with 500gb VRAM modified in China if possible.

LocoMod@alien.top · 2 years ago

We need some hero to develop an app that downloads more GPU memory like those apps back in the 90’s. /s

iCantHack@alien.top · 2 years ago

I wonder if there’s any real demand for even 48GB 4090s enough to incentives somebody to do it. I bet the hardware/electronics part of it is trivial, tho.

BangkokPadang@alien.top · 2 years ago

If people started doing this with any regularity, nVidia would intentionally bork the drivers.

mpasila@alien.top · 2 years ago

the devs mentioned that the 600B model takes about 1,3TB space alone…

MannowLawn@alien.top · 2 years ago

Give it 5 years with the Mac Studio. Next year 256gb, will go up real quick.

BangkokPadang@alien.top · 2 years ago

Honestly, a 4bit quantized version of the 220B model should run on a 192GB M2 Studio, assuming these models could even work with a current transformer/loader.

9wR8xO@alien.top · 2 years ago

Make it 0.01bpm quantized and you will fit in good ol’ 3090.

OVAWARE@alien.top · 2 years ago

Its private so there is absolutely 0 way to confirm its quality

ninjasaid13@alien.top · 2 years ago

Not just private but closed access.

FaustBargain@alien.top · 2 years ago

how much ram do you think the 600B would take? I have 512gb and I can fit another 512gb in my box before I run out of slots. I think with 1TB I should be able to run it unquantized because falcon 180b used slightly less than half my ram.

theyreplayingyou@alien.top · 2 years ago

Can you please share a bit more about your setup and experiences?

I’ve been looking to use some of my idle enterprise gear for LLM’s but everyone tells me not to bother. I’ve got a few dual xeon boxes with quad channel DDR4 in 256 & 384GB capacities, NVMe or RAID10 SSDs, 10GBe, etc and I guess (having not yet experienced it) I have a hard time imagining that the equivalent of 120Ghz, 1/2 - 1tb of RAM and 7GB/s disk reads “not being fast enough.” I don’t need instant responses from a sex chatbot, rather I would like to run a model that can help my wife (in the medical field) with work queries, to help my school age kid with math and grammar questions, etc.

Thank you much!

FaustBargain@alien.top · 2 years ago

if you have the ram don’t worry about disk at all. if you have to drop to any kind of disk even if it’s gen 5 ssd you speeds will tank. memory bandwidth matters so much more than compute for LLMs, but it all depends on your needs. there are probably cheaper ways to go about this if you just need something occasionally. maybe runpod or something, but if you need a lot of inference then locally could save you money, but renting a big machine with a100s will always be faster. so will a 7B model do what you need or do you need the accuracy and comprehension of a 70b or one of the new 120b merges? also llama3 is supposed to be out in jan/feb and if it’s significantly better then everything changes again.

FaustBargain@alien.top · 2 years ago

wait the 100B one says it’s based on llama2-chat? did they take the llama 2 foundational model, up the parameter count, and just continue training?

FaustBargain@alien.top · 2 years ago

“organisations”…

BalorNG@alien.top · 2 years ago

“Prompt Template: Alpeca” Wut?

Looks like a scam to be fair. I bet if you apply, you’ll get “Just send us 100$ for access!”

LetsGoBrandon4256@alien.top · 2 years ago

Those Microsoft tech support scam calls are reaching new level.

noeda@alien.top · 2 years ago

Some quotes I found on the pages:

“No! The model is not going to be available publically. APOLOGIES. The model like this can be misused very easily. The model is only going to be provided to already selected organisations.”

“[SOMETHING SPECIAL]: AIN’T DISCLOSING!🧟”

“Hallucinations: Reduced Hallucinations 8x compared to ChatGPT 🥳”

My guess: it’s just another merge like Goliath. At best it’s marginally better than a good 70B.

I can also “successfully build 220B model” easily with mergekit. Would it be good? Probably not.

The lab should write on their model card why should I not think it’s just bullshit. Not exactly the first mystery lab making big claims.

VertexMachine@alien.top · 2 years ago

I doubt there’s any model there.

PookaMacPhellimen@alien.top · 2 years ago

Wonder if GPT4 is just a series of merges

swagonflyyyy@alien.top · 2 years ago

Inb4 The Bloke Quantizes it to about 100B size.

BayesMind@alien.top · 2 years ago

We need a different flair for New Models vs New Merge/Finetune

UnignorableAnomaly@alien.top · 2 years ago

Deepnight were the guys that uploaded upstage’s instruct v2, claimed it was their own, then deleted with an oopsie whoopsie.
I am skeptical.