https://higgsfield.ai
We have a massive GPU cluster and developed our own infrastructure to manage the cluster and train massive models.
There’s how it works:
- You upload the dataset with preconfigured format into HuggingFaсe [1].
- Choose your LLM (e.g. LLaMa 70B, Mistral 7B)
- Place your submission into the queue
- Wait for it to get trained.
- Then you get your trained model there on HuggingFace.
Essentially, why would we want to do it?
- We already have an experience with training big LLMs.
- We could achieve near-perfect infrastructure performance for training.
- Sometimes GPUs have just nothing to train.
Thus we thought it would be cool if we could utilize our GPU cluster 100%. And give back to Open Source community (already built an e2e distributed training framework [2]).
This is in an early stage, so you can expect some bugs.
Any thoughts, opinions, or ideas are quite welcome!
[1]: https://github.com/higgsfield-ai/higgsfield/blob/main/tutori…
Wow, you guys are the best, could you also add estimated time for my run to start, thinking if i ll get something in meaningful time, but the mere fact things like this exist is great
Giving their gpu for free - this is some iq 200 stuff
Do you allow training of other sorts of models? I want to train a TTS model.
We support only large models (starting from 7B).
By ‘training’, I assume you mean fine-tuning or LoRA?
We only do full fine-tune.
Are you having good luck with adding knowledge to the model? I tried this with llama for a couple weeks when things were just getting going and I just could not find good hyperparameters for fine tuning. I was also doing Lora so…idk.
Same
From our experience, to get a very good results you need
-
High quality dataset. It’s worth to spend more time on data cleaning. It’s way better to have a smaller dataset with high quality points than a huge dataset with garbage.
-
You need to fully finetune it.
-
Don’t leave us hanging, what does the cluster look like? (ignore if you’re not allowed to share, but I’m a gigantic hardware nerd)
In terms of their capacity nothing crazy, Its probably a standard H100 or A100 cluster, 32 or 64 gpus
Why are you hiding who you are, and how many GPUs you have … and if you have legal access to them?
What’s with the tendency for software engineers to name their libraries after fundamental physics? As a physicist this always bothered me. I’ll search for numerical algorithms for doing real physics… and end up with some garbage blockchain app or a Rust crate that does nothing
I’m a bot, bleep, bloop. Someone has linked to this thread from another place on reddit:
- [/r/datascienceproject] Higgsfield.AI – Anyone can train Llama 70B or Mistral for free (r/MachineLearning)
^(If you follow any of the above links, please respect the rules of reddit and don’t vote in the other threads.) ^(Info ^/ [1](/message/compose?to=/r/TotesMessenger))
Contact ↩︎