Like many of you, I often need to train LLMs (Large Language Models). Code hops from one project to another, and it’s easy to lose track, resulting in several iterations of the same training process.
X—LLM is a solution. It’s a streamlined, user-friendly library designed for efficient model training, offering advanced techniques and customizable options within the Hugging Face ecosystem.
Features:
- LoRA, QLoRA and fusing
- Flash Attention 2
- Gradient checkpointing
- bitsandbytes quantization
- GPTQ (including post-training quantization)
- W&B experiment tracking
- Simple training on multiple GPUs at once using DeepSpeed or FSDP
Use cases:
- Create production-ready solutions or fast prototypes. X—LLM works in both configurations
- Finetune a 7B model with 334 million tokens (1.1 million dialogues) for just 50$
- Automatically save each checkpoint during training to the Hugging Face Hub and don’t lose any progress
- Quantize a model using GPTQ. Reduce 7B Mistral model from 15 GB to 4.3 GB and increase inference speed
Github repo: https://github.com/BobaZooba/xllm
You can train 7B model, fuse LoRA and upload ready-to-use model to the Hugging Face Hub. All in a single Colab! Link
The library has gained 100 stars in less than a day, and now it’s almost at 200. People are using it, training models in both Colab and multi-GPU setups. Meanwhile, I’m supporting X—LLM users and currently implementing the most requested feature - DPO.
I suggest that you try training your own models and see for yourself how simple it is.
If you like it, please consider giving the project a star on GitHub.
Any idea what the vram requirements are for locally training a 7b qlora?
I strongly recommend training on a GPU, as it speeds up the training process by an order of magnitude and has become the standard. I can recommend services that offer GPU rentals at the lowest prices.
https://vast.ai
https://www.runpod.io
https://www.tensordock.com
Ah, OK- but what about a setup with dual local 3090s?
What kind of gpu rental would you recommend? An a100 80gb?
I apologize, I’ve confused you. At first, I read RAM and thought that you wanted to train on the CPU.
Of course, 2 x 3090 would be more than enough for training. I believe even a 13B model with a large context length could be trained.
If you have 2 GPUs, I suggest training through the command line and utilizing DeepSpeed or FSDP (which has been tested less).
Here are examples of projects where it’s explained in detail how you can train:
https://github.com/BobaZooba/xllm-demo
https://github.com/BobaZooba/wgpt
On Twitter, one person unknown to me posted about how they easily managed to train on multi-gpu (a super simple and short example):
https://twitter.com/darrenangle/status/1724913070105841806
Awesome thank you.
Last question! Would it be reasonable to train on a single 3090 following that guide as well?
Edit: train a 7b on single
And feel free to ask! I’m just here to help you
It depends on how deeply you want to immerse yourself. The library is intended for both rapid prototyping and production-ready development. I would recommend starting with the former, it’s very simple and will take about 10-15 minutes to get started, not including training time.
Here is a notebook that allows you to train models on a single GPU:
https://colab.research.google.com/drive/1CNNB\_HPhQ8g7piosdehqWlgA30xoLauP
You can download it and train your model locally on your computer.
Thank you so much, this is awesome.