You must log in or register to comment.
In our benchmark, training LLaMA-7B with sequences of 1024 tokens with n = 5 would use more VRAM than full parameter fine-tuning
This is a deal breaker.
I am hopeful for LoftQ integration into training frameworks, it has more potential. https://arxiv.org/abs/2310.08659