MultiLoRA: Democratizing LoRA for Better Multi-Task Learning

ninjasaid13@alien.top · 1 year ago

FullOf_Bad_Ideas@alien.top · 1 year ago

In our benchmark, training LLaMA-7B with sequences of 1024 tokens with n = 5 would use more VRAM than full parameter fine-tuning

This is a deal breaker.

I am hopeful for LoftQ integration into training frameworks, it has more potential. https://arxiv.org/abs/2310.08659