• FullOf_Bad_Ideas@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    In our benchmark, training LLaMA-7B with sequences of 1024 tokens with n = 5 would use more VRAM than full parameter fine-tuning

    This is a deal breaker.

    I am hopeful for LoftQ integration into training frameworks, it has more potential. https://arxiv.org/abs/2310.08659