• llama_in_sunglasses@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    I had okayish results blowing up layers from 70b… but messing with the first or last 20% lobotomizes the model, and I didn’t snip more than a couple layers from any one place. By the time I got the model far enough down in size that q2_K could load in 24GB of VRAM it fell apart, so I didn’t consider mergekit all that useful of a distillation/parameter reduction process.