People talk about it around here like this is pretty simple (these days at least). But once I hit about 4200-4400 tokens (with my limit pushed to 8k) all I get is gibberish. This is with the LLaMA2-13B-Tiefighter-AWQ model, which seems highly regarded for roleplay/storytelling (my use case).

I also tried OpenHermes-2.5-Mistral-7B and it was nonsensical from the very start oddly enough.

I’m using Silly Tavern with Oobabooga, sequence length set to 8k in both, and a 3090. I’m pretty new to all of this and it’s been difficult finding up to date information (because things develop so quickly!) The term fine-tuning comes up a lot, and with it comes a whooooole lot of complicated coding talk I know nothing about.

As a layman, is there a way to achieve 8k (or more) context for a roleplay/storytelling model?

  • BangkokPadang@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    For llama2 models set your alpha to 2.65 when loading them at 8k.

    The general suggestion is “2.5” but if you plot the formula on a graph, 8192 context aligns with 2.642, so 2.65 is more accurate than 2.5