People talk about it around here like this is pretty simple (these days at least). But once I hit about 4200-4400 tokens (with my limit pushed to 8k) all I get is gibberish. This is with the LLaMA2-13B-Tiefighter-AWQ model, which seems highly regarded for roleplay/storytelling (my use case).
I also tried OpenHermes-2.5-Mistral-7B and it was nonsensical from the very start oddly enough.
I’m using Silly Tavern with Oobabooga, sequence length set to 8k in both, and a 3090. I’m pretty new to all of this and it’s been difficult finding up to date information (because things develop so quickly!) The term fine-tuning comes up a lot, and with it comes a whooooole lot of complicated coding talk I know nothing about.
As a layman, is there a way to achieve 8k (or more) context for a roleplay/storytelling model?
For llama2 models set your alpha to 2.65 when loading them at 8k.
The general suggestion is “2.5” but if you plot the formula on a graph, 8192 context aligns with 2.642, so 2.65 is more accurate than 2.5