In no particular order! Don’t forget to use each of their specific prompts for the best generations!
AWQ, and GGUF also available.
https://huggingface.co/NurtureAI/zephyr-7b-beta-16k
https://huggingface.co/NurtureAI/neural-chat-7b-v3-16k
https://huggingface.co/NurtureAI/neural-chat-7b-v3-1-16k
https://huggingface.co/NurtureAI/SynthIA-7B-v2.0-16k
Have fun LocalLLaMA fam <3 ! Let us know what you find! <3
First, thank you for sharing. However, I was a bit puzzled by these finetunes since many finetunes based on Mistral can simply support longer context out of the box by using NTK scaling, see here. Alas, I couldn’t find any information about what NurtureAI did to extend the context in their model cards.
I’ve tested the NurtureAI synthia-7b-v2-16k-q8_0.gguf, using koboldcpp v1.49 using the native rope configuration of the model (which has a rope base freq of 1000000), in an existing conversation of 14971 tokens, asking it to generate a standup comedy about the preceding conversation and it produced incoherent babbling. Using the original model synthia-7b-v2.0.Q8_0.gguf (which has a rope base freq of 10000) with --ropeconfig 1.0 45000 gives me a coherent standup comedy that makes sense.
How well this NTK scaling on Mistral-based finetunes works depends on the finetune, for some it works better than for others. For example, when I ask the original zephyr-7b-beta.Q8_0.gguf finetune, in an existing conversation of 25872 tokens, to produce a rhyming poem about the preceding conversation, the resulting poem actually mostly rhymes. Other original finetunes, like synthia-7b-v2.0.Q8_0.gguf, seem still coherent at this context size but are not able to produce rhyming poems anymore.
Anyway, based on my experiments, these extended context models by NurtureAI do not work for me and just using NTK scaling on original Mistral-based finetunes does.
I also released chupacabra 7b awq version to get extra crispy.
I’m not sure who told who that Mistral models are only 8k or 4k. The sliding window is not the context size, it is the embedding positions that is the context size which is 32k.
I’m not sure who told who that Mistral models are only 8k
The official Mistral product information.
Our very first foundational model: 7B parameters, fast-deployed and easily customisable. Small, yet powerful for a variety of use cases. Supports English and code, and a 8k context length. link
Does Mistral themselves actually mention 32k anywhere?
It has 32k, they mention it in their config “max_position_embeddings”: 32768. This is the sequence length.
But “true” 16K-32K models like MistralLite seem to perform much better at long context than the default Mistral config.
There is nothing “true” context length about MistralLite. You are essentially removing the sliding window by doing what Amazon or Yarn is doing.
is this a scam or what? none of the models above are from NurtureAI:
- zephyr-beta is trained by HuggingFace and is 32K by default
- neural-chat is from Intel
- synthia is from migtissera
Original links:
https://huggingface.co/HuggingFaceH4/zephyr-7b-beta
NurtureAI extended the context size to 16k
So assuming this release does anything at all the only thing I can think of would be that instead of “hidden size” cause being 4k giving a 4k sliding window into 32k context it would be a hidden size of 16k giving a 16k window into the 32k context.
However that’s just speculation on my part because… Otherwise the release means nothing… Which would be weird.
That’s not what hidden size does.