• mll59@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    First, thank you for sharing. However, I was a bit puzzled by these finetunes since many finetunes based on Mistral can simply support longer context out of the box by using NTK scaling, see here. Alas, I couldn’t find any information about what NurtureAI did to extend the context in their model cards.

    I’ve tested the NurtureAI synthia-7b-v2-16k-q8_0.gguf, using koboldcpp v1.49 using the native rope configuration of the model (which has a rope base freq of 1000000), in an existing conversation of 14971 tokens, asking it to generate a standup comedy about the preceding conversation and it produced incoherent babbling. Using the original model synthia-7b-v2.0.Q8_0.gguf (which has a rope base freq of 10000) with --ropeconfig 1.0 45000 gives me a coherent standup comedy that makes sense.

    How well this NTK scaling on Mistral-based finetunes works depends on the finetune, for some it works better than for others. For example, when I ask the original zephyr-7b-beta.Q8_0.gguf finetune, in an existing conversation of 25872 tokens, to produce a rhyming poem about the preceding conversation, the resulting poem actually mostly rhymes. Other original finetunes, like synthia-7b-v2.0.Q8_0.gguf, seem still coherent at this context size but are not able to produce rhyming poems anymore.

    Anyway, based on my experiments, these extended context models by NurtureAI do not work for me and just using NTK scaling on original Mistral-based finetunes does.

  • permalip@alien.topB
    link
    fedilink
    English
    arrow-up
    0
    ·
    10 months ago

    I’m not sure who told who that Mistral models are only 8k or 4k. The sliding window is not the context size, it is the embedding positions that is the context size which is 32k.