• 4 Posts
  • 52 Comments
Joined 1 year ago
cake
Cake day: October 30th, 2023

help-circle








  • Most of these are (parts of) EOS (end of sequence) tokens. The model is supposed to send an EOS token to signal that inference is done, as without that, it would keep going until the max new tokens limit is hit.

    Unfortunately some models, especially merges with different prompt formats, can get confused and output the wrong token or turn the special token into a regular string. In that case, adding that string (or a part of it) to the custom stopping strings list ensures that inference is properly concluding anyways.

    In addition to that, I put the asterisk followed by username there to catch the model trying to act as the user. Just like how the software by default already includes the username followed by a colon, to catch the model trying to talk as user.




  • My AI Workstation:

    • 2 GPUs (48 GB VRAM): Asus ROG STRIX RTX 3090 O24 Gaming White Edition (24 GB VRAM) + EVGA GeForce RTX 3090 FTW3 ULTRA GAMING (24 GB VRAM)
    • 13th Gen Intel Core i9-13900K (24 Cores, 8 Performance-Cores + 16 Efficient-Cores, 32 Threads, 3.0-5.8 GHz)
    • 128 GB DDR5 RAM (4x 32GB Kingston Fury Beast DDR5-6000 MHz) @ 4800 MHz ☹️
    • ASUS ProArt Z790 Creator WiFi
    • 1650W Thermaltake ToughPower GF3 Gen5
    • Noctua NH-D15 Chromax.Black (supersilent)
    • ATX-Midi Fractal Meshify 2 XL
    • Windows 11 Pro 64-bit

    I’m still at NVIDIA driver 531.79. If you have a newer one, did you set it up to crash instead of swap to system RAM when VRAM is full?


  • Already saw and read your post, saved it, and added Misted-7B to the top of my 7B TODO list. :)

    I’m not sure about what causes the misspellings, probably both low quant and the frankenmerging combined.

    I do see misspellings and grammar mistakes when using the English models in German, even the biggest ones, but it’s worth with smaller models. They understand full well what is said but can’t write it as perfectly as English. And that’s apparently at any quant. Probably because there’s less quality German in the training data compared to English, and the less parameters a model has, the less its (language) understanding and knowledge, so it makes more mistakes.





  • I did a speed benchmark months ago and picked Q4_0 because of that. Nowadays I’d prefer to use Q4_K_M but try to minimize differences between tests for maximum comparability, so I’ve been intentionally stuck on this quant level. (I did make some exceptions for EXL2 because it’s so much faster than GGUF, and I did test Airoboros at Q4_K_M because Q4_0 was broken, but those were exceptions.)

    Now that I’m done with these tests (they go back weeks/months and allow comparisons between different sizes, too, as they were all tested the same way and with as similar a setup as possible), I’m free to change the tests and setup. I’d like to expand into harder questions so it’s not as crowded at the top (I’m still convinced GPT-4 is far ahead of our local models, but the gap seems to be narrowing, and more advanced tests could show that more clearly).