Eric Hartford, the author of dolphin models, released dolphin-2.2-yi-34b.
This is one of the earliest community finetunes of the yi-34B.
yi-34B was developed by a Chinese company and they claim sota performance that are on par with gpt-3.5
HF: https://huggingface.co/ehartford/dolphin-2_2-yi-34b
Announcement: https://x.com/erhartford/status/1723940171991663088?s=20
I am getting nice results in webui using exllama 2 loader and llama 2 prompt. Problem is that webui gives me 21 t/s while when using chat.py from exllama directly I get 28.5 t/s. The difference is too big to make me use webui. I tried matching sampler settings, bos, system prompt and repetition penalty but it still has issues there - it either mixes up the prompt, for example outputting <>, prints out a whole-ass comment section to a story, outputs 30 links to YT out of nowhere and generally still acts a bit like a base model. I can’t really blame exllama v2, because my lora works more predictably. I also can’t blame spicyboros, because it works great in webui. It looks the same with raw, llama and chatml prompt formats. It’s not a big deal since it’s still usable, but it bugs me a bit.