@sophosympatheia

sophosympatheia@alien.top · 10 months ago

There are several popular methods, all supported by the lovely mergekit project at https://github.com/cg123/mergekit.

The ties merge method is the newest and most advanced method. It works well because it implements some logic to minimize how much the models step on each other’s toes when you merge them together. Mergekit also makes it easy to do “frankenmerges” using the passthrough method where you interleave layers from different models in a way that extends the resultant model’s size beyond the normal limits. For example, that’s how goliath-120b was made from two 70b models merged together.

sophosympatheia@alien.top · 10 months ago

I’m excited to share what I’ve been working on that builds on this model. It was creative but struggled with following instructions. I was able to correct for that shortcoming with some additional merges at a low weight that seem to have preserved its creativity. The results had me really impressed last night as I did my testing.

sophosympatheia@alien.top · 10 months ago

I’m one of those weirdos merging 70b models together for fun. I mostly use my own merges now as they’ve become quite good. (Link to my Hugging Face page where I share my merges.) I’m mostly interested in roleplaying and storytelling with local LLMs.

sophosympatheia@alien.top · 10 months ago

EXL2 runs fast and the quantization process implements some fancy logic behind the scenes to do something similar to k_m quants for GGUF models. Instead of quantizing every slice of the model to the same bits per weight (bpw), it determines which slices are more important and uses a higher bpw for those slices and a lower bpw for the less-important slices where the effects of quantization won’t matter as much. The result is the average bits per weight across all the layers works out to be what you specified, say 4.0 bits per weight, but the performance hit to the model is less severe than its level of quantization would suggest because the important layers are maybe 5.0 bpw or 5.5 bpw, something like that.

In short, EXL2 quants tend to punch above their weight class due to some fancy logic going on behind the scenes.

sophosympatheia@alien.top · 10 months ago

Another great battery of tests and results, Wolfram! Thanks again for giving one of my models a test drive.

I’ve been busy since sophosynthesis-v1. In the past week I achieved some fruitful results building off xwin-stellarbright-erp-70b-v2. What a stud that model has proven to be. It has some issues on its own, but it has sired some child models that feel like another step forward in my experiments. More to come soon!

sophosympatheia@alien.top · 10 months ago

I can’t speak to the quality of sequelbox/DaringFortitude but I can wholeheartedly recommend sequelbox/StellarBright. I have been using StellarBright in some experimental 70b model merges and it’s phenomenal. I imagine 13b merges using DaringFortitude, or finetunes on top of it, would be quite good.

sophosympatheia@alien.top · 10 months ago

This model kicks ass. I strongly recommend trying it for roleplay. The 4-bit 32g act order GPTQ quant is on par with 70b models, so I can only imagine what higher-bit quants can do.

sophosympatheia@alien.top · 10 months ago

For anyone capable of running 70b models (i.e. 48GB VRAM), you might want to check out my latest merge: sophosympatheia/xwin-stellarbright-erp-v2. It is the latest iteration of some experiments I’ve been conducting and I like how it performs with the provided prompt template and sampler settings on the HF page.

Sorry for the lack of more quantizations right now and full weights. I’m getting additional SSD storage next week that should help.

sophosympatheia@alien.top · 10 months ago

This was an insightful comment. The winnowing effect of market conditions should not be underestimated.

I love the Wild West that is the local LLM scene right now, but I wonder how long the party will last. I predict that the groups with the capacity to produce novel, state-of-the-art LLMs will be seduced by profit to keep those models closed, and as those models that could run on consumer hardware become increasingly capable, the safety concerns (legitimate or not) will eventually smother their open nature. We may continue to get weights for toy versions of those new flagship models, but I suspect their creators will reserve the top-shelf stuff for their subscription customers, and they can easily cite safety as a reason for it. I can’t really blame them, either. Why give it away for free when you can become rich off your invention?

Hopefully I’ll be proven wrong. 🤞 We’ll see…

sophosympatheia@alien.top · 10 months ago

Text Gen Web UI + Silly Tavern for me. Works like a charm.

sophosympatheia@alien.top · 10 months ago

Awesome post! Thanks for investing the time into this, u/kindacognizant.

I have been playing around with your suggested Min-P settings and they kick butt. It feels close to mirostat subjectively, certainly no worse, and you made some convincing arguments for the Min-P approach. I like the simplicity of it too. I think I’ll be using Min-P primarily from now on.

sophosympatheia@alien.top · 10 months ago

Another great contribution, Wolfram! I was pleased to see one of my 70b merges in there and it didn’t suck. More good stuff to come soon! I have a xwin-stellarbright merge I still need to upload that is hands down my new favorite for role play. I’m also excited to see what opus can do in the mix.

sophosympatheia@alien.top · 10 months ago

What you highlighted as problems are the reasons why people fork out money for the compute to run 34b and 70b models. You can tweak sampler settings and prompt templates all day long but you can only squeeze so much smarts out of a 7b - 13b parameter model.

The good news is better 7b and 13b parameter models are coming out all the time. The bad news is even with all that, you’re still not going to do better than a capable 70b parameter model if you want it to follow instructions, remember what’s going on, and stay consistent with the story.