Maybe anecdotal but I have very high hopes for Yi 34b finetunes.

Herr_Drosselmeyer@alien.top · 11 months ago

Use 10 second clips of clean audio, no music, no background noise. I like to record samples from audiobooks. Free samples on Amazon recorded with audacity work well for me.

One thing to note, my install (an implementation for SillyTavern) somehow got corrupted, no idea how. It still worked but sounded way worse. Reinstall fixed that so maybe that’s happening to you too.

Herr_Drosselmeyer@alien.top · 11 months ago

With a 3090 and sufficient system RAM, you can run 70b models but they’ll be slow. About 1.5 tokens/second. Plus quite a bit of time for prompt ingestion. It’s doable but not fun.

Herr_Drosselmeyer@alien.top · 11 months ago

Because a frightening amount of people still think Twitter matters.

Herr_Drosselmeyer@alien.top · 11 months ago

The models don’t have memory per se, they just process the entirety of the context (i.e. the conversation) with each generation. As this becomes larger and more complex, models with less parameters struggle.

You can try to add certain instructions into the system prompt, such as “advance the story” but ultimately, more parameters means better grasp of the conversation. I haven’t come across any model below an 8 bit 13b model that could keep a story together, so that’s the minimum I go for when I want to RP.

As for the 70b’s writing being less interesting, I’d say that’s independent of the model capabilities and more down to style. Again, giving it instructions on how to write as well as example messages can help but it does somewhat come down to what it was trained on.

Herr_Drosselmeyer@alien.top · 11 months ago

It’s a rule of thumb that yes, higher parameter at low quant beats lower parameter at high quant (or no quant) but take it with a grain of salt as you may still prefer a lower parameter model that’s more tuned for the task you prefer.

Herr_Drosselmeyer@alien.top · 11 months ago

The model, called Q* – and pronounced as “Q-Star” – was able to solve basic maths problems it had not seen before, according to the tech news site the Information, which added that the pace of development behind the system had alarmed some safety researchers.

Sound like a load of bullocks to me. How would anybody working in AI be “alarmed” by a model solving basic maths problems?

Herr_Drosselmeyer@alien.top · 11 months ago

Try just exllama2, no HF.

Herr_Drosselmeyer@alien.top · 1 year ago

https://huggingface.co/zgce/Yi-34B-Chat-Spicyboros-limarpv3-4bpw-hb6-exl2

This one?

Herr_Drosselmeyer@alien.top · 1 year ago

I know but it’s slowing down quite a bit at 32k already so I don’t think it’s worth pushing it further. But hey, even at just 16k it’s four times what we usually get, so I’m not complaining.

Herr_Drosselmeyer@alien.top · 1 year ago

With this particular model, I can crank it up to 32k if I enable " Use 8-bit cache to save VRAM" and that’s as high as it can go in Oobabooga WebUI.

Herr_Drosselmeyer@alien.top · 1 year ago

The base Yi can handle 200k. The version I used can do 48k (though I only tested 16k so far). Larger context size requires more VRAM.

The size that TheBloke like gives for GGUF is the minimum size at 0 context. As context increases, VRAM use increases.

Herr_Drosselmeyer@alien.top · 1 year ago

Maybe anecdotal but I have very high hopes for Yi 34b finetunes.

Herr_Drosselmeyer@alien.top · 1 year ago

I was hoping for a shakeup and all we got was an expensive game of musical chairs? Meh.

Herr_Drosselmeyer@alien.top · 1 year ago

It should work with those specs. Not sure what “connection” it means. Perhaps post a screenshot of the console?

Herr_Drosselmeyer@alien.top · 1 year ago

Now would be a good time for a disgruntled employee to leak some models and make OpenAI actually open. ;)

Herr_Drosselmeyer@alien.top · 1 year ago

What are you looking for?

With a 3090, you can run any 13b model in 8 bit, group size 128, act order true, at decent speed.

Go-tos for the more spicy stuff would be Mythomax and Tie fighter.

Herr_Drosselmeyer@alien.top · 1 year ago

Under full load and if thermals allow it, that machine can draw up to 120 from the wall. Likely the tool isn’t reading the SOC power draw correctly.

Herr_Drosselmeyer@alien.top · 1 year ago

My poor liver!

Herr_Drosselmeyer@alien.top · 1 year ago

Hadn’t thought of that. I have 24gb so I’ve always used GPTQ and with that, you really need more than 16gb.

Herr_Drosselmeyer@alien.top · 1 year ago

with 16gb you could run q8

Not really though. Any kind of context will push you over 16gb. Or I’m doing something wrong.

Herr_Drosselmeyer@alien.top · 1 year ago

Obviously. There aren’t many people in the world with 50k burning a hole in their pockets and of those, even fewer are nerdy enough to want to set up their own AI server in their basement just for themselves to tinker with.