Questions about Horde

Dead_Internet_Theory@alien.top · 10 months ago

If you can run Q8_0 but use Q5_K_M for speed, any reason you don’t just run an exl2 at 8bpw?

Dead_Internet_Theory@alien.top · 10 months ago

GPUs/OS?

Dead_Internet_Theory@alien.top · 10 months ago

That is absolutely impressive, but:

is light quantization that bad? Couldn’t you run 99% of the same model for half the cost? Is running unquantized just a flex/exercise/bragging right?
Would quantized run faster? Slower? The same?
Isn’t Falcon-180B kinda… meh? I mean it’s pretty smart from size alone, but the lack of fine tuning by the community means it’s kind of like running LLaMA-70b by itself.
Would one of those new crazy good Threadrippers beat the GPUs? lol

Dead_Internet_Theory@alien.top · 10 months ago

Unfortunately, they are already in positions of power, so the next best thing is to remove them from positions of power, undermine their efforts, and most importantly spread the word.

An ideal world is one where censorship freaks hold zero power and are universally ostracized.

Dead_Internet_Theory@alien.top · 10 months ago

OP, this post is fantastic.

I wonder, is this a case of the community doing free R&D for OpenAI or they truly have a good reason for using naive sampling?

Also the graph comes from here, a bunch of other graphs there too.

Dead_Internet_Theory@alien.top · 10 months ago

The thing is, as far as I’m aware, “sound generation” is always a separate TTS thing cobbled together, and even “vision” is a separate thing that describes the image for the AI.

This 13b model is probably still state of the art in the vision department for open models, a few crop up now and again but they didn’t surprise me much.
https://llava-vl.github.io/

If you need to recognize audio, check Whisper, or Faster-Whisper, or anything developed from that. If you need to generate voice, check Bark, maybe Silero, RVC, etc.

You probably won’t find it all wrapped into one neat package like ChatGPT+ right now, but I’d love to be proven wrong.

Dead_Internet_Theory@alien.top · 10 months ago

That is awesome. What kind of platform do you use for that 3 GPUs setup?

Dead_Internet_Theory@alien.top · 10 months ago

Yeah, EXL2 is awesome. It’s kinda black magic how GPUs that were released way before ChatGPT was a twinkle in anyone’s eyes can run something that can trade blows with it. I still don’t get how fractional bpw is even possible. What the hell, 2.55 bits man 😂 how does it even run after that to any degree? It’s magic, that’s what it is.

Dead_Internet_Theory@alien.top · 10 months ago

If the question is trivial, I trust GPT got it right. If the question is semi-complex, I ask, then confirm with a web search. For some reason Google used to be a lot smarter in the past, too. These days it’s more of a link fetcher.

Dead_Internet_Theory@alien.top · 10 months ago

GPT-4 is programmed not to be racist nor sexist, as that is what white men do.

Dead_Internet_Theory@alien.top · 10 months ago

They seem to have 70b and Goliath (the 120b monstrosity) on there. Currently I only see one 70b on https://lite.koboldai.net/'s list, but the other day I saw a couple Goliaths. I have no idea why would anyone host the 120b other than maybe “crowd-sourcing” a dataset (probably against TOS or something, but why would anyone do it?).

Dead_Internet_Theory@alien.top · 10 months ago

Questions about Horde

Dead_Internet_Theory@alien.top · 11 months ago

Actually GBNF is this re-branding, BNF is the proper name (the G is Georgi Gerganov’s). There’s also a reason why languages compile to assembly but that doesn’t mean it’s user-friendly. Or Abstract Syntax Trees. There’s stuff that pretty much only applies to compilers, doesn’t mean it’s a good general-purpose solution.

Though I must imagine implementing BNF is orders of magnitude easier than implementing the monster that is extended regular expressions.

Dead_Internet_Theory@alien.top · 11 months ago

Yeah I didn’t even thought this was possible, but it makes for a much safer way to do function calling! Like, imagine the pain of protecting against all the myriad exploits vs just using this. It’s fantastic.

And yeah I can only use it in llama.cpp for some reason too, but I got the impression _HF should have it.

Dead_Internet_Theory@alien.top · 11 months ago

The regex editor is something else and is about find/replace after the AI has generated stuff. GBNF is about restricting the AI to only generate specific stuff. Like imagine you want a yes/no answer and the AI is physically unable to answer anything but that. While Regex is more about “replace all instances of X with Y in the response”.

Dead_Internet_Theory@alien.top · 11 months ago

Questions about Horde

Questions about Horde

PSA: GBNF exists. Use it.

PSA: GBNF exists. Use it.