@Brainfeed9000

Brainfeed9000@alien.top · 1 year ago

Will you be re-running tests? I’m particularly interested in the lower quants below 3bpw because it’s the only option to run EXL2 70B models on my RTX4090.

But thanks for the pointer on comparing quant effects across models. I realize that my past testing on perplexity numbers are virtually useless because I was comparing Yi34b to Lzlv70b.

It’ll be tough, but I guess finding exactly what works for me: 3rd person RP with an emphasis on dialogue, just means using each model individually for hours to get a feel for them.

Brainfeed9000@alien.top · 1 year ago

There’s got to be some sort of limit to the rule of thumb? I recall from one of your other tests between different GGUF quants & EXL2 quants that anything below 3BPW suffers greatly.

Which I think I can anecdotally see when comparing a 2.4BPW EXL2 quant of lzlv 70b and a 4BPW EXL2 quant of Yi 34b chat.