Is there any way to speed up the MythoMax-L2-13B on a 6GB GPU?

OverallBit9@alien.top · 2 years ago

Is there any way to speed up the MythoMax-L2-13B on a 6GB GPU?

YearZero@alien.top · 2 years ago

I think 13b Q8 is just cutting it really close with your 6GB vram and 16GB ram. You’d be much better of using the Q6 quant, and definitely anything below that would be ok.

Look at the model card, TheBloke lists RAM requirements for each quant (without context). Since this model uses 4096 tokens for context, you would add another 1-2 gigs to the requirements.

You might have some luck if you allocate the right amount in the parameters (as right now you’re allocating 0 to the GPU), but definitely play with lower quants, you wouldn’t even notice the quality loss until you get into maybe Q3.

OverallBit9@alien.top · 2 years ago

Testing Q5 seems like the best at least for this GPU I use, but only on mythomax I’m not sure if other models would be the same.