First time testing local text model I don’t know much yet.I’ve seen people with 8GB cards complaining that text generation is very slow so I don’t have much hope about that but still… I think I need to do some configuration, when generating text my SSD is at 100% reading 1~2gb/s while my GPU does not reach 15% usage.
Using RTX 2060 6GB, 16GB RAM.
This is the model I am testing ( mythomax-l2-13b.Q8_0.gguf): https://huggingface.co/TheBloke/MythoMax-L2-13B-GGUF/tree/main

  • YearZero@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    I think 13b Q8 is just cutting it really close with your 6GB vram and 16GB ram. You’d be much better of using the Q6 quant, and definitely anything below that would be ok.

    Look at the model card, TheBloke lists RAM requirements for each quant (without context). Since this model uses 4096 tokens for context, you would add another 1-2 gigs to the requirements.

    You might have some luck if you allocate the right amount in the parameters (as right now you’re allocating 0 to the GPU), but definitely play with lower quants, you wouldn’t even notice the quality loss until you get into maybe Q3.

    • OverallBit9@alien.topOPB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      Testing Q5 seems like the best at least for this GPU I use, but only on mythomax I’m not sure if other models would be the same.