Ok, I know this may be asked here a lot, but the last time I checked this sub was around the time that LLaMa.Cpp just came out and I assume a lot has changed/Improved I hear models like Mistral can even change the landscape, what is currently best roleplay and storytelling LLM that can run on my PC with 32 GB Ram and 8gb Vram card (Or both since I also heard about layered hybrid approach too) and generally what would you recommend with this specs?

Thanks for this amazing community in advance for improving open source LLM eco-system

  • zware@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    If you want speed, you’ll want to use Mistral-7B-OpenOrca-GPTQ with ExLLama v2, that’ll give you around 40-45 tokens per second. TheBloke/Xwin-MLewd-13B-v0.2-GGUF to trade speed for quality (llama.cpp)