Will there be a significant difference in speed and quality between LLama v2 GPTQ using Exllama v2 and LLama v2 GGUF using llama.cpp by offloading all the layers to GPU?

  • Maykey@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Yes. ExLlama2 is much faster IME. It also supports 8-bit cache to save even more VRAM(I don’t know if llama.cpp has it).