WinterUsed1120@alien.topB to

LocalLLaMA@poweruser.forumEnglish · 1 year ago

Exllama v2 vs. llama.cpp (All layes offloaded to GPU)

1

1

Exllama v2 vs. llama.cpp (All layes offloaded to GPU)

WinterUsed1120@alien.topB to

LocalLLaMA@poweruser.forumEnglish · 1 year ago

1

Will there be a significant difference in speed and quality between LLama v2 GPTQ using Exllama v2 and LLama v2 GGUF using llama.cpp by offloading all the layers to GPU?

Chat

Maykey@alien.topB
link
fedilink
English
arrow-up
1·
1 year ago
Yes. ExLlama2 is much faster IME. It also supports 8-bit cache to save even more VRAM(I don’t know if llama.cpp has it).