WinterUsed1120@alien.topB to

LocalLLaMA@poweruser.forumEnglish · 1 year ago

Exllama v2 vs. llama.cpp (All layes offloaded to GPU)

1

1

Exllama v2 vs. llama.cpp (All layes offloaded to GPU)

WinterUsed1120@alien.topB to

LocalLLaMA@poweruser.forumEnglish · 1 year ago

1

Will there be a significant difference in speed and quality between LLama v2 GPTQ using Exllama v2 and LLama v2 GGUF using llama.cpp by offloading all the layers to GPU?

You must log in or register to comment.

Chat

Maykey@alien.topB
link
fedilink
English
arrow-up
1·
1 year ago
Yes. ExLlama2 is much faster IME. It also supports 8-bit cache to save even more VRAM(I don’t know if llama.cpp has it).