nmcfarl@alien.topB to

LocalLLaMA@poweruser.forumEnglish · 10 months ago

Fast Llama 2 on CPUs With Sparse Fine-Tuning and DeepSparse

neuralmagic.com

1

1

Fast Llama 2 on CPUs With Sparse Fine-Tuning and DeepSparse

neuralmagic.com

nmcfarl@alien.topB to

LocalLLaMA@poweruser.forumEnglish · 10 months ago

1

Fast Llama 2 on CPUs With Sparse Fine-Tuning and DeepSparse - Neural Magic

neuralmagic.com

Key Takeaways We expanded our Sparse Fine-Tuning research results to include Llama 2. The results include 60% sparsity with INT8 quantization and no drop in accuracy. DeepSparse now supports accelerated inference of sparse-quantized Llama 2 models, with inference speeds 6-8x faster over the baseline at 60-80% sparsity. We used some interesting algorithmic techniques in order

Chat

metaprotium@alien.topB
link
fedilink
English
arrow-up
1·
10 months ago
60% sparsity with no quality loss is really good. I’ll have to look into the methodology cause that’s impressive