turamura@alien.topB to

LocalLLaMA@poweruser.forumEnglish · 1 year ago

Exllama outside of text generation webui?

5

1

Exllama outside of text generation webui?

turamura@alien.topB to

LocalLLaMA@poweruser.forumEnglish · 1 year ago

5

I want to use the ExLlama models because it enables me to use the Llama 70b version with my 2 RTX 4090. I managed to get it to work pretty easily via text generation webui and inference is really fast! So far so good…

However, I need the model in python to do some large scale analyses. I cannot seem to find any guide/tutorial in which it is explained how to use ExLlama in the usual python/huggingface setup.

Is this just not possible? If it is, can someone pinpoint me to some examplary code in which ExLlama is used in python.

Much appreciated!

Chat

Murky-Ladder8684@alien.topB
link
fedilink
English
arrow-up
1·
1 year ago
Check out turbo’s project https://github.com/turboderp/exui

He just put it up not long ago and he has Speculative Decoding working on it. I tried it with Goliath 120b 4.85bpw exl2 and was getting 11-13 t/s vs 6-8 t/s without it. It’s barebones but works.