Llama-2 7b Unquantized Transformers using 26.8GB of Vram.

Acceptable_Can5509@alien.topB to

LocalLLaMA@poweruser.forumEnglish · 2 years ago

I’m running Llama-2 7b using Google Colab on a 40gb A100. However it’s using 26.8 gb of vram, is that normal? I tried using 13b version however the system ran out of memory. Yes I know quantized versions are almost as good but I specifically need unquantized.

https://colab.research.google.com/drive/10KL87N1ZQxSgPmS9eZxPKTXnobUR_pYT?usp=sharing

You must log in or register to comment.

Chat