ExLlamaV2: The Fastest Library to Run LLMs

alchemist1e9@alien.top · 1 year ago

ExLlamaV2: The Fastest Library to Run LLMs

JoseConseco_@alien.top · 1 year ago

So how much vram would be required for 34b model or 14b model? I assume no cpu offloading right? With my 12gb vram, I guess I could only feed 14bilion parameters models, maybe even not that.

ExLlamaV2: The Fastest Library to Run LLMs

ExLlamaV2: The Fastest Library to Run LLMs

Just a moment...