Need help estimating if my speed is expected. Llama_index

Noxusequal@alien.top · 1 year ago

Noxusequal@alien.top · 1 year ago

I know that cuda is used vram is full and i get the message in the beginning. What is your hardware setup ?

Do you also use llama_index and then langchain or did you build it more or less from llama_cpp and langchain without llama_index ?

harrro@alien.top · 1 year ago

I’m using langchain with qdrant as the vector store.

VRAM is full

How is a 7B model maxing out your VRAM? A 7B model at 4bit and 4k context should not use the 12GB VRAM on a 3060.

Noxusequal@alien.top · 1 year ago

Its a 3060 laptop so only 6gb and model plus embedding etc. Is at like 5.8gb