Can I run an LLM that takes up no more than 1-4GB of RAM / VRAM and have it answer questions using my notes, or is that unrealistic?

TheTwelveYearOld@alien.top · 2 years ago

Can I run an LLM that takes up no more than 1-4GB of RAM / VRAM and have it answer questions using my notes, or is that unrealistic?

gentlecucumber@alien.top · 2 years ago

I haven’t tried Mistral yet, but RAG with a 7b might not give accurate info from the context you pass it; even larger models can have trouble with accurate Q/A over documents, but there are things you can do to help with that.

Why not just make API calls to GPT 3.5T instead of trying to barely run a 7b model at a snails pace for sub-par results? It’s fractions of a penny for thousands of tokens.