gentlecucumber@alien.topBtoLocalLLaMA@poweruser.forum•Can I run an LLM that takes up no more than 1-4GB of RAM / VRAM and have it answer questions using my notes, or is that unrealistic?English
1·
10 months agoI haven’t tried Mistral yet, but RAG with a 7b might not give accurate info from the context you pass it; even larger models can have trouble with accurate Q/A over documents, but there are things you can do to help with that.
Why not just make API calls to GPT 3.5T instead of trying to barely run a 7b model at a snails pace for sub-par results? It’s fractions of a penny for thousands of tokens.
Probably Airoboros, either the llama2 or Mistral version, you’d have to evaluate which one handled the fine-tuning better. I suspect llama2