J_J_Jake@alien.topBtoLocalLLaMA@poweruser.forum•llama.cpp running the ai models with less ramEnglish
1·
1 year agoThis is because llama.cpp uses mmap()
by default, which maps a file stream buffer into memory. The model is swapped in and out as used, with available system ram basically used as a cache. you can disable this via the command line if you want the model to be static in ram.
You could try LLaVA or MiniGPT-4 but I am unsure of how well they would perform.