I wonder theres way to run LLM without loading on ram

wjohhan@alien.top · 1 year ago

I wonder theres way to run LLM without loading on ram

SlowSmarts@alien.top · 1 year ago

I ran a 13b Q_4 on a Raspberry Pi4 8Gb with Llama.cpp with no special settings, it just automatically cashed from disk… Was mega slow and got worse with more tokens, but did it. Don’t know if it was Llama.cpp or Raspberry Pi OS that automatically cached.

You can cmake Llama.cpp on many platforms.