Hi all I’m wondering if is there a possibility to spread load of localLLM on multiple hosts instead of adding gpu’s to speed up responses. My host do not have gpu’s since I want to be power effective, but they have decent ammont of ram 128. Thx for all ideas.
Check this out https://github.com/ggerganov/llama.cpp#mpi-build