Hi all I’m wondering if is there a possibility to spread load of localLLM on multiple hosts instead of adding gpu’s to speed up responses. My host do not have gpu’s since I want to be power effective, but they have decent ammont of ram 128. Thx for all ideas.
If i didnt missunderstood your question, answer is Petals:
https://research.yandex.com/blog/petals-decentralized-inference-and-finetuning-of-large-language-models