Jugg3rnaut@alien.topB to

LocalLLaMA@poweruser.forumEnglish · 1 year ago

GPU-over-IP for LLM inference?

1

1

GPU-over-IP for LLM inference?

Jugg3rnaut@alien.topB to

LocalLLaMA@poweruser.forumEnglish · 1 year ago

1

Has anyone tried to combine a server with a moderately powerful GPU with a server with a lot of RAM to run inference? Especially with llama. Cpp where you can offload just some of the layers to GPU?

https://github.com/Juice-Labs/Juice-Labs/wiki

You must log in or register to comment.

Chat

Brave-Decision-1944@alien.top
cake
B
link
fedilink
English
arrow-up
1·
1 year ago
I seen something like that in LOLLMs UI, it’s called petal, and basically it bandwidth the processing along computers connected to that network. There was also other remote “binding” from same maker as the UI. But I didn’t tired those.