PSA: If you have Telsa P40 which has abysmal FP16 performance DO NOT update oobabooga past commit 564d0cde8289a9c9602b4d6a2e970659492ad135

nero10578@alien.top · 11 months ago

You don’t NEED 3090/4090s. A 3x Tesla P40 setup still streams at reading speed running 120b models.

nero10578@alien.top · 1 year ago

Huh its not really faster than Tesla P40s then for some reason.

nero10578@alien.top · 1 year ago

There are no new 3090 so comparing the cost to a new 3090 is pointless as its basically just scalped overprized new 3090s left.

nero10578@alien.top · 1 year ago

Not sure where they got 694GB/s for the Tesla P40, they’re only 347GB/s of memory bandwidth.

nero10578@alien.top · 1 year ago

What kind of token/s do you get with 2x3090 for the 70B models?

nero10578@alien.top · 1 year ago

Dual CPUs would have terrible performance. This is because the processor is reading the whole model everytime its generating tokens and if you spread half the model onto a second CPU’s memory then the cores in the first CPU would have to read that part of the model through the slow inter-CPU link. Vice versa with the second CPU’s cores. llama.cpp would have to make a system to spread the workload across multi CPUs like they do across multi GPUs for this to work.

nero10578@alien.top · 1 year ago

A V100 16GB is like $700 on ebay. RTX 3090 24GB can be had for a similar amount.

nero10578@alien.top · 1 year ago

Wait what? I am getting 2-3t/s on 3x P40 running Goliath GGUF Q4KS.

nero10578@alien.top · 1 year ago

Wonder what card you have that’s 20GB?

nero10578@alien.top · 1 year ago

PSA: If you have Telsa P40 which has abysmal FP16 performance DO NOT update oobabooga past commit 564d0cde8289a9c9602b4d6a2e970659492ad135

nero10578@alien.top · 1 year ago

Mellanox CX3 40G card causes issues with GPU passthrough on Asus X99-E-10G WS

nero10578@alien.top · 1 year ago

Definitely thought this was for his homelab