[Project] LLM inference with vLLM and AMD: Achieving LLM inference parity with Nvidia

openssp@alien.top · 1 year ago

[Project] LLM inference with vLLM and AMD: Achieving LLM inference parity with Nvidia

mrpoops@alien.top · 1 year ago

Will this run on a ryzen with Radeon built in?

If so, can’t you build a ryzen machine with like 128gb of ram and dedicate nearly all of it to video?

ndreamer@alien.top · 1 year ago

There are limits, some motherboards have higher limits https://www.tomshardware.com/news/dollar95-amd-cpu-becomes-16gb-gpu-to-run-ai-software

Booonishment@alien.top · 1 year ago

Well now I know what I’m doing with my weekend. Thanks for sharing! Hopefully I can report back some xtx performance numbers.

diamond_jackie07@alien.top · 1 year ago

I tried on this config - Ryzen 9 7950x MI210. I got this result Throughput: 129 requests/min, 1028.89 tokens/s on llama2-7b. Which is even better than the performance they cite on the post

diamond_jackie07@alien.top · 1 year ago

Will report back on 13b performance ASAP

killver@alien.top · 1 year ago

Is there some good cloud host for getting AMD GPUs?

TotesMessenger@alien.top · 1 year ago

I’m a bot, bleep, bloop. Someone has linked to this thread from another place on reddit:

[/r/datascienceproject] LLM inference with vLLM and AMD: Achieving LLM inference parity with Nvidia (r/MachineLearning)

^(If you follow any of the above links, please respect the rules of reddit and don’t vote in the other threads.) ^(Info ^/ ^[1](/message/compose?to=/r/TotesMessenger))

Contact ↩︎