I wanted to share some exciting news from the GPU world that could potentially change the game for LLM inference. AMD has been making significant strides in LLM inference, thanks to the porting of vLLM to ROCm 5.6. You can find the code implementation on GitHub.
The result? AMD’s MI210 now almost matches Nvidia’s A100 in LLM inference performance. This is a significant development, as it could make AMD a more viable option for LLM inference tasks, which traditionally have been dominated by Nvidia.
For those interested in the technical details, I recommend checking out this EmbeddedLLM Blog Post.
I’m curious to hear your thoughts on this. Anyone manage to run it on RX 7900 XTX?
Will this run on a ryzen with Radeon built in?
If so, can’t you build a ryzen machine with like 128gb of ram and dedicate nearly all of it to video?
There are limits, some motherboards have higher limits https://www.tomshardware.com/news/dollar95-amd-cpu-becomes-16gb-gpu-to-run-ai-software
Well now I know what I’m doing with my weekend. Thanks for sharing! Hopefully I can report back some xtx performance numbers.
I tried on this config - Ryzen 9 7950x MI210. I got this result Throughput: 129 requests/min, 1028.89 tokens/s on llama2-7b. Which is even better than the performance they cite on the post
Will report back on 13b performance ASAP
Is there some good cloud host for getting AMD GPUs?
I’m a bot, bleep, bloop. Someone has linked to this thread from another place on reddit:
- [/r/datascienceproject] LLM inference with vLLM and AMD: Achieving LLM inference parity with Nvidia (r/MachineLearning)
^(If you follow any of the above links, please respect the rules of reddit and don’t vote in the other threads.) ^(Info ^/ [1](/message/compose?to=/r/TotesMessenger))
Contact ↩︎