diamond_jackie07@alien.topBtoMachine Learning@academy.garden•[Project] LLM inference with vLLM and AMD: Achieving LLM inference parity with NvidiaEnglish
1·
1 year agoI tried on this config - Ryzen 9 7950x MI210. I got this result Throughput: 129 requests/min, 1028.89 tokens/s on llama2-7b. Which is even better than the performance they cite on the post
Will report back on 13b performance ASAP