I saw an interesting article somewhere that showed you can be a lot more memory efficient doing inference with Rust, since you dont have several GBs of python dependencies.
It depends a lot on the details tbh. Do they share one model? Do they each use a different lora? If its the latter theres some cool recent research on efficiently hosting many loras on one machine
I saw an interesting article somewhere that showed you can be a lot more memory efficient doing inference with Rust, since you dont have several GBs of python dependencies.