I scaled Mistral 7B to 200 GPUs in less than 5 minutes

Ok_Post_149@alien.top · 1 year ago

I scaled Mistral 7B to 200 GPUs in less than 5 minutes

Puzzleheaded-Pay-476@alien.top · 1 year ago

This is actually really cool, it was simple and I didn’t run into any issues! Couple of major questions though, since you’re managing the infrastructure… how long are you going to let people use your compute for? Do I get a certain amount for free? How much once I need to start paying?

Unique concept, I like it

caikenboeing727@alien.top · 1 year ago

Very impressive! Lots of good use cases for this.

DarthNebo@alien.top · 1 year ago

You should look into continuous batching as most of your parallel requests are batch size 1 & heavily under utilising the VRAM & overall throughput that would have been easily possible.

sergeant113@alien.top · 1 year ago

I’m in the middle of building my app on Modal. Guess I’ll adapt it to run on your service and see. Thanks for sharing!

Ok_Post_149@alien.top · 1 year ago

This is really cool! We are more focused on lengthy workloads so running 500k inputs through an LLM in one batch instead of on-demand inference (starting to support this). Right now the startup time is pretty long (2-5 minutes) but we are working on cutting it down.