Background is… trying to build interface for users to choose LLM (like Falcon, Deepsake etc from Huggingface) from my portal which will make script to download and deploy that particular LLM in Azure.
Once it is deployed, users will use those LLMs to build apps. Deploying custom LLM in user/client cloud environment is mandate as there is data security policies in play.
If anyone worked on such script or have an idea then please share your inputs.
While I have not tried this in azure, my understanding is that you can deploy a Linux vm with A100 in azure (T4or V100 may not work for all use cases, but will be a cheaper option). Once you have a Linux vm with GPU, you can choose how you would like to host the model(s). You can write some code and expose the LLM via an API ( I like Fast chat, but there are other options as well). Heck you can even use ooba if you like. Just make sure to check the license for what you use.