XTTSv2 is released. I’d say it’s a big jump in quality.
- Better voice cloning
- Better audio
- Impressive prosody and expressiveness
- Added more languages, I guess total 16 languages.
- Non-EN languages sounds way better
- Streaming under 200ms ( I have 3090)
- Finetuning code
Here you can try https://huggingface.co/spaces/coqui/xtts
You must log in or register to comment.
incredible
Does anyone know if there is a detailed model description somewhere? They don’t seem to have a full technical report anywhere and the documentation just describes the model API.
What is the best infra to deploy the API?
What is the best way to deploy it as an API?