XTTSv2 is released. I’d say it’s a big jump in quality.

  • Better voice cloning
  • Better audio
  • Impressive prosody and expressiveness
  • Added more languages, I guess total 16 languages.
  • Non-EN languages sounds way better
  • Streaming under 200ms ( I have 3090)
  • Finetuning code

Here you can try https://huggingface.co/spaces/coqui/xtts

  • m-pana@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Does anyone know if there is a detailed model description somewhere? They don’t seem to have a full technical report anywhere and the documentation just describes the model API.