StyleTTS 2 - Closes gap further on TTS quality + Voice generation from samples

super-helper@alien.top · 3 years ago

StyleTTS 2 - Closes gap further on TTS quality + Voice generation from samples

xadiant@alien.top · 3 years ago

Goddammit, I just fine-tuned Tortoise with custom voice. Can’t wait for webui’s for the StyleTTS. Hope it’s easy to fine-tune

AWAS666@alien.top · 3 years ago

Yep it is, takes around 4 hours on a 3090.

xadiant@alien.top · 3 years ago

That’s acceptable. Did you full train or fine-tune though? And how much data?

AWAS666@alien.top · 3 years ago

Fine tune and around an hour worth of data.

Traditional-Ice-5790@alien.top · 3 years ago

How do you Fine-Tune or full train? I wish there was a step by step guide, I’ve been trying for hours but I can’t figure out what I’m supposed to do. The Readme doesn’t explain much.

StyleTTS 2 - Closes gap further on TTS quality + Voice generation from samples

StyleTTS 2 - Closes gap further on TTS quality + Voice generation from samples

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models

Yinghao Aaron Li, Cong Han, Vinay S. Raghavan, Gavin Mischler, Nima Mesgarani