So I saw the post, “Hugging Face Removes Singing AI Models of Xi Jinping But Not of Biden” and I was curious…

How does one set up a singing model (or a speaking model that can copy other people)?

Is it just TTS and fine tuning the settings of pitch, tone, etc or is there a program that takes a description of the voice and uses a model to make it?

How does one dive into this kind of AI stuff on a home system?

  • a_beautiful_rhind@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    so-vits-svc has an interface you can do inference on static files or live voice. I only trained RVC and not a so-vits yet, but it’s very good with decent audio. I have tried it with other peoples models.

    To do a song and put it on youtube, you will still need to know some audio engineering.

  • LJRE_auteur@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    What you’re looking for is called RVC. It’s a voice conversion software. You give it an audio file containing a voice speaking or singing, and a model file for another voice. It literally makes the second voice say/sing whatever the first voice was saying/singing.

    People make AI covers by splitting a voice from a song, changing the voice with RVC, and mixing the instrument and the new voice file together.