MegaTTS 3 Voice Cloning

MegaTTS 3 is a text-to-speech model trained by ByteDance with exceptional voice cloning capabilities. The original authors did not release the WavVAE encoder, so voice cloning was not publicly available; however, thanks to @ACoderPassBy's WavVAE encoder, we can now clone voices with MegaTTS 3!

h/t to MysteryShack on Discord for the info about the unofficial WavVAE encoder!

Upload a reference audio clip and enter text to generate speech with the cloned voice.

Reference Audio

Text to Generate

Generated Audio