MegaTTS 3 Voice Cloning
MegaTTS 3 is a text-to-speech model trained by ByteDance with exceptional voice cloning capabilities. The original authors did not release the WavVAE encoder, so voice cloning was not publicly available; however, thanks to @ACoderPassBy's WavVAE encoder, we can now clone voices with MegaTTS 3!
h/t to MysteryShack on Discord for the info about the unofficial WavVAE encoder!
Upload a reference audio clip and enter text to generate speech with the cloned voice.