According to reports, a new language model from Microsoft Vall-E, can mimic any voice using only a three-second sample. The 60,000 hours of English speech data that were used to test this recently released AI tool.
Researchers from Cornell University claimed in a report that it could mimic the emotions and tone of a speaker. This ability has been demonstrated by Microsoft on the TTS system’s GitHub page. Vall-E will recite any text while maintaining the tone of the 3-second audio, which can be spoken in any emotional state (angry, tired, neutral, amused, disgusted, etc.).
Evidently, such conclusions held true even when capturing words that the original speaker had never actually pronounced.
According to the researchers, “Vall-E emerges in-context learning capabilities and can be used to synthesize high-quality personalized speech with only a 3-second enrolled recording of an unseen speaker as an acoustic prompt. Experiment results show that Vall-E significantly outperforms the state-of-the-art zero-shot [text to speech] system in terms of speech naturalness and speaker similarity”. They also reveal: “In addition, we find Vall-E could preserve the speaker’s emotion and acoustic environment of the acoustic prompt in synthesis.”
However, the research in text-to-speech AI comes with a warning.
“Since Vall-E could synthesize speech that maintains speaker identity, it may carry potential risks in misuse of the model, such as spoofing voice identification or impersonating a specific speaker,” the researchers say. “We conducted the experiments under the assumption that the user agree to be the target speaker in speech synthesis. When the model is generalized to unseen speakers in the real world, it should include a protocol to ensure that the speaker approves the use of their voice and a synthesized speech detection model.”
Talks with Leaders, Ep8: 1st AI Doctor Assistant in Vietnam provides 90+% correct diagnosis within 5s
Talks with Leaders, Special episode: Blockchain meets Metaverse, featuring business leaders, investors, tech experts and global influencers
Threads App’s Rollercoaster Ride: Plummeting 82% of Users Raise Concerns for Meta’s Social Experiment
- OpenAI’s Sora Unleashes Social Media Storm - February 23, 2024
- From Bard to Gemini: Google’s Next-Generation AI Evolution - February 12, 2024
- Ho Chi Minh City Unveils Multi-Functional AI Livestream Shopping Center - January 29, 2024