Microsoft’s new AI can simulate anyone’s voice with 3 seconds of audio

10Jan/23Off

Microsoft’s new AI can simulate anyone’s voice with 3 seconds of audio

Microsoft calls VALL-E a "neural codec language model," and it builds off of a technology called EnCodec, which Meta announced in October 2022. Unlike other text-to-speech methods that typically synthesize speech by manipulating waveforms, VALL-E generates discrete audio codec codes from text and acoustic prompts. It basically analyzes how a person sounds, breaks that information into discrete components (called "tokens") thanks to EnCodec, and uses training data to match what it "knows" about how that voice would sound if it spoke other phrases outside of the three-second sample.

https://arstechnica.com/information-technology/2023/01/microsofts-new-ai-can-simulate-anyones-voice-with-3-seconds-of-audio/

Filed under: Non-3D stories Comments Off

Comments (0) Trackbacks (0) ( subscribe to comments on this post )

Sorry, the comment form is closed at this time.

Trackbacks are disabled.

Deep Learning Expert Says GPT Startups May Be in for a Very Rude Awakening » « Generative AI: a game-changer that society and industry need to be ready for

Pages

If your company is an ETC member, you can log in and see more news posts at www.etcentric.org

philip lelyveld The world of entertainment technology

Microsoft’s new AI can simulate anyone’s voice with 3 seconds of audio

Pages

More posts