The tech, dubbed VoCo (voice conversion), presents the user with a text box. Initially the text box shows the spoken content of the audio clip. You can then move the words around, delete fragments, or type in entirely new words. When you type in a new word, there's a small pause while the word is constructed—then you can press play and listen to the new clip.

    VoCo works by ingesting a large amount of voice data (about 20 minutes right now, but that'll be improved), breaking it down into phonemes (each of the distinct sounds that make up a spoken language), and then attempting to create a voice model of the speaker—presumably stuff like cadence, stresses, quirks, etc., but Adobe hasn't provided much detail yet.


Not quite sure why these corporate events always have to be so painful to watch, but the technology is impressive. Makes me wonder: if Adobe's close to turning this into a product you can run on your home computer, then which other organizations already have similar technology deployed? What could the CIA do with a few choice edits to a leaked recording, for example? You have to suspect that similar technologies may be in use outside of the public view.

posted by rezzeJ: 947 days ago