Adobe demos “photoshop for audio,” lets you edit speech as easily as text

a thoughtful web.

Good ideas and conversation. No ads, no tracking. Login or Take a Tour!

Adobe demos “photoshop for audio,” lets you edit speech as easily as text · 11

rezzeJ · 2991 days ago

arstechnica.co.uk · #technology · #video

VoCo works by ingesting a large amount of voice data (about 20 minutes right now, but that'll be improved), breaking it down into phonemes (each of the distinct sounds that make up a spoken language), and then attempting to create a voice model of the speaker—presumably stuff like cadence, stresses, quirks, etc., but Adobe hasn't provided much detail yet.

tweet · print · htmlmarkup tips · 0

rrrrr · 2991 days ago · link ·

Not quite sure why these corporate events always have to be so painful to watch, but the technology is impressive. Makes me wonder: if Adobe's close to turning this into a product you can run on your home computer, then which other organizations already have similar technology deployed? What could the CIA do with a few choice edits to a leaked recording, for example? You have to suspect that similar technologies may be in use outside of the public view.

+discuss+discuss

–

kleinbl00 · 2991 days ago · link ·

If you want to do it by hand you can do it with magnetic fucking tape. No tinfoil hat necessary.

Legitimately, this is Adobe saying "we can use your own voice as a vocaloid seed."

+discuss+discuss

–

DJWalnut · 2983 days ago · link ·

"we can use your own voice as a vocaloid seed."

this has many uses. from making parody videos to restoring old music. the possibility of it making it easier to fake evidence is concerning, but we survived photoshop without too much damage

+discuss+discuss

rrrrr · 2991 days ago · link ·

Well, there you go, I just discovered Vocaloid and now I want to play with it. I'm sure it's more fun than Adobe's thing. First release 12 years ago. Evidently I am years behind in what's possible.

+discuss+discuss

–

kleinbl00 · 2991 days ago · link ·

Up until about 2 years ago it only ran on Windows, which kept most of the music-making world from fucking with it.

Up until last year it only spoke Japanese with no real English port for any of its instructions. Means that if you wanted to do something not in Japanese you had to use Japanese phonemes and katakana instruction and it was kinda tortuous.

Also, there are few things as fuckin' Otaku Japanese as Vocaloid.

now with bonus Archer clip

+discuss+discuss

–

user-inactivated · 2991 days ago · link ·

There was one built on Festival called Flinger that sung in English, but it seems to be dead now. Also the UI was a Scheme interpreter.

+discuss+discuss

kleinbl00 · 2991 days ago · link ·

We've been doing this the hard way for years. It doesn't take as long as you'd think - it's not that the process is difficult, it's that it's offensive - "if you need that line, go record that guy saying that line, ass." "But we don't have tyyyyyyymmmmmeeeeeeUHHHHHH!" "I will literally take an iPhone recording." "Why are you being so difffffffficulllllllllltUHHHHHHHH!*" I think it's indicative of Adobe that they're simplifying a process that nobody needs rather than, oh, coming out with an audio editor that doesn't suck.

Because the people that need this are Youtubers that didn't get that tedious c-grade celebrity saying that thing they needed for that tedious Youtube video and they don't have the skills to edit around their failure.

Real toys with a real function that accomplishes real task has been available for a decade.

+discuss+discuss

–

rezzeJ · 2991 days ago · link ·

I don't doubt that there's tools with a lot more utility than Adobe's software provides when it comes to editing vocal takes and such, like the one you linked. And like you said, it's not really that bad of a process to do manually. At least not from the limited experience I've had of it.

The main interest I had in what Adobe showed was the ability to type in new words and phrases that a person hasn't actually said and the software's ability to produce a somewhat realistic sounding take. That's the cool part for me, vs. the basic vocal adjustment capability.

+discuss+discuss

–

kleinbl00 · 2991 days ago · link ·

That's because Adobe has had this latent voice recognition thing that they've been trying to do for a while so they can tag flash videos. It's why they came out with Story so they'd increase their corpus.

My point is that the underlying technology Adobe is leveraging has been available for years from far better vendors using far better tools, it's just that nobody but Adobe would come up with a "let's fake news clippings" use for it.

+discuss+discuss

–

rezzeJ · 2991 days ago · link ·

Fair enough. It seems I'm behind on the times along with rrrrr.

+discuss+discuss

nothingleftinside · 2991 days ago · link ·

Us audio engineers are going to be out of a job soon! ;-P