Adobe demos “photoshop for audio,” lets you edit speech as easily as text

169 pointsby nerdyover 8 years ago

14 comments

edurenover 8 years ago

I've actually been tossing around the idea of creating a program like this, although for a specific use case.In Bethesda games (Oblivion, Skyrim, Fallout) there are large modding communities adding new quests, areas and plot lines. But one technical and financial challenge for them has always been voice acting. Not only do they have to worry about voice acting new potential characters, but they have no means of writing new dialogue for existing characters.In Fallout 4, for example, the protagonist is fully voice acted. That means a distinct change between the way the main game feels, and any modding efforts made by the community (barring actually re-hiring the original voice actor for new lines).I'm envisioning having this tool train on the already provided voice lines in the game(depending on the character in question, that's quite a bit). And then letting mod authors input new dialogue lines to be spit out in somewhat the actors voice.Lots of problems with the approach of course, not to mention the fact that these are actors and not just voices (there would probably be significant amount of emotion lost). But it would give the modding community such a powerful tool to add new plots for existing characters.

评论 #12893570 未加载

评论 #12896598 未加载

评论 #12896655 未加载

Jugurthaover 8 years ago

I wonder how much better the rendering would be if the audio track were much longer and the software would have more to learn. I don't mean more words for a 1 to 1 match since it's clearly beyond that (pronouncing words that it didn't see), but voice features that weren't in that short track.Hypothetical question: say it had access to all the episodes from Key & Peele, would the rendering be better to the point you could basically generate an audio track from a script with intonation and all?It would be interesting if they offered "voice packages" either online or offline so you could just pass text through it and the output would be a Morgan Freeman narration. You'd have a shop for "Cords" the same as iTunes for songs & apps. Maybe game developers will find that interesting, too. Having access to way more voices than they'd have in real life, and on a budget.Someone could also save their voices for posterity. Many people listen to recordings of loved ones who passed away to remember them. Saving the voice for new content would be something to think about.

评论 #12893221 未加载

评论 #12893198 未加载

nerdyover 8 years ago

I thought it sounded badly cut when he moved wife in the sentence but adding new text was pretty amazing.

评论 #12892459 未加载

评论 #12892425 未加载

pavel_lishinover 8 years ago

How long before video and audio evidence will not longer be admissible in court? 2030?

评论 #12893995 未加载

评论 #12892379 未加载

评论 #12893117 未加载

noonespecialover 8 years ago

1) Mentioned near the end of the video that it actually required around 20 minutes of audio to start synthesys. Not quite as magic as it first seemed. Still cool.2) The intonation always matched the initial sample. Give us some filters like "vocal fry", "perplexed", "angry", "wonder" etc and then we'll really have something here.

评论 #12894087 未加载

评论 #12893894 未加载

erikschosterover 8 years ago

Sounds a lot like this: <a href="https://youtu.be/xzL-pxcpo-E?t=933" rel="nofollow">https://youtu.be/xzL-pxcpo-E?t=933</a>IRCAM has been doing some really cool stuff in this area for a long time. Check out their pages on corpus-based synthesis for example: <a href="http://imtr.ircam.fr/imtr/Corpus_Based_Synthesis" rel="nofollow">http://imtr.ircam.fr/imtr/Corpus_Based_Synthesis</a>

评论 #12893175 未加载

echelonover 8 years ago

This sounds waaay better than the Donald Trump text to speech system I've been working on: <a href="http://jungle.horse" rel="nofollow">http://jungle.horse</a>I wish I could chat with their engineering team. I'd love to learn the mathematics and tech. (A lot of it might be patented?)Is there an equivalent of SIGGRAPH for audio?

评论 #12895940 未加载

jbverschoorover 8 years ago

If photoshop would give these results, a lot of industries would go belly up.Nice marketing line, but it's speech recognition which set the begin/end frame in the sample.I was expecting either "painting" away defects or actually reconstruction a real TTS by using a small sample.

评论 #12893177 未加载

schoenover 8 years ago

Copied from my comment on an earlier submission on this:I don't see how the watermarking they talk about is going to succeed in preventing forgeries.If they're planning to watermark unedited recordings, you have a huge false positive problem because there are billions of hours of legitimate but unwatermarked audio recordings, and will probably continue to be. You can also get false negatives by tampering with a watermark-capable device to get it to watermark something that wasn't recorded from analog. Or you can rerecord edited audio from an analog source and simply claim that your "genuine" recording is slightly noisy.If they're planning to watermark edited recordings, someone else can implement the same kind of technology but without the watermarking.

评论 #12893538 未加载

hammockover 8 years ago

You know this is a good idea when half the commenters already have a half-baked version of this created themselves!

评论 #12899851 未加载

Something1234over 8 years ago

Just imagine what this will do for dubbing anime or any other tv show. It's still scary how this can be abused.

评论 #12894184 未加载

jwebb99over 8 years ago

"Photoshop for audio," seems so obvious, I'm surprised we haven't seen this before. (After all, the underlying technology has been around for a while now.)

gallerdudeover 8 years ago

Joaquin Phoenix is now going to narrate all of my audio books.

评论 #12894422 未加载

glaszover 8 years ago

if adobe has this working in a demo, rest assured "security service" developed such thing 10 years ago. then you can go back and ask yourselves why osama has been reported dead as early as like 2001, the cia released videos in which he always looked different and why his body was quickly drowned at an unknown location.go back to sleep, now. everything's alright. great new tech. will help catching terrorists from beneath your bed.

评论 #12893305 未加载