Synthesia is also the name of a much more established, extremely popular midi/piano visualisation software[1]. If you've ever looked up "<song> piano tutorial" on youtube, you've probably seen that program.<p>It's a shame they chose that name, since it was such a great play on words for the midi software (synesthesia is sound into colorful visuals, and midi uses synths) whereas this product has basically no relation.<p>[1] <a href="https://synthesiagame.com/" rel="nofollow">https://synthesiagame.com/</a>
Avoid getting your video rejected. Please make sure you adhere to our content guidelines.
Please keep your script professional and business related. Political, sexual, personal, criminal and discriminatory content will not be tolerated or approved.<p>Ahh.. the anchor fm problem.. guess I'll need an open source version.<p>I started toying with libreBot I think it's called - which allows you to do anything you want with these things if you self-host license for a grand I think it was.<p>This synthesia didn't even get the first sentence I tried. It also requires a 'business email' and agree to terms that includes "I agree to receive occasional product information as per Synthesia Privacy Policy *"<p>trying hard to keep the genie in the bottle aren't they.
While the tech is impressive in itself, still doesn't look to be something I'd pay for. The lip sync is annoyingly off, and the bland expressions that comes from not understanding context make the communication even worse. If having a visual talking head is that important for a project, still seems better to just hire someone.<p>(On a side note, I'm not sure I understand the appeal of emotionally bland fake-smile talking heads in general, even when they're real.)
Can you think of one good use for this product?<p>No I'm not asking if you think you can you use this to make money, I'm asking do you personally want to sit through a video of a robot telling you do things? Are we supposed to believe this is preferable to simply reading this or hearing recorded audio? This is flat out consumer hostility, basically telling your customers to talk to a sock puppet instead of a real person, I hope this fails, I would pay money to make this illegal.
Here's the cookie text if you are lazy to read it...it sounds a bit creepy: <a href="https://share.synthesia.io/a4159eee-f70b-4318-a8bc-ec0fdf6af751" rel="nofollow">https://share.synthesia.io/a4159eee-f70b-4318-a8bc-ec0fdf6af...</a>
Are sales spam emails going to start including personalized videos? I guess I'll look forward to the "Hello dollar sign firstname. I'm dollar sign agentname. My colleague recommended I connect with you, as you both work at dollar sign employer" template misfires.
Impressive. Funny enough I've started to see those faces appear on YouTube. The intention may be to create these corporate style videos, but I'm counting down the minutes until my aunt starts forwarding questionable things on WhatsApp.
<a href="https://share.synthesia.io/d8860a05-2870-4315-9316-b03cbc76a6ad" rel="nofollow">https://share.synthesia.io/d8860a05-2870-4315-9316-b03cbc76a...</a><p>Animations are pretty good. Pronunciation could use some work. There also does not seem to be a way to influence the inflection, which is an absolutely crucial component for sales pitches. It's not so much what you say, but how you say it. Also, the right people have to sell the right things. Words coming from Elon's mouth in regards to cryptocurrency have a far greater effect on market behavior than the exact same words coming from this AI person's mouth.
Uncanny valley meets mixed messages and bad delivery.<p>The incoherent facial expressions actually manage to confuse the message more than the dissociated pronunciation.... "witch is know small feet".<p>This tech is a neat trick at this stage but is less useful than just leaving the text as text, in fact adding negative value to an already fully functional process.<p>Fiver is a better option, and I would not recommend that.<p>For an interesting and highly unethical experiment, someone should raise a thousand infants with this drivel and see what happens...I’m going to posit that the result is not good. Children’s narrations is exactly where this is headed though, I can see this as a multimillion view no effort YouTube babysitter.<p>Children find a pleasant, smiling female face soothing...so this is going to be another way that the dollar and human laziness will use AI to make the world a slightly worse place.
What awful comments here, you're all criticizing something really exciting. Of course AI can't beat real humans, what do you expect? But it's closer we've ever been, especially since is available to consumers. People in sales and marketing know how valuable is this on improving conversion rates... if you're not in those fields, that's not for you, saying something it useless just because you have no knowledge in other domains, it's highly ignorant.
Wow this feels like a blast from the past. There used to be a service that did exactly this (little help chats with "AI" generated voices), in the mid 2000s but instead of having human avatars they were animated. Seeing the woman speak immediately unlocked a memory in my kid brain.
Fantastic technology and I love that the videos look and sound super lifelike. The face looks like most instagram influencers with vanilla broad-appeal pretty faces, which I guess is the style these days.<p>But what’s the point?<p>If you’re gonna send someone a soulless corporate drone video, is that really better than a soulless corporate email? I thought the goal of doing video was that it’s more personable and human ... an AI video doesn’t quite hit those goals does it?
Here’s a sample video with a custom script produced earlier <a href="https://share.synthesia.io/4b75b584-9b3b-4a96-86c2-6b34b8711d10" rel="nofollow">https://share.synthesia.io/4b75b584-9b3b-4a96-86c2-6b34b8711...</a>
Pretty good.... but not quite there yet, in my humble opinion.<p>The lips, eyes, and facial features move in natural ways, but the head remains frozen in a somewhat unnatural manner. It's just inside the uncanny valley, with barely perceptible creepiness.<p>I would hope to see improvements to make face/neck movements look more natural, to overcome these issues over time!
There's something quite cyberpunk about smiling AI-generated corporate headshot faces extolling the wonders of <insert product here>. And I don't mean that in a good or bad way. I imagine we'll start seeing these all over the place quite soon.<p>I mean, combine it with GPT-3 and you've got something that's nearly science fiction. Really interested to see where this goes.
The eyes aren't quite right and sometimes.thr voice is a little off, but I probably wouldn't notice in a real world setting without prior knowledge.
I want to see her on my wall, every day, bald, with green eyes. Spouting Shakespearean slurs at Alexa, then following up with some Rumi poetry, and a dash of Allan Watts..all powered by a Markov chain.
Very close but not quite human. A text book example of the uncanny valley <a href="https://en.m.wikipedia.org/wiki/Uncanny_valley" rel="nofollow">https://en.m.wikipedia.org/wiki/Uncanny_valley</a>
rel. given a script, "generating all aspects of a cinematic scene, including staging, acting, editing, framing and lighting in Assassin's Creed Odyssey."<p><a href="https://youtube.com/watch?v=DFM5zbekZ7c" rel="nofollow">https://youtube.com/watch?v=DFM5zbekZ7c</a> hour-long dev talk (GDC)
Their David Beckham video is pretty good <a href="https://www.synthesia.io/post/david-beckham" rel="nofollow">https://www.synthesia.io/post/david-beckham</a>
What's the point of using AI if it needs to be manually reviewed? I suppose the outputs are also manually reviewed as well to keep from the AI going rouge?
People don't want to talk to computers, that's why chatbots (in their current form) fail one after the other. People also don't want to listen to emotionless robots. As long as this technology is not 100% accurately mimicking a human, the Uncanny valley effect will kick in and just leave an uncomfortable feeling.
Here is an instructional reading of advice I gave my friend over text on how to use enzymatic cleaner should his new kittens have an accident:<p><a href="https://share.synthesia.io/2761933d-4ec7-48c7-b67e-85fc9d6864b9" rel="nofollow">https://share.synthesia.io/2761933d-4ec7-48c7-b67e-85fc9d686...</a>
I know I'll will probably sound a bit Luddite by saying this, but just the examples already make me cringe: a welcoming video for a corporation saying "we're looking forward to have you here", narrated by a _bot_, is as dehumanizing as it gets. :(
Interesting. I hope the models were paid adequately, considering that they can now use them effectively for free infinitely.<p>Reminds me of the movie The Congress.<p>Obviously this technology has a long way to go, but it seems that that actors should feel less secure about their jobs being resistant to automation.
Impressive, but not quite good enough to avoid the 'uncanny valley' - the lips are not perfectly synced to the audio. Also it should allow a way <i>stress certain words</i> in the input script.
So, a bit curious on how this factors in emotions and depth that could vary depending on the nature of the video [onboarding vs launch videos, say]? And, how to not run out of options for voice/person selection. It shouldn't end up being like the stock images (same faced used in multiple brands). How well of a brand identity gets maintained for say paying customers?
>> Synthesia lets you create great business videos in minutes. Say goodbye to actors, film crews and expensive equipment.<p>Yay! At last! And when we've automated away everyone's work, also say goodbye to synthesia and every other automation service, because there's no business left to use it. Woo-hoo, future world, here I come!
A really creepy use case for this would be to combine it with one of those IP-to-company name lists. If you visit a vendor it could play a video greeting you by mentioning your business name. “Click here to learn what we can do for Acme Industries!”<p>Again, super creepy and not really clear if it would drive engagement.
Wow, the Portuguese pronunciation, intonation and lipsync are incredibly accurate, 10x more so than the English voice. I wonder if that's true for other latin-ish languages and if that means those languages are easier to learn.
I think in general the quality is quite good, but the characters lack personality. I think that is the opportunity. Create something with more lively movement. Think the Sham-wow guy.<p>Anybody can stand blankly in front of a camera without emotion. But this is an impressive start.
I love it 1000%. Need to create videos for a new crypto. This helps translate the videos to 10 different languages and kick off a global service. It's not perfect but it's fast and looks very professional.
Would have been interresting to try out but unfortunately, the email prompt ended my evaluation. A lot of people will probably stop there and move on as well.
Aw man, it kind of made it seem like it would be generated fast, but then you find out after putting in your information that it requires manual review.
I'm more stunned by the good speech synthesis than by the already good visuals.<p>Does anyone know what's under the hood for the text to speech?
Founder here. AMA :)<p>To answer a few recurring questions in the thread<p>---> Use case.<p>Video is a way more effective way to communicate than text. Not for the HN crowd, but if you're a blue collar worker a 2 minute video in your native language is much preferred to a 5 page pdf for training.<p>Anyone who has tried to record a simple corporate video know the pain of cameras, film crews, 25 takes to get one that works and post production. Cumbersome, slow and multidisciplinary. By the time the video is done the content is out of date.<p>Synthetic video is not yet at the quality of real video. Eventually it will be. But the mistake many are making here is comparing it to real video; it should be compared with text.<p>In X years we'll be able to make Hollywood films on a laptop without needing anything but time and imagination. Just like we can digitally compose music in Ableton, create images in Photoshop and type novels on keyboards rather than with pen and paper.<p>My (obviously biased;)) belief is that synthetic media will eventually become foundational technology that will move media production from cameras/microphones to API's. We'll be able to do all kind of things we couldn't do before.<p>Eg. personalized and interactive rich media, video-driven chatbots and eventually Hollywood blockbusters made by your favourite YouTuber from his or her bedroom.<p>---> Uncanny valley<p>Simulating real video is incredibly hard. We're constantly improving and launching more expressive synthesis soon.<p>From our tests with some of our largest clients 8/10 people don't realise it's a synthetic video (unless they are asked to look for it).<p>---> Tech<p>Has been developed over the last 3 yrs. Origins/team from Stanford/UCL/TUM.<p>Learning: Going from research to working, scaleable product is <i>hard</i> and takes time. But very rewarding when it works.<p>[1] <a href="https://www.youtube.com/watch?v=ohmajJTcpNk" rel="nofollow">https://www.youtube.com/watch?v=ohmajJTcpNk</a>
[2] <a href="https://www.youtube.com/watch?v=qc5P2bvfl44" rel="nofollow">https://www.youtube.com/watch?v=qc5P2bvfl44</a><p>---> Bad uses<p>Bad actors will do bad things with synthetic media. Like with any other technology from smartphones to cars. We're moderating all content and building safeguards and verification + working with FAANG and others on detection and provenance technology.<p>Recommended read - deepfakes perfectly follow the story arc of any new, powerful technology: <a href="https://journals.sagepub.com/doi/full/10.1177/1745691620919372" rel="nofollow">https://journals.sagepub.com/doi/full/10.1177/17456916209193...</a><p>---> Actors<p>Real actors getting rev share + upfront free from every video generated with their likeness. Like being a stock photo actor.