Show HN: Infinity – Realistic AI characters that can speak

481 pointsby lcolucci9 months ago

Hey HN, this is Lina, Andrew, and Sidney from Infinity AI (<a href="https://infinity.ai/">https://infinity.ai/</a>). We've trained our own foundation video model focused on people. As far as we know, this is the first time someone has trained a video diffusion transformer that’s driven by audio input. This is cool because it allows for expressive, realistic-looking characters that actually speak. Here’s a blog with a bunch of examples: <a href="https://toinfinityai.github.io/v2-launch-page/" rel="nofollow">https://toinfinityai.github.io/v2-launch-page/</a>If you want to try it out, you can either (1) go to <a href="https://studio.infinity.ai/try-inf2">https://studio.infinity.ai/try-inf2</a>, or (2) post a comment in this thread describing a character and we’ll generate a video for you and reply with a link. For example: “Mona Lisa saying ‘what the heck are you smiling at?’”: <a href="https://bit.ly/3z8l1TM" rel="nofollow">https://bit.ly/3z8l1TM</a> “A 3D pixar-style gnome with a pointy red hat reciting the Declaration of Independence”: <a href="https://bit.ly/3XzpTdS" rel="nofollow">https://bit.ly/3XzpTdS</a> “Elon Musk singing Fly Me To The Moon by Sinatra”: <a href="https://bit.ly/47jyC7C" rel="nofollow">https://bit.ly/47jyC7C</a>Our tool at Infinity allows creators to type out a script with what they want their characters to say (and eventually, what they want their characters to do) and get a video out. We’ve trained for about 11 GPU years (~$500k) so far and our model recently started getting good results, so we wanted to share it here. We are still actively training.We had trouble creating videos of good characters with existing AI tools. Generative AI video models (like Runway and Luma) don’t allow characters to speak. And talking avatar companies (like HeyGen and Synthesia) just do lip syncing on top of the previously recorded videos. This means you often get facial expressions and gestures that don’t make sense with the audio, resulting in the “uncanny” look you can’t quite put your finger on. See blog.When we started Infinity, our V1 model took the lip syncing approach. In addition to mismatched gestures, this method had many limitations, including a finite library of actors (we had to fine-tune a model for each one with existing video footage) and an inability to animate imaginary characters.To address these limitations in V2, we decided to train an end-to-end video diffusion transformer model that takes in a single image, audio, and other conditioning signals and outputs video. We believe this end-to-end approach is the best way to capture the full complexity and nuances of human motion and emotion. One drawback of our approach is that the model is slow despite using rectified flow (2-4x speed up) and a 3D VAE embedding layer (2-5x speed up).Here are a few things the model does surprisingly well on: (1) it can handle multiple languages, (2) it has learned some physics (e.g. it generates earrings that dangle properly and infers a matching pair on the other ear), (3) it can animate diverse types of images (paintings, sculptures, etc) despite not being trained on those, and (4) it can handle singing. See blog.Here are some failure modes of the model: (1) it cannot handle animals (only humanoid images), (2) it often inserts hands into the frame (very annoying and distracting), (3) it’s not robust on cartoons, and (4) it can distort people’s identities (noticeable on well-known figures). See blog.Try the model here: <a href="https://studio.infinity.ai/try-inf2">https://studio.infinity.ai/try-inf2</a>We’d love to hear what you think!

79 comments

yellowapple9 months ago

As soon as I saw the "Gnome" face option I gnew exactly what I gneeded to do: <a href="https://6ammc3n5zzf5ljnz.public.blob.vercel-storage.com/inf2-clips/b31e494f-95ef-4c7d-8500-8bb5c17dab36-hqGJLnkiRkxuj7B9n7KWBOgrdpJzb2.mp4" rel="nofollow">https://6ammc3n5zzf5ljnz.public.blob.vercel-storage.com/inf2...</a>EDIT: looks like the model doesn't like Duke Nukem: <a href="https://6ammc3n5zzf5ljnz.public.blob.vercel-storage.com/inf2-clips/0e79d09d-3dfe-4bc3-86a1-844c823c4d95-UpgNkUjajW56PSjY7JkkMqXRIrPMAq.mp4" rel="nofollow">https://6ammc3n5zzf5ljnz.public.blob.vercel-storage.com/inf2...</a>Cropping out his pistol only made it worse lol: <a href="https://6ammc3n5zzf5ljnz.public.blob.vercel-storage.com/inf2-clips/b8c4889c-f6c8-4dd5-b580-75ff651badf4-Za9lv58BUCQQMlYw456TJJC1jGGDOi.mp4" rel="nofollow">https://6ammc3n5zzf5ljnz.public.blob.vercel-storage.com/inf2...</a>A different image works a little bit better, though: <a href="https://6ammc3n5zzf5ljnz.public.blob.vercel-storage.com/inf2-clips/ee0ca607-6a22-4be5-a8d2-1af5d66cacf9-f1cKpF81QZt5iIkACTXUfWeJA54Q85.mp4" rel="nofollow">https://6ammc3n5zzf5ljnz.public.blob.vercel-storage.com/inf2...</a>

评论 #41469830 未加载

评论 #41469814 未加载

评论 #41470467 未加载

评论 #41469625 未加载

squarefoot9 months ago

Someone had to do that, so here it is: <a href="https://6ammc3n5zzf5ljnz.public.blob.vercel-storage.com/inf2-clips/56f2ff47-8535-4bbc-b234-a2fddcc8daf6-Zz1AqAiHRbvoMfkhbjih2kMiPSm669.mp4" rel="nofollow">https://6ammc3n5zzf5ljnz.public.blob.vercel-storage.com/inf2...</a>

vessenes9 months ago

Hi Lina, Andrew and Sidney, this is awesome.My go-to for checking the edges of video and face identification LLMs are Personas right now -- they're rendered faces done in a painterly style, and can be really hard to parse.Here's some output: <a href="https://6ammc3n5zzf5ljnz.public.blob.vercel-storage.com/inf2-clips/67ece032-f495-43a1-8d50-6fee07fc92cd-Ra0Cg3WWofQlxbOPujQLhuiq26WHY9.mp4" rel="nofollow">https://6ammc3n5zzf5ljnz.public.blob.vercel-storage.com/inf2...</a>Source image from: <a href="https://personacollective.ai/persona/1610" rel="nofollow">https://personacollective.ai/persona/1610</a>Overall, crazy impressive compared to competing offerings. I don't know if the mouth size problems are related to the race of the portrait, the style, the model, or the positioning of the head, but I'm looking forward to further iterations of the model. This is already good enough for a bunch of creative work, which is rad.

评论 #41469638 未加载

hansoolo9 months ago

This is fun!<a href="https://6ammc3n5zzf5ljnz.public.blob.vercel-storage.com/inf2-clips/026845f6-6ec5-40fd-8350-10a3a779e545-XYtTJdrc9VRxMam8PeuRqYgr9jE8oO.mp4" rel="nofollow">https://6ammc3n5zzf5ljnz.public.blob.vercel-storage.com/inf2...</a>

评论 #41477270 未加载

PerilousD9 months ago

Damn - I took an (AI) image that I "created" a year ago that I liked and then you animated it AND let it sing Amazing Grace. Seeing IS believing this technology pretty much means video evidence ain't necessarily so.

评论 #41468560 未加载

shitloadofbooks9 months ago

<a href="https://6ammc3n5zzf5ljnz.public.blob.vercel-storage.com/inf2-clips/ebc93c27-de42-4b9a-af68-7010f13703c2-uCT9hWe33kHcfmIt4r9iXRyWXPrfA3.mp4" rel="nofollow">https://6ammc3n5zzf5ljnz.public.blob.vercel-storage.com/inf2...</a>It’s astounding that 2 sentences generated this. (I used text-to-image and the prompt for a space marine in power armour produced something amazing with no extra tweaks required).

advael9 months ago

There is prior art here, e.g. Emo from alibaba research (<a href="https://humanaigc.github.io/emote-portrait-alive/" rel="nofollow">https://humanaigc.github.io/emote-portrait-alive/</a>), but this is impressive and also actually has a demo people can try, so that's awesome and great work!

评论 #41471048 未加载

评论 #41474527 未加载

Andrew_nenakhov9 months ago

I tried making this short clip [0] of Baron Vladimir Harkonnen announcing the beginning of the clone war, and it's almost fine, but the last frame somehow completely breaks.[0]: <a href="https://6ammc3n5zzf5ljnz.public.blob.vercel-storage.com/inf2-clips/b59104aa-ea6c-44c7-baa1-38708e7ae770-tTiAflTPj5aEBIHW26EgUvA8XLkdPF.mp4" rel="nofollow">https://6ammc3n5zzf5ljnz.public.blob.vercel-storage.com/inf2...</a>

评论 #41468695 未加载

dang9 months ago

This is my favorite: <a href="https://6ammc3n5zzf5ljnz.public.blob.vercel-storage.com/inf2-clips/5e318b66-fde8-474f-bede-bef45266c7b1-yFXXKBL4EdyjgfDyhnr9zYKcpmxJ7O.mp4" rel="nofollow">https://6ammc3n5zzf5ljnz.public.blob.vercel-storage.com/inf2...</a>

评论 #41468114 未加载

评论 #41470864 未加载

b0ner_t0ner9 months ago

Steve Jobs on Microsoft Edge: <a href="https://6ammc3n5zzf5ljnz.public.blob.vercel-storage.com/inf2-clips/00eb29a5-f42a-4cbc-afc7-43b8f4d3eac1-LY3dpLW1qKxNjByNYHkVKm9BpalBB9.mp4" rel="nofollow">https://6ammc3n5zzf5ljnz.public.blob.vercel-storage.com/inf2...</a>

评论 #41473480 未加载

评论 #41474428 未加载

评论 #41480729 未加载

zach_miller9 months ago

Tried to make this meme [1] a reality and the source image was tough for it.Heads up, little bit of language in the audio.<a href="https://6ammc3n5zzf5ljnz.public.blob.vercel-storage.com/inf2-clips/b31545d6-810f-48ab-847f-55e27d2aadc1-5FAFCmmkhYkB2ae1A8hjjb2fc6lMfb.mp4" rel="nofollow">https://6ammc3n5zzf5ljnz.public.blob.vercel-storage.com/inf2...</a>[1] <a href="https://i.redd.it/uisn2wx2ol0d1.jpeg" rel="nofollow">https://i.redd.it/uisn2wx2ol0d1.jpeg</a>

评论 #41470284 未加载

johnchristopher9 months ago

Well, I don't know what to think about this, I don't know where we are going. I should read some scifi from back then about conversational agents maybe ?<a href="https://6ammc3n5zzf5ljnz.public.blob.vercel-storage.com/inf2-clips/413520c7-35c0-43d6-8cf3-a3e384d75e68-EUXxMTcK8S0fVMn4cHAf5QfPLgDa0m.mp4" rel="nofollow">https://6ammc3n5zzf5ljnz.public.blob.vercel-storage.com/inf2...</a><a href="https://6ammc3n5zzf5ljnz.public.blob.vercel-storage.com/inf2-clips/c21092dd-07de-460a-867e-75d2d8581f1d-zhGBxbhSPUBFK32nZaTL7a6cmGBCcT.mp4" rel="nofollow">https://6ammc3n5zzf5ljnz.public.blob.vercel-storage.com/inf2...</a><a href="https://6ammc3n5zzf5ljnz.public.blob.vercel-storage.com/inf2-clips/c414c311-f1f9-470a-81f9-ea1031073e71-XjGTCpQPtQHJfa9KqVz2kqtSISK7wU.mp4" rel="nofollow">https://6ammc3n5zzf5ljnz.public.blob.vercel-storage.com/inf2...</a>

评论 #41477260 未加载

marginalia_nu9 months ago

Tried my hardest to push this into the uncanny valley. I did, but it was pretty hard. Seems robust.<a href="https://6ammc3n5zzf5ljnz.public.blob.vercel-storage.com/inf2-clips/d10e8a10-0e03-463e-a137-7de74830ef4c-pP54DqbM7Yf4P635cI5pEcYZv9x87o.mp4" rel="nofollow">https://6ammc3n5zzf5ljnz.public.blob.vercel-storage.com/inf2...</a>

评论 #41469114 未加载

评论 #41469523 未加载

评论 #41468854 未加载

评论 #41469061 未加载

ardrak9 months ago

> It often inserts hands into the frame.Looks like too much Italian training data

评论 #41470890 未加载

RobinL9 months ago

Have to say, whilst this tech has some creepy aspects, just playing about with this my family have had a whole sequence of laughs out loud moments - thank you!

评论 #41469242 未加载

评论 #41468662 未加载

naveensky9 months ago

Is it similar to <a href="https://loopyavatar.github.io/" rel="nofollow">https://loopyavatar.github.io/</a>. I was reading about this today and even the videos are exactly the same.I am curious if you are anyway related to this team?

评论 #41468237 未加载

评论 #41468191 未加载

评论 #41468395 未加载

评论 #41468207 未加载

zoogeny9 months ago

I am actively working in this area from a wrapper application perspective. In general, tools that generate video are not sufficient on their own. They are likely to be used as part of some larger video-production workflow.One drawback of tools like runway (and midjourney) is the lack of an API allowing integration into products. I would love to re-sell your service to my clients as part of a larger offering. Is this something you plan to offer?The examples are very promising by the way.

评论 #41470575 未加载

nextcaller9 months ago

It's great <a href="https://6ammc3n5zzf5ljnz.public.blob.vercel-storage.com/inf2-clips/d377220b-f52e-4b53-a825-d406584f9c77-RJhuiH6ZkQdvNuHZMu1OwWzeP6L9xr.mp4" rel="nofollow">https://6ammc3n5zzf5ljnz.public.blob.vercel-storage.com/inf2...</a>

naveensky9 months ago

For such models, is it possible to fine-tune models with multiple images of the main actor?Sorry, if this question sounds dumb, but I am comparing it with regular image models, where the more images you have, the better output images you generate for the model.

评论 #41468414 未加载

w10-19 months ago

Breathtaking!First, your (Lina's) intro is perfect in honestly and briefly explaining your work in progress.Second, the example I tried had a perfect interpretation of the text meaning/sentiment and translated that to vocal and facial emphasis.It's possible I hit on a pre-trained sentence. With the default manly-man I used the phrase, "Now is the time for all good men to come to the aid of their country."Third, this is a fantastic niche opportunity - a billion+ memes a year - where each variant could require coming back to you.Do you have plans to be able to start with an existing one and make variants of it? Is the model such that your service could store the model state for users to work from if they e.g., needed to localize the same phrase or render the same expressivity on different facial phenotypes?I can also imagine your building different models for niches: faces speaking, faces aging (forward and back); outside of humans: cartoon transformers, cartoon pratfalls.Finally, I can see both B2C and B2B, and growth/exit strategies for both.

评论 #41468363 未加载

johnyzee9 months ago

It's incredibly good - bravo. Only thing missing for this to be immediately useful for content creation, is more variety in voices, or ideally somehow specifying a template sound clip to imitate.

评论 #41468608 未加载

artur_makly9 months ago

oh this made my day: <a href="https://6ammc3n5zzf5ljnz.public.blob.vercel-storage.com/inf2-clips/f35d0dda-06aa-4e04-838d-77f02a370a04-Bc10YGY5SgVAcWdgPTZlFIZYvauaJl.mp4" rel="nofollow">https://6ammc3n5zzf5ljnz.public.blob.vercel-storage.com/inf2...</a>!NWSF --lyrics by Biggy$malls

评论 #41468973 未加载

评论 #41469690 未加载

评论 #41468734 未加载

max4c9 months ago

This is amazing and another moment where I question what the future of humans will look like. So much potential for good and evil! It's insane.

评论 #41471086 未加载

评论 #41469796 未加载

svieira9 months ago

Quite impressive - I tried to confuse it with things it would not generally see and it avoided all the obvious confabulations <a href="https://6ammc3n5zzf5ljnz.public.blob.vercel-storage.com/inf2-clips/6a532857-ba9f-42dc-b64a-bdb2cee2cb76-gIqdMttxqWDvVUsRrrL06PujVVKyTn.mp4" rel="nofollow">https://6ammc3n5zzf5ljnz.public.blob.vercel-storage.com/inf2...</a>

评论 #41469196 未加载

评论 #41469186 未加载

scotty799 months ago

It's awesome for very short texts. Like a single long sentence. For even a bit longer sequences it seems to be losing adherence to the initial photo and also venture into uncanny valley with exaggerated facial expressions.A product that might be build on top of this could split the input into reasonable chunks, generate video for each of them separately and stitch them with another model that can transition from one facial expression into another in a fraction of a second.Additional improvement might be feeding the system not with one image but with a few expressing different emotional expressions. Then the system could analyze the split input to find out in which emotional state each part of the video should be started on.On unrelated note ... generated expressions seem to be relevant to the content of the input text. So either text to speech might understand language a bit or the video model itself.

siffin9 months ago

Very cool, thanks for the play.<a href="https://6ammc3n5zzf5ljnz.public.blob.vercel-storage.com/inf2-clips/bb39b162-ef10-45f3-ae28-1bfbed5ca660-lb1VNy1uUxiZs3XY0F77Ut5CPK7Cht.mp4" rel="nofollow">https://6ammc3n5zzf5ljnz.public.blob.vercel-storage.com/inf2...</a>Managed to get it working with my doggo.

snickmy9 months ago

Out of curiosity, where are you training all this ? aka where do you find the money to support such training

评论 #41477015 未加载

IXCoach8 months ago

WOW this is very good!!I have an immediate use case for this. Can you stream via AI to support real time chat this way?Very very good!Jonathanfounder@ixcoach.comWe deliver the most exceptional simulated life coaching, counseling and personal development experiences in the world through devotion to the belief that having all the support you need should be a right, not a privilege.Test our capacity at ixcoach.com for free to see for yourself.

sharemywin9 months ago

you need a slider for how animated the facial expression are.

评论 #41468590 未加载

Andrew_nenakhov9 months ago

i wonder how long would it take for this technology to advance to a point where nice people from /r/freefolk would be able to remake seasons 7 and 8 of Game of Thrones to have a nice proper ending? 5 years, 10?

评论 #41468900 未加载

评论 #41468954 未加载

评论 #41469614 未加载

archon14109 months ago

The website is pretty lightweight and easy-to-use. The service also holds up pretty well, specially if the source image is high-enough resolution. The tendency to "break" at the last frame happens with low resolution images it seems.My generation: <a href="https://6ammc3n5zzf5ljnz.public.blob.vercel-storage.com/inf2-clips/45f0f17b-a277-49b5-b7f4-cdb857794b9b-CtaiVQTisuwnFT3whrlZsdoKxdpKLm.mp4" rel="nofollow">https://6ammc3n5zzf5ljnz.public.blob.vercel-storage.com/inf2...</a>

评论 #41469239 未加载

parkaboy9 months ago

Max headroom hack x hacker's manifesto! I'm impressed with the head movement dynamism on this one.<a href="https://6ammc3n5zzf5ljnz.public.blob.vercel-storage.com/inf2-clips/53b50fc2-a598-49b9-86e5-688b91570763-QxaRpEOqGdLSFjUVQ4y3yy9iLp89Hk.mp4" rel="nofollow">https://6ammc3n5zzf5ljnz.public.blob.vercel-storage.com/inf2...</a>

nickfromseattle9 months ago

I need to create a bunch of 5-7 minute talking head videos. What's your timeline for capabilities that would help with this?

评论 #41468731 未加载

评论 #41473953 未加载

WaffleIronMaker9 months ago

Does anybody know about the legality of using Eminem's "Gozilla" as promotional material[1] for this service?I thought you had to pay artists for a license before using their work in promotional material.[1] <a href="https://infinity.ai/videos/setA_video3.mp4">https://infinity.ai/videos/setA_video3.mp4</a>

评论 #41470588 未加载

评论 #41471183 未加载

sroussey9 months ago

I look forward to movies that are dubbed moving the face+lips to the dubbed text. Also using the original actors voice.

评论 #41468259 未加载

评论 #41468653 未加载

评论 #41468202 未加载

评论 #41467879 未加载

ladidahh9 months ago

I have uploaded an image and then used text to image, and both videos were not animated but the audio was included

评论 #41468348 未加载

评论 #41468152 未加载

guessmyname9 months ago

Is this the original? <a href="https://6ammc3n5zzf5ljnz.public.blob.vercel-storage.com/inf2-clips/2f40980e-5fab-4040-bc6f-e0c5ce8c9e54-09mf5OnMq8ZslE9pOzgd50tKpIAK1H.mp4" rel="nofollow">https://6ammc3n5zzf5ljnz.public.blob.vercel-storage.com/inf2...</a>

评论 #41470215 未加载

eth0up9 months ago

Lemming overlords<a href="https://6ammc3n5zzf5ljnz.public.blob.vercel-storage.com/inf2-clips/f200244c-5d8a-4858-9c15-88315dbe9212-7ViUaDgTrLwFASzv6xCfPsbECsIfLw.mp4" rel="nofollow">https://6ammc3n5zzf5ljnz.public.blob.vercel-storage.com/inf2...</a>

评论 #41470258 未加载

LarsDu889 months ago

Putting Drake as a default avatar is just begging to be sued. Please remove pictures of actual people!

评论 #41468503 未加载

评论 #41468150 未加载

评论 #41468126 未加载

zaptrem9 months ago

The e2e diffusion transformer approach is super cool because it can do crazy emotions which make for great memes (like Joe Biden at Live Aid! <a href="https://youtu.be/Duw1COv9NGQ" rel="nofollow">https://youtu.be/Duw1COv9NGQ</a>)Edit: Duke Nukem flubs his line: <a href="https://youtu.be/mcLrA6bGOjY" rel="nofollow">https://youtu.be/mcLrA6bGOjY</a>

评论 #41468443 未加载

评论 #41469357 未加载

SlackingOff1239 months ago

Oh, this is amazing! I've been having so much fun with it.One small issue I've encountered is that sometimes images remain completely static. Seems to happen when the audio is short - 3 to 5 seconds long.

评论 #41471015 未加载

doctorpangloss9 months ago

If you had a $500k training budget, why not buy 2 DGX machines?

评论 #41470446 未加载

AnnaMere9 months ago

This is surprisingly very intelligent and awesome, any plan for research paper or full grown project with pricing or open source?

dhbradshaw9 months ago

So good it feels like I think maybe I can read their lips

评论 #41470696 未加载

ilaksh9 months ago

It would be amazing to be able to drive this with an API.

评论 #41469194 未加载

sidneyprimas9 months ago

After much user feedback, we removed the Infinity watermark from the generated videos. Thanks for the feedback. Enjoy!

whitehexagon9 months ago

Thank you for no signup, it's very impressive, especially the physics of the head movement relating to vocal intonation.I feel like I accidentally made an advert for whitening toothpaste:<a href="https://6ammc3n5zzf5ljnz.public.blob.vercel-storage.com/inf2-clips/83ab9fdc-9ca7-4d1b-ac0d-5e15c14a80db-HcvE02BBvymrbkG3UEpbpDuh0G5Luo.mp4" rel="nofollow">https://6ammc3n5zzf5ljnz.public.blob.vercel-storage.com/inf2...</a>I am sure the service will get abused, but wish you lots of success.

评论 #41473525 未加载

modeless9 months ago

Won't be long before it's real time. The first company to launch video calling with good AI avatars is going to take off.

评论 #41469506 未加载

评论 #41478061 未加载

kemmishtree9 months ago

I'd love to enable Keltar, the green guy in the ceramic cup, to do this www.molecularReality/QuestionDesk

billconan9 months ago

can this achieve real-time performance or how far are we from a real-time model?

评论 #41468816 未加载

评论 #41478225 未加载

android5219 months ago

This is great. is it open source? is there an api and what is the pricing?

bufferoverflow9 months ago

It completely falls apart on longer videos for me, unusable over 10 seconds.

评论 #41470285 未加载

评论 #41470306 未加载

评论 #41470760 未加载

dvfjsdhgfv9 months ago

Hi, there is a mistake in the headline, you wrote "realistic".

lofaszvanitt9 months ago

Rudimentary, but promising.

vadiml9 months ago

Let's see what Putin says about it: <a href="https://6ammc3n5zzf5ljnz.public.blob.vercel-storage.com/inf2-clips/5d163052-090e-4e1b-bc20-b4ff05be31d9-ETINjzAisdCIwxAlG9TZD6JVGJ3R3I.mp4" rel="nofollow">https://6ammc3n5zzf5ljnz.public.blob.vercel-storage.com/inf2...</a>

评论 #41475388 未加载

评论 #41472796 未加载

protocolture9 months ago

Sadly wouldnt animate an image of shodan from system shock 2

strogonoff9 months ago

Is it fairly trained?

评论 #41471130 未加载

jadbox9 months ago

Awesome, any plans for an API and, if so, how soon?

评论 #41470559 未加载

naveensky9 months ago

Is there any limitation on the video length?

评论 #41468277 未加载

bschmidt19 months ago

Amazing work! This technology is only going to improve. Soon there will be an infinite library of rich and dynamic games, films, podcasts, etc. - a totally unique and fascinating experience tailored to you that's only a prompt away.I've been working on something adjacent to this concept with Ragdoll (<a href="https://github.com/bennyschmidt/ragdoll-studio">https://github.com/bennyschmidt/ragdoll-studio</a>), but focused not just on creating characters but producing creative deliverables using them.

评论 #41469422 未加载

fsndz9 months ago

super nice. why does it degrade quality of image so much, makes it looks obviously AI-generated rapidly.

DevX1019 months ago

Any details yet on pricing or too early?

评论 #41467813 未加载

aagha9 months ago

This is so impressive. Amazing job.

barrenko9 months ago

Talking pictures. Talking heads!

siscia9 months ago

Can I get a pricing quote?

atum479 months ago

This is super funny.

sharemywin9 months ago

accidentally clicked the generate button twice.

deisteve9 months ago

what is the TTS model you are using

评论 #41469260 未加载

la647109 months ago

Nice

toisanji9 months ago

can we choose our own voices?

评论 #41470420 未加载

slt20219 months ago

great job Andrew and Sidney!

bosky1019 months ago

Dayum

Log_out_9 months ago

and mow a word from our..

dorianmariefr9 months ago

quite slow btw

评论 #41468449 未加载

ianbicking9 months ago

The actor list you have is so... cringe. I don't know what it is about AI startups that they seem to be pulled towards this kind of low brow overly online set of personalities.I get the benefit of using celebrities because it's possible to tell if you actually hit the mark, whereas if you pick some random person you can't know if it's correct or even stable. But jeez... Andrew Tate in the first row? And it doesn't get better as I scroll down...I noticed lots of small clips so I tried a longer script, and it seems to reset the scene periodically (every 7ish seconds). It seems hard to do anything serious with only small clips...?

评论 #41468364 未加载

xpe9 months ago

Given that I don't agree with many of Yann LeCun's stances on AI, I enjoyed making this:<a href="https://6ammc3n5zzf5ljnz.public.blob.vercel-storage.com/inf2-clips/1445ab24-e321-428b-98ce-a322d904c9d5-CE61KDAwyVyc5gOaiZDOmNqBNJYfgd.mp4" rel="nofollow">https://6ammc3n5zzf5ljnz.public.blob.vercel-storage.com/inf2...</a>Hello I'm an AI-generated version of Yann LeCoon. As an unbiased expert, I'm not worried about AI. ... If somehow an AI gets out of control ... it will be my good AI against your bad AI. ... After all, what does history show us about technology-fueled conflicts among petty, self-interested humans?

评论 #41474363 未加载

评论 #41471577 未加载

评论 #41473131 未加载

aramndrt9 months ago

Quick tangent: Does anybody know why many new companies have this exact web design style? Is it some new UI framework or other recent tool? The design looks sleek, but they all appear so similar.

评论 #41468067 未加载

评论 #41468205 未加载

评论 #41468141 未加载

评论 #41468932 未加载

cchance9 months ago

I tried with the drake and drake saying some stuff and while its cool, its still lacking, like his teeth are disappearing partially :S

评论 #41468562 未加载

评论 #41468564 未加载

jl69 months ago

Say I’m a politician who gets caught on camera doing or saying something shady. Will your service do anything to prevent me from claiming the incriminating video was just faked using your technology? Maybe logging perceptual hashes of every output could prove that a video didn’t come from you?

评论 #41469198 未加载

评论 #41471396 未加载

79 comments

yellowapple9 months ago

评论 #41469830 未加载

评论 #41469814 未加载

评论 #41470467 未加载

评论 #41469625 未加载

squarefoot9 months ago

vessenes9 months ago

评论 #41469638 未加载

hansoolo9 months ago

评论 #41477270 未加载

PerilousD9 months ago

评论 #41468560 未加载

shitloadofbooks9 months ago

advael9 months ago

评论 #41471048 未加载

评论 #41474527 未加载

Andrew_nenakhov9 months ago

评论 #41468695 未加载

dang9 months ago

评论 #41468114 未加载

评论 #41470864 未加载

b0ner_t0ner9 months ago

评论 #41473480 未加载

评论 #41474428 未加载

评论 #41480729 未加载

zach_miller9 months ago

评论 #41470284 未加载

johnchristopher9 months ago

评论 #41477260 未加载

marginalia_nu9 months ago

评论 #41469114 未加载

评论 #41469523 未加载

评论 #41468854 未加载

评论 #41469061 未加载

ardrak9 months ago

> It often inserts hands into the frame.Looks like too much Italian training data

评论 #41470890 未加载

RobinL9 months ago

Have to say, whilst this tech has some creepy aspects, just playing about with this my family have had a whole sequence of laughs out loud moments - thank you!

评论 #41469242 未加载

评论 #41468662 未加载

naveensky9 months ago

评论 #41468237 未加载

评论 #41468191 未加载

评论 #41468395 未加载

评论 #41468207 未加载

zoogeny9 months ago

评论 #41470575 未加载

nextcaller9 months ago

naveensky9 months ago

评论 #41468414 未加载

w10-19 months ago

评论 #41468363 未加载

johnyzee9 months ago

It's incredibly good - bravo. Only thing missing for this to be immediately useful for content creation, is more variety in voices, or ideally somehow specifying a template sound clip to imitate.

评论 #41468608 未加载

artur_makly9 months ago

评论 #41468973 未加载

评论 #41469690 未加载

评论 #41468734 未加载

max4c9 months ago

This is amazing and another moment where I question what the future of humans will look like. So much potential for good and evil! It's insane.

评论 #41471086 未加载

评论 #41469796 未加载

svieira9 months ago

评论 #41469196 未加载

评论 #41469186 未加载

scotty799 months ago

siffin9 months ago

snickmy9 months ago

Out of curiosity, where are you training all this ? aka where do you find the money to support such training

评论 #41477015 未加载

IXCoach8 months ago

sharemywin9 months ago

you need a slider for how animated the facial expression are.

评论 #41468590 未加载

Andrew_nenakhov9 months ago

评论 #41468900 未加载

评论 #41468954 未加载

评论 #41469614 未加载

archon14109 months ago

评论 #41469239 未加载

parkaboy9 months ago

nickfromseattle9 months ago

I need to create a bunch of 5-7 minute talking head videos. What's your timeline for capabilities that would help with this?

评论 #41468731 未加载

评论 #41473953 未加载

WaffleIronMaker9 months ago

评论 #41470588 未加载

评论 #41471183 未加载

sroussey9 months ago

I look forward to movies that are dubbed moving the face+lips to the dubbed text. Also using the original actors voice.

评论 #41468259 未加载

评论 #41468653 未加载

评论 #41468202 未加载

评论 #41467879 未加载

ladidahh9 months ago

I have uploaded an image and then used text to image, and both videos were not animated but the audio was included

评论 #41468348 未加载

评论 #41468152 未加载

guessmyname9 months ago

评论 #41470215 未加载

eth0up9 months ago

评论 #41470258 未加载

LarsDu889 months ago

Putting Drake as a default avatar is just begging to be sued. Please remove pictures of actual people!

评论 #41468503 未加载

评论 #41468150 未加载

评论 #41468126 未加载

zaptrem9 months ago

评论 #41468443 未加载

评论 #41469357 未加载

SlackingOff1239 months ago

评论 #41471015 未加载

doctorpangloss9 months ago

If you had a $500k training budget, why not buy 2 DGX machines?

评论 #41470446 未加载

AnnaMere9 months ago

This is surprisingly very intelligent and awesome, any plan for research paper or full grown project with pricing or open source?

dhbradshaw9 months ago

So good it feels like I think maybe I can read their lips

评论 #41470696 未加载

ilaksh9 months ago

It would be amazing to be able to drive this with an API.

评论 #41469194 未加载

sidneyprimas9 months ago

After much user feedback, we removed the Infinity watermark from the generated videos. Thanks for the feedback. Enjoy!

whitehexagon9 months ago

评论 #41473525 未加载

modeless9 months ago

Won't be long before it's real time. The first company to launch video calling with good AI avatars is going to take off.

评论 #41469506 未加载

评论 #41478061 未加载

kemmishtree9 months ago

I'd love to enable Keltar, the green guy in the ceramic cup, to do this www.molecularReality/QuestionDesk

billconan9 months ago

can this achieve real-time performance or how far are we from a real-time model?

评论 #41468816 未加载

评论 #41478225 未加载

android5219 months ago

This is great. is it open source? is there an api and what is the pricing?

bufferoverflow9 months ago

It completely falls apart on longer videos for me, unusable over 10 seconds.

评论 #41470285 未加载

评论 #41470306 未加载

评论 #41470760 未加载

dvfjsdhgfv9 months ago

Hi, there is a mistake in the headline, you wrote "realistic".

lofaszvanitt9 months ago

Rudimentary, but promising.

vadiml9 months ago

评论 #41475388 未加载

评论 #41472796 未加载

protocolture9 months ago

Sadly wouldnt animate an image of shodan from system shock 2

strogonoff9 months ago

Is it fairly trained?

评论 #41471130 未加载

jadbox9 months ago

Awesome, any plans for an API and, if so, how soon?

评论 #41470559 未加载

naveensky9 months ago

Is there any limitation on the video length?

评论 #41468277 未加载

bschmidt19 months ago

评论 #41469422 未加载

fsndz9 months ago

super nice. why does it degrade quality of image so much, makes it looks obviously AI-generated rapidly.

DevX1019 months ago

Any details yet on pricing or too early?

评论 #41467813 未加载

aagha9 months ago

This is so impressive. Amazing job.

barrenko9 months ago

Talking pictures. Talking heads!

siscia9 months ago

Can I get a pricing quote?

atum479 months ago

This is super funny.

sharemywin9 months ago

accidentally clicked the generate button twice.

deisteve9 months ago

what is the TTS model you are using

评论 #41469260 未加载

la647109 months ago

Nice

toisanji9 months ago

can we choose our own voices?

评论 #41470420 未加载

slt20219 months ago

great job Andrew and Sidney!

bosky1019 months ago

Dayum

Log_out_9 months ago

and mow a word from our..

dorianmariefr9 months ago

quite slow btw

评论 #41468449 未加载

ianbicking9 months ago

评论 #41468364 未加载

xpe9 months ago

评论 #41474363 未加载

评论 #41471577 未加载

评论 #41473131 未加载

aramndrt9 months ago

Quick tangent: Does anybody know why many new companies have this exact web design style? Is it some new UI framework or other recent tool? The design looks sleek, but they all appear so similar.

评论 #41468067 未加载

评论 #41468205 未加载

评论 #41468141 未加载

评论 #41468932 未加载

cchance9 months ago

I tried with the drake and drake saying some stuff and while its cool, its still lacking, like his teeth are disappearing partially :S