TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

StoryDiffusion: Long-range image and video generation

233 点作者 doodlesdev大约 1 年前

22 条评论

schoen大约 1 年前
I looked very closely at the videos for a while and managed to find some minor continuity errors (like different numbers of buttons on people&#x27;s button-down shirts at different times, or different sizes or styles of earrings, or arguably different interpretations of which finger is which in an intermittently-obscured hand). I also think that the cycling woman&#x27;s shorts appear to cover more of her left leg than her right leg, although that&#x27;s not physically impossible, and the bear seemingly has a differently-sized canine tooth at different times.<p>But I guess it took me multiple minutes to find these problems, watching each video clip many times, rather than having any of them jump out at me. So, it&#x27;s not like literally full consistent object persistence, but at a casual viewing it was very persuasive.<p>Maybe people who shoot or edit video frequently would notice some of these problems more quickly, because they&#x27;re more attuned to looking for continuity problems?
评论 #40222491 未加载
评论 #40218790 未加载
评论 #40224040 未加载
评论 #40218917 未加载
评论 #40218873 未加载
评论 #40222651 未加载
评论 #40221022 未加载
评论 #40221785 未加载
samspenc大约 1 年前
Normally I don&#x27;t mind spelling errors - and there are plenty in the examples - but my question is, did the system really produce &quot;lunch&quot; when the prompt was &quot;they have launch at restraunt&quot; (verbatim from the sample)? I would imagine it got restaurant right, but I would have expected it to produce something like a rocket launch image instead of figuring out the author meant lunch.
评论 #40218531 未加载
评论 #40218581 未加载
评论 #40218933 未加载
评论 #40219566 未加载
hbbio大约 1 年前
GitHub link is not public yet?<p><a href="https:&#x2F;&#x2F;github.com&#x2F;HVision-NKU&#x2F;StoryDiffusion">https:&#x2F;&#x2F;github.com&#x2F;HVision-NKU&#x2F;StoryDiffusion</a>
评论 #40219302 未加载
评论 #40223271 未加载
评论 #40218725 未加载
smusamashah大约 1 年前
This is unbelievably good. Seems better than Sora even in terms of natural look and motion in videos.<p>The video of two girls talking seems so natural. There are some artifacts but the movement is so natural and clothes and other things around are not continuously changing.<p>I hope it does become open source, which i suspect it won&#x27;t because it&#x27;s coming from byte dance.
评论 #40221013 未加载
forgingahead大约 1 年前
Github link is broken, and I honestly find it frustrating that the only link to code is the <i>theme source and credits</i>?? Is it really that important to give the static page theme that much real estate instead of actual code release for the project?
speedgoose大约 1 年前
Is there a video of Will Smith eating spaghetti with this model?
LeoPanthera大约 1 年前
The rate of progress of generative AI is honestly quite scary.
评论 #40218516 未加载
keikobadthebad大约 1 年前
It&#x27;ll be good if the girl and the giant squirrel are ever seen in the same park at the same time.
MisterTea大约 1 年前
One day we won&#x27;t have 3D engines or GPU&#x27;s but AI chips that generate the scenes without calculating a single triangle or loading a single texture. We just stream in a scene, IP asset seeds provide the characters, plot and story. But even those can be generated in real-time. Video games, movies, anything will be on demand. No one will act. No one will draw. We will just sit and ask for more. Strange times.
评论 #40222707 未加载
topspin大约 1 年前
Love how under &quot;Multiple Characters Generation&quot; the white guy is &quot;A Man,&quot; whereas the someone else is &quot;An Asian Man.&quot; Reminds me of Daryl Gates and the &quot;normal people&quot; quote, thence patrol cars being called &quot;black and normals.&quot;
评论 #40218862 未加载
pmontra大约 1 年前
The Moon in the sky seen from the surface of the Moon is wrong? Poetic? Funny? Recursive? A demonstration that these models don&#x27;t understand anything? Add to the list.
brotherdusk大约 1 年前
sorry, i can&#x27;t access the repo and the pdf doesn&#x27;t have an href attr, is that by design?
zhoudaquan21大约 1 年前
Hi guys, thanks for your interest. The paper and the code are now released: <a href="https:&#x2F;&#x2F;github.com&#x2F;HVision-NKU&#x2F;StoryDiffusion">https:&#x2F;&#x2F;github.com&#x2F;HVision-NKU&#x2F;StoryDiffusion</a>. Currently, only the comics-related codes are made public. We are waiting for the company&#x27;s assessment for the release of the video-related codes.
gbickford大约 1 年前
It&#x27;s always disappointing when people publish things to GitHub without the intention of collaborating or sharing.
cykkkklz大约 1 年前
GitHub Page: <a href="https:&#x2F;&#x2F;github.com&#x2F;HVision-NKU&#x2F;StoryDiffusion">https:&#x2F;&#x2F;github.com&#x2F;HVision-NKU&#x2F;StoryDiffusion</a> Paper: <a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2405.01434" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2405.01434</a>
spywaregorilla大约 1 年前
How is this conceptually different from tracking an embedding for a single character or training a lora on it?
jerpint大约 1 年前
The videos look incredible, but a lot of the captions are riddled with grammar&#x2F;syntax mistakes that seem odd for a model to make of that quality.
gtoast大约 1 年前
Its really challenging to think of the positive, constructive uses for this technology without thiking of the myriad, life and societal effecting uses for this. Just interpersonally the use of this technology is heavily weighted towards destruction and deception. I don&#x27;t know where this ends or where researchers who release this technology think this will go, but I can&#x27;t imagine its going anywhere good for all of us.
29athrowaway大约 1 年前
Time for Microsoft Chat 2.0 it seems.
nephanth大约 1 年前
Um, the github link is a 404, and the paper link links to the webpage itself (— the paper is <i>not</i> on arxiv). Probably they put the website on too fast?
freefruit大约 1 年前
So is Amazon flooded with hyper niche e-books yet?
评论 #40219747 未加载
评论 #40218554 未加载
peteradio大约 1 年前
There is a video of two girls. One girl seems to be sticking out her tongue and then blowing a kiss, but the tongue is appearing again mid-kiss. Very arousing stuff I&#x27;ll say. Keep up the good work microsft or goggle or whoever made it.
评论 #40221103 未加载