TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Emu Video and Emu Edit, our latest generative AI research milestones

201 点作者 ot超过 1 年前

9 条评论

fpgaminer超过 1 年前
Somewhat tangential, but I hadn&#x27;t heard about the Emu model, which was apparently released (the paper [1] at least) in September. I was curious about the details and read the Emu paper and ... I feel like I&#x27;m taking crazy pills reading it.<p>&gt; To the best of our knowledge, this is the first work highlighting fine-tuning for generically promoting aesthetic alignment for a wide range of visual domains.<p>... unlike Stable Diffusion which did aesthetic fine tuning when it was released? Or like the thousands of aesthetic finetunes released since?<p>&gt; We show that the original 4-channel autoencoder design [27] is unable to reconstruct fine details. Increasing channel size leads to much better reconstructions.<p>Is it not expected that decreasing the compression ratio would lead to better reconstructions? The whole point of the latent diffusion architecture is to make a trade-off here. They&#x27;re more than welcome to do pixel diffusion if they want better quality, or upscaling architecture.<p>And then the rest of the paper is this long documentation that can be summed up as &quot;we used industry standard filtering and then human filtering to build an aesthetic dataset which we finetuned a model with&quot;. Which, again, has been done a thousand times already.<p>I really, really don&#x27;t mean to knock the researcher&#x27;s work here. I&#x27;m just very confused as to why the work is being represented as new or groundbreaking. Contrast to OAI which documents using a diffusion based latent decoder. That&#x27;s interesting, different, and worth publishing. Scaling up your latent space to get better results is just ... obvious? (As obvious as anything in ML is, anyway). Facebook&#x27;s research isn&#x27;t usually this off the mark. E.g. the Emu Edit paper is very interesting and contributes many new methods to the field.<p>[1] <a href="https:&#x2F;&#x2F;scontent-lax3-1.xx.fbcdn.net&#x2F;v&#x2F;t39.2365-6&#x2F;10000000_1099397624548149_16002132581482810_n.pdf?_nc_cat=110&amp;ccb=1-7&amp;_nc_sid=3c67a6&amp;_nc_ohc=mxfter4gnLgAX_0FnFD&amp;_nc_ht=scontent-lax3-1.xx&amp;oh=00_AfByfMkAByxJGLImPcGtMiBQtMsLU0e1ksDLvyqJW7yaPA&amp;oe=655B1F8F" rel="nofollow noreferrer">https:&#x2F;&#x2F;scontent-lax3-1.xx.fbcdn.net&#x2F;v&#x2F;t39.2365-6&#x2F;10000000_1...</a>
评论 #38298757 未加载
dougmwne超过 1 年前
Emu Edit is awesome. I think we have officially brought this scene from Star Trek to life.<p><a href="https:&#x2F;&#x2F;m.youtube.com&#x2F;watch?v=NXX0dKw4SjI&amp;pp=ygUII3Npbm50ZWs%3D">https:&#x2F;&#x2F;m.youtube.com&#x2F;watch?v=NXX0dKw4SjI&amp;pp=ygUII3Npbm50ZWs...</a>
评论 #38293623 未加载
评论 #38294152 未加载
评论 #38293855 未加载
评论 #38298119 未加载
评论 #38293507 未加载
morph123超过 1 年前
I wonder how far away we are from &quot;make a movie from a sentence&quot;.<p>2030?<p>Also why do these AI people always end with &quot;this does not replace anyone&quot;. Surely they do not believe this?
评论 #38297708 未加载
评论 #38298109 未加载
评论 #38297307 未加载
评论 #38297401 未加载
enonimal超过 1 年前
Is anyone able to determine how long it takes to generate a video with one of these methods? Can&#x27;t find it in the paper.
评论 #38296829 未加载
评论 #38295720 未加载
davesque超过 1 年前
Definitely looks like progress, but they&#x27;re still firmly in the center of the uncanny valley.
colesantiago超过 1 年前
Does anyone know where the source code is, I can&#x27;t seem to find it anywhere.
评论 #38292567 未加载
评论 #38292526 未加载
chasd00超过 1 年前
I never would have guessed the artists would be who AI took out first.
评论 #38300556 未加载
scudsworth超过 1 年前
a huge pile of money on fire forever
评论 #38296987 未加载
tomdell超过 1 年前
An impressive technical achievement, yes - but the presentation&#x2F;marketing of this is absurd.<p>The generated videos are aesthetically horrendous. I don&#x27;t know what kind of mental gymnastics are going on that they can confidently describe something where the body shapes are nonsensically in flux with every change of frame (look at the eagle&#x27;s talons, or the dog&#x27;s leg movements as it runs) as &quot;high-quality video&quot;.<p>Is generative AI hype blinding them to how hideous these videos are, or do they know and they just pretend like it&#x27;s something it isn&#x27;t?
评论 #38297444 未加载
评论 #38295948 未加载
评论 #38297001 未加载
评论 #38296520 未加载