TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

How Imagen Works

142 pointsby SleekEaglealmost 3 years ago

12 comments

skinner_almost 3 years ago
&gt; The central intuition in using T5 is that extremely large language models, by virtue of their sheer size alone, may still learn useful representations despite the fact that they are not explicitly trained with any text&#x2F;image task in mind. [...] Therefore, the central question being addressed by this choice is whether or not a massive language model trained on a massive dataset independent of the task of image generation is a worthwhile trade-off for a non-specialized text encoder. The Imagen authors bet on the side of the large language model, and it is a bet that seems to pay off well.<p>The way out of this dilemma is to fine-tune T5 on the caption dataset instead of keeping it frozen. The paper notes that they don&#x27;t do fine-tuning, but does not provide any ablation or other justification. I wonder if it would help or not.
评论 #31856214 未加载
varispeedalmost 3 years ago
&gt; is trained on hundreds of millions of images and their associated captions<p>So how do you get access to hundreds of millions of images and use them to create derivative works? Did they get consent from millions of authors?<p>Or is something like that only available to the rich with access to lawyers on tap?<p>I mean I can imagine if a nobody wanted to do something like this, they&#x27;d get bankrupted by having to deal with all the photographers &#x2F; artists spotting a tiny sliver of their art in the image produced by the model.<p>Furthermore, would something like this work with music? For instance, train the model on all Spotify songs and then generate songs based on &quot;Get me a Bach symphony played on sticks with someone rapping like Dr Dre with lisp.&quot; Or do music industry have enough money to bully anyone into not doing that?
评论 #31857176 未加载
评论 #31854094 未加载
评论 #31856950 未加载
评论 #31854157 未加载
astrangealmost 3 years ago
Is there a compare and contrast between Imagen and Parti anywhere? I realize the paper came out yesterday, but maybe other people remember what &quot;autoregressive&quot; means better than I do.
评论 #31862115 未加载
Workaccount2almost 3 years ago
I have shown imagen (and dalle2) to a number of people now (non-tech, just everyday friends, family, co-workers) and I have been pretty stunned by the response I get from most people:<p>&quot;Meh, that&#x27;s kinda cool? I guess?&quot; or &quot;What am I looking at?&quot;...&quot;Ok? So a computer made it? That seems neat&quot;<p>To me I am still trying to get my jaw off the floor from 2 months ago. But the responses have been so muted and shoulder shrugging that I think either I am missing something or they are missing something. Even really drilling in, practically shaking them &quot;DO YOU NOT UNDERSTAND THAT THIS IS A ORIGINAL IMAGE CONSTRUCTED ENTIRELY BY AN AI?!?!&quot; and people just seem to see it as a party trick at best.
评论 #31853969 未加载
评论 #31853157 未加载
评论 #31857109 未加载
评论 #31854077 未加载
评论 #31852787 未加载
评论 #31852925 未加载
评论 #31857389 未加载
评论 #31854773 未加载
评论 #31855431 未加载
评论 #31853766 未加载
评论 #31853787 未加载
评论 #31856951 未加载
评论 #31853976 未加载
评论 #31856655 未加载
评论 #31853291 未加载
coding123almost 3 years ago
Is this by a person that knows or is guessing?
评论 #31851026 未加载
评论 #31851784 未加载
评论 #31850561 未加载
评论 #31850577 未加载
sagarpatilalmost 3 years ago
I wonder how developers can monetise this? What use cases does it have?
natchalmost 3 years ago
&gt; Imagen, released just last month, can generate high-quality, high-resolution images given only a description of a scene<p>“Released”? What? Papers are published. Websites are published. Tools are “released.”<p>Where has Imagen been released?
评论 #31852358 未加载
alexcccccalmost 3 years ago
Super interesting
dubswithusalmost 3 years ago
If Google has something similar or better it definitely makes it look like OpenAI is wasting its time. None of this relates to AGI.
评论 #31851473 未加载
funstuff007almost 3 years ago
What&#x27;s the highest price paid for an AI-generated image NFT?
评论 #31856989 未加载
aceon48almost 3 years ago
AI is now creative
DonHopkinsalmost 3 years ago
Wait, this isn&#x27;t about the line of intelligent xeroxographic laser printers developed by Imagen Corporation in 1981, supporting the Impress printer language?<p><a href="https:&#x2F;&#x2F;tug.org&#x2F;TUGboat&#x2F;tb02-2&#x2F;tb03imagen.pdf" rel="nofollow">https:&#x2F;&#x2F;tug.org&#x2F;TUGboat&#x2F;tb02-2&#x2F;tb03imagen.pdf</a><p><a href="https:&#x2F;&#x2F;www.openprinting.org&#x2F;driver&#x2F;imagen" rel="nofollow">https:&#x2F;&#x2F;www.openprinting.org&#x2F;driver&#x2F;imagen</a>
评论 #31851761 未加载