TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

PixArt-α:A New Open-Source Text-to-Image Model Challenging SDXL and Dalle·3

77 点作者 liuxiaopai超过 1 年前

9 条评论

Animats超过 1 年前
This has problems usually not seen with current systems. It&#x27;s produced human characters with one thick leg and one thin leg. Three legs of different sizes. Three arms.<p>It can do humans in passive poses, but ask for an action shot and it botches it badly. It needs more training data on how bodies move. Maybe load it up with stills from dance, martial arts, and sports.
评论 #38257745 未加载
GaggiX超过 1 年前
The most interesting aspect of this model is that it is very training efficient: <a href="https:&#x2F;&#x2F;pixart-alpha.github.io&#x2F;" rel="nofollow noreferrer">https:&#x2F;&#x2F;pixart-alpha.github.io&#x2F;</a><p>It also has the same idea as Dalle 3 to train the model on synthetic captions.
ShamelessC超过 1 年前
Why name it PixArt when it covers a broader range of media than simply pixel art? Super confusing.
评论 #38259985 未加载
krasin超过 1 年前
The source code license is AGPL-3.0 license. Perfect for these kinds of models: <a href="https:&#x2F;&#x2F;github.com&#x2F;PixArt-alpha&#x2F;PixArt-alpha">https:&#x2F;&#x2F;github.com&#x2F;PixArt-alpha&#x2F;PixArt-alpha</a>
评论 #38257072 未加载
gigel82超过 1 年前
From their GitHub:<p>&gt;This integration allows running the pipeline with a batch size of 4 under 11 GBs of GPU VRAM. GPU VRAM consumption under 10 GB will soon be supported, too. Stay tuned.
ilaksh超过 1 年前
Seems to have pretty good understanding and performance.
camdenlock超过 1 年前
This appears to be work sponsored by Huawei.
评论 #38257204 未加载
andromeduck超过 1 年前
Thought this was going to be a new optical sensor series :(
philmitchell47超过 1 年前
I think it&#x27;s kind of disingenuous maybe to claim such improvements in training efficiency when they rely on:<p>- Existing models for data pseudo-labelling<p>- ImageNet pretraining<p>- A frozen text encoder<p>- A frozen image encoder