Show HN: Open Prompts – dataset of 10M Stable Diffusion generations

279 点作者 vipermu超过 2 年前

Open Prompts is the dataset used to build krea.ai. The data comes from the Stability AI Discord and includes around 10M images from 2M prompts. You can use it for creating semantic search engines of prompts, training LLMs, fine-tuning image-to-text models like BLIP, or extracting insights from the data—like the most common combinations of modifiers.

20 条评论

Samin100超过 2 年前

Great work! If anyone’s planning to use AI generated artwork in their projects, I’ve added an image search API to Lexica, similar to Unsplash. All the images are licensed CC0 and millions more are being added every few weeks.Docs here: <a href="https://lexica.art/docs" rel="nofollow">https://lexica.art/docs</a>

评论 #32945473 未加载

评论 #32945062 未加载

评论 #32944563 未加载

评论 #32947533 未加载

password4321超过 2 年前

Show HN: I made 7k images with DALL-E 2 to create a reference/inspiration table<a href="https://generrated.com" rel="nofollow">https://generrated.com</a> <a href="https://news.ycombinator.com/item?id=32824448" rel="nofollow">https://news.ycombinator.com/item?id=32824448</a>

评论 #32946370 未加载

评论 #32945885 未加载

评论 #32949698 未加载

Oras超过 2 年前

This is fantastic. A few days ago I was checking PromptBase [0] and thought it was a really good idea. Yours just took it to the next level being free with massive amount of data.Great work.[0] <a href="https://promptbase.com/" rel="nofollow">https://promptbase.com/</a>

评论 #32944004 未加载

评论 #32943923 未加载

评论 #32950073 未加载

whalesalad超过 2 年前

This is wild to me. Now we have meta-ai that is surrounding other forms of ai, analyzing the user submitted input as well as the image output, using ai to infer intent, identify nouns etc... and yet all of this is stipulated on the initial datasets that these initial text->img robots were trained on which may or may not be a true representation of our actual culture. So we are lava/magma layering all of these approximations on top of each other and gluing them with scrambled eggs. I think this is all really cool, for the record, it's just something I have been thinking about. For art, I love it, for a self-driving vehicle, lmao.

评论 #32944905 未加载

nextaccountic超过 2 年前

This is fantastic, thanks for publishing it. I'm glad many players in the Stable Diffusion ecosystem is striving for openness (not only of the model itself but there are also open source frontends and related tooling)

评论 #32945026 未加载

davidkunz超过 2 年前

Hi, just for you to know: The "krea.ai" link in the readme gives a 404: <a href="https://github.com/krea-ai/open-prompts/blob/main/krea.ai" rel="nofollow">https://github.com/krea-ai/open-prompts/blob/main/krea.ai</a>

评论 #32944591 未加载

smusamashah超过 2 年前

I want back to my ai creations on NightCafe which I did 5 months ago. Almost all of them now looked pretty ugly/stupid now.It was same as when I saw game play of NFS most wanted it looked so realistic, now it absolutely does not.This effect is amazing, don't know if it has a name though.

cercatrova超过 2 年前

How does it compare with <a href="https://lexica.art" rel="nofollow">https://lexica.art</a>?

评论 #32943742 未加载

评论 #32943677 未加载

maaaaattttt超过 2 年前

Is there any effort being done in rating prompts in regard to the image the model output and/or what the user chose as being a satisfactory image?I could (probably naively) imagine that this would be the next step in making these models even more pleasing to humans. Or at least in creating a GPT-based "companion" model that would suggest, from an initial subpar prompt, a prompt yielding better results.

评论 #32949682 未加载

rhacker超过 2 年前

someone should do a thing that lets strangers critique AI by allowing them to select:A person has a glitchy face in this photo. A person has a glitchy body in this photo. etc..and then train the AI to have a fixup pass.

评论 #32944434 未加载

评论 #32943982 未加载

评论 #32943929 未加载

mod超过 2 年前

Wish the back button could find my spot on the page.Fun to explore the prompts getting results similar to what I want. Great project.

评论 #32949666 未加载

masterspy7超过 2 年前

I'm curious, I've seen a few sites like this which grab from the Stability Discord. Is there a way to quickly scrape this amount of data from a Discord server?

评论 #32946394 未加载

nextaccountic超过 2 年前

Hey, was the specific Stable Diffusion version used to generate each image recorded anywhere in the dataset?In krea.ai, it doesn't say which version of the model was used to generate each imageIt appears that later versions are better in generating faces or something. Like, Stable Diffusion 1.5 vs 1.4 (I'm not sure but there's a great variability nonetheless and I wanted to know if the version of the model accounted for this)

评论 #32945845 未加载

评论 #32946402 未加载

hwers超过 2 年前

What’s interesting about datasets like this is that you can likely use it to distil an even more compressed SD generator from it.

评论 #32945441 未加载

jononor超过 2 年前

Nice. Another meta thing I would like to do, is to generate a bunch of prompts around a topic, mashed up with related or unrelated other topics. So that I can get a bunch of images and just be able to review/curate them all in one go. Does anyone know of tooling in that direction?

评论 #32944857 未加载

XorNot超过 2 年前

I'm very much looking forward to how collections like this influence the second generation AI models, which will include data like this and tend to rank it highly on alt-text/clip embedding alignment.

ipaddr超过 2 年前

When it comes to faces or people all photos default to horror.

dr_dshiv超过 2 年前

Radical. I’m imagining randomly sampling images and identifying the text attributes associated with human ratings of image beauty.

评论 #32945276 未加载

jaimex2超过 2 年前

There's also <a href="https://lexica.art/" rel="nofollow">https://lexica.art/</a>

评论 #32946551 未加载

Philomath超过 2 年前

That's amazing, thanks for sharing. For how long have you been gathering this data?

评论 #32943596 未加载

20 条评论

Samin100超过 2 年前

评论 #32945473 未加载

评论 #32945062 未加载

评论 #32944563 未加载

评论 #32947533 未加载

password4321超过 2 年前

评论 #32946370 未加载

评论 #32945885 未加载

评论 #32949698 未加载

Oras超过 2 年前

评论 #32944004 未加载

评论 #32943923 未加载

评论 #32950073 未加载

whalesalad超过 2 年前

评论 #32944905 未加载

nextaccountic超过 2 年前

评论 #32945026 未加载

davidkunz超过 2 年前

评论 #32944591 未加载

smusamashah超过 2 年前

cercatrova超过 2 年前

How does it compare with <a href="https://lexica.art" rel="nofollow">https://lexica.art</a>?

评论 #32943742 未加载

评论 #32943677 未加载

maaaaattttt超过 2 年前

评论 #32949682 未加载

rhacker超过 2 年前

评论 #32944434 未加载

评论 #32943982 未加载

评论 #32943929 未加载

mod超过 2 年前

Wish the back button could find my spot on the page.Fun to explore the prompts getting results similar to what I want. Great project.

评论 #32949666 未加载

masterspy7超过 2 年前

I'm curious, I've seen a few sites like this which grab from the Stability Discord. Is there a way to quickly scrape this amount of data from a Discord server?

评论 #32946394 未加载

nextaccountic超过 2 年前

评论 #32945845 未加载

评论 #32946402 未加载

hwers超过 2 年前

What’s interesting about datasets like this is that you can likely use it to distil an even more compressed SD generator from it.

评论 #32945441 未加载

jononor超过 2 年前

评论 #32944857 未加载

XorNot超过 2 年前

ipaddr超过 2 年前

When it comes to faces or people all photos default to horror.

dr_dshiv超过 2 年前

Radical. I’m imagining randomly sampling images and identifying the text attributes associated with human ratings of image beauty.

评论 #32945276 未加载

jaimex2超过 2 年前

There's also <a href="https://lexica.art/" rel="nofollow">https://lexica.art/</a>

评论 #32946551 未加载

Philomath超过 2 年前

That's amazing, thanks for sharing. For how long have you been gathering this data?

评论 #32943596 未加载