Create and edit images with Gemini 2.0 in preview

245 포인트작성자: meetpateltech1일 전

24 comments

I've added/tested this multimodal Gemini 2.0 to my shoot-out of SOTA image gen models (OpenAI 4o, Midjourney 7, Flux, etc.) which contains a collection of increasingly difficult prompts.<a href="https://genai-showdown.specr.net" rel="nofollow">https://genai-showdown.specr.net</a>I don't know how much of Google's original Imagen 3.0 is incorporated into this new model, but the overall aesthetic quality seems to be unfortunately significantly worse.The big "wins" are:- Multimodal aspect in trying to keep parity with OpenAI's offerings.- An order of magnitude faster than OpenAI 4o image gen

评论 #43926178 未加载

评论 #43922168 未加载

评论 #43922114 未加载

评论 #43924313 未加载

评论 #43921478 未加载

评论 #43922383 未加载

评论 #43925298 未加载

评论 #43919963 未加载

simonw약 22시간 전

Be a bit careful playing with this one. I tried this:<pre><code> curl -s -X POST \ "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash-preview-image-generation:generateContent?key=$GEMINI_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "contents": [{ "parts": [ {"text": "Provide a vegetarian recipe for butter chicken but with chickpeas not chicken and include many inline illustrations along the way"} ] }], "generationConfig":{"responseModalities":["TEXT","IMAGE"]} }' > /tmp/out.json </code></pre> And got back 41MB of JSON with 28 base64 images in it: <a href="https://gist.github.com/simonw/55894032b2c60b35f320b6a166ded493" rel="nofollow">https://gist.github.com/simonw/55894032b2c60b35f320b6a166ded...</a>At 4c per image that's more than a dollar on that single prompt.I built this quick tool <a href="https://tools.simonwillison.net/gemini-image-json" rel="nofollow">https://tools.simonwillison.net/gemini-image-json</a> for pasting that JSON into to see it rendered.

评论 #43921618 未加载

eminence321일 전

This seems neat, I guess. But whenever I try tools like this, I often run into the limits of what I can describe in words. I might try something like "Add some clutter to the desk, including stacks of paper and notebooks" but when it doesn't quite look like what I want, I'm not sure what else to do except try slightly different wordings until the output happens to land on what I want.I'm sure part of this is a lack of imagination on my part about how to describe the vague image in my own head. But I guess I have a lot of doubts about using a conversational interface for this kind of stuff

评论 #43918568 未加载

评论 #43919210 未加载

评论 #43918682 未加载

评论 #43923071 未加载

评论 #43919380 未加载

评论 #43920330 未加载

评论 #43919502 未加载

评论 #43921527 未加载

评论 #43919580 未加载

cush1일 전

The doodle demo is super fun<a href="https://aistudio.google.com/apps/bundled/gemini-co-drawing?showPreview=true" rel="nofollow">https://aistudio.google.com/apps/bundled/gemini-co-drawing?s...</a>

评论 #43919626 未加载

minimaxir1일 전

Of note is that the per-image pricing for Gemini 2.0 image generation is $0.039 per image, which is more expensive than Imagen 3 ($0.03 per image): <a href="https://ai.google.dev/gemini-api/docs/pricing" rel="nofollow">https://ai.google.dev/gemini-api/docs/pricing</a>The main difference is that Gemini does allow for incorporating a conversation to generate the image as demoed here, while Imagen 3 is a strict text-in/image-out with optional mask-constrained edits but likely allows for higher-quality images overall if skilled with prompt engineering. This is a nuance that is annoying to differentiate.

评论 #43919715 未加载

评论 #43919160 未加载

mkl약 17시간 전

> what the lamp from the second image would look like on the desk from the first imageThe lamp is put on a different desk in a totally different room, with AI mush in the foreground. Props for not cherry-picking a first example, I guess. The sofa colour one is somehow much better, with a less specific instruction.

评论 #43926822 未加载

评论 #43922888 未加载

Yiling-J약 20시간 전

I generated 100 recipes with images using gemini-2.0-flash and gemini-2.0-flash-exp-image-generation as a demo of text+image generation in my open-source project: <a href="https://github.com/Yiling-J/tablepilot/tree/main/examples/100_recipes">https://github.com/Yiling-J/tablepilot/tree/main/examples/10...</a>You can see the full table with images here: <a href="https://tabulator-ai.notion.site/1df2066c65b580e9ad76dbd12ae7a590?v=1df2066c65b58093bd1a000ccfe702ed" rel="nofollow">https://tabulator-ai.notion.site/1df2066c65b580e9ad76dbd12ae...</a>I think the results came out quiet well. Be aware I don't generate a text prompt based on row data for image generation. Instead, the raw row data(ingredients, instructions...) and table metadata(column names and descriptions) are sent directly to gemini-2.0-flash-exp-image-generation.

thornewolf1일 전

Model outputs look good-ish. I think they are neat. I updated my recent hack project <a href="https://lifestyle.photo" rel="nofollow">https://lifestyle.photo</a> to the new model. It's middling-to-good.There are a lot of failure modes still but what I want is a very large cookbook showing what known-good workflows are. Since this is just so directly downstream of (limited) training data it might be that I am just prompting in a ever so slightly bad way.

评论 #43918796 未加载

评论 #43918607 未加载

mNovak1일 전

I'm getting mixed results with the co-drawing demo, in terms of understanding what stick figures are, which seems pretty important for the 99% of us who can't draw a realistic human. I was hoping to sketch a scene, and let the model "inflate" it, but I ended up with 3D rendered stick figures.Seems to help if you explicitly describe the scene, but then the drawing-along aspect seem relatively pointless.

voidUpdate약 12시간 전

1 doesn't actually really show how the lamp would look in that situation... in the first image it's about the same height as the sofa. I'd expect it to be at least twice the size that it is in the second image. Also what is going on underneath the table?

评论 #43924161 未加载

评论 #43924977 未加载

pentagrama1일 전

I want to take a step back and reflect on what this actually shows us. Look at the examples Google provides: it refers to the generated objects as "products", clearly pointing toward shopping or e-commerce use cases.It seems like the real goal here, for Google and other AI companies, is a world flooded with endless AI-generated variants of objects that don’t even exist yet, crafted to be sold and marketed (probably by AI too) to hyper-targeted audiences. This feels like an incoming wave of "AI slop", mass-produced synthetic content, crashing against the small island of genuine human craftsmanship and real, existing objects.

评论 #43923472 未加载

评论 #43921452 未加载

评论 #43919692 未加载

评论 #43920258 未加载

ohadron1일 전

For one thing, it's way faster than the OpenAI equivalent in a way that might unlock additional use cases.

评论 #43918535 未加载

评论 #43919768 未加载

egamirorrim1일 전

I don't understand how to use this, I keep trying to edit a photo (change a jacket to a t-shirt) of myself in the Gemini app with 2.0 flash selected and it just generated a new image that's nothing like the original

评论 #43918775 未加载

评论 #43918482 未加载

评论 #43919748 未加载

cthulberg약 10시간 전

gemini-2.0-flash-*-image-generation models are not currently supported in a number of countries in Europe, Middle East & Africasource: <a href="https://ai.google.dev/gemini-api/docs/models#gemini-2.0-flash" rel="nofollow">https://ai.google.dev/gemini-api/docs/models#gemini-2.0-flas...</a> and my Google Ai Studio

qq991일 전

Wasn't this already available in AI Studio? It sounds like they also improved the image quality. It's hard to keep up with what's new with all these versions

simonw약 21시간 전

Posted some notes from trying this out here, including examples of the images it produced and a tool for rendering the JSON <a href="https://simonwillison.net/2025/May/7/gemini-images-preview/" rel="nofollow">https://simonwillison.net/2025/May/7/gemini-images-preview/</a>

Tsarp약 17시간 전

There are direct prompt tests and then there are tests with tooling.If for example you use controlnets you can pretty much get very close to a style composition that you need with an open model like Flux that will be far better. Flux has a few successors coming up now

emporas약 17시간 전

I use gemini to create covers for songs/albums i make, with beautiful typography. Something like this [1]. I was dying of curiosity, how ideogram managed to create such gorgeous images. I figured it out 2 days ago.I take an image with some desired colors or typography from an already existing music album or from Ideogram's poster section. I pass it to gemini and give the command:"describe the texture of the picture, all the element and their position in the picture, left side, center right side, up and down, the color using rgb, the artistic style and the calligraphy or font of the letters"Then i take the result and pass it through an LLM, a different LLM because i don't like gemini that much, i find it is much less coherent than other models. I use qwen-qwq-32b usually and I take the description gemini outputs and give it to qwen:" write a similar description, but this time i want a surreal painting with several imaginative colors. Follow the example of image description, add several new and beautiful shapes of all elements and give all details, every side which brushstrokes it uses, and rgb colors it uses, the color palette of the elements of the page, i want it to be a pastel painting like the example, and don't put bioluminesence. I want it to be old style retro style mystery sci fi. Also i want to have a title of "Song Title" and describe the artistic font it uses and it's position in the painting, it should be designed as a drum n bass album cover "*Then i take the result and give it back to gemini with command: "Create an image with text "Song Title" for an album cover: here is the description of the rest of the album"If the resulting image is good, then it is time to add font, i take the new image description and pass it through qwen again, supposing the image description has fields Title and Typography:"rewrite the description and add full description of the letters and font of text, clean or distressed, jagged or fluid letters or any other property they might have, where they are overlayed, and make some new patterns about the letter appearance and how big they are and the material they are made of, rewrite the Title and Typography."I replace the previous description's section Title and Typography with the new description and create images with beautiful fonts.[1] <a href="https://imgur.com/a/8TCUJ75" rel="nofollow">https://imgur.com/a/8TCUJ75</a>

评论 #43926057 未加载

taylorhughes1일 전

Image editing/compositing/remixing is not quite as good as gpt-image-1, but the results are really compelling anyway due to the dramatic increase in speed! Playing with it just now, it's often 5 seconds for a compositing task between multiple images. Feels totally different from waiting 30s+ for gpt-image-1.

refulgentis1일 전

Another release from Google!Now I can use:- Gemini 2.0 Flash Image Generation Preview (May) instead of Gemini 2.0 Flash Image Generation Preview (March)- or when I need text, Gemini 2.5 Flash Thinking 04-17 Preview ("natively multimodal" w/o image generation)- When I need to control thinking budgets, I can do that with Gemini 2.5 Flash Preview 04-17, with not-thinking at a 50% price increase over a month prior- And when I need realtime, fallback to Gemini 2.0 Flash 001 Live Preview (announced as In Preview on April 9 2025 after the Multimodal Live API was announced as released on December 11 2024)- I can't control Gemini 2.5 Pro Experimental/Preview/Preview IO Edition's thinking budgets, but good news follows in the next bullet: they'll swap the model out underneath me with one that thinks ~10x less so at least its in the same cost ballpark as their competitors- and we all got autoupgraded from Gemini 2.5 Pro Preview (03/25 released 4/2) to Gemini 2.5 Pro Preview (IO Edition) yesterday! Yay!

评论 #43918778 未加载

评论 #43919169 未加载

评论 #43919583 未加载

GaggiX1일 전

Not available in the EU, first version was and then removed.Btw still not as good as ChatGPT but much, much faster, it's a nice progress compare to the previous model.

adverbly1일 전

Google totally crushing it and stock is down 8% today :|Is it just me or is the market just absolutely terrible at understanding the implications and speed of progress behind what's happening right now in the walls of big G?

评论 #43918957 未加载

评论 #43919445 未加载

评论 #43918943 未加载

mvdtnz약 21시간 전

I gave this a crack this morning, trying something very similar to the examples. I tried to get Gemini 2.0 Preview to add a set of bi-fold doors to a picture of a house in a particular place. It failed completely. It put them in the wrong place, they looked absolutely hideous (like I had pasted them in with MS Paint) and the more I tried to correct it with prompts the worse it got. At one point when I re-prompted it, it said> Okay, I understand. You want me to replace ONLY the four windows located underneath the arched openings on the right side of the house with bifold doors, leaving all other features of the house unchanged. Here is the edited image:Followed by no image. This is a behaviour I have seen many times from Gemini in the past so it's frustrating that it's still a problem.I give this a 0/10 for my first use case.

jansan1일 전

Some examples are quite impressive, but the one with the ice bear on the white mug is very underwhelming and the co-drawing looks like it was hacked together by a vibe coder.

评论 #43922890 未加载

评论 #43918466 未加载