OpenAI releases image generation in the API

487 点作者 themanmaran21 天前

39 条评论

When this was up yesterday I complained that the refusal rate was super high especially on government and military shaped tasks, and that this would only push contractors to use CN-developed open source models for work that could then be compromised.Today I'm discovering there is a tier of API access with virtually no content moderation available to companies working in that space. I have no idea how to go about requesting that tier of access, but have spoken to 4 different defense contractors in the last day who seem to already be using it.

评论 #43790339 未加载

评论 #43787445 未加载

评论 #43787671 未加载

评论 #43787674 未加载

评论 #43798251 未加载

评论 #43791523 未加载

评论 #43789036 未加载

评论 #43792690 未加载

评论 #43788118 未加载

评论 #43788629 未加载

评论 #43788099 未加载

johnyzee20 天前

I wanted to try this in the image playground, but I was told I have to add a payment method. When adding this, I was told I would also have to pay a minimum of $5. Did this. Then when trying to generate an image, I was told I would have to do "verification" of my organization (?). OK, I chose 'personal'. I was then told I have to complete the verification though some third party partner of OpenAI, which included giving permission to process my biometric information. Yeah, I don't want to try this that bad, but now I already paid you and have to struggle to figure out how to get my money back. Horrible UX.

评论 #43797508 未加载

评论 #43796411 未加载

tezza20 天前

For the curious I generated the same prompt for each of the quality types. ‘Auto’, ‘low’, ‘medium’, ‘high’.Prompt: “a cute dog hugs a cute cat”<a href="https://x.com/terrylurie/status/1915161141489136095" rel="nofollow">https://x.com/terrylurie/status/1915161141489136095</a>I also then showed a couple of DALL:E 3 images for comparison in a comment

评论 #43789387 未加载

评论 #43789398 未加载

评论 #43788813 未加载

评论 #43791814 未加载

评论 #43794039 未加载

film4220 天前

I generated 5 images in the playground. One using a text-only prompt and 4 using images from my phone. I spent $0.85 which isn't bad for a fun round of Studio Ghibli portraits for the family group chat, but too expensive to be used in a customer facing product.

评论 #43788402 未加载

alasano20 天前

I built a local playground for it if anyone is interested (your openai org needs to be verified btw..)<a href="https://github.com/Alasano/gpt-image-1-playground">https://github.com/Alasano/gpt-image-1-playground</a>Openai's Playground doesn't expose all the API options.Mine covers all options, has built in mask creation and cost tracking as well.

评论 #43867617 未加载

Imnimo20 天前

I'm curious what the applications are where people need to generate hundreds or thousands of these images. I like making Ghibli-esque versions of family photos as much as the next person, but I don't need to make them in volume. As far as I can recall, every time I've used image generation, it's been one-off things that I'm happy to do in the ChatGPT UI.

评论 #43787705 未加载

评论 #43789342 未加载

评论 #43787780 未加载

评论 #43792763 未加载

评论 #43787416 未加载

评论 #43791825 未加载

评论 #43789937 未加载

评论 #43789311 未加载

评论 #43788640 未加载

minimaxir20 天前

Pricing-wise, this API is going to be hard to justify the value unless you really can get value out of providing references. A generated `medium` 1024x1024 is $0.04/image, which is in the same cost class as Imagen 3 and Flux 1.1 Pro. Testing from their new playground (<a href="https://platform.openai.com/playground/images" rel="nofollow">https://platform.openai.com/playground/images</a>), the medium images are indeed lower quality than either of of two competitor models and still takes 15+ seconds to generate: <a href="https://x.com/minimaxir/status/1915114021466017830" rel="nofollow">https://x.com/minimaxir/status/1915114021466017830</a>Prompting the model is also substantially more different and difficult than traditional models, unsurprisingly given the way the model works. The traditional image tricks don't work out-of-the-box and I'm struggling to get something that works without significant prompt augmentation (which is what I suspect was used for the ChatGPT image generations)

评论 #43787463 未加载

评论 #43787606 未加载

评论 #43786910 未加载

评论 #43787191 未加载

评论 #43787755 未加载

评论 #43787144 未加载

评论 #43787282 未加载

评论 #43787608 未加载

评论 #43787033 未加载

评论 #43787430 未加载

评论 #43788777 未加载

badmonster20 天前

Usage of gpt-image-1 is priced per token, with separate pricing for text and image tokens:Text input tokens (prompt text): $5 per 1M tokens Image input tokens (input images): $10 per 1M tokens Image output tokens (generated images): $40 per 1M tokensIn practice, this translates to roughly $0.02, $0.07, and $0.19 per generated image for low, medium, and high-quality square images, respectively.that's a bit pricy for a startup.

评论 #43789526 未加载

jumploops20 天前

This new model is autoregression-based (similar to LLMs, token by token) rather than diffusion based, meaning that it adheres to text prompts with much higher accuracy.As an example, some users (myself included) of a generative image app were trying to make a picture of person in the pouch of a kangaroo.No matter what we prompted, we couldn’t get it to work.GPT-4o did it in one shot!

评论 #43788552 未加载

评论 #43788738 未加载

gervwyk20 天前

Great svg generation would be far more userful! For example, being able to edit svg images after generated by Ai would be quick to modify the last mile.. For our new website <a href="https://resonancy.io" rel="nofollow">https://resonancy.io</a> the simple svg workflow images created was still very much created by hand.. and trying various ai tools to make such images yielded shockingly bad off-brand results even when provided multiple examples. By far the best tool for this is still canva for us..Anyone know of an Ai model for generating svg images? Please share.

评论 #43787181 未加载

评论 #43787153 未加载

评论 #43787030 未加载

评论 #43788476 未加载

评论 #43787901 未加载

hombre_fatal20 天前

I would have expected an API like:<pre><code> let imageId = api.generateImage(prompt) let {url, isFinished} = api.imageInfo(id) </code></pre> But instead it's:<pre><code> let bytes = api.generateImage(prompt) </code></pre> It's interesting to me how AI APIs let you hold such a persistent, active connection. I'm so used to anything that takes more than a second becoming an async background process where you notify the recipient when it's ready.With Netflix, it makes sense that you can open a connection to some static content and receive gigabytes over it.But streaming tokens from a GPU is a much more active process. Especially in this case where you're waiting tens of seconds for an image to generate.

评论 #43795079 未加载

sebastiennight20 天前

Hmm seems pricey.What's the current state of the art for API generation of an image from a reference plus modifier prompt?Say, in the 1c per HD (1920*1080) image range?

评论 #43786980 未加载

qhwudbebd20 天前

I hope the images support in the responses API is more competently executed than the mess piling up in the v1/images/generations endpoint.To pick an example, we have a model parameter and a response_format parameter. The response_format parameter selects whether image data should be returned as a URL (old method) or directly, base64-encoded. The new model only supports base64, whereas the old models default to a URL return, which is fine and understandable.But the endpoint refuses to accept any value for response_format including b64_json with the new model, so you can't set-and-forget the new behaviour and allow the model to be parameterised without worrying about it. Instead, you have to request the new behaviour with the older models, and not request it (but still get it) with the new one. sigh

评论 #43794337 未加载

_pdp_20 天前

We have integrated it into our platform and we already have use-cases for it to help create ads and other marketing material.However, while being better than my other models, it is not perfect. The image edit api will make a similar looking picture (even with masking) but exactly the same with some modifications.

gitroom20 天前

Man, pain in the ass just to try an image API, and then all these hoops for payments, ID, even biometrics? Stuff like this always makes me think does anyone up top even try their own product? you figure all this extra friction just ends up pushing users somewhere else?

PeterStuer20 天前

My number one ask as am almost 2 year OpenAI in production user: Enable Tool Use in the API so I can evaluate OpenAI models in agentic environments without jumping through hoops.

评论 #43792694 未加载

claiir20 天前

> GoDaddy is actively experimenting to integrate image generation so customers can easily create logos that are editable [..]I remember meeting someone on Discord 1-2 years ago (?) working on a GoDaddy effort to have customer-generated icons using bespoke foundation image gen models? Suppose that kind of bespoke model at that scale is ripe for replacement by gpt-image-1, given the instruction-following ability / steerability?

greatgib20 天前

Any one has an idea of what represent an "image token" for the pricing? Is it a block of an image from a given fixed size?

JPKab20 天前

As a paying customer, you get completely hosed every time they add a new feature for the non-paying users.The website is barely responding today, and the Desktop client always has massively degraded performance. Really annoying having their desire for user growth killing the experience for those of us who are financing it.

verelo20 天前

“ Editing videos: invideo enables millions of users to transform their ideas into videos using AI. With the integration of gpt-image-1, the platform now offers improved text generation, fine-grain editing controls, and advanced style guidance.”Does this mean this also does video in some manner?

MisterBiggs20 天前

Lots of comments on the price being too high, what are the odds this is a subsidized bare metal cost?

评论 #43787084 未加载

ChaitanyaSai20 天前

Almost every image has a yellow tint. Any discussion of why and when that's being fixed?

评论 #43791910 未加载

pknerd20 天前

I would like to know some resources about prompt engineering to use the Image gen module by OpenAI, especially for products related to images or Ads.PS: Does anyone know a good LLM/service to turn images into Videos?

hnthrowaway031520 天前

I wonder which model is the best to output standard 2d game resources:- N by N sprite sheets- Isometric sprite sheetsBasically anything that I can directly drop into my little game engine.

评论 #43792440 未加载

评论 #43792366 未加载

scyzoryk_xyz20 天前

Intelligence is fast approaching utility status.

jonplackett20 天前

Does anyone know if you can give this endpoint an image as input along with text - not just an image to mask, but an image as part of a text input description.I can’t see a way to do this currently, you just get a prompt.This, I think, is the most powerful way to use the new image model since it actually understands the input image and can make a new one based on it.Eg you can give it a person sitting at a desk and it can make one of them standing up. Or from another angle. Or in the moon.

评论 #43787573 未加载

评论 #43787660 未加载

jeevships20 天前

Genuinely curious, why would someone buy from your gpt image wrapper when they can just create it in gpt themselves?

评论 #43789170 未加载

评论 #43789101 未加载

评论 #43792028 未加载

drakenot20 天前

Does the AI have the same content restrictions that the chat service does?

评论 #43793764 未加载

GaggiX20 天前

Far too expensive, I think I will wait for an equivalent Gemini model.

gcrfelix20 天前

lesson: never build your moat around optimizing the existing AI capability

p1dda20 天前

For how long can OpenAI beat the dead horse that is LLM

smrt20 天前

I don't understand why this api needs organization verification. More paperwork ahead. FacepalmPermissionDeniedError: Error code: 403 - {'error': {'message': 'To access gpt-image-1, please complete organization verification

评论 #43786988 未加载

评论 #43789776 未加载

system220 天前

Jesus, $0.19 for an image you may or may not use. I think it is still super expensive to be useful. I go through 10 AI images until I find a useful one. This might not work for everyone.

评论 #43799304 未加载

topaz020 天前

Criminally wasteful.

animanoir20 天前

Wow more AI slop

1oooqooq20 天前

aren't you all embarrassed seeing lame press releases of the most uninteresting things on the top of HN front page? i kinda feel bad.

评论 #43787246 未加载

评论 #43788422 未加载

hexo20 天前

Thank you for a great contribution to global warming.

pkulak20 天前

I don't get it. I've been using `dall-e-3` over the public API for a couple years now. Is this just a new model?EDIT: Oh, yes, that's what it appears to be. Is it better? Why would I switch?

评论 #43787002 未加载

评论 #43787050 未加载

评论 #43787394 未加载

评论 #43787503 未加载

rahulg20 天前

Been waiting for this to implement Ghibli, Muppets etc. in my WhatsApp bot that converts your photos into AI generated art. Check it out at <a href="https://artstudiobot.com" rel="nofollow">https://artstudiobot.com</a>. 80% vibe-coded, 20% engineer friend.

评论 #43792284 未加载