OpenAI DevDay 2024 live blog

212 点作者 plurby7 个月前

20 条评论

qwertox7 个月前

> The Realtime API improves this by streaming audio inputs and outputs directly, enabling more natural conversational experiences. It can also handle interruptions automatically, much like Advanced Voice Mode in ChatGPT.> Under the hood, the Realtime API lets you create a persistent WebSocket connection to exchange messages with GPT-4o. The API supports function calling(opens in a new window), which makes it possible for voice assistants to respond to user requests by triggering actions or pulling in new context.-This sounds really interesting, and I see a great use cases for it. However, I'm wondering if the API provides a text transcription of both the input and output so that I can store the data directly in a database without needing to transcribe the audio separately.-Edit: Apparently it does.It sends `conversation.item.input_audio_transcription.completed` [0] events when the input transcription is done (I guess a couple of them in real-time)and `response.done` [1] with the response text.[0] <a href="https://platform.openai.com/docs/api-reference/realtime-server-events/conversation-item-input-audio-transcription-completed" rel="nofollow">https://platform.openai.com/docs/api-reference/realtime-serv...</a>[1] <a href="https://platform.openai.com/docs/api-reference/realtime-server-events/response-done" rel="nofollow">https://platform.openai.com/docs/api-reference/realtime-serv...</a>

评论 #41714081 未加载

评论 #41715209 未加载

评论 #41713202 未加载

siva77 个月前

I've never seen a company publishing consistently groundbreaking features at such a speed like this one. I really wonder how their teams work. It's unprecedented at what i've seen in 15 years software

评论 #41712839 未加载

评论 #41713332 未加载

评论 #41712873 未加载

评论 #41715865 未加载

评论 #41714637 未加载

ponty_rick7 个月前

> 11:43 Fields are generated in the same order that you defined them in the schema, even though JSON is supposed to ignore key order. This ensures you can implement things like chain-of-thought by adding those keys in the correct order in your schema design.Why not use an array of key value pairs if you want to maintain ordering without breaking traditional JSON rules?[ {key1:value1}, {key2:value2} ]

评论 #41713971 未加载

评论 #41713383 未加载

serjester7 个月前

The eval platform is a game changer.It's nice to have have a solution from OpenAI given how much they use a variant of this internally. I've tried like 5 YC startups and I don't think anyone's really solved this.There's the very real risk of vendor lock-in but quickly scanning the docs seems like it's a pretty portable implementation.

alach117 个月前

It's pretty amazing that they made prompt caching automatic. It's rare that a company gives a 50% discount without the customer explicitly requesting it! Of course... they might be retaining some margin, judging by their discount being 50% vs. Anthropic's 90%.

评论 #41714519 未加载

thenameless77417 个月前

Blog updates:- Introducing the Realtime API: <a href="https://openai.com/index/introducing-the-realtime-api/" rel="nofollow">https://openai.com/index/introducing-the-realtime-api/</a>- Introducing vision to the fine-tuning API: <a href="https://openai.com/index/introducing-vision-to-the-fine-tuning-api/" rel="nofollow">https://openai.com/index/introducing-vision-to-the-fine-tuni...</a>- Prompt Caching in the API: <a href="https://openai.com/index/api-prompt-caching/" rel="nofollow">https://openai.com/index/api-prompt-caching/</a>- Model Distillation in the API: <a href="https://openai.com/index/api-model-distillation/" rel="nofollow">https://openai.com/index/api-model-distillation/</a>Docs updates:- Realtime API: <a href="https://platform.openai.com/docs/guides/realtime" rel="nofollow">https://platform.openai.com/docs/guides/realtime</a>- Vision fine-tuning: <a href="https://platform.openai.com/docs/guides/fine-tuning/vision" rel="nofollow">https://platform.openai.com/docs/guides/fine-tuning/vision</a>- Prompt Caching: <a href="https://platform.openai.com/docs/guides/prompt-caching" rel="nofollow">https://platform.openai.com/docs/guides/prompt-caching</a>- Model Distillation: <a href="https://platform.openai.com/docs/guides/distillation" rel="nofollow">https://platform.openai.com/docs/guides/distillation</a>- Evaluating model performance: <a href="https://platform.openai.com/docs/guides/evals" rel="nofollow">https://platform.openai.com/docs/guides/evals</a>Additional updates from @OpenAIDevs: <a href="https://x.com/OpenAIDevs/status/1841175537060102396" rel="nofollow">https://x.com/OpenAIDevs/status/1841175537060102396</a>- New prompt generator on <a href="https://playground.openai.com" rel="nofollow">https://playground.openai.com</a>- Access to the o1 model is expanded to developers on usage tier 3, and rate limits are increased (to the same limits as GPT-4o)Additional updates from @OpenAI: <a href="https://x.com/OpenAI/status/1841179938642411582" rel="nofollow">https://x.com/OpenAI/status/1841179938642411582</a>- Advanced Voice is rolling out globally to ChatGPT Enterprise, Edu, and Team users. Free users will get a sneak peak of it (except EU).

评论 #41712419 未加载

1010087 个月前

I understand the Realtime API voice novelty, and the techonological achievement it is, but I don't see it from the product point of view. It looks like one of those startups finding a solution before knowing the problem.The two examples shown in the DevDay are the things I don't really want to do in the future. I don't want to talk to anybody, and I don't want to wait for their answer in a human form. That's why I order my food through an app or Whatsapp, or why I prefer to buy my tickets online. In the rare case I call to order food, it's because I have a weird question or a weird request (can I pick it up in X minutes? Can you prepare it in a different way?)I hope we don't start seeing apps using conversations as interfaces because it would really horrible (leaving aside the fact that a lot of people don't know how to communicate themselves, different accents, sound environments, etc), while clicking or typing work almost the same for everyone (at least much more normalized than talking)

评论 #41713998 未加载

评论 #41715887 未加载

评论 #41715176 未加载

评论 #41713597 未加载

评论 #41714101 未加载

superdisk7 个月前

Holy crud, I figured they would guard this for a long time and I was really salivating to make some stuff with it. The doors are wide open for all sorts of stuff now, Advanced Voice is the first feature since ChatGPT initially came out that really has my jaw on the floor.

评论 #41712600 未加载

minimaxir7 个月前

From the Realtime API blog post: <a href="https://openai.com/index/introducing-the-realtime-api/" rel="nofollow">https://openai.com/index/introducing-the-realtime-api/</a>> Audio in the Chat Completions API will be released in the coming weeks, as a new model `gpt-4o-audio-preview`. With `gpt-4o-audio-preview`, developers can input text or audio into GPT-4o and receive responses in text, audio, or both.> The Realtime API uses both text tokens and audio tokens. Text input tokens are priced at $5 per 1M and $20 per 1M output tokens. Audio input is priced at $100 per 1M tokens and output is $200 per 1M tokens. This equates to approximately $0.06 per minute of audio input and $0.24 per minute of audio output. Audio in the Chat Completions API will be the same price.As usual, OpenAI failed to emphasize the real-game changer feature at their Dev Day: audio output from the standard generation API.This has severe implications for text-to-speech apps, particularly if the audio output style is as steerable as the gpt-4o voice demos.

评论 #41712910 未加载

N_A_T_E7 个月前

I just need their API to be faster. 15-30 seconds per request using 4o-mini isn't good enough for responsive applications.

评论 #41714363 未加载

评论 #41715298 未加载

评论 #41714076 未加载

评论 #41715116 未加载

simonw7 个月前

For anyone who’s interested, I’ve written up details of how the underlying live blog system works here: <a href="https://til.simonwillison.net/django/live-blog" rel="nofollow">https://til.simonwillison.net/django/live-blog</a>

modeless7 个月前

I didn't expect an API for advanced voice so soon. That's pretty great. Here's the thing I was really wondering: Audio is $.06/min in, $.24/min out. Can't wait to try some language learning apps built with this. It'll also be fun for controlling robots.

sammyteee7 个月前

Loving these live updates, keep em coming! Thanks Simon!

nielsole7 个月前

> The first big announcement: a realtime API, providing the ability to use WebSockets to implement voice input and output against their models.I guess this is using their "old" turn-based voice system?

评论 #41712262 未加载

cedws7 个月前

WebSockets for realtime? WS is TCP based, wouldn’t it be better to use something UDP based if you want to optimise for latency?

og_kalu7 个月前

Image output for 4o in the API would be very nice but i'm not sure if that's at all in the cards.Audio output in the api now but you lose image input. Why ? That's a shame.

jbaudanza7 个月前

Interesting choice of a 24kHz sample rate for PCM audio. I wonder if the model was trained on 24kHz audio, rather than the usual 8/16kHz for ML models.

hidelooktropic7 个月前

Any word on increased weekly caps on o1 usage?

评论 #41714335 未加载

lysecret7 个月前

Using structured outputs for generative ui is such a cool idea does anyone know some cool web demos related to this ?

评论 #41714598 未加载

bigcat123456787 个月前

Seems mostly standard items so far.