TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Notes on OpenAI o3-mini

211 点作者 dtquad3 个月前

8 条评论

tkgally3 个月前
At the end of his post, Simon mentions translation between human languages. While maybe not directly related to token limits, I just did a test in which both R1 and o3-mini got worse at translation in the latter half of a long text.<p>I ran the test on Perplexity Pro, which hosts DeepSeek R1 in the U.S. and which has just added o3-mini as well. The text was a speech I translated a month ago from Japanese to English, preceded by a long prompt specifying the speech’s purpose and audience and the sort of style I wanted. (I am a professional Japanese-English translator with nearly four decades of experience. I have been testing and using LLMs for translation since early 2023.)<p>An initial comparison of the output suggested that, while R1 didn’t seem bad, o3-mini produced a writing style closer to what I asked for in the prompt—smoother and more natural English.<p>But then I noticed that the output length was 5,855 characters for R1, 9,052 characters for o3-mini, and 11,021 characters for my own polished version. Comparing the three translations side-by-side with the original Japanese, I discovered that R1 had omitted entire paragraphs toward the end of the speech, and that o3-mini had switched to a strange abbreviated style (using slashes instead of “and” between noun phrases, for example) toward the end as well. The vanilla versions of ChatGPT, Claude, and Gemini that I ran the same prompt and text through a month ago had had none of those problems.
评论 #42896222 未加载
评论 #42897170 未加载
评论 #42895821 未加载
评论 #42907850 未加载
评论 #42896413 未加载
评论 #42896113 未加载
kamikazeturtles3 个月前
There&#x27;s a huge price difference between o3-mini and o1 ($4.40 vs $60 per million output tokens), what trade-offs in performance would justify such a large price gap?<p>Are there specific use cases where o1&#x27;s higher cost is justified anymore?
评论 #42895875 未加载
评论 #42895038 未加载
评论 #42894912 未加载
johngalt26003 个月前
So far ive been impressed.. seems to be in the same ballpark as r1 and claude for coding. I will have to gather more samples.. in this past week ive changed from using 100% claude exclusively (since 3.5) to hitting all the big boys: claude, r1, 4o (o3 now), and gemini flash. Then ill do a new chat that includes all of their generated solutions for additional context for a refactored final solution.<p>R1 has upped the ante so Im hoping we continue to get more updates rapidly... they are getting quite good
xnx3 个月前
Hasn&#x27;t Gemini pricing been lower than this (or even free) for awhile? <a href="https:&#x2F;&#x2F;ai.google.dev&#x2F;pricing" rel="nofollow">https:&#x2F;&#x2F;ai.google.dev&#x2F;pricing</a>
评论 #42894872 未加载
lysecret3 个月前
Hadn’t had much luck with o3. One thing that came to my mind with these test time compute models is that they have a tendency to “overthink” and “overcomplicate” things this is just a feeling for now but has anyone done some study on this? E.g. potentially degrading performance in simpler questions for these types of models?
submeta3 个月前
&gt; The model accepts up to 200,000 tokens of input, an improvement on GPT-4o’s 128,000.<p>So finally ChatGPT catches up with Claude which has a 200,000 token input limit ever since.<p>Claude with its projects feature is my go to tool for working on projects that I have to work on for weeks and months. Now I see a possible alternative.
maxdo3 个月前
How would you rate it against Claude ? Didn’t test it yet, but o1 pro didn’t perform as good
评论 #42894757 未加载
评论 #42900646 未加载
brianbest1013 个月前
Open AI really needs to work on their naming conventions for these things.
评论 #42895159 未加载