科技回声

8 条评论

tkgally3 个月前

At the end of his post, Simon mentions translation between human languages. While maybe not directly related to token limits, I just did a test in which both R1 and o3-mini got worse at translation in the latter half of a long text.I ran the test on Perplexity Pro, which hosts DeepSeek R1 in the U.S. and which has just added o3-mini as well. The text was a speech I translated a month ago from Japanese to English, preceded by a long prompt specifying the speech’s purpose and audience and the sort of style I wanted. (I am a professional Japanese-English translator with nearly four decades of experience. I have been testing and using LLMs for translation since early 2023.)An initial comparison of the output suggested that, while R1 didn’t seem bad, o3-mini produced a writing style closer to what I asked for in the prompt—smoother and more natural English.But then I noticed that the output length was 5,855 characters for R1, 9,052 characters for o3-mini, and 11,021 characters for my own polished version. Comparing the three translations side-by-side with the original Japanese, I discovered that R1 had omitted entire paragraphs toward the end of the speech, and that o3-mini had switched to a strange abbreviated style (using slashes instead of “and” between noun phrases, for example) toward the end as well. The vanilla versions of ChatGPT, Claude, and Gemini that I ran the same prompt and text through a month ago had had none of those problems.

评论 #42896222 未加载

评论 #42897170 未加载

评论 #42895821 未加载

评论 #42907850 未加载

评论 #42896413 未加载

评论 #42896113 未加载

kamikazeturtles3 个月前

There's a huge price difference between o3-mini and o1 ($4.40 vs $60 per million output tokens), what trade-offs in performance would justify such a large price gap?Are there specific use cases where o1's higher cost is justified anymore?

评论 #42895875 未加载

评论 #42895038 未加载

评论 #42894912 未加载

johngalt26003 个月前

So far ive been impressed.. seems to be in the same ballpark as r1 and claude for coding. I will have to gather more samples.. in this past week ive changed from using 100% claude exclusively (since 3.5) to hitting all the big boys: claude, r1, 4o (o3 now), and gemini flash. Then ill do a new chat that includes all of their generated solutions for additional context for a refactored final solution.R1 has upped the ante so Im hoping we continue to get more updates rapidly... they are getting quite good

xnx3 个月前

Hasn't Gemini pricing been lower than this (or even free) for awhile? <a href="https://ai.google.dev/pricing" rel="nofollow">https://ai.google.dev/pricing</a>

评论 #42894872 未加载

lysecret3 个月前

Hadn’t had much luck with o3. One thing that came to my mind with these test time compute models is that they have a tendency to “overthink” and “overcomplicate” things this is just a feeling for now but has anyone done some study on this? E.g. potentially degrading performance in simpler questions for these types of models?

submeta3 个月前

> The model accepts up to 200,000 tokens of input, an improvement on GPT-4o’s 128,000.So finally ChatGPT catches up with Claude which has a 200,000 token input limit ever since.Claude with its projects feature is my go to tool for working on projects that I have to work on for weeks and months. Now I see a possible alternative.

maxdo3 个月前

How would you rate it against Claude ? Didn’t test it yet, but o1 pro didn’t perform as good

评论 #42894757 未加载

评论 #42900646 未加载

brianbest1013 个月前

Open AI really needs to work on their naming conventions for these things.

评论 #42895159 未加载

8 条评论

tkgally3 个月前

评论 #42896222 未加载

评论 #42897170 未加载

评论 #42895821 未加载

评论 #42907850 未加载

评论 #42896413 未加载

评论 #42896113 未加载

kamikazeturtles3 个月前

评论 #42895875 未加载

评论 #42895038 未加载

评论 #42894912 未加载

johngalt26003 个月前

xnx3 个月前

Hasn't Gemini pricing been lower than this (or even free) for awhile? <a href="https://ai.google.dev/pricing" rel="nofollow">https://ai.google.dev/pricing</a>

评论 #42894872 未加载

lysecret3 个月前

submeta3 个月前

maxdo3 个月前

How would you rate it against Claude ? Didn’t test it yet, but o1 pro didn’t perform as good

Notes on OpenAI o3-mini

8 条评论

Notes on OpenAI o3-mini

8 条评论