At the end of his post, Simon mentions translation between human languages. While maybe not directly related to token limits, I just did a test in which both R1 and o3-mini got worse at translation in the latter half of a long text.<p>I ran the test on Perplexity Pro, which hosts DeepSeek R1 in the U.S. and which has just added o3-mini as well. The text was a speech I translated a month ago from Japanese to English, preceded by a long prompt specifying the speech’s purpose and audience and the sort of style I wanted. (I am a professional Japanese-English translator with nearly four decades of experience. I have been testing and using LLMs for translation since early 2023.)<p>An initial comparison of the output suggested that, while R1 didn’t seem bad, o3-mini produced a writing style closer to what I asked for in the prompt—smoother and more natural English.<p>But then I noticed that the output length was 5,855 characters for R1, 9,052 characters for o3-mini, and 11,021 characters for my own polished version. Comparing the three translations side-by-side with the original Japanese, I discovered that R1 had omitted entire paragraphs toward the end of the speech, and that o3-mini had switched to a strange abbreviated style (using slashes instead of “and” between noun phrases, for example) toward the end as well. The vanilla versions of ChatGPT, Claude, and Gemini that I ran the same prompt and text through a month ago had had none of those problems.