TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Gemini Ultra is out. Does it beat GPT4? (~10k words of tests/observations)

1 点作者 COAGULOPATH超过 1 年前

1 comment

COAGULOPATH超过 1 年前
(They patched Gemini an hour after I finished writing this. My complaints of excessive refusals and model deception may no longer apply.)<p>Witness: - A chess match between Ultra and GPT4 (the first one ever, as far as I&#x27;m aware) - A Gemini vs GPT4 rap battle - Tests of general knowledge, recall, abstract reasoning, and code generation - Head-to-head contests of poetry and prose, plus style imitations of famous authors&#x2F;bloggers<p>I also investigate VERY IMPORTANT things such as:<p>- which model can create a more realistic ASCII cat? - which model is better at stacking eggs? - which model plays Wordle better? - which model SIMULATES Wordle better (with me playing)?<p>Obviously, a lot of my tests are a bit silly. We already know Ultra&#x27;s benchmarks, I&#x27;m trying to probe the gaps BETWEEN benchmarks, and figure out what the models are like &quot;on the ground&quot;.<p>Conventional wisdom holds that Ultra is another GPT4: this was not my experience. Switching from GPT4 to Ultra feels like switching character classes in an RPG; they are quite different, with distinct strengths and weaknesses.