TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Gemini Ultra is out. Does it beat GPT4? (~10k words of tests/observations)

1 pointsby COAGULOPATHover 1 year ago

1 comment

COAGULOPATHover 1 year ago
(They patched Gemini an hour after I finished writing this. My complaints of excessive refusals and model deception may no longer apply.)<p>Witness: - A chess match between Ultra and GPT4 (the first one ever, as far as I&#x27;m aware) - A Gemini vs GPT4 rap battle - Tests of general knowledge, recall, abstract reasoning, and code generation - Head-to-head contests of poetry and prose, plus style imitations of famous authors&#x2F;bloggers<p>I also investigate VERY IMPORTANT things such as:<p>- which model can create a more realistic ASCII cat? - which model is better at stacking eggs? - which model plays Wordle better? - which model SIMULATES Wordle better (with me playing)?<p>Obviously, a lot of my tests are a bit silly. We already know Ultra&#x27;s benchmarks, I&#x27;m trying to probe the gaps BETWEEN benchmarks, and figure out what the models are like &quot;on the ground&quot;.<p>Conventional wisdom holds that Ultra is another GPT4: this was not my experience. Switching from GPT4 to Ultra feels like switching character classes in an RPG; they are quite different, with distinct strengths and weaknesses.