TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Claude Plays Pokémon

75 pointsby LightMachine3 months ago

9 comments

dang3 months ago
Related ongoing thread:<p><i>Show HN: LLM plays Pokémon (open sourced)</i> - <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=43187231">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=43187231</a>
Philpax3 months ago
This is truly tremendous to watch. Eleven years from TPP, and we&#x27;re watching the current best-in-class AI try its best at the same. Who&#x27;ll get there first, the historical gestalt of Twitch users or the just-shy-of-10^26 FLOPS [0] AI model?<p>Now here&#x27;s a concept for anyone with more money than sense: ClaudePlaysTwitchPlaysPokemon, where it&#x27;s TPP but every participant is Claude. Would hivemind AI consensus perform better than a single AI? Anthropic&#x27;s certainly looking into it! [1]<p>[0]: <a href="https:&#x2F;&#x2F;www.oneusefulthing.org&#x2F;p&#x2F;a-new-generation-of-ais-claude-37" rel="nofollow">https:&#x2F;&#x2F;www.oneusefulthing.org&#x2F;p&#x2F;a-new-generation-of-ais-cla...</a><p>[1]: <a href="https:&#x2F;&#x2F;www.anthropic.com&#x2F;news&#x2F;visible-extended-thinking" rel="nofollow">https:&#x2F;&#x2F;www.anthropic.com&#x2F;news&#x2F;visible-extended-thinking</a>
评论 #43194640 未加载
评论 #43190069 未加载
_--__--__3 months ago
This is neat but watching a reasoning model that stops to consider &quot;I have read half of a dialogue block, time to press A to get the rest of the text&quot; gets old really quick. I think I&#x27;d rather watch a model try to play pokemon against human opponents on a simulator like pokemon showdown (which I understand is a bit further in an IP rights grey area than emulating a 30 year old game). In that case you would get to see how it handles unknown information and updates its reasoning based on the success&#x2F;failure of its predictions.
评论 #43188864 未加载
评论 #43188819 未加载
评论 #43190118 未加载
评论 #43191548 未加载
Philpax3 months ago
It&#x27;s run by Anthropic! <a href="https:&#x2F;&#x2F;x.com&#x2F;AnthropicAI&#x2F;status&#x2F;1894419011569344978" rel="nofollow">https:&#x2F;&#x2F;x.com&#x2F;AnthropicAI&#x2F;status&#x2F;1894419011569344978</a>
评论 #43178981 未加载
tehsauce3 months ago
Anyone interested in watching lots of reinforcement agents playing pokemon red at once, we have a website which streams hundreds of concurrent games from multiple people’s training runs to a shared map in real time!<p><a href="https:&#x2F;&#x2F;pwhiddy.github.io&#x2F;pokerl-map-viz&#x2F;" rel="nofollow">https:&#x2F;&#x2F;pwhiddy.github.io&#x2F;pokerl-map-viz&#x2F;</a><p>(works best on desktop)
sunaookami3 months ago
I like that it named the rival &quot;Waclaude&quot; :)
评论 #43188765 未加载
评论 #43188477 未加载
TheAceOfHearts3 months ago
Watching the moment to moment is pretty boring, but it might be interesting if someone puts together highlights of interesting events and moments. The screenshot where Claude asks for the game to restart is absolutely charming.
meltyness3 months ago
I can&#x27;t look at the current state of this and without wondering if it&#x27;s tokenizer-dyslexia. I wonder if AI performance growth has been borrowed from overfitting and pruning the tokenizer of invalid sequences and leakage the entire corpus, a cardinal sin of making valid predictions.
j_timberlake3 months ago
This would be a really cool category of speed-running. &quot;How fast can a model beat a game that it&#x27;s never played before?&quot;<p>First get the model to beat a game, then work on better decision-making, then try to speed up the decision-making. Then repeat when better models come out.