TechEcho

9 comments

dang3 months ago

Related ongoing thread:Show HN: LLM plays Pokémon (open sourced) - <a href="https://news.ycombinator.com/item?id=43187231">https://news.ycombinator.com/item?id=43187231</a>

Philpax3 months ago

This is truly tremendous to watch. Eleven years from TPP, and we're watching the current best-in-class AI try its best at the same. Who'll get there first, the historical gestalt of Twitch users or the just-shy-of-10^26 FLOPS [0] AI model?Now here's a concept for anyone with more money than sense: ClaudePlaysTwitchPlaysPokemon, where it's TPP but every participant is Claude. Would hivemind AI consensus perform better than a single AI? Anthropic's certainly looking into it! [1][0]: <a href="https://www.oneusefulthing.org/p/a-new-generation-of-ais-claude-37" rel="nofollow">https://www.oneusefulthing.org/p/a-new-generation-of-ais-cla...</a>[1]: <a href="https://www.anthropic.com/news/visible-extended-thinking" rel="nofollow">https://www.anthropic.com/news/visible-extended-thinking</a>

评论 #43194640 未加载

评论 #43190069 未加载

_--__--__3 months ago

This is neat but watching a reasoning model that stops to consider "I have read half of a dialogue block, time to press A to get the rest of the text" gets old really quick. I think I'd rather watch a model try to play pokemon against human opponents on a simulator like pokemon showdown (which I understand is a bit further in an IP rights grey area than emulating a 30 year old game). In that case you would get to see how it handles unknown information and updates its reasoning based on the success/failure of its predictions.

评论 #43188864 未加载

评论 #43188819 未加载

评论 #43190118 未加载

评论 #43191548 未加载

Philpax3 months ago

It's run by Anthropic! <a href="https://x.com/AnthropicAI/status/1894419011569344978" rel="nofollow">https://x.com/AnthropicAI/status/1894419011569344978</a>

评论 #43178981 未加载

tehsauce3 months ago

Anyone interested in watching lots of reinforcement agents playing pokemon red at once, we have a website which streams hundreds of concurrent games from multiple people’s training runs to a shared map in real time!<a href="https://pwhiddy.github.io/pokerl-map-viz/" rel="nofollow">https://pwhiddy.github.io/pokerl-map-viz/</a>(works best on desktop)

sunaookami3 months ago

I like that it named the rival "Waclaude" :)

评论 #43188765 未加载

评论 #43188477 未加载

TheAceOfHearts3 months ago

Watching the moment to moment is pretty boring, but it might be interesting if someone puts together highlights of interesting events and moments. The screenshot where Claude asks for the game to restart is absolutely charming.

meltyness3 months ago

I can't look at the current state of this and without wondering if it's tokenizer-dyslexia. I wonder if AI performance growth has been borrowed from overfitting and pruning the tokenizer of invalid sequences and leakage the entire corpus, a cardinal sin of making valid predictions.

j_timberlake3 months ago

This would be a really cool category of speed-running. "How fast can a model beat a game that it's never played before?"First get the model to beat a game, then work on better decision-making, then try to speed up the decision-making. Then repeat when better models come out.

9 comments

dang3 months ago

Related ongoing thread:Show HN: LLM plays Pokémon (open sourced) - <a href="https://news.ycombinator.com/item?id=43187231">https://news.ycombinator.com/item?id=43187231</a>

Philpax3 months ago

评论 #43194640 未加载

评论 #43190069 未加载

_--__--__3 months ago

评论 #43188864 未加载

评论 #43188819 未加载

评论 #43190118 未加载

评论 #43191548 未加载

Philpax3 months ago

It's run by Anthropic! <a href="https://x.com/AnthropicAI/status/1894419011569344978" rel="nofollow">https://x.com/AnthropicAI/status/1894419011569344978</a>

评论 #43178981 未加载

tehsauce3 months ago

sunaookami3 months ago

I like that it named the rival "Waclaude" :)

评论 #43188765 未加载

评论 #43188477 未加载

TheAceOfHearts3 months ago

meltyness3 months ago

j_timberlake3 months ago

Claude Plays Pokémon

9 comments

Claude Plays Pokémon

9 comments