TechEcho

1 comment

dinp3 months ago

Source code: <a href="https://github.com/don-dp/simulateagents/">https://github.com/don-dp/simulateagents/</a><p>Click on 'Play moves' to watch a replay.<p>I initially planned to run a chess tournament for LLMs but they are not good: besides obvious mistakes, they output incorrect moves, get stuck in loops by repeating the same moves and the smaller models fail to output valid json frequently. I thought the reasoning models like o3 mini might be good, but they are an incremental improvement in chess.<p>Feedback and suggestions for other games to explore welcome.

O3 mini vs. Gemini flash 2.0 in chess

1 comment

O3 mini vs. Gemini flash 2.0 in chess

1 comment