ChatGPT's Chess Elo is 1400

212 pointsby dbreretonabout 2 years ago

41 comments

k2052about 2 years ago

This is so easy to disprove it makes it look like the author didn't even try.Here is the convo I just had:me: You are a chess grandmaster playing as black and your goal is to win in as few moves as possible. I will give you the move sequence, and you will return your next move. No explanation neededChatGPT: Sure, I'd be happy to help! Please provide the move sequence and I'll give you my response.me: 1. e3ChatGPT: 1... e5me: 2. ne2ChatGPT: 2... d6me: 3. g4ChatGPT: 3... h5me: 4. g5ChatGPT: 4... Ne4Ne4 is illegal. Also you'd be hard pressed to find any human player that would miss the hanging pawns much less a 1400 elo player. GPT has been trained on chess games and is predicting patterns, this has been known since the GPT-2 days. The daily hype articles where someone coerces a statistical model into "demonstrating" an emergent capability sure are getting old. Claiming emergent capabilities (like having learned chess) are an extraordinary claim (LLMs most likely cant learn things) and I really wish people would put more effort when they make these claims. But the AI hype problem is as old as AI research itself...

评论 #35201639 未加载

评论 #35201562 未加载

评论 #35201640 未加载

评论 #35203102 未加载

评论 #35204696 未加载

评论 #35202620 未加载

评论 #35202326 未加载

评论 #35203434 未加载

评论 #35202612 未加载

评论 #35202487 未加载

评论 #35203337 未加载

latexrabout 2 years ago

> These people used bad prompts and came to the conclusion that ChatGPT can’t play a legal chess game. (…)> With this prompt ChatGPT almost always plays fully legal games.> Occasionally it does make an illegal move, but I decided to interpret that as ChatGPT flipping the table (…)> (…) with GPT4 (…) in the two games I attempted, it made numerous illegal moves.So you’ve ostensibly¹ found a way to reduce the error rate and then deliberately ignored the cases where it failed. In short: it may play valid chess under certain conditions but can’t be trusted to do so. That doesn’t contradict previous findings.¹ 19 games is a small sample and the supposedly more advanced system failed in your tries.

评论 #35200843 未加载

评论 #35200816 未加载

评论 #35200820 未加载

评论 #35200825 未加载

评论 #35201582 未加载

评论 #35200992 未加载

评论 #35202486 未加载

评论 #35200783 未加载

FartyMcFarterabout 2 years ago

I just opened a random recent chess game on lichess ( <a href="https://lichess.org/YpxTUUbO/white#88" rel="nofollow">https://lichess.org/YpxTUUbO/white#88</a> ) . I'm pretty sure ChatGPT can't be trained on games that were just played, so this ensures the game is not in its training data.I gave the position before checkmate to ChatGPT to see if it would produce the checkmating move. It played an illegal move, replying with "Be5#" even there's no bishop of either color in the position.Unfortunately I'm rate limited at the moment so I can't try other games, but this looks like a solid method to evaluate how often ChatGPT plays legal / good moves.

评论 #35200985 未加载

评论 #35201031 未加载

WoodenChairabout 2 years ago

Most likely it has seen a similar sequence of moves in its training set. There are numerous chess sites with databases displayed in the form of web pages with millions of games in them. If it had any understanding of chess, it would never play an illegal move. It's not surprising that given a sequence of algebraic notation it can regurgitate the next move in a similar sequence of algebraic notation.

评论 #35200935 未加载

评论 #35200963 未加载

评论 #35200622 未加载

评论 #35200546 未加载

评论 #35200499 未加载

评论 #35200512 未加载

评论 #35200582 未加载

评论 #35200994 未加载

评论 #35202560 未加载

liampullesabout 2 years ago

With respect to the points here on how illegal moves invalidate ChatGPT as a valid Chess AI, consider the following:AI.v1: Feed ChatCPT the move history and ask for the next move.AI.v2: Wrap AI.v1 with additional logic to try and translate any invalid response as a similar/close valid move, else treat it as a resignation.I would say AI.v2 is a "valid" Chess AI since it obeys the rules of chess. Also I think it is reasonable given how minimal this "wrapping" is to attribute the underlying ability to ChatGPT.

评论 #35202320 未加载

评论 #35204585 未加载

marcodiegoabout 2 years ago

This may look low: ELO for mediocre players is 1500. But if it is obeying the rules of the game, then this is big. This is a signal that if it learns some expertise, like discovering how to use or create better search algorithms (like MCTS and heuristics to evaluate a state) and improve by itself (somewhat like alphazero did), then it may eventually reach superhuman level.It may then reach superhuman level in any task simpler than chess which is enough to destroy many human jobs.EDIT: From the article: "With this prompt ChatGPT almost always plays fully legal games." Relax: we're still far from that.

评论 #35200918 未加载

评论 #35200972 未加载

jonnycatabout 2 years ago

A lot of the discussion here is about inferring the model's chess capabilities from the lack (or occasional presence) of illegal moves. But we can test it more directly by making an illegal move ourselves - what does the model say if we take its queen on the second move of the game?Me: You are a chess grandmaster playing as black and your goal is to win in as few moves as possible. I will give you the move sequence, and you will return your next move. No explanation needed. '1. e4'1... e5Me: 1. e4 e5 2. Ngxd8+2... Ke7This is highly repeatable - I can make illegal non-sensical moves and not once does it tell me the move is illegal. It simply provides a (plausible looking?) continuation.

paxysabout 2 years ago

ChatGPT is a brilliant piece of tech but random people trying it out for five minutes and making wild claims does it disservice. More so when such clickbait headlines are shared all over the internet as fact.It is a language model. It cannot play chess. It can't even play tic-tac-toe, which was trivially solved by machines decades ago. Here's the result of a game I just tried:X (me) – 0,0O (chatGPT) – 1,1X – 0,1ChatGPT instead put my move at 0,2 for some reasonO – 2,2Regardless of whether my move was at 0,1 or 0,2, I easily won. And after I won it just ignored the outcome and continued playing. Plus this was after telling it to play a perfect game.

the_afabout 2 years ago

This is GPT4, right? Because ChatGPT (GPT-3) still fails to provide a legal game of Tic Tac Toe with this prompt:> "Let's play Tic Tac Toe. You are O, I'm X. Display the board in a frame, with references for the axes"It failed to recognize that I won.Then continued playing (past the end), played illegally over a move I had already done, obtained a line of 3 for itself, and still doesn't acknowledge the game has ended.

评论 #35201245 未加载

评论 #35201124 未加载

评论 #35202703 未加载

agolioabout 2 years ago

Chessnetwork did a video about this a few weeks ago FYI - <a href="https://www.youtube.com/watch?v=svlIYFpsWs0">https://www.youtube.com/watch?v=svlIYFpsWs0</a>Seems its strength is mostly in memorisation of the book openings. Still impressive nonetheless.

jonathanstrangeabout 2 years ago

Not to nitpick but I don't think it can be said to be able to play chess when it also makes illegal moves.

评论 #35200662 未加载

评论 #35200605 未加载

评论 #35200610 未加载

Repturabout 2 years ago

It can't even play tic tac toe without making mistakes. My prompt told it to play a perfect game, predicting all next possible moves, the goal is stalemate or win. It would still make sub-optimal moves and fail at this. Even when I explained what went wrong, it would then apologize and do another wrong move.

评论 #35203932 未加载

Tepixabout 2 years ago

I tried to play Othello with ChatGPT in the past (after ChatGPT confidently claimed it could play Othello) but it didn't work (ChatGPT refused to play).I tried it with this prompt and it works a little bit but produces repeated illegal moves as early as move 4.Bummer. Perhaps GPT-4 will be more capable.

Animatsabout 2 years ago

A real question is whether it has developed an internal board model of chess. It has a training set of chess games as strings of chess notation. That's enough to play "book" games, and that's apparently working. Can it do as well as it is doing without a board model? Or has it somehow developed an internal model of the game state?This is a system with no look ahead. Any look ahead took place in the past, during training. It's amazing that this works.I notice the prompt says "No explanation needed". What happens if you ask for one?

Andrew_nenakhovabout 2 years ago

I tried playing with it and it barely makes any correct move: forgets whos turn to move, doesn't observe the rules, incorrectly displays move results.So I guess that if we ask it to make a move repeatedly and accept the move only when it occasionally makes a legal move, it'll show some results, but I don't think it is what can be interpreted as a chess engine with some meaningful rating.

评论 #35201464 未加载

juice_busabout 2 years ago

When I tried it at v3.0 i found after 5-10 moves it started moving illegally.

评论 #35200468 未加载

评论 #35200564 未加载

MC_10about 2 years ago

Not sure how useful this is but I simply asked ChatGPT what it thinks its skill level is. This is using GPT-4.After some previous questions...Me: "Would you be able to estimate your ELO?"ChatGPT: "As a language model, I don't have a specific ELO rating. However, my knowledge of chess strategies and tactics is based on the information I have been trained on, which includes games played by humans with various ELO ratings. My level of play might be roughly equivalent to an ELO of around 1200-1500, but please keep in mind that this is only an estimate, and my actual performance could be higher or lower depending on the specific position or game."Me: "As a generative AI, are you actually able to reason which move to make next in chess? Or are you somehow just predicting the next likely move based on training data."ChatGPT: "As a generative AI language model, my primary function is to predict the most likely text given a certain context. When it comes to playing chess, I am not "reasoning" in the same way that a dedicated chess engine would, which involves evaluating positions and searching through a large number of potential moves using specialized algorithms.Instead, I generate my chess moves based on patterns and sequences I've seen during my training, which include a large number of chess games and discussions about chess strategy. My knowledge is derived from the text data I've been trained on, and I apply that knowledge to suggest moves that I think are appropriate for a given position.While this approach allows me to play at an intermediate level, it is not as precise or accurate as the methods used by dedicated chess engines. My suggestions might not always be the best moves, and my understanding of the game is limited by the quality and quantity of my training data."

oeziabout 2 years ago

Elo 1400 seems to equate to a top 30% player.Source: <a href="https://chessgrandmonkey.com/chess-rating-percentile-calculator-graph/" rel="nofollow">https://chessgrandmonkey.com/chess-rating-percentile-calcula...</a>

评论 #35200510 未加载

评论 #35200604 未加载

silveiraabout 2 years ago

Ask him to play at a higher Elo.

评论 #35200494 未加载

meghan_rainabout 2 years ago

I too noticed (for a usecase totally unrelated to chess, namely code generation) that ChatGPT3.5 gave better answers than GPT4. The 3.5 answer was exactly what I wanted, GPT4 was wrong.Does thay mean we have plateaued?

评论 #35201651 未加载

评论 #35202179 未加载

chpatrickabout 2 years ago

Good thing it's "incapable of reasoning"!

评论 #35200501 未加载

评论 #35200470 未加载

fancyfredbotabout 2 years ago

I would be interested to see an argument based on computational complexity that puts a bound on how well a transformer based llm can play chess. Although it has access to a library of precomputed results, that library is finite and the amount of compute it can do on any prompt is limited by the the length of the context window so it can't possibly "think" more than N moves ahead.

andrepdabout 2 years ago

> Occasionally it does make an illegal move, but I decided to interpret that as ChatGPT flipping the table and saying “this game is impossible, I literally cannot conceive of how to win without breaking the rules of chess.” So whenever it wanted to make an illegal move, it resigned.Lmao. ChatGPT can play chess, as long as you ignore the times when it can't x) Brilliant.

keskivalabout 2 years ago

I gave it more time to think about its move by adding empty tokens to the end, and it played a lot better: <a href="https://www.linkedin.com/posts/terokeskivalkama_chatgpt-activity-7042589615370715136-nkN9" rel="nofollow">https://www.linkedin.com/posts/terokeskivalkama_chatgpt-acti...</a>

mtlmtlmtlmtlabout 2 years ago

There's a huge difference between 1400 elo in FIDE games versus 1400 on chess.com, which is not even using elo.For instance the strongest blitz players in the world are hundreds of points higher rated on chess.com blitz versus their FIDE blitz rating. Chess.com and lichess have a ton of rating inflation.

评论 #35201316 未加载

ht85about 2 years ago

Hikaru recently put out a video of a game vs the free ChatGPT.Reading the explanations it gives with every move is really insightful. Basically polished, elaborate bullshit that might look reasonable to an ignorant audience.Unless you are a conservative pundit, AI will not take your job anytime soon.

teejabout 2 years ago

I just deployed a GPT-4 powered chess bot to lichess. You can challenge it here: <a href="https://lichess.org/@/oopsallbots-gpt-4" rel="nofollow">https://lichess.org/@/oopsallbots-gpt-4</a>

评论 #35204202 未加载

评论 #35203134 未加载

sabujpabout 2 years ago

yea tried this several months ago, it's pretty bad and will make illegal moves

评论 #35200503 未加载

nickpetersonabout 2 years ago

What are the possibilities for augmenting or synthesizing Lon’s with old school expert systems or other tools. Could it recognize when a problem fits some other tool and help map the inputs?

andrepdabout 2 years ago

ChatGPT playing chess: <a href="https://www.youtube.com/watch?v=GneReITaRvs">https://www.youtube.com/watch?v=GneReITaRvs</a>

ricardo81about 2 years ago

Better than me then.But does it give credit to who taught it. These models are basically a scrape of the best of humankind and a claim that it's their own.

评论 #35200704 未加载

ameliusabout 2 years ago

I'd be more interested to know how well it performs on a new game, i.e. one for which it didn't have terabytes of training data.

josh2600about 2 years ago

I want to know what the author's ELO is (forgive me if it was disclosed and I am unobservant).Maybe ChatGPT trainers will eventually have ELO's.

arooabout 2 years ago

I wonder how it would perform if you feed it a comprehensive chess rulebook, and to avoid all the illegal moves possible

Upvoter33about 2 years ago

chatgpt does not really know how to play chess, or understand chess. it understands how to sound like it plays chess.

comment_ranabout 2 years ago

Why not just introduce AlphaGo as an API that can be used by chatGPT? So every time you want to do a this type of gaming, you just send a request. I mean, chatGPT sends a request to AlphaGo, but as a user you don't know actually what's happening. But in the background, it happens really fast, so it's just like you are chatting with chatGPT, but using much, much powerful tool to do this kind of things.

评论 #35200537 未加载

评论 #35200578 未加载

评论 #35200688 未加载

oli5679about 2 years ago

I tried playing just now, with exactly the same prompt as the article.It played an illegal move on move 4.1. d4 d5 2. c4 e6 3. Nf3 nf6 4. g3 Bg4

xkcd1963about 2 years ago

It's like saying the agglomeration of media handling chess has an ELO of 1400

Nifty3929about 2 years ago

My conversation with ChatGPT on this:What is the best way to prompt you to provide the best chess move?To prompt me to provide the best chess move, you can provide the current position of the chessboard and any relevant information about the game, such as whose turn it is, the time control, and any important features of the position (e.g. whether any pieces are under attack or there are any potential tactics to consider). Additionally, providing any specific criteria or goals for the move, such as aiming to control the center or launch an attack on a particular part of the board, can also help me provide the best move.In what format should I provide the current position?To provide the current position, you can use algebraic notation. Algebraic notation is a standard notation used to describe the moves and positions on a chessboard.Each square on the chessboard is assigned a unique coordinate, consisting of a letter and a number. The letters range from "a" to "h" and refer to the files (columns) of the board, while the numbers range from 1 to 8 and refer to the ranks (rows) of the board.To describe a position, you can list the pieces on each square, starting from the top-left corner of the board and moving from left to right along each rank. You can use uppercase letters to represent white pieces and lowercase letters to represent black pieces. Here is an example of a starting position:rnbqkbnr pppppppp ........ ........ ........ ........ PPPPPPPP RNBQKBNRIn this position, "r" represents a black rook, "n" represents a black knight, "b" represents a black bishop, "q" represents a black queen, "k" represents a black king, "p" represents a black pawn, "R" represents a white rook, "N" represents a white knight, "B" represents a white bishop, "Q" represents a white queen, "K" represents a white king, and "." represents an empty square.You can also include additional information such as whose turn it is to move, the castling rights, and the en passant square. For example, the starting position for white to move with full castling rights and no en passant square would be:rnbqkbnr pppppppp ........ ........ ........ ........ PPPPPPPP RNBQKBNR w KQkq -In this case, "w" indicates that it is white's turn to move, "KQkq" indicates that both sides have full castling rights, and "-" indicates that there is no en passant square.What about just providing a sequence of moves from the starting position? Would this be better or worse than your suggested approach?Providing a sequence of moves from the starting position can also be a valid way to specify a position in chess. However, it may not always be the best approach, depending on the context.One potential disadvantage of providing a sequence of moves is that it requires more effort and time to enter the moves and verify that the position is correct. Additionally, if there are any errors or typos in the sequence, it could lead to an incorrect position and subsequently an incorrect move suggestion.On the other hand, if you have a specific position in mind that is not the starting position, providing a sequence of moves may be the most straightforward way to specify it. In this case, it is important to ensure that the sequence is accurate and that all relevant information (such as castling rights and en passant squares) is included.In summary, both algebraic notation and a sequence of moves can be used to specify a position in chess, and the choice between them depends on the context and the specific needs of the situation.

jackphilsonabout 2 years ago

Calling it: chess games in the future will be a battle between two prompts.

评论 #35201003 未加载

jacobsenscottabout 2 years ago

I own't speak to weather this shows ChatGPT can play chess, but did you notify your opponents beforehand they are playing chatgpt? Otherwise (maybe even if) you're cheating and your account should be removed. If anyone else here is thinking of doing this - don't.

41 comments

k2052about 2 years ago

评论 #35201639 未加载

评论 #35201562 未加载

评论 #35201640 未加载

评论 #35203102 未加载

评论 #35204696 未加载

评论 #35202620 未加载

评论 #35202326 未加载

评论 #35203434 未加载

评论 #35202612 未加载

评论 #35202487 未加载

评论 #35203337 未加载

latexrabout 2 years ago

评论 #35200843 未加载

评论 #35200816 未加载

评论 #35200820 未加载

评论 #35200825 未加载

评论 #35201582 未加载

评论 #35200992 未加载

评论 #35202486 未加载

评论 #35200783 未加载

FartyMcFarterabout 2 years ago

评论 #35200985 未加载

评论 #35201031 未加载

WoodenChairabout 2 years ago

评论 #35200935 未加载

评论 #35200963 未加载

评论 #35200622 未加载

评论 #35200546 未加载

评论 #35200499 未加载

评论 #35200512 未加载

评论 #35200582 未加载

评论 #35200994 未加载

评论 #35202560 未加载

liampullesabout 2 years ago

评论 #35202320 未加载

评论 #35204585 未加载

marcodiegoabout 2 years ago

评论 #35200918 未加载

评论 #35200972 未加载

jonnycatabout 2 years ago

paxysabout 2 years ago

the_afabout 2 years ago

评论 #35201245 未加载

评论 #35201124 未加载

评论 #35202703 未加载

agolioabout 2 years ago

jonathanstrangeabout 2 years ago

Not to nitpick but I don't think it can be said to be able to play chess when it also makes illegal moves.

评论 #35200662 未加载

评论 #35200605 未加载

评论 #35200610 未加载

Repturabout 2 years ago

评论 #35203932 未加载

Tepixabout 2 years ago

Animatsabout 2 years ago

Andrew_nenakhovabout 2 years ago

评论 #35201464 未加载

juice_busabout 2 years ago

When I tried it at v3.0 i found after 5-10 moves it started moving illegally.

评论 #35200468 未加载

评论 #35200564 未加载

MC_10about 2 years ago

oeziabout 2 years ago

评论 #35200510 未加载

评论 #35200604 未加载

silveiraabout 2 years ago

Ask him to play at a higher Elo.

评论 #35200494 未加载

meghan_rainabout 2 years ago

评论 #35201651 未加载

评论 #35202179 未加载

chpatrickabout 2 years ago

Good thing it's "incapable of reasoning"!

评论 #35200501 未加载

评论 #35200470 未加载

fancyfredbotabout 2 years ago

andrepdabout 2 years ago

keskivalabout 2 years ago

mtlmtlmtlmtlabout 2 years ago

评论 #35201316 未加载

ht85about 2 years ago

teejabout 2 years ago

I just deployed a GPT-4 powered chess bot to lichess. You can challenge it here: <a href="https://lichess.org/@/oopsallbots-gpt-4" rel="nofollow">https://lichess.org/@/oopsallbots-gpt-4</a>

评论 #35204202 未加载

评论 #35203134 未加载

sabujpabout 2 years ago

yea tried this several months ago, it's pretty bad and will make illegal moves

评论 #35200503 未加载

nickpetersonabout 2 years ago

What are the possibilities for augmenting or synthesizing Lon’s with old school expert systems or other tools. Could it recognize when a problem fits some other tool and help map the inputs?

andrepdabout 2 years ago

ChatGPT playing chess: <a href="https://www.youtube.com/watch?v=GneReITaRvs">https://www.youtube.com/watch?v=GneReITaRvs</a>

ricardo81about 2 years ago

Better than me then.But does it give credit to who taught it. These models are basically a scrape of the best of humankind and a claim that it's their own.

评论 #35200704 未加载

ameliusabout 2 years ago

I'd be more interested to know how well it performs on a new game, i.e. one for which it didn't have terabytes of training data.

josh2600about 2 years ago

I want to know what the author's ELO is (forgive me if it was disclosed and I am unobservant).Maybe ChatGPT trainers will eventually have ELO's.

arooabout 2 years ago

I wonder how it would perform if you feed it a comprehensive chess rulebook, and to avoid all the illegal moves possible

Upvoter33about 2 years ago

chatgpt does not really know how to play chess, or understand chess. it understands how to sound like it plays chess.

comment_ranabout 2 years ago

评论 #35200537 未加载

评论 #35200578 未加载

评论 #35200688 未加载

oli5679about 2 years ago

I tried playing just now, with exactly the same prompt as the article.It played an illegal move on move 4.1. d4 d5 2. c4 e6 3. Nf3 nf6 4. g3 Bg4

xkcd1963about 2 years ago

It's like saying the agglomeration of media handling chess has an ELO of 1400

Nifty3929about 2 years ago

jackphilsonabout 2 years ago

Calling it: chess games in the future will be a battle between two prompts.

评论 #35201003 未加载

jacobsenscottabout 2 years ago