TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Counterfactual Regret Minimization with Kuhn Poker

51 pointsby vpjalmost 5 years ago

2 comments

hgibbsalmost 5 years ago
This is cool. I was actually asked for the optimal strategy for player 1 in this game as an interview question for a quant research role; I didn&#x27;t know anything about game&#x2F;decision theory, other than the basics, and so I only got as far as determining what to do with an Ace and a King, and then figuring out that you should bluff with a certain frequency if you hold the ace.<p>Anyhow, I came up with another question while considering this problem. Player 1 should only bluff at the optimal rate if player 2 punishes them for deviating from this rate by changing the frequency that they call when holding the king. Otherwise, Player 1 gains more on average by bluffing with the queen more frequently. So, how should Player 2 figure out if Player 1 is actually bluffing at the optimal rate (and not higher)? It could happen, with exponentially small probability, that Player 1 decides whether to bluff randomly, and bluffs with the first 10,000 queens before ever folding with a queen. How does Player 2 distinguish this from Player 1 only ever bluffing with the queen?<p>But perhaps this is something that the Nash Theorem deals with? I don&#x27;t know. Any comments from people who know more than me on this are welcome!
评论 #23309633 未加载
评论 #23312852 未加载
plaflalmost 5 years ago
For the uninitiated like me it&#x27;s confusing at first the following sentence when deriving the Nash equilibrum:<p>&quot;If first player has a K, because of 1. and 2., he would only lose if he bets. So first player should pass if he has a K.&quot;<p>It actually means that the first player always loses on average if he bets (expected utility -1&#x2F;2 if not mistaken). If he folds however the expected utility is clearly zero since half of the time wins and half of the time loses.<p>I have not reached yet the description of the algorithm.<p>I actually find somewhat paradoxical that the first player should fold with a K and bluff with a Q! I remember that I loved little results like this when years ago I first followed the game theory course in Coursera. It was a little dry but I completed it nevertheless since I found it quite fascinating. I do wonder what possible applications are for my working field, machine learning, to apply these kind of algorithms. I would love to have an excuse to devote some time to it.
评论 #23308547 未加载
评论 #23308807 未加载
评论 #23308652 未加载