I wrote a pokerbot for my university third year project: <a href="https://github.com/IsaacLewis/FYP" rel="nofollow">https://github.com/IsaacLewis/FYP</a>. I haven't been able to spend any more time on that project since finishing it (though I wanted to), but I still find the space fascinating.<p>Unlike the linked bot, which is an "equilibrium" (or "game-theoretic") player, mine followed an "exploitative" strategy. What's the difference? Equilibrium strategies find (or attempt to find) a Nash equilibrium, and follow that. As the OP said, this minimises their losses, but also prevents them exploiting weaknesses in an opponent's playing style. Wheras an exploitative player adapts its strategy to take advantage of its opponent, but that leaves it open to being exploited itself.<p>The OP used RPS as an example - it's clear that the Nash equilibrium is picking each move with 1/3 probability. No matter what your opponent does, your expected value is 0. But what if your opponent decides that they will always pick rock? The EV of the equilibrium strategy is still 0, but you could switch to an exploitative strategy of always picking paper, in which case your EV is 1. For this reason, exploitative strategies will almost always win multiplayer RPS tournaments, because they can consistently beat the weaker players, whereas the equilibrium players will stay in the middle of the pack. It might seem like a surprising result that playing an exploitative strategy <i>always</i> leaves you open to exploitation yourself, but the maths works out.<p>If you an intuitive grasp of this idea, consider that to exploit your opponent's strategy, your play must be adapted based on observations of their play. But this means they can play with style X, leading you to play style X' which is dominant, before they catch you out by switching to style X'', which dominates X'. If you have experience playing poker with competent humans, they do the same thing.<p>In computer poker, AFAIK equilibrium players generally perform better. I think this is because poker is a more complicated game than RPS, so both humans and bots consistently make mistakes, so just playing solidly gives equilibrium bots the edge. But writing an exploitative bot is still pretty interesting, because it seems closer to human poker, which is more about bluffing and outthinking your opponents than mathematically optimising your play.<p>My bot wasn't especially interesting - it was based on an existing algorithm called Miximix, and I used Weka to try and machine learn a model of the opponent's strategy. Still, it could do interesting stuff - eg, if it played against an opponent that could be intimidated out of hands by large bets, it would realise that it could bet large without having good hands - ie, it successfully taught itself to bluff. What I thought would be really interesting was a bot with multiple-level opponent modelling - "what does my opponent think I have?" or "what does my opponent think I think he has?". Good human players think this way, and "recursively modelling other minds" seems integral to conscious thought, so it'd be cool to look into in more depth.<p>The other thing that would be cool to look into is "explanation-based learning". Normal machine learning approaches require large amounts of data to draw inferences, but human poker players seem capable of forming conclusions about their opponent based on very limited information. Explanation-based learning uses a domain model to help this.<p>Hmm, writing this comment has reignited my interest in this space - I really should dig out my old code and work on this again some time.