I'm a Legend dota2 player and also a Machine Learning researcher and I'm <i></i>fascinated<i></i> by this result. The main message I take away is, we might already have powerful enough methods (in terms of learning capabilities), and we're limited by hardware (this also makes me a little sad). My thoughts,<p>1) "At the beginning of each training game, we randomly "assign" each hero to some subset of lanes and penalize it for straying from those lanes until a randomly-chosen time in the game...." Combining this with "team spirit" (weighted combined reward - networth, k/d/a). They were able to learn early game movement for position 4 (farming priority position). For roaming position, identifying which lane to start out with, what timing should I leave the lane to have the biggest impact, how should I gank other lanes are very difficult. I'm very surprised that very complex reasoning can be learned from this simple setup.<p>2) Sacrificing safe-lane to control enemy's jungle requires overcoming local minimum (considering the rewards), and successfully assign credits over a very very long horizon. I'm very surprised they were able to achieve this with PPO + LSTM. However, one asterik here is if we look at the draft, Sniper, Lich, CM, Viper, Necro. This draft is very versatile with Viper and Necro can play any lane. This draft is also very strong in laning phase and mid game. Whoever win sniper's lane and win laning phase in general is probably going to win. So this makes it a little bit less of a local optimal. (In contrast to having some safe lane heroes that require a lot of farm).<p>3) "Deviated from current playstyle in a few areas, such as giving support heroes (which usually do not take priority for resources) lots of early experience and gold." Support heroes are strong early game and doesn't require a lot items to be useful in combat. Especially with this draft, CM with enough exp (or a blink, or good positioning) can solo kill almost any hero. So it's not too surprising if CM takes some farm early game, especially when Viper and Necro are naturally strong and doesn't need too much of farm (they still do, but not as much as sniper). This observation is quite interesting, but maybe not something completely new as it might sound like.<p>4) "Pushed the transitions from early- to mid-game faster than its opponents. It did this by: (1) setting up successful ganks (when players move around the map to ambush an enemy hero — see animation) when players overextended in their lane, and (2) by grouping up to take towers before the opponents could organize a counterplay." I'm a little bit skeptical of this observation. I think with this draft, whoever wins the laning phase will be able to take next objectives much faster. And winning the laning phase is really 1v1 skill since both Lich and CM are not really roaming heroes. If you just look at their winning games and draw conclusion, it will be biased.<p>5) This draft is also very low mobility. All 5 heroes Sniper, Lich, CM, Necro, Viper share the weakness of small movement speed (except for maybe Lich). Also, none of these heroes can go at Sniper in mid/late game, so if you have better positioning + reaction time, you'll probably win.<p>Overall, I think this is a great step and great achievement (with some caveats I noted above). As far as next steps, I would love to see if they can try meta-learned agent where they don't have to train from scratch for a new draft. I would love to see they learn item building, courier usage instead of using scripts. I would also love to see they learn drafting (can be simply phrased as a supervised problem). I'm pretty excited about this project, hopefully they release a white paper with some more details so we can try to replicate.