I'm a bit curious how dealing with different opponents is dealt with. It seems that to prove such a model "optimal", some quite strong assumptions have to be made regarding the encountered opponent strategies? Or it's provable there is some optimal exploration vs exploitation tradeoff in discovering them?