I recently read about a technique in statistics/ML called "offline policy evaluation".<p>The idea is that you can evaluate how new policies will perform by using historical data generated under a previous policy. For example, rather than testing a new fraud policy in an A/B test, you can use historical logs to determine if the new policy will outperform the existing one. This seems like it could be a great step before A/B testing new policies.<p>I whipped up some example code to test out what would be considered the "hello world" of offline policy evaluation if anyone is curious:
https://github.com/banditml/offline-policy-evaluation/blob/main/examples/Inverse%20propensity%20scoring.ipynb<p>My question to you is -- have any of you have tried this or do any of your currently use OPE at your companies?