The insight about starting from the public odds is so important. It's like running from a polar bear: you don't have to outrun the bear, only your buddy. Not counting the track take, any deviation from the public odds in the correct direction makes you money.<p>(And the public is often reliably biased -- this is why they say sports betting is more about betting on gamblers than on games.)<p>A smaller deviation makes you less money that one time, of course, but it also saves you lots of money if you're wrong, ensuring you live to bet another day.<p>Also in real life: people in aggregate aren't dumb (just a bit biased), so using the public opinion as your prior and then adjusting it Bayesically with private information gets you a more reliable edge than guessing wildly.
You can find Benter’s paper about the model here: <a href="https://www.gwern.net/docs/statistics/decision/1994-benter.pdf" rel="nofollow">https://www.gwern.net/docs/statistics/decision/1994-benter.p...</a><p>About a year ago I read this and attempted to build a similar model (with more layers ;)) using data I scraped from Hong Kong Jockey Club’s website. Although I used much fewer features, it still produced profit in held-out races: <a href="https://teddykoker.com/2019/12/beating-the-odds-machine-learning-for-horse-racing/" rel="nofollow">https://teddykoker.com/2019/12/beating-the-odds-machine-lear...</a>. Obviously there are many caveats when backtesting like this but I thought it was a fun project!
His company is <a href="http://www.ave4.com/" rel="nofollow">http://www.ave4.com/</a>
I meet him a few times, interviewed to work there (didn't get the job.) I mostly talked to a lawyer who was very nice, but reminded me of Kobayashi from the Usual Suspects. The offices are a bit extravagant with some amazing original artwork on the walls, <a href="https://www.jendoco.com/portfolio/fourth-avenue-analytics/" rel="nofollow">https://www.jendoco.com/portfolio/fourth-avenue-analytics/</a><p>Pre covid he hosted meetups with Pittsburgh's R Users Group, <a href="https://www.meetup.com/Pittsburgh-useR-Group/events/260701660/" rel="nofollow">https://www.meetup.com/Pittsburgh-useR-Group/events/26070166...</a><p>He is also mentioned in <a href="http://www.fortunesformula.com/" rel="nofollow">http://www.fortunesformula.com/</a> which I really enjoyed.
I've been spending almost every weekend for the past few months trying to break a crypto crash game (called bustabit) using ML. While I'm also dealing with probabilities, my challenge has to do with the limited data I'm dealt with even though their API is public.<p>In horse racing, you have access to so much shit that you can feed into the AI, like rider statistics, horse performance, horse attributes, etc. In my case, I only have the game/round number and the bust result (red or green). From thereon, I have to create many other variables (aka featuring engineering) like round win/lose streaks, moving averages, MACD, etc. I can also feed in what and if other players are currently betting for the round or not.<p>Currently, the house always wins. And their game is rigged so they have that 1% advantage over you. However, so far with my AI, I've came with enough confidence score to win as long as my bot plays 0.001% of the games.. which isn't declared victory since those blackswan games happen once every 3 weeks.<p>There's also another way to beat it without using any ML, just plain old fibbonaci martingale - requires around 10k starting money for error margin but it ends up always ahead of the house.. That's just a math way to beat it and not very fun :)<p>It's probably crazy of me to even attempt to predict randomness, but there's something intriguing when you mix big data and ML with forced randomness. Again, they have to keep forcing the odds to be 49% vs 51%..
> <i>“F---,” Benter said. “We hit it.”</i><p>Pretty sure he didn't say "F---". Why the fuck is bloomberg misquoting this guy? Why publish a quote you're just going to censor?