The data covers 203,916 races. I have info trainer, jockey, individual horses, betting odds, and a multitude of other things. I have a few ideas for what I can do with it, but thought I'd get some input before starting work on it.
Do you have info on the way the lines change from initially being set to post time? That shit is totally fixed, if you could figure out what certain line changes mean in terms of predicting a winner you could perhaps make some money.
Just off the top of my head, I'd say it's pretty much useless without a betting strategy. I'd recommend starting with "Dr. Z's Beat the Racetrack". It's out of print, but you should be able to find a copy.
How about creating a simple site where you simulate a horse race. i.e. give users part of the history data for horses, ask them to select the winners, and then reveal the true outcome. People would be betting for boast points.<p>A discussion board where people can discuss their heuristics for selecting winners would be great. For the famous horses, you can link to its Wikipedia page, Flickr photos, etc. if they exist to create a more immersive experience.<p>I, for one, would play with such a site for a while.
Are there any free/commercial sites with comparable datasets? I can see the data itself having a not-insignificant market value if it's of sufficient quality.
<i>Share it on Bit Torrent!</i> Large, accurate real-world datasets are difficult to find for the purposes of testing and experimenting with various machine learning algos.