With the third largest crowd ever seen at the Belmont Stakes, it looked as though California Chrome was poised to make history this past Saturday. As the 11 horses raced down the final stretch, though, it became clear that we would not see another Triple Crown winner this year. Given our success at predicting previous races, did the Belmont Stakes catch us flat footed?
Top 5 Belmont Stakes Horses
|Ride on Curlin||Commissioner|
|Commanding Curve||Medal Count|
|Wicked Strong||California Chrome|
Predicting three of the top five horses across the finish line is no easy feat, especially given how few career races many of the contenders had data for. Our Kentucky Derby prediction was only one horse better, and the mile and a half of the Belmont Stakes is a distance that many of these colts have never raced at – nor ever will again in the future. Given these difficulties, how did we come up with our predictions?
(Finish Position) = 5.598233829 + 1.244550122*(Style Rating Standardized) - 1.244550122*(ML Implied Probability Standardized) - 1.244550122*tanh(2*(Racing Speed Average Standardized))
After ingesting Brisnet.com’s unique handicapping data, Eureqa came back to us with the above equation. What does it mean? With the goal of minimizing finish position (#1 = winner), positive terms in the model suggest disadvantages while negative terms suggest advantages for horses running the Belmont. As a positive term, Eureqa picked out that high early speed in a horse compared to the competition (style rating) tends to be a disadvantage. But, horses that both were heavy betting favorites (Morning Line) and had high race speeds compared to the competition throughout their career (racing speed average) tend to be at an advantage.
So why did we only get 3 of the top 5 horses? It was clear that California Chrome’s heavy race schedule had finally caught up to him down the final stretch, as he wasn’t able to shift into a higher gear to put away the competition. The data we gave Eureqa didn’t suggest any high causal effects between race schedule and performance. But armed with this real-world proof point, we can use our newfound domain expertise in the future to prompt Eureqa into further investigation of the race schedule to race performance trend.
While we at Nutonian were ready to break out the party hats and celebrate the end of the Triple Crown drought, it was sadly not to be. Today’s race towards data lakes has armed many business teams with the opposite problem – far too much data. Setting Nutonian’s robotic data scientist on critical business processes across retail, insurance, and manufacturing can automatically discover hidden insights in complex data sets, while still allowing you to apply your own domain expertise – leading to maximum results.
Did you try running Eureqa on your own Belmont predictions? Let us know your results and thoughts in the comments!