How we beat stats guru Nate Silver – with only 2 days and a $7.56 budget

Posted by Jon Millis

15.04.2014 10:00 AM

April is the pinnacle of excitement for American sports fans. Baseball stadiums lift the tarps off freshly groomed fields, NBA and NHL teams begin their playoff quests for championship trophies, and of course: the college basketball world witnesses the last stretch of thrilling (and seemingly unpredictable) postseason play.

The “unpredictability” of the NCAA tournament is one of the primary reasons it’s so entrancing. While sports networks like ESPN employ college basketball “gurus” who form predictions based on some combination of gut feeling and statistics, their historical picks suggest they aren’t much better than the average American in picking tournament outcomes before the games tip off. 

One man who has drawn much fanfare for his pure statistics-based approach to predicting everything from player performance in Major League Baseball to pristine state-by-state forecasts for the 2012 presidential election (which put most analysts to shame) is Nate Silver. Silver began to get in on the fun of the NCAA tournament by unveiling his first stats-powered bracket in 2011. He has participated in each year since and has not failed to impress, correctly predicting the winners in both 2012 and 2013. Last month, Silver self-deprecatingly quipped that “this year’s NCAA basketball tournament is designed to make me look dumb. There aren’t any favorites.”

uconn-huskiesBut despite the anticipated madness, we decided to see how we’d fare against the best of the best, Mr. Silver himself, using Nutonian’s own Eureqa, a cognitive computing platform that automatically discovers cause and effect relationships within complex data. We spent a few hours Googling for publicly available team statistics, expert rankings, and computerized rankings, and fed the data into Eureqa to model the “physics” of what determines the winner of a postseason college basketball game.

As noted in last week’s blog post, Eureqa ran for two hours on nine cloud servers, returning a model with 75% predictive accuracy. After adding a few additional parameters, we were up to 80%. Our equation pinpointed six variables with significant predictive power: assist-to-turnover ratio (regular season), average scoring differential (regular season), field goal percentage (regular season), three-point field goal attempts (regular season), active win/loss streak, and tournament seed. For each match-up in the 2014 bracket, we ran a simple “symmetric” simulation, plugging in statistics for both schools and deeming the team with the higher output the winner.

How did we do? In spite of the wildest tournament in recent memory, including the lowest-combined seeds ever to reach the championship, Eureqa did impressively well. Mr. Silver’s model, which perhaps required weeks of manpower and leveraged hundreds of variables, correctly predicted 4/8 teams to reach the “Elite Eight”, 1/4 teams to reach the “Final Four”, and 0/2 teams to reach the “Championship”. Eureqa, which required two days of casual work by one of our software developers (and for the sake of time excluded many potentially interesting variables included by Silver, such as pre-season rankings, player injuries, and geography), correctly predicted 4/8 “Elite Eight” teams, 2/4 “Final Four” teams, and 0/2 “Championship” teams.

Without having to know anything about our input variables, their relative importance, or even the sport itself, we became basketball gurus equipped with an informative mathematical model that identified the core “drivers” of tournament wins, and beat out one of the most prominent statisticians in the world. The total cost? $7.56 for compute time in the cloud and a handful of chocolate-covered almonds to keep Dylan happy while he input data.

If Eureqa has this sort of predictive and prescriptive capability for something as volatile as college basketball games, imagine the impact it could have in your business: possessing the ability to not only understand what will happen, but when it will happen and why it will happen. This simple example switches from “Wisconsin will beat Arizona in the tournament, because it will excel at these specific aspects of the game” to “We should charge Jon this amount for an insurance premium, because here are the 12 variables that truly matter in assessing his risk.” Or, “We should keep these products in stock next month to maximize revenue, because here are our most important sales drivers during unseasonably warm winters.”

Mr. Silver, we’ll see you at the next tournament. In the meantime, download your free trial of Eureqa and let us know what you think.

Topics: Big data, Eureqa, March Madness, Nate Silver

Follow Me

Posts by Topic

see all