How we beat stats guru Nate Silver – with only 2 days and a $7.56 budget

Posted by Jon Millis

15.04.2014 10:00 AM

April is the pinnacle of excitement for American sports fans. Baseball stadiums lift the tarps off freshly groomed fields, NBA and NHL teams begin their playoff quests for championship trophies, and of course: the college basketball world witnesses the last stretch of thrilling (and seemingly unpredictable) postseason play.

The “unpredictability” of the NCAA tournament is one of the primary reasons it’s so entrancing. While sports networks like ESPN employ college basketball “gurus” who form predictions based on some combination of gut feeling and statistics, their historical picks suggest they aren’t much better than the average American in picking tournament outcomes before the games tip off. 

One man who has drawn much fanfare for his pure statistics-based approach to predicting everything from player performance in Major League Baseball to pristine state-by-state forecasts for the 2012 presidential election (which put most analysts to shame) is Nate Silver. Silver began to get in on the fun of the NCAA tournament by unveiling his first stats-powered bracket in 2011. He has participated in each year since and has not failed to impress, correctly predicting the winners in both 2012 and 2013. Last month, Silver self-deprecatingly quipped that “this year’s NCAA basketball tournament is designed to make me look dumb. There aren’t any favorites.”

uconn-huskiesBut despite the anticipated madness, we decided to see how we’d fare against the best of the best, Mr. Silver himself, using Nutonian’s own Eureqa, a cognitive computing platform that automatically discovers cause and effect relationships within complex data. We spent a few hours Googling for publicly available team statistics, expert rankings, and computerized rankings, and fed the data into Eureqa to model the “physics” of what determines the winner of a postseason college basketball game.

As noted in last week’s blog post, Eureqa ran for two hours on nine cloud servers, returning a model with 75% predictive accuracy. After adding a few additional parameters, we were up to 80%. Our equation pinpointed six variables with significant predictive power: assist-to-turnover ratio (regular season), average scoring differential (regular season), field goal percentage (regular season), three-point field goal attempts (regular season), active win/loss streak, and tournament seed. For each match-up in the 2014 bracket, we ran a simple “symmetric” simulation, plugging in statistics for both schools and deeming the team with the higher output the winner.

How did we do? In spite of the wildest tournament in recent memory, including the lowest-combined seeds ever to reach the championship, Eureqa did impressively well. Mr. Silver’s model, which perhaps required weeks of manpower and leveraged hundreds of variables, correctly predicted 4/8 teams to reach the “Elite Eight”, 1/4 teams to reach the “Final Four”, and 0/2 teams to reach the “Championship”. Eureqa, which required two days of casual work by one of our software developers (and for the sake of time excluded many potentially interesting variables included by Silver, such as pre-season rankings, player injuries, and geography), correctly predicted 4/8 “Elite Eight” teams, 2/4 “Final Four” teams, and 0/2 “Championship” teams.

Without having to know anything about our input variables, their relative importance, or even the sport itself, we became basketball gurus equipped with an informative mathematical model that identified the core “drivers” of tournament wins, and beat out one of the most prominent statisticians in the world. The total cost? $7.56 for compute time in the cloud and a handful of chocolate-covered almonds to keep Dylan happy while he input data.

If Eureqa has this sort of predictive and prescriptive capability for something as volatile as college basketball games, imagine the impact it could have in your business: possessing the ability to not only understand what will happen, but when it will happen and why it will happen. This simple example switches from “Wisconsin will beat Arizona in the tournament, because it will excel at these specific aspects of the game” to “We should charge Jon this amount for an insurance premium, because here are the 12 variables that truly matter in assessing his risk.” Or, “We should keep these products in stock next month to maximize revenue, because here are our most important sales drivers during unseasonably warm winters.”

Mr. Silver, we’ll see you at the next tournament. In the meantime, download your free trial of Eureqa and let us know what you think.

Topics: Big data, Eureqa, March Madness, Nate Silver

How Nutonian almost won $1B

Posted by Jess Lin

02.04.2014 01:00 PM

2014MarchMadnessWe journeyed out to Chicago for the first day of March Madness, armed with Eureqa and our final bracket for the NCAA tournament. While the chaos of this year’s March Madness may have cheated us out of Buffett’s $1B, we have enjoyed a pretty decent bracket (with what appears to be more successful predictions this weekend). It’s not cheating to use Eureqa™ to make confident bracket predictions (and explain them!) without spending hours watching ESPN or poring over analyst reports.

Why March Madness?

March Madness is one of the most popular annual sporting events in the US, and for those of us who are less than basketball savvy, it was an incredibly fun challenge. Quicken Loans offered $1B to anyone that could accurately predict the winners of all 63 games (OK, this may have had an influence on our decision as well). More commonly known as Buffett’s Billion Dollar Challenge, the contest sparked intense excitement as thousands searched for the hidden keys to unlocking this perfect bracket.

At Nutonian, our secret weapon was Nutonian’s cognitive computing engine, Eureqa™. Eureqa™ is capable of automatically discovering causal relationships within complex data structures – and NCAA stats are quite a complex beast. Any of the basketball pundits that made bracket predictions this year would far outstrip the combined basketball knowledge of all of us at Nutonian – no contest. But once we started pulling some basic data together, we could still quickly create a predictive model with high accuracy.

What data did we choose? We only began pulling data a few days before the tournament began, so we started with the basics from the Kaggle March Machine Learning Mania competition as well as the NCAA’s computerized stats. We were also advised to include distance data (game by game distance from home court) and Ken Pomeroy’s strength of schedule statistics. Even though there were many other important stats we could have included, we still ended up with a dataset that had almost 4,000 rows and 60+ columns. Not something that you or I could sift through to decipher by hand.

The Nutonian Difference

To be completely honest, some of the stats we gathered are still a complete mystery to us. But the beauty of using Eureqa is that it doesn’t matter. We just give Eureqa the data, and it tells us what’s important and why. After running for 864 core hours (12 hours on 9 machines for a total bill of $7.56 from AWS), we reached a model with ~75% accuracy. What did Eureqa give us?

win = logistic(2.33e5
-3.56e5 *
THEN greater(seedoseed, Ratio),
ELSE logistic(
THEN seed + tanh(Ratio) – oseed,
) / Pyth_1

What does this equation mean? If a team faced an opponent with a lower seed, the best determinant of the game winner was if the seed difference was greater than the team’s assist / turnover ratio. Otherwise, defense was the name of the game, with rebound margin and assist / turnover ratio acting as the best determinants of the game winner, though teams that faced stronger teams through the season were penalized less heavily.


Does Eureqa know how to play basketball? Does Eureqa know who UCLA is? Does Eureqa care how much time we spent pulling all this data together?? No. But look at the beauty of this simple model that Eureqa was able to discover. Without any prior knowledge of the game of basketball, Eureqa was able to expose the meaningful variables in a model that makes actual basketball sense and can be easily explained to others.

Final Thoughts

Sadly, no one at Nutonian is a billionaire yet. But given a system with less inherent chaos, such as sales forecasting or customer retention, Eureqa can help you become one. We’ve helped our customers pinpoint sales drivers, improve manufacturing processes, stay ahead of market trends, and more. Reaching this level of context and understanding provides actionable outcomes with Eureqa’s vertically focused application modules for retail, telecommunications, financial services, life sciences and utilities.

Watch the recording of our live meet-up for some of the insightful commentary we heard from attendees. Think about how you could apply Eureqa to your own data and let us know some of your ideas in the comments!


Topics: Chicago, Eureqa, March Madness, Meetup

March Madness Meets Moneyball

Posted by Jess Lin

17.03.2014 10:30 AM

This Thursday, 3/17, Nutonian will be travelling to Chicago for an entertaining Meetup where they’ll take on Warren Buffett’s Billion Dollar March Madness Challenge. In case you haven’t heard, Warren Buffett is offering one billion dollars ($1,000,000,000) to anyone who fills out a perfect NCAA basketball championship bracket.

We will be building our NCAA bracket with Eureqa™ using data from Kaggle’s March Machine Learning Mania competition, as well as a number of other sources. Using Eureqa™ to analyze the data allows us to combine domain expertise with the power to automatically derive meaning from the massive quantity of data available. Leave subjective opinions out and uncover the hidden keys for filling out a perfect bracket!

If you’re in the Chicago area, come out and join us! Otherwise, stay tuned for our submitted bracket and let us know what you think.

To join the Chicago meetup, register here.

March MadnessEureqa Cognitive Computing

Topics: Big data, Eureqa, March Madness

Follow Me

Posts by Topic

see all