We journeyed out to Chicago for the first day of March Madness, armed with Eureqa and our final bracket for the NCAA tournament. While the chaos of this year’s March Madness may have cheated us out of Buffett’s $1B, we have enjoyed a pretty decent bracket (with what appears to be more successful predictions this weekend). It’s not cheating to use Eureqa™ to make confident bracket predictions (and explain them!) without spending hours watching ESPN or poring over analyst reports.
Why March Madness?
March Madness is one of the most popular annual sporting events in the US, and for those of us who are less than basketball savvy, it was an incredibly fun challenge. Quicken Loans offered $1B to anyone that could accurately predict the winners of all 63 games (OK, this may have had an influence on our decision as well). More commonly known as Buffett’s Billion Dollar Challenge, the contest sparked intense excitement as thousands searched for the hidden keys to unlocking this perfect bracket.
At Nutonian, our secret weapon was Nutonian’s cognitive computing engine, Eureqa™. Eureqa™ is capable of automatically discovering causal relationships within complex data structures – and NCAA stats are quite a complex beast. Any of the basketball pundits that made bracket predictions this year would far outstrip the combined basketball knowledge of all of us at Nutonian – no contest. But once we started pulling some basic data together, we could still quickly create a predictive model with high accuracy.
What data did we choose? We only began pulling data a few days before the tournament began, so we started with the basics from the Kaggle March Machine Learning Mania competition as well as the NCAA’s computerized stats. We were also advised to include distance data (game by game distance from home court) and Ken Pomeroy’s strength of schedule statistics. Even though there were many other important stats we could have included, we still ended up with a dataset that had almost 4,000 rows and 60+ columns. Not something that you or I could sift through to decipher by hand.
The Nutonian Difference
To be completely honest, some of the stats we gathered are still a complete mystery to us. But the beauty of using Eureqa is that it doesn’t matter. We just give Eureqa the data, and it tells us what’s important and why. After running for 864 core hours (12 hours on 9 machines for a total bill of $7.56 from AWS), we reached a model with ~75% accuracy. What did Eureqa give us?
win = logistic(2.33e5
IF(seed – oseed)
THEN greater(seed – oseed, Ratio),
THEN seed + tanh(Ratio) – oseed,
) / Pyth_1
What does this equation mean? If a team faced an opponent with a lower seed, the best determinant of the game winner was if the seed difference was greater than the team’s assist / turnover ratio. Otherwise, defense was the name of the game, with rebound margin and assist / turnover ratio acting as the best determinants of the game winner, though teams that faced stronger teams through the season were penalized less heavily.
Does Eureqa know how to play basketball? Does Eureqa know who UCLA is? Does Eureqa care how much time we spent pulling all this data together?? No. But look at the beauty of this simple model that Eureqa was able to discover. Without any prior knowledge of the game of basketball, Eureqa was able to expose the meaningful variables in a model that makes actual basketball sense and can be easily explained to others.
Sadly, no one at Nutonian is a billionaire yet. But given a system with less inherent chaos, such as sales forecasting or customer retention, Eureqa can help you become one. We’ve helped our customers pinpoint sales drivers, improve manufacturing processes, stay ahead of market trends, and more. Reaching this level of context and understanding provides actionable outcomes with Eureqa’s vertically focused application modules for retail, telecommunications, financial services, life sciences and utilities.
Watch the recording of our live meet-up for some of the insightful commentary we heard from attendees. Think about how you could apply Eureqa to your own data and let us know some of your ideas in the comments!