Blog

Nutonian Takes its Shot at NHL Playoff Predictions

Posted by Jon Millis

16.04.2015 05:14 PM

The National Hockey League: where grown men skate after a small cylinder, whack it with sticks and beat each other to a pulp. God Bless America (and Canada). Hockey is popular for good reason. It’s fast, action-packed and highly-skilled. It also produces a decent amount of freely available data for fans – and now software – to crunch and predict the winners of this year’s holy grail, the 2015 Stanley Cup. Here’s how our virtual data scientist, Eureqa, believes this year’s playoffs will shake out:

 

NHL_Bracket_15We pulled team statistics (about 200 variables) from puckalytics.com and hockey-reference.com for the past five seasons to examine how regular season performance governs playoff performance. For the sake of this exercise, we excluded much potentially valuable data, such as teams stats dating back more than five years and advanced metrics that are not available to the world. Our resulting analytical model had 77% accuracy for playoff picks and placed particularly high predictive power on a few variables:

  • Team normalized Corsi (total shots taken) for 5-on-5 play
  • Late season record
  • Penalty kill %
  • Regular season head-to-head record
  • Regular season rating (“Simple Rating System”; takes into account average goal differential and strength of schedule)

The puck drops at 7pm EST tonight. We’ll see if, once again, regular season play is a strong predictor for achievement in the playoffs, or if this year was meant to be wild.

Topics: Big data, Eureqa, Machine Intelligence, NHL Playoffs

Do Investors Rely on Too Much Information or Not Enough?

Posted by Jon Millis

07.04.2015 10:00 AM

Information is beautiful. It helps us learn about the world around us. It helps us better understand people and subject matters. And in the professional world, it helps us make informed decisions that are likely to help us achieve specific goals.

In finance, those goals frequently come down to success in investing. But as financial reporter Maria Bartiromo pointed out in her recent blog post, this information age comes with a pivotal caveat: it’s remarkably difficult to mine, analyze and digest everything, because there’s a whole lot of it: “The onus is on the individual to not get lost in the noise and analyze and act on portions of it that they believe to be important.”

Investing is particularly tricky. In manufacturing, for example, you’re likely to collect a finite amount of data about a problem. You’re trying to improve the fatigue life of a material: you might collect information about applied load, temperature, oxygen content, cooling rate, etc. We know, or at least have an idea, about the variables in play. Investing, on the other hand, is intertwined with so many external factors – weather patterns, energy shocks, political instability, recent executive changes, and interest rates in developing countries – that deciphering what’s driving financial performance and to what extent is excruciatingly difficult. As a result, nearly all investment firms focus on financial and economic indicators and disregard other data; it’s simply too time-consuming to isolate its impact.

Yet sometimes this other data is exceptionally important. Firms are pretending to exist in a vacuum. Using a subset of purely financial metrics to predict future asset performance is the equivalent of Brian Fantana in “Anchorman” unveiling his Sex Panther cologne and boasting to his colleagues that “60% of the time, it works every time”. It just doesn’t make sense. Acting off of wildly incomplete information doesn’t have to be the norm for quantitative financial analysis. A disruptive trend like Machine Intelligence™, pioneered by Eureqa, enables users to throw any potentially influential variables into their models, and Eureqa will churn through the data to identify the most significant factors that drive financial performance and automatically build the most accurate models possible. Eureqa can generate incomparably complete predictive models, because including additional data sets is fast, simple, and improves accuracy without increasing complexity for the end user.

As Ms. Bartiromo well knows, information is ubiquitous. But where we believe she’s wrong is that the onus will continue to be on the individual to detect the signal from the noise and analyze/act on the information that he/she believes to be important. For too long, we’ve been relying on individual peddling to power financial analyses. It’s time for a hydroelectric dam.

Topics: Big data, Eureqa, Machine Intelligence

Demystifying Data Science, Part 1: My Transition from Infrastructure to Machine Intelligence

Posted by Lakshmikant Shrinivas

06.04.2015 10:00 AM

A lot of people thought I was crazy for leaving one of the hottest, most innovative big data companies of the last 10 years, to join another start-up.

I graduated from UW-Madison with a PhD in databases, and worked for several years as a systems software engineer at Vertica, as deep down in the guts as you can imagine: C++, multi-threaded programming, distributed systems, process management. Towards the end of my stint there, I was leading the analytics team.

The analytics team was responsible for creating a whole slew of analytic plugins for the Vertica database engine. These plugins provided functionality like geospatial capabilities and data mining algorithms such as linear regression, SVM, etc. In the early stages, I spoke to several customers to get some feedback to guide development. The conversations usually went like this:

Me: “We’re thinking of building a library of data mining functions – things like linear regression and support vector machines – to provide predictive analytics. We were hoping to get your thoughts on which algorithms you’d find most useful.”

Customer: “Predictive analytics sounds wonderful! However, how do we tell what algorithms could be used for our business problems?”

Me: “That would be something your data scientist would know.”

Customer: “Our data-what?”

After a couple of conversations like that, we got better at targeting customers that had data scientists working for them, from whom we got the feedback we were looking for. However, this made me realize that even though tools like Vertica and Tableau have solved the problems of capturing, processing and visualizing huge quantities of data, predictive modeling is currently a very human-intensive activity. In addition, from what I can tell, data scientists are a pretty scarce resource!

Enter Nutonian. The first time I had a conversation with Michael Schmidt (the founder of Nutonian), I was very impressed with Eureqa’s ability to automatically build predictive models that are easily understandable by a non-data scientist, like me. The Eureqa core engine is able to automatically discover non-linear relationships in data: essentially a set of mathematical equations that hold true over the range of the data. I realized that this technology has the potential to really disrupt the market by making predictive analysis accessible to the masses. That’s when I decided to join Nutonian, so I could work on really exciting and impactful technology.

Enabling users without a math background to really understand equations would require some very innovative user interfaces and visualizations. It felt like a great opportunity to learn something new and build a product that can disrupt the market. Besides, there is something very satisfying about being able to visually show what you’ve built. This would be in stark contrast to my prior work at Vertica, which was deep in the core of an analytic database – it’s very difficult to demo a SQL prompt!

Stay tuned for next week, when I share some of the interesting projects we’ve been working on in the advanced analytics team.

Topics: Big data, Demystifying data science, Machine Intelligence

Convert Data to Intelligence, Faster

Posted by Jon Millis

20.03.2015 10:00 AM

Oil_refinery

“Data might be the new oil, but a lot of us just need gasoline,” proclaimed Derrick Harris, one of our favorite technology journalists, in a recent blog post.

With this brilliant analogy, Harris excitedly reflected on the potential implications of CrowdFlower’s Data For Everyone initiative, which makes clean data sets readily available for public use, and appears to be an exciting step in the direction of data democratization. New easy-to-analyze data will naturally yield faster, more holistic intelligence. But if Data For Everyone is just one step forward, when will we take our next giant leap?

Soon. In the analogy of oil and data, businesses are oil companies, constantly striving to churn oil (data) into gasoline (valuable information). Along comes the invention of fracking (CrowdFlower). Fracking enables access to more oil (externally available data sets) – and provides cleaner, more environmentally-friendly natural gas to boot (data sets that are clean and analysis-ready).** Now, let’s pipeline that oil/natural gas over to our refineries (analysts, data scientists, statisticians and consultants), so they can grind away for weeks with machines from decades ago to massage the crude oil into gasoline.

Wait, hang on. Weeks? The oil company executives want gasoline today. But two of our most important refinery tools (SPSS and SAS) were built in the late ’60s and ’70s. Tell the execs they’re out of luck. Well, the execs say our ability to produce gasoline could spell the difference between us dominating the energy industry and becoming a niche regional player. Figure it out. (Before you knowingly nod and begin pounding your head against a table, we encourage you to keep reading.)

It’s time for some new equipment. Eureqa is the industry’s first automated, high-yield “oil” refinery that’s easy enough for almost any business or technical user to operate. Ingest your data into Eureqa, and watch it beautifully process into pure, high-grade “gasoline”: predictive analytic models that visualize and explain in plain English what’s causing business outcomes.

Eureqa generates hundreds of millions of potential models per second, only presenting the end user with the best ones, ranging from the simple (fewer explanatory variables, high accuracy) to the complex (many explanatory variables, extreme accuracy). Drill into your favorite model to dig deeper: What are the most important explanatory variables and relationships driving my business/process? What steps should I take to maximize my desired outcome?

The best part? You don’t even have to have the skill-set of a data scientist to turn your data into business fuel. Request a demo to see for yourself.

 

**For the sake of the analogy, we’ll ignore fracking’s potential negative side-effects like contaminated groundwater…and earthquakes.

 

Topics: Big data, Eureqa

Stop Talking About Big Data

Posted by Jess Lin

26.08.2014 10:00 AM

No buzzwords. Can you really explain what “Big Data” is? If you wade through the hype for long enough, you should see that big data isn’t about the data at all – but you’ll have a long way to wade first. It’s about finally bridging the divide between potential and action. It’s about moving us from a generalizable world where machines and even people are lumped into amalgamous buckets that are all supposed to behave the exact same way into a personalized world where everything is recognized for its distinctive traits and unique qualities. But thanks to a highly technical origin combined with rapid enterprise adoption, big data is still a very amorphous concept that people struggle to define yet can’t ignore.

With the exponentially increasing investments in big data infrastructures and data lakes, everyone is wondering: now what? Forrester analysts recently completed an exhaustive study of the big data phenomenon and discovered that the fundamental ability of technology to drive businesses into the future is warring with mass business confusion on where to even begin. Success stories do exist, as Forrester found, and those successes are driving innovation in every market and industry. But for every success story, there are hundreds more companies that don’t understand why the new Hadoop project that was siloed into the IT department is bleeding money and time, instead of making money and saving time.

 

“As data explodes, so do old ways of doing business” – Forrester Research

 

To drive real, actionable value, companies need to focus on the true value of big data: competitive advantage. The problem is that up until now, everyone has tried to define big data as a sum of its properties and data. This line of thinking leads you down the wrong path. Obsessing over how you data may or may not fit into the three V’s or hoarding data into enormous data lakes will not magically lead to profitability or innovation. Even good data scientists with no business context can only do so much, not to mention how difficult it is to even find those data scientists in the first place.

We rely on people to be domain experts and employ creative thinking, while we rely on our computers to be consistent and blazingly fast. So why does the idea of “big data” make us try to force one to do the job of both? It’s not about big data, it’s about looking at the big picture. If you want to unlock the keys to competitive advantage in this hyper-connected world, you need to leverage all your assets to interact together. Use tools that allow you to quickly focus on the right data and move past opaque “insights” that don’t help you adjust for the ever-changing future. And most importantly, make sure you get the right data in the hands of the people who can actually make a difference.

Tools like Nutonian’s Eureqa® can automate and accelerate data science for business users who aren’t math experts but know their business like the back of their hand. Contact us to learn more.

Topics: Big data

Follow Me

Posts by Topic

see all