Business Analytics: Simple is Better. Always.

Posted by Ben Israelite

31.01.2017

Richard Branson tweet.png

Occam’s Razor, a well-known principle stating the simplest solution for a problem is often the best, has been utilized by businesses for decades to solve their most significant and complicated problems. The integration of data analytics – the pursuit of extracting meaning from raw data – into an enterprise’s decision-making process should aid in this effort. Yet, as organizations ramp up their data analytics capabilities, black box algorithms and highly convoluted predictions have been favored over concise and actionable insights.

The process of developing an analytical expression to drive successful business outcomes is very difficult. Traditionally, individuals with high-level degrees (in a STEM field) and a strong knowledge of technical tools (MATLAB, SAS, Stata, etc.) and programming languages (Python or R) spend weeks solving problems with data they spent months collecting, aggregating and transforming. The level of effort needed to implement this approach in big businesses is staggering but, seemingly, necessary in order to extract value from the vast amounts of data large companies are paying millions of dollars a year to store. The output generated through this approach, which ranges from simple linear models to black box machine learning algorithms (neural networks, SVMs, etc.), provides a prediction of what will happen but does not provide increased understanding or insights into decisive actions that can drive business results. Predictive accuracy became the most important metric in analytics because being right was prioritized over providing understanding.

The time has come for this to change. Companies must harness simplicity in order to generate significant business value moving forward. Rather than simply learning what will happen (sales tomorrow will be x), companies need to also understand why it will happen (marketing takes two weeks to influence sales, weather impacts in-person purchases, etc.).

The only way to do the latter is to build simple, interpretable (parsimonious!) models. Simple models deliver results that are as accurate as black box approaches but impact the business much more profoundly. It is time for companies to stop hiding behind their initial approach to predictive modeling and jump head first into the future of machine intelligence.

How to Master Business Planning with Time Series Forecasting

Posted by Jess Lin

17.01.2017

Every analyst report, news article, and business conference has drummed into our collective minds that predictive analytics is the way of the future. If you can predict future sales, you can funnel that knowledge into driving optimal sales and business operations. Smart businesses are leveraging big data insights to leave their competitors in the dust, or at least so they say.

As this new time series forecasting tech makes its way to the enterprise, trumpeting billions of dollars behind it, you feel ready for the challenge. You’ve learned a bewildering array of new vocabulary, sat through endless meetings on empowering a data-driven mindset, invested in top of the line data warehouses, and cracked your allotted share of jokes on the size of your Big Data. Best of all, you’ve somehow even managed to hire a crack data science team. Profit?

Not so fast.

The Three Stages of Data Science....png

Before those stacks of paper come rolling in and you deliver on all your ROI promises, let’s walk through this step by step.

  • Phase 1: Connect a beautiful data pipeline of well-cleaned and transformed data (yeah, we all wish) over to your data scientists.
  • Phase 2: Convert data into actionable insights through predictive analytics.
  • Phase 3: Feed those insights into well-executed, shareable dashboards that enable data-driven business planning decisions.

Even skipping over the problematic phases 1 and 3, let’s level set on what your data scientists can do. Even a team of rock star data scientists has limits, and working at top efficiency can still only produce a limited number of predictive models. Do you want dependable results at frequent, regular intervals? Best you can get is a single prediction for overall sales. Do you want per-unit forecasts across thousands of SKUs? Be prepared for a wait and infrequent forecast updates. Bringing in consulting firms doesn’t solve the problem either – fees can run into the multi-million dollar range for a years-long engagement that still only addresses part of your needs.

This is the situation that a major car manufacturer found themselves in when they came across Eureqa. With ever-increasing and increasingly-complex data streaming in from all sources, they had outgrown their existing, manual methods of data analysis and needed a new solution that could handle their data and provide innovative predictive analytics. Nutonian’s AI-powered modeling engine, Eureqa, seemed like the perfect fit.

Can Eureqa automatically ingest and analyze hundreds of data sources at once? Check ✓

Can Eureqa provide hundreds of highly accurate sales forecasts on a monthly basis across all levels from operations, from high-level national forecasts all the way down to individual dealers? Check ✓

Can Eureqa allow end-users to manipulate model results and simulate the impact on the sales forecast given different potential scenarios? Check ✓

Can Eureqa isolate and quantify the impact of individual attributes to the sales forecast? Check ✓

Can Eureqa identify and automatically recommend highest-impact actions for both a corporate strategist as well as an individual dealership? Check ✓

So we did all of that, and wrapped it up into a user-friendly application that works across the entire business team. With a marketing budget that runs into the millions, corporate can optimize their ad spend mix by using Eureqa to pinpoint where to increase spend and where spend can be safely decreased. At the same time, managers can use Eureqa to evaluate upcoming sales forecasts for their dealerships and design the best action plan for success, both short-term and long-term. By providing actionable insights and leveraging AI to automate interpretable predictive modeling, Eureqa is the “wheel deal” for this client.

Eureqa doesn’t stop at forecasting sales, either. Sales forecasts can be leveraged into powerful use cases like optimizing staffing, pricing, inventory, supply chain, marketing, and store expansion. Eureqa’s fundamental time series forecasting approach can also be applied to detecting financial fraud, diagnosing equipment predictive maintenance, building a portfolio trading strategy, and more.

What could you do with thousands of accurate, automated, and actionable time series forecasts? Reach out to us at to see how we can work together!

Eureqa Hits Wall Street; Automatically Identifies Key Predictive Relationships

Posted by Jason Kutarnia

01.12.2016

As a team of data scientists, analysts and software developers, we didn’t expect to be praised as financial gurus. But in an industry of ever-present uncertainly and huge financial gains and losses at stake, Eureqa, the dynamic modeling engine, displays a unique competitive advantage in the technology stack: the ability to quickly derive extremely accurate and simple-to-understand models that predict what will happen in the future, and why.

Typically, Wall Street employs elite squadrons of quants and analysts to build models to make forecasts about where individual stocks and other financial instruments are headed. Some firms, such as the consistently elite hedge funds, make delightful profits by “beating the market,” i.e., outperforming an industry-standard index like the S&P 500. Other financial institutions make their money simply off of the fees they charge for commissions. The laggards have significant room for improvement, where instead of leveraging only industry news and well-known metrics like return on equity, price/earnings ratio and idiosyncratic volatility, they could use stockpiles of data to search for signals and early indicators that an investment is primed to tumble or soar. Hunches and over-simplified metrics should be a thing of the past, and the proof should be in the pudding (the data). Some things, like natural disasters and leadership changes, are not always part of the data, but for everything else…there’s Mastercard. Err, Eureqa.

And for those overachievers – the hedge funds, the private wealth management firms, the day traders – who think they have mastered their own domain, we’re here to tell you, there’s a lot of room for improvement. Financial models are time-consuming to build, often to the tune of weeks or months to refine…and meanwhile, the markets, whether moving up or down, are making people money while you’re on the sidelines crafting your models. In addition to the time sink, manual human-made models with tools like R and SAS are not as accurate as they come, nor are they easy to interpret. The result is that firms are leaving millions on the table, and not understanding why the markets or assets behave as they do. It’s one thing to predict that real estate will beat the market in 2017, based on an algorithm that contains 2,000 variables and mind-numbingly complex transformations of those variables. But what if I could accurately predict that real estate in the Northeast U.S. will appreciate 10-12%, while I should leave the Midwest untouched, and the “drivers” of this growth will be 4 truly impactful variables: demographic growth of Millennials moving into the cities, wage increases, job growth, and a slowing of new construction permits. I could not only make more money, but I could justify all of my investments beforehand with a comprehensive understanding of “how things work.”

In order to validate Eureqa’s approach to a major investment firm, I built a simple trading strategy using the stocks in the S&P 500. The goal was to forecast whether a stock’s excess monthly return – the difference between the stock’s return and the overall S&P 500 return – would be positive or negative. In our strategy, we bought a stock if Eureqa predicted its excess return would be positive, and we shorted any stocks Eureqa thought would be negative.

Immediately, the client saw the enormous value of Eureqa. Leveraging publicly available data sets through 2014, in a matter of a few hours Eureqa created classification models unique to each industry (retail, finance, technology, etc.), and we plugged individual companies into the models to predict whether the stock would achieve excess return for 2015. We then hypothetically created a simple, equal portfolio of the predicted “overachievers”. Remarkably, Eureqa’s anticipated winning portfolio achieved a compound excess return of 14.1% for the following year, compared with the S&P 500’s disappointing -.7%. Not only was our portfolio’s performance exceptional, but so was our fundamental understanding of the causes of its success. We could convey to our hypothetical clients, bosses and others that not only did our strategy work this year, but it’s likely to work again next year, because some of the key drivers of excess returns for stock X are variables Q, R, S, T, U and V, and this is how it’ll move in the context of the current economy. In a matter of hours, with Eureqa at my side, a graduate student in tissue motion modeling transformed into a powerful financial analyst with a theoretical market-beating investment portfolio. Now, imagine what this application could do with even more data, and in the hands of a true industry expert…

Machine Intelligence with Michael Schmidt: Searching data for causation

Posted by Michael Schmidt

27.07.2016

The holy grail of data analytics is finding “causation” in data: identifying which variables, inputs, and processes are driving the outcome of a problem. The entire field of econometrics, for example, is dedicated to studying and characterizing where causation exists. Actually proving causation, however, is extremely difficult, typically involving carefully controlled experiments. To even get started, analysts need to know which variables are important to include in the evaluation, which need to be controlled for, and which to ignore. From there, they can build a model, design an experiment to test its causal predictions, and iterate until they arrive at a conclusion.

Proving causation relies heavily on these smart assumptions. What if you forgot to control for age, demographics, or socioeconomic conditions? It’s difficult to figure out how to start framing the problem to analyze causal impact. But this is a task that machines were born to solve.

There are two important steps required to identify causation: 1) among many possible variables, finding the few that are actually relevant, and 2) given a limited set of variables, executing the transformations needed to reveal the extent of each variable’s impact.


For the first time, there exists software that helps companies reliably determine causation from raw, seemingly chaotic data.

People often use Eureqa for its ability to start from the ground up and “think like a scientist,” sifting through billions of potential models, structures, and nonlinear operations from scratch to create the ideal analytical model for your unique dataset – without needing to know the important variables or model algorithm ahead of time. Eureqa’s modeling engine effectively generates theories of causation via its processes of building analytical models from a dataset. Eureqa doesn’t attempt to prove causality on its own, but instead yields a very special form of model that can be interpreted physically for causal effects.

One of the biggest open problems in machine learning (and analytics in general) is avoiding spurious correlations and similar non-causal effects. In fact, there’s likely no perfect solution despite the advances we’ve made; ultimately a person needs to interpret the findings and provide context not contained by the data alone. One of the most-used visuals in Eureqa is the covariates window and the ability to block and replace variables from a model – features we’ve added specifically to interact with users to model complex systems.

There is some exciting research taking place, however, connecting Eureqa to live biological experiments to automatically guide experimentation and test predictions. While this research is still on-going, perhaps a physical robot scientist is around the corner.

The "First Mover's" Analytics Stack, 2015 vs. 2016

Posted by Jon Millis

01.07.2016

The irony of data science is the glacial and blazing speed at which the industry seems to move. It’s been more than 10 years since the origin of the phrase “big data”, and yet what we initially set out to accomplish – extracting valuable answers from data – is still a painstaking process. Some of this could be attributed to what Gartner refers to as the “Hype Cycle”, which hypothesizes that emerging technologies experience a predictable wave of hype, trials and tribulations before the they hit full-scale market maturity: technology trigger → peak of inflated expectations → trough of disillusionment → slope of enlightenment → plateau of productivity.

The true skeptics call it all a data science bubble. But answer me this. If we’re in the midst of a bubble, how can we explain the sustained, consistent movement of tech luminaries and innovators into the market over the course of years and years? Sure, a healthy economy is full of new competitors competing for market share, creative destruction, and eventual consolidation, but take a look at this diagram and try to explain how so many people could be so wrong about data science? It’s hard to imagine we’re in a bubble when all around us is an indefinitely growing ecosystem of tools, technologies and investment. As we’re well aware, nothing bad happened after heaps of money were piled into mortgage-backed securities in the early 2000s, and oil speculators have made a killing off of $5/gallon gas prices in 2016.

We kid, we kid. Of course there are illogical investments and industries that miss, but we maintain our belief that there is astounding value in data. Not all companies have capitalized on it yet, but the problems, the dollars, and the benefits to society as a whole are real. Data science is here to stay.

With an ecosystem now wildly overwhelming with tools, approaches and technologies, how can we understand general market trends? What kinds of tools and technologies make up a typical company’s analytics “stack”? More importantly, where are the “first movers” moving and making investments to capitalize on data? To find out, we share general insights we’ve gleaned from talking with our customer base and clients, a mix of Fortune 500 behemoths and data-driven start-ups.

Here’s what the 2015 analytics stack looked like:


Let’s take an outside-in approach, beginning with the raw data and getting closer and closer to the end user.

Data preparation – The cleansing layer of the ecosystem, where raw streams of data are prepped for storage or analysis. Ex., Informatica, Pentaho, Tamr, Trifacta

Data management – The data storage and management layer of the ecosystem, where data sits either structured, semi-structured or unstructured. Ex., ArcSight, Cloudera, Greenplum, Hortonworks, MapR, Oracle, Splunk, Sumo Logic, Teradata Aster, Vertica

Visualization – The visualization and dashboarding layer of the ecosystem, where business analysts can interact with, and “see”, their data and track KPIs. Ex., Microstrategy, Qlik, Tableau

Statistical – The statistical layer of the ecosystem, where statisticians and data scientists can build analytical and predictive models to predict future outcomes or dissect how a system/process “works” to make strategic changes. Ex., Cognos, H2O, Python, R, RapidMiner, SAS, SPSS

Simple enough, right? The most data-savvy organizations make it look like a cakewalk. But take a closer look, and you’ll notice there’s a significant difference between the outer two “orbits” and the inner orbit: the inner orbit is fragmented. This does not fit with the smooth flow of the rest of the solar system.

Why are two systems occupying the same space? Because they’re both end-user analyst and data science tools that aim to deliver answers to the business team. Nutonian’s bashfully modest vision is to occupy the entire inner sphere of how people extract answers from data, with the help of “machine intelligence”. While Nutonian’s AI-powered modeling engine, Eureqa, plays nicely with statistical and visualization tools via our API, we’re encouraging companies who are either frustrated by their lack of data science productivity or who have greenfield projects to invest in Eureqa as their one-size-fits-almost-all answers machine.

Our vision is to empower organizations and individual users to make smart data-driven decisions in minutes. Eureqa automates nearly everything accomplished in the statistical layer and the visualization layer of the analytics stack – with the exception of the domain expert himself, who’s vital to guiding Eureqa in the right direction. The innovative “first movers” in 2016 are putting the data they’ve collected to good use, and consolidating the asteroid belt of tools and technologies banging together in the inner orbit of their solar systems. It’s the simple law of conservation of [data science] energy.

