The irony of data science is the glacial and blazing speed at which the industry seems to move. It’s been more than 10 years since the origin of the phrase “big data”, and yet what we initially set out to accomplish – extracting valuable answers from data – is still a painstaking process. Some of this could be attributed to what Gartner refers to as the “Hype Cycle”, which hypothesizes that emerging technologies experience a predictable wave of hype, trials and tribulations before the they hit full-scale market maturity: technology trigger → peak of inflated expectations → trough of disillusionment → slope of enlightenment → plateau of productivity.
The true skeptics call it all a data science bubble. But answer me this. If we’re in the midst of a bubble, how can we explain the sustained, consistent movement of tech luminaries and innovators into the market over the course of years and years? Sure, a healthy economy is full of new competitors competing for market share, creative destruction, and eventual consolidation, but take a look at this diagram and try to explain how so many people could be so wrong about data science? It’s hard to imagine we’re in a bubble when all around us is an indefinitely growing ecosystem of tools, technologies and investment. As we’re well aware, nothing bad happened after heaps of money were piled into mortgage-backed securities in the early 2000s, and oil speculators have made a killing off of $5/gallon gas prices in 2016.
We kid, we kid. Of course there are illogical investments and industries that miss, but we maintain our belief that there is astounding value in data. Not all companies have capitalized on it yet, but the problems, the dollars, and the benefits to society as a whole are real. Data science is here to stay.
With an ecosystem now wildly overwhelming with tools, approaches and technologies, how can we understand general market trends? What kinds of tools and technologies make up a typical company’s analytics “stack”? More importantly, where are the “first movers” moving and making investments to capitalize on data? To find out, we share general insights we’ve gleaned from talking with our customer base and clients, a mix of Fortune 500 behemoths and data-driven start-ups.
Here’s what the 2015 analytics stack looked like:
Let’s take an outside-in approach, beginning with the raw data and getting closer and closer to the end user.
Data preparation – The cleansing layer of the ecosystem, where raw streams of data are prepped for storage or analysis. Ex., Informatica, Pentaho, Tamr, Trifacta
Data management – The data storage and management layer of the ecosystem, where data sits either structured, semi-structured or unstructured. Ex., ArcSight, Cloudera, Greenplum, Hortonworks, MapR, Oracle, Splunk, Sumo Logic, Teradata Aster, Vertica
Visualization – The visualization and dashboarding layer of the ecosystem, where business analysts can interact with, and “see”, their data and track KPIs. Ex., Microstrategy, Qlik, Tableau
Statistical – The statistical layer of the ecosystem, where statisticians and data scientists can build analytical and predictive models to predict future outcomes or dissect how a system/process “works” to make strategic changes. Ex., Cognos, H2O, Python, R, RapidMiner, SAS, SPSS
Simple enough, right? The most data-savvy organizations make it look like a cakewalk. But take a closer look, and you’ll notice there’s a significant difference between the outer two “orbits” and the inner orbit: the inner orbit is fragmented. This does not fit with the smooth flow of the rest of the solar system.
Why are two systems occupying the same space? Because they’re both end-user analyst and data science tools that aim to deliver answers to the business team. Nutonian’s bashfully modest vision is to occupy the entire inner sphere of how people extract answers from data, with the help of “machine intelligence”. While Nutonian’s AI-powered modeling engine, Eureqa, plays nicely with statistical and visualization tools via our API, we’re encouraging companies who are either frustrated by their lack of data science productivity or who have greenfield projects to invest in Eureqa as their one-size-fits-almost-all answers machine.
Our vision is to empower organizations and individual users to make smart data-driven decisions in minutes. Eureqa automates nearly everything accomplished in the statistical layer and the visualization layer of the analytics stack – with the exception of the domain expert himself, who’s vital to guiding Eureqa in the right direction. The innovative “first movers” in 2016 are putting the data they’ve collected to good use, and consolidating the asteroid belt of tools and technologies banging together in the inner orbit of their solar systems. It’s the simple law of conservation of [data science] energy.