Data Scientists Don’t Scale. Machine Intelligence Does.

Posted by Jon Millis

29.06.2015 12:30 PM

Data scientists are unicorns. We know they exist, we know they’re magical, we know they hold the answers to many of our business intelligence hopes and dreams. Unfortunately, the current market encounters a tri-fold of problems in trying to find, tame and leverage these unicorns.

  1. Data scientists are hard to find and attract.
  2. If you’re lucky enough to have a data scientist, he/she may already be overwhelmed with questions.
  3. Data scientists use pixie dust: complex programming languages and models. If a chef came to your office and cooked you a meal using pixie dust, and he promised it was going to be delicious for the company, wouldn’t you have some questions before you went feeding it to all your colleagues?
    “Chefman. What’s in this?”
    “Powdery white stuff.”
    “And what’s the white stuff made out of?”
    “Ingredients that I collected.”
    “How do I know it’s good for me?”
    “Because I’m a professional chef…”

    headdesk “

Side note: if you are a data scientist, God bless you, and come use us. You’ll be like Superman without a kryptonite (short time-to-answer cycles and transparent models).

If you’re not a data scientist, welcome back to my fantasy world, where I haven’t used the words “unicorn” or “pixie dust” since my parents left my merciless older sister home alone with me in grade school.

Stuart Frankel, a guest columnist for Harvard Business Review, recently published an article explaining that executives are finally becoming frustrated with their big data investments. Despite pumping large budgets into storage, analysis, reporting and visualization technologies, employees are still burning the midnight oil manually creating reports and interpretations of their data. Stuart notes:

To solve this problem and increase utilization of existing solutions, organizations are now contemplating even further investment, often in the form of $250,000 data scientists (if all of these tools we’ve purchased haven’t completely done the trick, surely this guy will!). However valuable these PhDs are, the organizations that have been lucky enough to secure these resources are realizing the limitations in human-powered data science: it’s simply not a scalable solution. The great irony is of course that we have more data and more ways to access that data than we’ve ever had; yet we know we’re only scratching the surface with these tools.

Instead of throwing in the towel on their big data initiatives, execs are doubling down and going data scientist-hunting. They know there’s gold in the data, but they’re having triple-bypasses trying to find it. They could wait on the sidelines and risk their competition making a breakthrough before them, or they could attempt the annexation of Puerto Rico to win the game. Naturally, as Little Giants fans, many execs are attempting the annexation of Puerto Rico.

So, where does this leave the market? Short.

Data scientists are rare commodities. The U.S. alone faces a shortage of 140,000-190,000 of them. What does that mean? An arms race for talent…all while data generation is increasing exponentially and the tools (R, SAS, Python, etc.) are remaining similar. So while the Googles, the Amazons, the ExxonMobils and the Walmarts may be able to shovel out the dough to attract data science talent, most can’t, even many companies as well-off as the Fortune 500s. And this is not a market that will quickly “clear” to match supply with demand; massive salaries have not been enough to lure more data scientists into training. One of the largest financial institutions in the world, now evaluating Eureqa, said it’s footing the bill to send 30 analysts to receive their Masters in Data Science, a program that won’t return them back to the company for years, because they were pessimistic they could fill the void in the open market. What data scientists do – curate data, ask the right questions, build explanatory analytical models, implement the models into various applications – is simply not scaling at the pace of demand.

Oh, is that all?

If we can’t address the problem from the demand side, let’s address it from the supply side. Give people the tools they want, that they need, to be successful information archaeologists and unearth transparent analytical models that communicate the patterns and relationships hidden within the data. Eureqa is automated machinery that does all the heavy lifting. We’ll help an analyst unearth dinosaurs, with museum-ready biographies attached. And with the speed and talents that data scientists already have, we’ll help them fill an entire museum.

Analytical models are the bridge between raw data and meaning. They impose structure on chaos, isolating the important factors driving a system. They explain how something works: how everything is connected to form the whole, how each individual input blends to drive an outcome. The engine of big data, of big insights, fundamentally is the analytical model. But models have shortcomings, too. They typically take weeks to hypothesize and to build. They’re inherently manual, technical. Eureqa is entirely automated, leveraging an evolutionary algorithm that searches an infinite equation space to bring the user the simplest, most accurate models that explain the data’s behavior. Models are typically difficult to interpret. Eureqa has an “explainer” section that translates the models into plain English, and even features an “interactive explainer” module that allows users to toggle changes in variable values to simulate “what-if” scenarios in real time. These are things that data scientists years ago could only dream about.

It may be true that data scientists don’t scale. But answers do. Evaluate Eureqa to see for yourself.

Topics: Eureqa, Scaling data science

Demystifying Data Science, Part 3: Scaling data science

Posted by Lakshmikant Shrinivas

15.05.2015 09:45 AM

In my last post in this series, I spoke about what goes into a data science workflow. The current state of the art in data science is not ideal; the value of data is limited by our understanding of it, and the current process to go from data to understanding is pretty tedious. The right tools make all the difference. Imagine cutting a tree with an axe instead of a chainsaw. If you were cutting trees for a living, wouldn’t you prefer the chainsaw? Even if you only had to cut trees occasionally, wouldn’t you prefer a chainsaw, because, well, chainsaw! The key here is automation. Ideally you want as much of a process automated as you can, for the sake of productivity.

With data science, the two major bottlenecks are wrangling with data and wrangling with models. Wrangling with data involves gathering and transforming data into a form suitable for modeling. There are several companies that deal with data wrangling – for example, the entire ETL industry. Wrangling with models involves creating and testing hypotheses, building, testing and refining features and models. Eureqa can help with the model wrangling. It is the chainsaw that completely automates the process of creating, testing, refining and sharing models.

As I mentioned in last post, the goals of modeling are pretty simple to express. We want to figure out if a) all terms in our model are important, and b) we’ve missed any term that would improve the accuracy significantly. Eureqa uses evolutionary algorithms to automatically create and test linear as well as non-linear models, sort of like the infinite monkey theorem. Except in our case, with the advances in computation, and of course our secret sauce, the “eventually find models” practically translates to a few minutes or hours – much faster than any human could do it.

If you pause for a moment to think about it, it’s pretty powerful and liberating. As a would-be data scientist, using such a tool frees up your time to focus on the more creative aspects of data science. For example, what other data could we pull in that might affect our problem? What other types of problems could we model with our data? As a non-data scientist, using such a tool lowers the barrier to entry for modeling. Imagine having a personal army of robotic data scientists at your beck and call.

For me, this is one of the most exciting aspects of Nutonian’s technology. While most of the world is still talking about scaling analytics to ever-growing amounts of data, Eureqa can scale analytics to the most precious resource of all: people.

Topics: Big data, Demystifying data science, Scaling data science

Follow Me

Posts by Topic

see all