Data Scientists Don’t Scale. Machine Intelligence Does.

Posted by Jon Millis

29.06.2015 12:30 PM

Data scientists are unicorns. We know they exist, we know they’re magical, we know they hold the answers to many of our business intelligence hopes and dreams. Unfortunately, the current market encounters a tri-fold of problems in trying to find, tame and leverage these unicorns.

  1. Data scientists are hard to find and attract.
  2. If you’re lucky enough to have a data scientist, he/she may already be overwhelmed with questions.
  3. Data scientists use pixie dust: complex programming languages and models. If a chef came to your office and cooked you a meal using pixie dust, and he promised it was going to be delicious for the company, wouldn’t you have some questions before you went feeding it to all your colleagues?
    “Chefman. What’s in this?”
    “Powdery white stuff.”
    “And what’s the white stuff made out of?”
    “Ingredients that I collected.”
    “How do I know it’s good for me?”
    “Because I’m a professional chef…”

    headdesk “

Side note: if you are a data scientist, God bless you, and come use us. You’ll be like Superman without a kryptonite (short time-to-answer cycles and transparent models).

If you’re not a data scientist, welcome back to my fantasy world, where I haven’t used the words “unicorn” or “pixie dust” since my parents left my merciless older sister home alone with me in grade school.

Stuart Frankel, a guest columnist for Harvard Business Review, recently published an article explaining that executives are finally becoming frustrated with their big data investments. Despite pumping large budgets into storage, analysis, reporting and visualization technologies, employees are still burning the midnight oil manually creating reports and interpretations of their data. Stuart notes:

To solve this problem and increase utilization of existing solutions, organizations are now contemplating even further investment, often in the form of $250,000 data scientists (if all of these tools we’ve purchased haven’t completely done the trick, surely this guy will!). However valuable these PhDs are, the organizations that have been lucky enough to secure these resources are realizing the limitations in human-powered data science: it’s simply not a scalable solution. The great irony is of course that we have more data and more ways to access that data than we’ve ever had; yet we know we’re only scratching the surface with these tools.

Instead of throwing in the towel on their big data initiatives, execs are doubling down and going data scientist-hunting. They know there’s gold in the data, but they’re having triple-bypasses trying to find it. They could wait on the sidelines and risk their competition making a breakthrough before them, or they could attempt the annexation of Puerto Rico to win the game. Naturally, as Little Giants fans, many execs are attempting the annexation of Puerto Rico.

So, where does this leave the market? Short.

Data scientists are rare commodities. The U.S. alone faces a shortage of 140,000-190,000 of them. What does that mean? An arms race for talent…all while data generation is increasing exponentially and the tools (R, SAS, Python, etc.) are remaining similar. So while the Googles, the Amazons, the ExxonMobils and the Walmarts may be able to shovel out the dough to attract data science talent, most can’t, even many companies as well-off as the Fortune 500s. And this is not a market that will quickly “clear” to match supply with demand; massive salaries have not been enough to lure more data scientists into training. One of the largest financial institutions in the world, now evaluating Eureqa, said it’s footing the bill to send 30 analysts to receive their Masters in Data Science, a program that won’t return them back to the company for years, because they were pessimistic they could fill the void in the open market. What data scientists do – curate data, ask the right questions, build explanatory analytical models, implement the models into various applications – is simply not scaling at the pace of demand.

Oh, is that all?

If we can’t address the problem from the demand side, let’s address it from the supply side. Give people the tools they want, that they need, to be successful information archaeologists and unearth transparent analytical models that communicate the patterns and relationships hidden within the data. Eureqa is automated machinery that does all the heavy lifting. We’ll help an analyst unearth dinosaurs, with museum-ready biographies attached. And with the speed and talents that data scientists already have, we’ll help them fill an entire museum.

Analytical models are the bridge between raw data and meaning. They impose structure on chaos, isolating the important factors driving a system. They explain how something works: how everything is connected to form the whole, how each individual input blends to drive an outcome. The engine of big data, of big insights, fundamentally is the analytical model. But models have shortcomings, too. They typically take weeks to hypothesize and to build. They’re inherently manual, technical. Eureqa is entirely automated, leveraging an evolutionary algorithm that searches an infinite equation space to bring the user the simplest, most accurate models that explain the data’s behavior. Models are typically difficult to interpret. Eureqa has an “explainer” section that translates the models into plain English, and even features an “interactive explainer” module that allows users to toggle changes in variable values to simulate “what-if” scenarios in real time. These are things that data scientists years ago could only dream about.

It may be true that data scientists don’t scale. But answers do. Evaluate Eureqa to see for yourself.

Topics: Eureqa, Scaling data science

Leave a Reply

Your email address will not be published. Required fields are marked *

Follow Me

Posts by Topic

see all