Blog

Data Scientists Don’t Scale. Machine Intelligence Does.

Posted by Jon Millis

29.06.2015 12:30 PM

Data scientists are unicorns. We know they exist, we know they’re magical, we know they hold the answers to many of our business intelligence hopes and dreams. Unfortunately, the current market encounters a tri-fold of problems in trying to find, tame and leverage these unicorns.

  1. Data scientists are hard to find and attract.
  2. If you’re lucky enough to have a data scientist, he/she may already be overwhelmed with questions.
  3. Data scientists use pixie dust: complex programming languages and models. If a chef came to your office and cooked you a meal using pixie dust, and he promised it was going to be delicious for the company, wouldn’t you have some questions before you went feeding it to all your colleagues?
    “Chefman. What’s in this?”
    “Powdery white stuff.”
    “And what’s the white stuff made out of?”
    “Ingredients that I collected.”
    “How do I know it’s good for me?”
    “Because I’m a professional chef…”

    headdesk “

Side note: if you are a data scientist, God bless you, and come use us. You’ll be like Superman without a kryptonite (short time-to-answer cycles and transparent models).

If you’re not a data scientist, welcome back to my fantasy world, where I haven’t used the words “unicorn” or “pixie dust” since my parents left my merciless older sister home alone with me in grade school.

Stuart Frankel, a guest columnist for Harvard Business Review, recently published an article explaining that executives are finally becoming frustrated with their big data investments. Despite pumping large budgets into storage, analysis, reporting and visualization technologies, employees are still burning the midnight oil manually creating reports and interpretations of their data. Stuart notes:

To solve this problem and increase utilization of existing solutions, organizations are now contemplating even further investment, often in the form of $250,000 data scientists (if all of these tools we’ve purchased haven’t completely done the trick, surely this guy will!). However valuable these PhDs are, the organizations that have been lucky enough to secure these resources are realizing the limitations in human-powered data science: it’s simply not a scalable solution. The great irony is of course that we have more data and more ways to access that data than we’ve ever had; yet we know we’re only scratching the surface with these tools.

Instead of throwing in the towel on their big data initiatives, execs are doubling down and going data scientist-hunting. They know there’s gold in the data, but they’re having triple-bypasses trying to find it. They could wait on the sidelines and risk their competition making a breakthrough before them, or they could attempt the annexation of Puerto Rico to win the game. Naturally, as Little Giants fans, many execs are attempting the annexation of Puerto Rico.

So, where does this leave the market? Short.

Data scientists are rare commodities. The U.S. alone faces a shortage of 140,000-190,000 of them. What does that mean? An arms race for talent…all while data generation is increasing exponentially and the tools (R, SAS, Python, etc.) are remaining similar. So while the Googles, the Amazons, the ExxonMobils and the Walmarts may be able to shovel out the dough to attract data science talent, most can’t, even many companies as well-off as the Fortune 500s. And this is not a market that will quickly “clear” to match supply with demand; massive salaries have not been enough to lure more data scientists into training. One of the largest financial institutions in the world, now evaluating Eureqa, said it’s footing the bill to send 30 analysts to receive their Masters in Data Science, a program that won’t return them back to the company for years, because they were pessimistic they could fill the void in the open market. What data scientists do – curate data, ask the right questions, build explanatory analytical models, implement the models into various applications – is simply not scaling at the pace of demand.

data_scientist_qualifications
Oh, is that all?

If we can’t address the problem from the demand side, let’s address it from the supply side. Give people the tools they want, that they need, to be successful information archaeologists and unearth transparent analytical models that communicate the patterns and relationships hidden within the data. Eureqa is automated machinery that does all the heavy lifting. We’ll help an analyst unearth dinosaurs, with museum-ready biographies attached. And with the speed and talents that data scientists already have, we’ll help them fill an entire museum.

Analytical models are the bridge between raw data and meaning. They impose structure on chaos, isolating the important factors driving a system. They explain how something works: how everything is connected to form the whole, how each individual input blends to drive an outcome. The engine of big data, of big insights, fundamentally is the analytical model. But models have shortcomings, too. They typically take weeks to hypothesize and to build. They’re inherently manual, technical. Eureqa is entirely automated, leveraging an evolutionary algorithm that searches an infinite equation space to bring the user the simplest, most accurate models that explain the data’s behavior. Models are typically difficult to interpret. Eureqa has an “explainer” section that translates the models into plain English, and even features an “interactive explainer” module that allows users to toggle changes in variable values to simulate “what-if” scenarios in real time. These are things that data scientists years ago could only dream about.

It may be true that data scientists don’t scale. But answers do. Evaluate Eureqa to see for yourself.

Topics: Eureqa, Scaling data science

Letter from a Grateful Hobbyist Who’s Predicting the Financial Markets with Eureqa

Posted by Jon Millis

22.06.2015 02:00 PM

Nutonian users aren’t just large corporations. They’re also hobbyist data modelers leveraging Eureqa to predict the popularity of songs, analyze home science experiments, and even determine what makes some Porsches faster than others. The letter below was sent to us by a former real estate investor and manager named Bill Russell, who’s been using Eureqa to anticipate relatively short-term movements in stock prices. Hopefully Bill’s note will not only shed light on Eureqa’s potential, but will encourage our non-commercial fans to start thinking how they might apply Eureqa to some cool personal projects outside the office.

 

To Michael Schmidt and the team at Nutonian:

Michael, I want to express my deep appreciation for what you have created and shared. I first started following Eureqa in early 2010 when my mathematician brother alerted me to your double pendulum demo and beta download when you were at Cornell.

By way of background, I’m 70 years old, retired from a career in real estate finance and management. My degree was in economics, but I always loved numbers and the numerical analysis side of that business. My serious hobby over the years has been an attempt to predict short-term moves in the financial markets. I never had an impressive level of success, but always a lot of enjoyment with the puzzle of it all. In retrospect, I am sobered by how much time and the many resources I’ve previously put into this hobby.

My attempts in market prediction began with Fourier analysis (thanks to my brother’s programming and math skills) on an HP-85 desktop computer that had 16kB of ram. Next, things got more serious with the IBM XT, Lotus, and very large worksheets of pre-processed data obtained from Pinnacle Data and TradeStation. Over the years, I went to seminars given by John Hill, Larry Williams, Bill Williams, Tom DiMark and others. I purchased the Market Logic course for Peter Steidlmayer’s market profile approach and the trading course Relevance III from Maynard Holt in Nashville. There were many ideas here and there for indicators and inter-market relationships, but choosing which to use, and how to use them together, was daunting. Eureqa has changed that. Along the way, I used some impressive programs at the time. Brainmaker Professional, a neural network program, took plenty of my time in searching for useable predictions. HNET Professional, a holographic neural network program was fast and impressive. AbTech’s Statnet was excellent as was Neuralware’s Neuralsim. Yet despite the prolonged, multi-year and serious approach, I could never find an integrated, consistent pathway to success.

Because Eureqa incorporates so much analytical power in one place and finds relationships that were simply impossible to find previously, I am encouraged as never before. With the opportunity to utilize Eureqa, so much of my past approach is obsolete and elementary. I have left most of my previous analytical programs behind and many of my technical market books have now been donated to the public library. Of great significance for individual traders is that you have diminished the gap between the professional and nonprofessional in approaching the markets. Each group can utilize Eureqa, and Eureqa is equally powerful for each.

In the past, my best insights into what data might be useful came from hundreds of tedious runs of Pearson correlations and trial-and-error runs in the neural networks. I looked for ways to recast and understand the data in S-Plus and now the R language, but I am not a programmer. Trying to smooth data with splines in R was almost an insurmountable task for me. Eureqa is enabling me now to pursue options that were previously impossible. Here are some of the reasons:

1) Power and Speed: I’m able to pursue so many more alternatives than were previously within my reach. Because Eureqa is so fast, I am now able to compare runs with a) raw data; b) the same data recast to binary form; c) the data uniformly redistributed; d) the data in a de-noised wave-shrunk form. There was simply not enough time to do this before I found Eureqa.

2) Fast Data Processing and Visualization in Eureqa: I had previously done smoothing, normalizing, and rescaling in S-Plus or R. Here Eureqa saves significant time and I have complete confidence that it is being done correctly. I was often uncertain if I was getting it right on my own with the R language.

3) Tighter Selection of Input Variables: I had previously looked for any correlated relationships among a bar’s open, high, low, close, and volume, and relationships with each of those inputs delayed four periods back. I likewise did this for inter-market correlations. There was lots of manual work with Excel. All this has become moot since Eureqa does this in a flash. I have been able to substantially reduce the number input variables.

4) Most importantly, Eureqa is finding predictive relationships that had simply been impossible to find.

Michael, it is a delight to be alive at 70, and see the breathtaking leaps in technology. I programmed a little in college, utilizing punched cards; I bought a cutting-edge four-function electronic calculator before finals in 1971 for $345 (a Sharp EL-8) and thought it was a bargain. And now there is Eureqa…….Wow!!! I can appreciate some of the incredible differences this product will continue to make in so many areas. Thank you so much for what you and your team have created, for sharing it in beta form in the past, and for still keeping it within reach for individuals.

With much appreciation,

Bill Russell

Topics: Big data, Eureqa, Financial Services

Trading Necessitates Speed Along Every Step of the Data Pipeline

Posted by Jon Millis

10.06.2015 01:43 PM

We just returned from Terrapinn’s The Trading Show, a data-driven financial services conference that brings together thought leadership in quant, automated trading, exchange technology, big data and derivatives. With more than 1,000 attendees and 60 exhibitors gathering at the Navy Pier in Chicago, this year’s event was an excellent way not only for us to educate the market about using AI to scale data science initiatives, but for us to learn about the most pressing needs faced by financial services companies.

The first day, Jay Schuren, our Field CTO, presented to an audience of 50 executives. His demo used publicly available data from Yahoo Finance – such as cash flow, valuation metrics and stock prices – to predict which NYSE companies were the most over- and undervalued compared to the rest of the market. To say the least, Jay’s discoveries, as well as the seamless and automated way in which he created his financial models, spurred heavy booth traffic for the rest of our trip.*

Finance is an interesting animal. Many industries have relatively straight-forward applications for machine intelligence. Utilities companies are often interested in daily demand forecasting. Manufacturing companies look to optimize processes and design new materials. Retailers want to determine the best locations to build new stores, while healthcare providers want to preemptively detect and treat diseases. But finance is a bit different.

Let’s take a timely analogy. As I was walking home last Friday, I saw probably half a dozen limos of Boston high-schoolers posing for photos and heading to prom. Most of our customers purchase Eureqa and just can’t help but gush to us how excited they are to go to prom with us. Leading up to the big day, we show off our dance moves (give them a live demo), and take them out for a few dates (send them a free two-week trial), and by the end of our brief tryout, they’re bursting with energy and telling us all about their plans for the big dance with us. Trading firms, on the other hand, are the stunning mystery girls.** They’re smart, they’re confident, and you don’t think they should be shy, but when you ask them to prom, they shrug their shoulders and indifferently and say, yeah, I guess that sounds cool. You raise an eyebrow unsure if you just got a date or got slapped in the face with a frozen ham. But then she sees you drag racing around the neighborhood, and all of a sudden, you’re the biggest heartthrob on the planet. What in the world just happened?

In the trading world, everything is about speed. It’s not only about the speed at which a company can execute a trade (though there were plenty of vendors there offering to shave off fractions of a second to do this), but it’s also about the time it takes for a firm to arrive at an answer about how their market works, whether that’s determining when a currency is undervalued, an asset is likely to significantly appreciate, or a large loan is too risky. Everything in the trading game revolves around timing. And everyone. Loves. Speed. Where Eureqa instantly became interesting to attendees was the automation from raw data to accurate analytical/predictive model, a process that Eureqa consolidates – and accelerates discovery – by orders of magnitude.

A majority of trading technology on display was new hardware and software that incrementally improves time-to-execution. Milliseconds are important, but implementing a trading strategy that no one else has thought of or discovered could be game-changing. Nutonian will never compete with these other products and services directly. But we’re bringing more than one date to prom.

 

* Email us at contact@nutonian.com for a live demo of this particular application. We’d love to share our current use cases in financial services and explore how we might be a fit for others. 

** We’ll ignore the fact that, in reality, it seems like a “trading” prom would be about 95% guys. Woof.

Topics: Big data, Eureqa, Financial Services, Machine Intelligence, The Trading Show

Demystifying Data Science, Part 3: Scaling data science

Posted by Lakshmikant Shrinivas

15.05.2015 09:45 AM

In my last post in this series, I spoke about what goes into a data science workflow. The current state of the art in data science is not ideal; the value of data is limited by our understanding of it, and the current process to go from data to understanding is pretty tedious. The right tools make all the difference. Imagine cutting a tree with an axe instead of a chainsaw. If you were cutting trees for a living, wouldn’t you prefer the chainsaw? Even if you only had to cut trees occasionally, wouldn’t you prefer a chainsaw, because, well, chainsaw! The key here is automation. Ideally you want as much of a process automated as you can, for the sake of productivity.

With data science, the two major bottlenecks are wrangling with data and wrangling with models. Wrangling with data involves gathering and transforming data into a form suitable for modeling. There are several companies that deal with data wrangling – for example, the entire ETL industry. Wrangling with models involves creating and testing hypotheses, building, testing and refining features and models. Eureqa can help with the model wrangling. It is the chainsaw that completely automates the process of creating, testing, refining and sharing models.

As I mentioned in last post, the goals of modeling are pretty simple to express. We want to figure out if a) all terms in our model are important, and b) we’ve missed any term that would improve the accuracy significantly. Eureqa uses evolutionary algorithms to automatically create and test linear as well as non-linear models, sort of like the infinite monkey theorem. Except in our case, with the advances in computation, and of course our secret sauce, the “eventually find models” practically translates to a few minutes or hours – much faster than any human could do it.

If you pause for a moment to think about it, it’s pretty powerful and liberating. As a would-be data scientist, using such a tool frees up your time to focus on the more creative aspects of data science. For example, what other data could we pull in that might affect our problem? What other types of problems could we model with our data? As a non-data scientist, using such a tool lowers the barrier to entry for modeling. Imagine having a personal army of robotic data scientists at your beck and call.

For me, this is one of the most exciting aspects of Nutonian’s technology. While most of the world is still talking about scaling analytics to ever-growing amounts of data, Eureqa can scale analytics to the most precious resource of all: people.

Topics: Big data, Demystifying data science, Scaling data science

In the Winner’s Circle with Nutonian – Kentucky Derby Recap

Posted by Jess Lin

07.05.2015 09:30 AM

The roses have been picked, the hats have been thrown, and just like that, the most exciting two minutes in sports are over. The 141st Kentucky Derby had one of the most talented fields of all time, leading experts to chime in with opinions ranging all across the spectrum of contenders. So with no horse racing experience of our own, a few days for analysis, and our virtual data scientist Eureqa® – how did our predictions do at the 141st Kentucky Derby?

As it turns out, Eureqa did great! Not only did Eureqa correctly predict American Pharoah as the winner, Eureqa also correctly identified three of the top five horses across the wire:

      Predicted Actual
1 American Pharoah American Pharoah
2 Dortmund Firing Line
3 Materiality Dortmund
4 Danzig Moon Frosted
5 Tencendur Danzig Moon

Kentucky Derby 2015

Victor Espinoza rides American Pharoah to victory. THE ASSOCIATED PRESS

With American Pharoah expected to run in the Preakness Stakes on May 16th, the world is ready for the Triple Crown drought to be broken. While American Pharoah will face new horses and challenges on his way there, Eureqa will help us discover the underlying factors that determine whether this audacious colt has what it takes to become only the 12th winner of the Triple Crown.

Want to join us on the ride? Try out Eureqa yourself and let us know what you find!

Topics: Eureqa, Kentucky Derby

Follow Me

Posts by Topic

see all