Blog

How does Eureqa Compare to Other Machine Learning Methods?

Posted by Hod Lipson

27.08.2013 09:57 AM

Hod Lipson

Hod Lipson

How does Eureqa’s performance, in terms of predictive accuracy and simplicity, compare to other machine learning methods, such as Neural Networks, Support Vector Machines, Decision Trees, and simple Linear Regression?

To answer this question we did a simple comparison. We ran Eureqa on seven test-cases for which data is publically available, and compared performance to four standard machine learning methods. The implementations used were the WEKA codes, with settings optimized for best performance:

  1. Linear Regression: Fit a linear equation of the form y=a1x1+ a2x2+ a3x3…  using least squares method. This approach is the traditional regression method used in many statistical regression software packages
  2. Decision trees (DT): This process tries to find multiple linear regression models, each for a different part of the dataset. The dataset is portioned using conditions on the input variables.
  3. Neural Networks (NN): A classic multi-layer perceptron network attempts to learn to predict the output from the input using back-propagation learning method. Early-stopping using validation set is used, with a single hidden layer whose size is optimized automatically.
  4. Support vector machines (SVM): Model the data as a combination of a few, selected data points (called support vectors).

We ran the tests on five datasets, obtained from the UCI Dataset repository. They included the Auto MPG Benchmark, the Challenger O-Ring Benchmark, the Concrete Compressive Strength Benchmark, the Solar Flare Benchmark, and the Coil 2000 Benchmark.

resultsEach algorithm produced a result in a different format: Linear regression produced a hyperplane, while a neural network produced a connectivity weight matrix and Eureqa produced an analytical expression. One example result can be seen to the right. It is clear that some solutions are more complex than others. The more complex solutions involve more free parameters, or just take more ink to write down. Some solutions were more accurate than others: They produced less error when tested in a separate test dataset. Of course, we’d like to have a machine learning algorithm that produces models that are both accurate and simple, but that isn’t always the case.

We plotted the average performance of all five algorithms at a location corresponding to the average complexity and accuracy of the models they produced. In a complexity versus accuracy chart, we can see several regions. The top left region is where we would see algorithms that produced models that are fairly accurate, but have many free parameters, The bottom right region is where we would see algorithms that produce very simple solutions, even if they are somewhat less accurate. The top right region of the chart is the worst region to be in, where models are both complicated and not so accurate. And the bottom left region is where we find algorithms that produce models that are at the same time both simple and accurate.

comparison resized 600

It appears that Eureqa’s use of symbolic regression produces models that are both more accurate and simpler than other machine learning methods, but what’s the catch?

There is no free lunch. Symbolic regression is substantially more computationally intensive when compared to neural networks, SVMs and Linear regression.  Luckily, however, while accuracy and simplicity are priceless, computational power can be bought on-demand with platforms like Amazon EC2.

Topics: hod lipson, symbolic regression

What is Symbolic Regression (and how does Eureqa use it)?

Posted by Hod Lipson

18.07.2013 04:00 PM

Hod Lipson

Hod Lipson

You may be familiar with the term “regression” – the ability of a computer to fit a mathematical equation to data. There are many types of regression techniques and tools out there. The most common method is called “linear regression”, where a computer fits a straight line (or a flat plane) to data. This works well if your data generally follows a straight trend and you want to know what the slope is. Another method is nonlinear regression, where a computer fits the coefficient of some arbitrary mathematical equation that you provide. This is good when you already know how your data behaves qualitatively, and all you want is just to get quantitative predictions. But what if your data does not seem to be following a linear trend, but you do not know what that trend is – even qualitatively?

Eureqa uses a new technique, called Symbolic Regression. Symbolic Regression does not assume a linear trend, nor does it require you to provide a model. Instead, symbolic regression searches for the best model for your data, including linear and nonlinear models. Since some models might be simple but inaccurate, and other models may be very accurate but complex, symbolic regression does not try to give you just a single answer – it gives you a handful of possible models that you can choose from. You can use the model to make predictions, to gain insight, and to find optimal points.

Symbolic Regression Verus Linear RegressionHow does symbolic regression works? We start with a bunch of simple, linear models. If these models fit perfectly, that’s great. If they don’t, we produce small variations to these models, and try again. These variations can include changing the form of the models adding, removing, and changing mathematical terms. We then keep testing – at a rate of 10 million equations per second – until we gradually converge. In test cases, we watched this simple algorithm find models that have taken human experts decades to discover.

Try it on your own data >>

Topics: hod lipson, linear regression, symbolic regression

Follow Me

Posts by Topic

see all