Hod Lipson

How does Eureqa’s performance, in terms of predictive accuracy and simplicity, compare to other machine learning methods, such as Neural Networks, Support Vector Machines, Decision Trees, and simple Linear Regression?

To answer this question we did a simple comparison. We ran Eureqa on seven test-cases for which data is publically available, and compared performance to four standard machine learning methods. The implementations used were the WEKA codes, with settings optimized for best performance:

**Linear Regression**: Fit a linear equation of the form y=a_{1}x_{1}+ a_{2}x_{2}+ a_{3}x_{3}… using least squares method. This approach is the traditional regression method used in many statistical regression software packages**Decision trees (DT)**: This process tries to find multiple linear regression models, each for a different part of the dataset. The dataset is portioned using conditions on the input variables.**Neural Networks (NN)**: A classic multi-layer perceptron network attempts to learn to predict the output from the input using back-propagation learning method. Early-stopping using validation set is used, with a single hidden layer whose size is optimized automatically.**Support vector machines (SVM)**: Model the data as a combination of a few, selected data points (called support vectors).

We ran the tests on five datasets, obtained from the UCI Dataset repository. They included the Auto MPG Benchmark, the Challenger O-Ring Benchmark, the Concrete Compressive Strength Benchmark, the Solar Flare Benchmark, and the Coil 2000 Benchmark.

Each algorithm produced a result in a different format: Linear regression produced a hyperplane, while a neural network produced a connectivity weight matrix and Eureqa produced an analytical expression. One example result can be seen to the right. It is clear that some solutions are more complex than others. The more complex solutions involve more free parameters, or just take more ink to write down. Some solutions were more accurate than others: They produced less error when tested in a separate test dataset. Of course, we’d like to have a machine learning algorithm that produces models that are both accurate and simple, but that isn’t always the case.

We plotted the average performance of all five algorithms at a location corresponding to the average complexity and accuracy of the models they produced. In a complexity versus accuracy chart, we can see several regions. The top left region is where we would see algorithms that produced models that are fairly accurate, but have many free parameters, The bottom right region is where we would see algorithms that produce very simple solutions, even if they are somewhat less accurate. The top right region of the chart is the worst region to be in, where models are both complicated and not so accurate. And the bottom left region is where we find algorithms that produce models that are at the same time both simple and accurate.

It appears that Eureqa’s use of symbolic regression produces models that are both more accurate and simpler than other machine learning methods, but what’s the catch?

There is no free lunch. Symbolic regression is substantially more computationally intensive when compared to neural networks, SVMs and Linear regression. Luckily, however, while accuracy and simplicity are priceless, computational power can be bought on-demand with platforms like Amazon EC2.

Topics: hod lipson, symbolic regression

Great comparison! And quite believable. My issue with Eureqa after several passes at it was that the resulting equation wasn’t generalizable to anything other than the data used in model calibration. In other words I couldn’t simply take the function it produced and validate the results. That’s a big problem.

Thanks for your comment. You can always “copy and paste” the equation to try it on new data, or use the “Evaluate and predict” tool on the “Report/Analyze” pane to run it on new data.

Note that Eureqa automatically validates the results on a portion of your data which it does not use for training. So when Eureqa reports an accuracy metric, it was measured on the validation set.

Very important to know that it’s self validating from data held from the original training data. Can you say what %?

The visual dynamic error rate and color coded best fit are cool. Very intuitive!!

Hi Jim,

By default, the split between validation and training data is dependent on the amount of input data entered.

Important to note: you can adjust the split from within Eureqa.

-Matt

Here is actually a separate blog post which details data-splitting in Eureqa:

http://blog.nutonian.com/bid/312708/Setting-and-using-validation-data-with-Eureqa

Unless you are using Best Subsets Regression, this doesn’t seem to be a fair comparison for Linear Regression. It is comparing a single model versus the choice among possible models available to other methods.

I agree that it is a useful benchmark, but in practice you would not stop looking for a better model after a single linear regression.

Dear Hod,

Is there any scientific publication with your performance results of Eureqa?

Thank you!

The linear regression example is very badly misrepresented. If one were to use an adequate set of primitive functions for an OLS, and then used a standardized dimension reduction such as AIC/BIC based stepwise regression, then the solution found would perform much better than Eureqa and much faster. Where Eureqa is most valuable is where the solution space is not C1 to C3, or in some sense discontinuous. Neural Nets are much more powerful and susceptible to misuse. Eureqa is golden with respect to being able to interpret the results by inspection and perform analysis like linear regression. Neural nets with more than one hidden layer are a nightmare to completely impenetrable with respect to interpretation.