﻿

# Blog

Hod Lipson

You may be familiar with the term “regression” – the ability of a computer to fit a mathematical equation to data. There are many types of regression techniques and tools out there. The most common method is called “linear regression”, where a computer fits a straight line (or a flat plane) to data. This works well if your data generally follows a straight trend and you want to know what the slope is. Another method is nonlinear regression, where a computer fits the coefficient of some arbitrary mathematical equation that you provide. This is good when you already know how your data behaves qualitatively, and all you want is just to get quantitative predictions. But what if your data does not seem to be following a linear trend, but you do not know what that trend is – even qualitatively?

Eureqa uses a new technique, called Symbolic Regression. Symbolic Regression does not assume a linear trend, nor does it require you to provide a model. Instead, symbolic regression searches for the best model for your data, including linear and nonlinear models. Since some models might be simple but inaccurate, and other models may be very accurate but complex, symbolic regression does not try to give you just a single answer – it gives you a handful of possible models that you can choose from. You can use the model to make predictions, to gain insight, and to find optimal points.

How does symbolic regression works? We start with a bunch of simple, linear models. If these models fit perfectly, that’s great. If they don’t, we produce small variations to these models, and try again. These variations can include changing the form of the models adding, removing, and changing mathematical terms. We then keep testing – at a rate of 10 million equations per second – until we gradually converge. In test cases, we watched this simple algorithm find models that have taken human experts decades to discover.