﻿

# Blog

Eureqa can automatically estimate numerical derivatives in order to model the rates of change of variables in your data. Often derivatives are more natural and simpler for modeling certain types of phenomena, particularly in physics. This post discusses the basics of entering derivatives into the Eureqa search relationship.

The Derivative Operator:

Eureqa provides the derivative operator D(x, y, n) where x and y are any arbitrary expressions and n is an integer representing the order of the derivative to take. This operator can be used in the Search Relationship setting. For example, consider the search relationship:

D(x,t,1) = f(x,t)

This relation tells Eureqa to find a function of x and t that models the first derivative (e.g. a velocity or slope) of x with respect to t. Short-hand for the first derivative is D(x,t). The derivative operator can also appear inside the formula as an input variable, for example:

D(x,t,2) = f( x, D(x,t) )

This relation tells Eureqa to find a model of the second derivative (e.g. an acceleration or curvature) of x with respect to t, as a function of x and the first derivative of x. In Eureqa, this relation will appear as:

Eureqa displays the derivatives in mathematical format after the relationship text is entered.

Alternatively, you could estimate the numerical derivatives ahead of time using another program, and enter these values as a new variable in the data set rather than using Eureqa’s derivative operator.
Starting the Search:

Eureqa will calculate the numerical derivatives that appear in your search relation when you start the search. The following screen will appear after you click start:

Eureqa estimates the numerical derivative using a spline fit to the data. This allows more accurate derivative estimates than other methods in case the data contains noise.

Advanced:

Estimating numerical derivatives accurately is a challenging task when the data is sparse or contains noise. Eureqa’s derivative estimation is an improvement over the most basic methods like Newton’s difference quotient. However, it does not work well in all cases.

One particular problem with spline curves is their accuracy at the head and tail of the data – these points are “surrounded” by fewer data points and thus have higher estimation error. If you can, you might want to ignore these points entirely using a weight variable. Simply add a new column to your data, and set the weight to 1 for all data points but near zero for the first and last 5 to 10 data points.

It may also be worth the effort to estimate the numerical derivatives outside of Eureqa using more specialized tools. For example, you may want to compute the derivative values in R or using Matlab’s spline toolbox, and then paste these into Eureqa as a new column variable.

While normalizing your data variables (rescaling the numeric values) is completely optional, it can greatly improve the performance of Eureqa, and numerical stability of solutions. This post discusses when and how to normalize variables in your data.

When to Normalize:Eureqa works best when all variables in your data have small to medium magnitudes, on the order of 1 to 100. For example, if you have any variable that ranges over a million, it would be best to rescale the values to larger units.

Additionally, the magnitudes of the variable should be similar to the mean or offset of the variable. For example, if you have a variable that only varies between 100.0 and 100.5, it would be best to subtract off 100 so that it ranges between 0 and 0.5.

For example, consider the following two variables in some data set:

Notice that both variables look rather flat. You can’t see any interesting variation because the variable a has such a large offset. Do variables in your data look like this? Let’s try subtracting off an offset of 10,000 from a:

Now, we can see some interesting variation in the variable a, but the variable b still looks flat because the variable still has a large magnitude relative to b. Next, let’s try dividing the values of a by 50:

Now we can see the interesting variation in both variables, as they now have the same relative scale and magnitudes. This is ideally how we want our data to look before entering it into Eureqa. When the variables are reasonably scaled, Eureqa is most likely to utilize their variation to build accurate solutions.

How to Normalize a Variable:

First, consider changing the units of the data you enter into Eureqa. Could you measure values in meters instead of centimeters? Could you measure currency in millions-of-dollars instead of dollars? Pick units such that the numeric values have a range of approximately 1 to 100.

Second, consider measuring values from an offset. Could you measure time since the time of your first data point, instead of since the beginning of the year or century?

Third, check over your data; look for outliers. Are there any values that are drastically out of proportion with the rest of the values? If so, consider removing this entire row in your data set or giving it a very low weight.

The general formula for normalizing a variable y is:

y_normalized = (y – offset)/scale

where offset and scale are the normalization parameters. It’s recommended that you pick offset and scale manually, so that the numeric values still have an intuitive meaning. However, if you truly don’t care what the numeric values mean, a common approach is to set offset equal to the mean of the variable and scale to the standard deviation of the variable.

It’s also recommended that you apply normalization before entering your data into Eureqa. However, you can specify the normalization in the Eureqa Search Relation. For example, consider the search relation:

y = f( x/1000 )

This tells Eureqa to find a model of y as a function of values of x that are divided by 1000.

Automatic Normalization Checks:

By default, Eureqa will check your data for extreme cases of that variables that need to be normalized. When entering or modifying values in the Eureqa data view, you may encounter a message like this:

Here, Eureqa is telling you that the variable y has a large offset. It has a mean value of about 1000, but it only varies by +/- 1.38. Eureqa suggests subtracting 996 from each y value in your data set, but leaving the scale unchanged.

You can also modify this and specify what values to apply. Pick a scale and offset that makes sense and preserves meaning.

See Also:

see all