Often you might want to specify that the output of a model should fall within a certain range rather than an exact numerical value. This post shows one way to do this with Eureqa. The goal it so find the simplest equations who's outputs always lie between some min/max value for each data point.
Enter Min and Max Values for each Data Point:
For each data points that you only have a range of output values (the min and max values), you simply need to add two rows for that data point, one with minimum value and one with the maximum value (keeping all other variables in the row the same).
Next, set the fitness metric to the "Mean Absolute Error" option.
Start the Eureqa search as usual. Solutions that fall between the min and max values will have identical absolute error.
If a model output lies between the min and max values, the absolute error happens to be indifferent (mathematically) to where exactly this value lies. If the value moves closer to the max value, the error on the max value data point decreases linearly, but the error on the min value data point increases linearly also.
In Eureqa, your data view should look similar to:
Where each input x is repeated twice, once with the minimum y value and again with maximum y value.
We can then start the search using the Mean Absolute Error fitness metric, and get various solutions that fall into the min/max ranges:
These solutions may have slightly different fitness values because some min/max data point pairs might get separated between the train and validation data sets. One way to avoid this is to change the train and validation sets to use all data or not shuffle their points in the Advanced Genetic Program Settings menu.
Using Separate Min and Max Values in a Custom Error Metric
Another option is to specify a custom error metric in the Search Relationship, this allows you to enter your min and max range values in different columns. For example, consider the following search relationship:
abs(y_min - f(x)) + abs(y_max - f(x)) = 0
where x is the input, and y_min and y_max are two different variables representing the min and max values of the range of outputs for each input x. The custom error in this relation is equivalent to the previous method.