In my previous post, we walked through the process of using Eureqa to predict the next price a bond will trade at. Starting with a massive spreadsheet with >760,000 rows and 61 columns, we were able to generate 11 equations to describe the data in 20 minutes. While I focused on just one of the equations, there is still more we can learn from Eureqa.
So let’s review the last page we looked at – the View Results tab of Eureqa:
I walked you through my thought process of how I chose a single equation out of the 11 that Eureqa generated. This equation had a size of 14, with only 4 parameters and 3 terms. Of all the equations, it seemed to best balance both accuracy and complexity, being able to predict the next price a bond will trade at within $0.55 based on only 3 variables. However, there’s far more information here in this tab – what else can we learn?
First, let’s talk a little more about the equation we chose:
trade_price = 0.6964*trade_price_last1 + 0.3026*curve_based_price + 0.1059/(trade_type – 2.759)
When you click on that specific solution, you will see details about that solution directly below. Eureqa provides details on 8 different error metrics for each solution, ranging from Mean Absolute Error to Hybrid Correlation/Error. I used MAE to judge accuracy in this case, but different data sets may require different error metrics.
While I didn’t touch upon R^2 Goodness of Fit in the previous tutorial, it can provide a meaningful way to evaluate your overall search. What this metric helps you understand is how much of the variance in your target variable is captured by each of your solutions. In this case, the R^2 value is telling us that our solution captures 98.9% of the variance in the predicted trade price. With this equation under our belt, let’s dig a little deeper.
For more details on all 8 error metrics, please see our tutorial on error metrics.
Even though we chose this specific equation as the best for now, what can the other equations tell us about this data? There are three different ways of ranking solutions – by size only, by fit only, or by a combination of size and fit. The third is what Eureqa defaults to, but you can still find valuable data by ranking by the other two methods.
Specifically, let’s look at what happens when you rank by size, looking at the simplest solutions first. By doing this, you can see which single variable Eureqa believes to be the most crucial to understanding the target variable. Then going through each successively more complicated solution, you can see which other variables begin appearing in what order. The simplest solution here is just:
trade_price = trade_price_last1.
When you look at the R^2 value for this solution, it actually shows us that this one variable captures 98.4% of the variance of the target variable. What does this mean for us? While we can (and did) find more sophisticated models that get us closer to modeling the future trade price, the last price that the bond traded at is by far the best indicator of the future price.
Finally, let’s focus on this trade_price_last1 variable. As we just discovered, it captures 98% of the variance in our target variable – trade_price. It could be interesting to look at what drives differences between the two variables – and Eureqa lets us do that extremely easily. All we need to do is set up a new search, and modify the target expression to find the difference between trade_price and trade_price_last1, as modeled by the rest of the variables:
trade_price – trade_price_last1 = f(weight, current_coupon, time_to_maturity, …, curve_based_price_last10)
After running this for almost 7 hours on 72 cores, the most accurate solution I could generate was:
trade_price – trade_price_last1 = (trade_type_last3 + 1.342*time_to_maturity)/(2.819*curve_based_price_last1 – curve_based_price*trade_type_last1) + (trade_type_last3 + 1.342*time_to_maturity)/(trade_type*curve_based_price – 2.819*curve_based_price)
As you can see from the Pareto front display, solutions with much more complexity are being introduced. Keeping in mind that the average difference between trade price and the last price is actually 0.607, our most accurate equation here has a 0.52 MAE. While this solution is the most accurate, you can choose for yourself which solution has a better balance of accuracy and complexity, such as the one with equation size of 13, using only 2 parameters. Additionally, doing more pre-processing on the dataset or choosing different building blocks will lead you to improved searches.
Last week, it was all about showing you how easy and intuitive it is to use Eureqa to quickly come up with incredibly accurate results. Today, I hope I was able to show you some of the hidden power behind Eureqa that allows you to accomplish far more.
Of course, this is still only touching the tip of the iceberg of Eureqa’s abilities. Using the fxp file I posted last time, go ahead and try yourself! If you run into any questions, check out our user guides and tutorials, or come visit our forums and see what questions others have asked!