Blog

Machine Intelligence with Michael Schmidt: Searching data for causation

Posted by Michael Schmidt

27.07.2016 10:03 AM

The holy grail of data analytics is finding “causation” in data: identifying which variables, inputs, and processes are driving the outcome of a problem. The entire field of econometrics, for example, is dedicated to studying and characterizing where causation exists. Actually proving causation, however, is extremely difficult, typically involving carefully controlled experiments. To even get started, analysts need to know which variables are important to include in the evaluation, which need to be controlled for, and which to ignore. From there, they can build a model, design an experiment to test its causal predictions, and iterate until they arrive at a conclusion.

Proving causation relies heavily on these smart assumptions. What if you forgot to control for age, demographics, or socioeconomic conditions? It’s difficult to figure out how to start framing the problem to analyze causal impact. But this is a task that machines were born to solve.

There are two important steps required to identify causation: 1) among many possible variables, finding the few that are actually relevant, and 2) given a limited set of variables, executing the transformations needed to reveal the extent of each variable’s impact.

Determining_causation_with_machine_intelligence.jpg

For the first time, there exists software that helps companies reliably determine causation from raw, seemingly chaotic data.

People often use Eureqa for its ability to start from the ground up and “think like a scientist,” sifting through billions of potential models, structures, and nonlinear operations from scratch to create the ideal analytical model for your unique dataset – without needing to know the important variables or model algorithm ahead of time. Eureqa’s modeling engine effectively generates theories of causation via its processes of building analytical models from a dataset. Eureqa doesn’t attempt to prove causality on its own, but instead yields a very special form of model that can be interpreted physically for causal effects.

One of the biggest open problems in machine learning (and analytics in general) is avoiding spurious correlations and similar non-causal effects. In fact, there’s likely no perfect solution despite the advances we’ve made; ultimately a person needs to interpret the findings and provide context not contained by the data alone. One of the most-used visuals in Eureqa is the covariates window and the ability to block and replace variables from a model – features we’ve added specifically to interact with users to model complex systems.

There is some exciting research taking place, however, connecting Eureqa to live biological experiments to automatically guide experimentation and test predictions. While this research is still on-going, perhaps a physical robot scientist is around the corner.

Topics: Causation, Eureqa, Machine Intelligence

Follow Me

Posts by Topic

see all