The Ladder of Causality

Posted: Aug 31, 2019

◷ 6 minute read

The recent launch of ClearBrain caught my eye with their product Causal Analytics. It is rare for a data analytics company to claim the ability to produce causal insights, and in this case, they are not merely claiming it, but rather it’s their raison d’être. Since extraordinary claims require extraordinary evidence, and after not being very satisfied with their technology white paper, I went to the source of their knowledge. Turns out the ClearBrain’s framework is in large part based on the Turing Award winning computer scientist, Judea Pearl’s causal theory. So I picked up his latest book, The Book of Why.

The Ladder of Causality is a central concept in the book, and I want to examine it here. Pearl claims that there are 3 rungs on the Ladder of Causality, each qualitatively different from the previous, and each bringing us closer to the Holy Grail of data analysis, causality. They are:

Association
Intervention
Counterfactual

The first rung of the Ladder, association, is the realm of most ordinary statistics. It is about finding patterns in the data we have, fitting lines to points so to speak. We can learn about which things are more closely related to one another and which are not. The questions that can be answered at this level are correlational only, such as “what’s the probability of a customer buying floss if he has purchased toothpaste?” Almost all current machine learning techniques, including deep learning, sit here, as they are no more sophisticated than simple “line fitting”, albeit in (much) higher dimensions. This level can be characterized as “the seen world”.

At the second rung of the Ladder we have intervention, which is already ascending into the realm of causality. This level moves past the patterns in the existing data, and asks how the dynamic world would change if we intervene and made some modifications. Here we seek to answer questions like “what would happen to our sales if we doubled the price of toothpaste”. Of course, the answers to these questions are not completely in the data from rung 1, which is why we will need new types of data as well as models to frame the intervention at this level. Note that regardless of our ability to predict, experiments can always give us the answer here, since we can simply perform the intervention and see what happens. This can be characterized as “the new world that is seeable”.

Then we get to the third and top rung of the Ladder, the counterfactual. We begin to ask how the world would have been had something in the past been different (i.e. intervention in the past), the “what if” questions. “What’s the probability that a customer who bought our toothpaste would still have bought it had we doubled the price?” Unlike in the second rung, we cannot get definitive answers by directly experimenting, since we cannot go back in time and make the modifications that we want. The toughest causal questions are examined here at the top level, requiring strong frameworks and formulations to answer. This is essentially “the world that cannot be seen”.

So what do we need to climb this Ladder of Causality? A key insight that Pearl brings up is that to get into the realm of the causal (rungs 2 and 3 of the Ladder), we need more than just the plain data, we need frameworks and models on how the world works. This makes many, especially the more “purist” statisticians, uncomfortable, because it means introducing elements of subjectivity (i.e. the models, how we think the world works) into the otherwise completely objective sea of data. Pearl actually goes as far as to attribute this mental discomfort as one of the reasons that the early pioneers of the field of statistics, such as Galton and Pearson, did not push into causal theory, despite coming close to it.

But why do we need subjective models of the world to answer causal questions? I have to admit that there is even a purist part of me that finds this irksome.

First, we need to recognize that mere association between variables does not encapsulate the concept of causality. Degrees of association is more formally quantified as probability, while causality, on the other hand, can be thought of as “probability raising”. The statement “A causes B” is saying that A is actively increasing the probability of B. Causality tells us whether and how probabilities change when the world changes. This by nature is a higher order concept than probability itself and thus outside of the world of mere associations.

We can also look to the natural sciences for evidence. Fields like physics and chemistry makes some of the strongest statements (i.e. the Laws) about causality that we are confident about, and despite the abundance of data, they still must rely on models to operate. For example, how do we know that the thermometer gauge rising does not cause the temperature to go up? This is almost impossible to answer from observational data alone, because all the data will tell us is that the two are (almost) perfectly correlated¹. So we need to make some assumptions, come up with a model of how temperature and fluid volume interact, then experiment (commonly via RCTs, randomized controlled trials) to confirm or invalidate our model.

It seems then, that while associative statistical analysis can occur purely in the data, without much regard to the domain - after all, a correlation is a correlation regardless of whether the variables are toothbrush sales or nuclear forces - the same is not true for causal analysis. To climb the Ladder of Causality, we need at least some domain knowledge in order to frame a model, then see if the data agrees with it.

Coming back to what dragged me into this world of causal analysis in the first place, the ClearBrain Causal Analytics technology. Their white paper recognizes two key problems that they needed to solve: automating the selection of confounding variables and building a system that could perform the analysis fast. The latter is mostly an architecture and engineering problem, and less relevant for the topic of causality. But the former is quite on point, and is essentially trying to automate the model framing part of causal analysis. They say that traditionally “the statistician is able to use her domain knowledge to select the appropriate confounding variables for the observational study she is creating”, but don’t really provide much detail on how they are able to automate this seemingly advanced usage of domain knowledge. Perhaps this is part of their secret sauce, and why they had to file a patent before launch. Regardless, it seems that at least the problem formulation of their solution is backed up by the theory, now I’m just curious how they actually solved it and how effective it is.

P.S. Here I have mostly talked about what we need to climb the Ladder of Causality, and not about the how. There are several techniques to perform causal analysis, and the one that Pearl explains in great detail in his books is the one that he pioneered, through the use of causal diagrams, which are special forms of Bayesian belief networks. I might write about them in more detail some other time. A relevant point to note here is that these causal diagrams are based on our models of how things work, and so can be drawn differently depending on the model we choose.

One of the misconceptions that some earlier statisticians and mathematicians had was that causation is just perfect correlation. This is of course not true. ↩︎

James's Blog

Sharing random thoughts, stories and ideas.

The Ladder of Causality