Virtually everyone will have heard the adage ‘correlation does not imply causation’ – but whilst it may be one of the most anecdotally used phrases in analytics, it may also be one of the most vacuous, because the very same people who say it also fall foul of it time and time again. This is a guide on how not to do that
What does it mean?
To put the whole thing into more extreme, but perhaps more understandable terms: there is no such thing as proven causation in digital analysis. No matter how detailed your analysis is, you are never at liberty to say categorically that X happened because of Y. Controlled experimentation gets you far closer to being able to say this, but even then it is not categorical fact, only probability. So, even more simply: never, ever say that something definitely happened because of something else.
Here are a few classic examples:
- Visitors who used site search convert at a higher rate than those who don’t, therefore having site search causes higher conversion. NO. How do you know that visitors who would have converted higher than others anyway are not simply more likely to use site search? This is hugely plausible, because if someone knows what they want they will want to get to it faster. Split testing removal of the site search function will get you to a closer probability of its cause on conversion.
- Mobile visitors convert at a lower rate than desktop users, therefore there is something wrong with the mobile experience (it is causing lower conversion). NO. Mobile visitors might literally be crossing a road whilst trying to buy from you; they are a different type of visitor altogether. There is no way to prove this because you cannot create a controlled situation in which the same person could be alternately served desktop vs. mobile.
- Visitors who log in are more valuable to us, therefore logging in causes more loyal behaviour and we need to get more people to log in. NO. Customers who log in could simply be more loyal to you in the first place, which causes them to log in.
- The conversion rate for visitors who land on the homepage is worse than for visitors who land on a product category page, therefore the homepage is not as effective at driving/causing conversion. NO. Visitors who land on the home page have less pre-existing intent before landing, because they have likely not searched for something specific.
The point is that correlations might be causally connected in the way you think they are, but they are more likely to be causally connected to something you don’t have any data on and therefore can’t see.
How to avoid falling foul of this:
To avoid doing this, follow these simple principles:
- Hypotheses for experimentation are the only valid output of data analytics. The only categorical fact in data analysis is the data itself, not what you think it is telling you. ‘Use of site-search is correlated to higher conversion’ is a categorical fact of the data – what you think that means about customer behaviour is not.
- Verbal communication is a fundamental part of data analysis – data, in and of itself, is not understandable by humans. We make sense of that data by explaining it in real language. The semantics and sentence structures used to form judgements based on data are as important as the data itself. Focus on this.
- A/B testing is about high confidence in probability – Most people respond to the inherent lack of causal fact in data by saying that AB testing is the only way to prove the cause. Never speak of AB testing as categorical proof – it is only a statistical representation of probability.
Enjoy the freedom of creative hypothesising:
The real tragedy of thinking that data analysis can prove causal fact is not that it leads to incorrect assumptions (although this is what happens), it’s that it takes the fun out of hypothesis generation. If you believe that you will find behavioural causal fact in data, you are not using the creative part of your mind.
Data is only data – what that is telling you and what it means is the subjective story that you create for yourself about why that data is the way it is. Purist scientists often have a hard time accepting this but it is the truth. When you also consider that all you are doing is generating hypotheses for experimentation, then you can embrace that creativity.