ξ correlation: A new tool for your causality checks

correlation
xi correlation
causality
Author

Michael Green

Published

September 20, 2024

Modified

April 22, 2024

Introduction

There’s a new candidate for checking for relationships between two variables out on the streets. It’s been there since 2020, but I actually just heard about it now. It’s marketed as a drop in replacement for the Pearson correlation coefficient (Pearson 1895). The name is ξ correlation and was introduced in Chatterjee (2021). In this post I’ll run through the strenghts and weaknesses of this new correlation coefficient from my point of view. Finally, as the title suggests, I’ll spend some time looking into how ξ correlation can be used for causal discovery. Though I will not touch upon it in this post you can look at Additive Noise Models (Kap, Aleksandrova, and Engel 2021) which is the standard workhorse in the field of causal discovery for more context.

This is also a follow up on Sumner (2024) introduction to the new metric. I recommend reading it before reading mine as he goes into more depth and intuition behind the math of the ξ correlation.

Some background

The Pearson correlation coefficient has been and continues to be a versetile tool in every data scientists analytical toolbox. It is primarily used to indicate relationships between variables of interest. Its strength lies in its simple equation and its cheap computational cost.

\[\text{COR}(x, y)=\frac{\mathbb{E}\left[(x-\mu_x)(y-\mu_y)\right]}{\sigma_x \sigma_y}\]

In this equation \(\mathbb{E}\) refers to the expectation operator, σ is the standard deviation of the respective variable and μ is the average of each variable.

To visualize this let’s say you have two vectors \(x\) and \(y\), representing two samples from variable x and y.

(X = [-1.0, 1.0, 0.5], Y = [1.0, -1.0, 0.9])

In Figure 1 you can see a scatterplot of x vs y. Since we only have three data points in each variable it won’t be a very impressive scatterplot, but you’ll see in a minute why I chose only three.

Figure 1: Visualization of interpretation of correlation as vector similarity.

In the illustration above you can see three plots. The first plot is a scatterplot of y against x. The second plot is a vector space plot where Dimension 1 corresponds to the first element of x and y respectively while dimension 2 refers to the second element. A good way to think of this would be that each new sample from the variable is a new dimension. So three samples, as we have here, equals three dimensions. The last plot is also a vector space but where we have subtracted the average of x and y respecively.

Now what is the correlation between x and y? Let’s apply the math from above and do the calculation. We end up with -0.72 which shows a negative correlation. Ok, but what does this have to do with the dimensions you say? Well it turns out that there’s an inherent link between the correlation coefficient and the angle between the vectors.

The correlation coefficient is equivalent to the dot product of the centralized vector x and y divided by their norms.

We can express this mathematically as

\[\text{cor}(\mathbf{x},\mathbf{y}) = \frac{\mathbf{x}_c \cdot \mathbf{y}_c}{|\mathbf{x}_c ||\mathbf{y}_c |} = \cos \theta\]

where \(\mathbf{x}_c = \mathbf{x} - \mu_X\) and \(\theta\) is the angle between the vectors \(\mathbf{x}\) and \(\mathbf{y}\). This gives us a nice geometrical interpretation of the correlation coefficient, which I personally find extremely satisfying and simple to think about. It also makes intuitive sense; Imagine that the two vectors were pointing in the same direction such that the angle \(\theta\) is close to 0. This would mean that \(\cos\theta\approx\cos0=1\) which is also what the correlation coefficient would be since the only way the vectors can point in the same direction is if the numbers are moving in similar ways. This is irrespective of any offset between the numbers.

ξ correlation

Needless to say, correlation is extremely powerful in data analysis since we can readily asses the linear relationship between any two variables. I emphasize linear because any relationship between the variables consisting of a non-linear function will be poorly represented by the correlation coefficient.

Figure 2: Different relationships between variables and where the correlation coefficient is approximately 0.

If you take a look at Figure 2 you will quickly agree with me that these variables x and y in the scatterplot have a relationship. In fact they are all simple mathematical relationships. But they all have an effective correlation coefficient of 0. This means that scanning for relationships between variables with correlation would miss a lot of non-linear relationships. Thus, for all the wonderful abilities we get out of the correlation coefficient it still fails to capture relationships as simple and common as \(y=sin(x)\) which is of course an issue when trying to discover unknown relationships. This is one of the reasons for a lot of research being conducted in the topic of causal discovery and independence tests between random variables. Josse and Holmes (2016) provides a recent and thourough overview of the later attempts.

Chatterjee (2021) introduced ξₙ a new rank based correlation test which has nice properties if the samples of the two continuous variables \(X\), and \(Y\) are i.i.d.

\[\xi_n(X,Y)=1-\frac{3\sum_{i=1}^{n-1}|R_{i+1}-R_i|}{n^2-1}\]

where \(R_i=\sum_{k=1}^{n}\delta_{Y_{(k)} \leq Y_{(i)}}\).

So it would be natural to ask; If all the plots in Figure 2 have a correlation of 0. Then what would the ξ correlation yield? In this post I follow the implementation of Taelman (2023). The function looks like this in julia code.

using Distributions, StatsBase

function xicor(X::AbstractVector, Y::AbstractVector; rank=denserank, noties=false)
    n = length(Y)
    xsortperm = sortperm(X)
    r = rank(xsortperm, by=i->Y[i])
    if !noties
        l = rank(xsortperm, by=i->Y[i], rev=true)
    end
    ξn = 0.0
    for i in 1:n-1
        ξn += abs(r[i+1] - r[i])
    end
    ξn *= noties ? 3 / (n^2 - 1) : n / 2sum(li->li*(n-li), l)
    ξn = 1 - ξn
    return ξn
end
xicor (generic function with 1 method)

If you’re not that into julia you can easily port this code to R, python or whatever language you prefer. Back to our question. What is the ξ correlation compared to the person correlation using our examples in Figure 2? In Table 1 we show the ξ, Kendall, Spearman and Pearson correlation coefficient for the three scenarios. It’s immediately clear that the ξ correlation is much more informative, and correct, regarding the relationship between x and y in all examples except the Unit Circle and the Four Gaussians. It’s also strictly different from 0 in the Unit circle example but this has a problem that we will get back to a little later. For now let’s focus on the expected results.

Table 1: Correlation and ξ correlation comparison for the three examples. Since ξₙ is not symmetric we denote ξₙ(X,Y) as ξₙxy while ξₙ(Y,X) is denoted ξₙyx. Pearson, Spearman and Kendall correlation coefficient is denoted by r, ρ and τ respectively.
Data ξₙxy ξₙyx τ r ρ
Unit Circle -0.5 -0.5 -0.02 -0.0 -0.02
Sine 0.8 -0.0 -0.01 0.0 0.0
4 Gaussians -0.08 -0.05 -0.02 -0.02 -0.03
A Smile 0.97 0.3 -0.07 -0.12 -0.1
Noisy smile 0.45 0.18 -0.08 -0.15 -0.13
3 Gaussians 0.42 0.19 -0.04 -0.02 -0.07

The Sine example has a very strong ξ coefficient of \(0.8\) which is close to \(1.0\) which is the maximum value the metric can take. As opposed to Pearson correlation, ξ correlation is supposed to be between 0 and 1 (Chatterjee 2021), where 0 indicates no relationship and 1 indicates a strong relationship. Thus, the Sine example would be classified as a strong relationship using ξ correlation while the normal pearson correlation would find no relationship at all. This opens up a larger point about ξ correlation not having a notion of “direction”, i.e., it just quantifies the strength of the relationship. But, it’s also not a symmetric function, which means that \(\xi_n(X,Y)\neq\xi_n(Y,X)\). So in order to find the “direction” you would have to compute both and the largest value would show you the direction of the relationship. This can be seen in Table 1 where the Sine true relationship is \(Y=\sin(X)\) which you can view as \(X\) causes \(Y\). Since \(\xi_n(X,Y)>\xi_n(Y,X)\) ξ correlation does a perfect job not only identifying that there is indeed a relationship but also the causal direction. Thus, in order to detect a relationship between two variables you need to calculate ξₙ two times.

Looking at the “A Smile” example the true relationship is \(Y = X^2\), meaning \(Y\) is a function of \(X\). Again the ξ correlation correctly identifies this direction. The same goes for the “Noisy smile” example.

Focusing on the 3 Gaussians we can see that the relationship identified by ξ correlation is indeed non-zero and showing a fairly robust relationship. Pearson correlation however is much closer to 0 again showing the capability of ξ correlation to detect relationships between two variables. The same reasoning applies to the Sine, A Smile, Noisy smile and 3 Gaussians examples. In all of these cases Pearson correlation is a weak to non-existant relationship while ξ correlation clearly shows that a relationship exists.

But what about the Unit Circle and 4 Gaussians? Well, that’s where the fun stops. While you and I definitely would agree that there’s significant structure in this data ξ correlation fails to identify it. To be fair, so does Pearson correlation but for different reasons. Now these failures are interesting. For the Unit Circle we get a negative \(\xi=-0.5\) which we above stated was not possible since ξ correlation is a scalar number between 0 and 1. So what’s going on here? Well, I don’t know other than that the author of the paper acknoledges that it is indeed possible for the value to be negative but offers no interpretation of what that would mean for the relationship at hand. Thus, it’s hard to say anything about relationships showing up as negative when using ξ correlation. For me, when I get negative numbers of ξₙ I tend to mark them as “manually inspect”. In Zhang (2024) they elaborate a lot more on issues with this new metric.

ξ correlation, almost always, quantifies the strength of the relationship between two variable as a scalar number between 0 and 1.

Correlation plots

When it comes to early exploratory data analysis scatterplots are extremely useful. Often they also come in flavors of what is known as pairplots, where you can see several variables scattered against each other at the same time. It’s common practice to also print the correlation coefficient for each of these scatterplots. We use the PIMA dataset from Venables and Ripley (2002) to illustrate a common pairplot using Kendall correlation in Figure 3. As you can see we only show the bottom left of the diagonal of the matrix of scatterplots and that’s because correlation in general is symmetric so \(\text{COR}(X,Y)=\text{COR}(Y,X)\). Thus, there is no information in the top half which is not already shown in the bottom half.

Figure 3: A pair plot of all variables in the PIMA dataset from the R MASS package where the Kendall correlation coefficient is given.

This is of course no longer true for ξ correlation. Therefor we actually want to show both halves in a correlation plot when using ξₙ correlation coefficient to evaluate the potential relationship between two variables. In Figure 4 you can see especially the two variables Ped and Age where the ξₙ coefficient is \(0.1\) in one direction and effectively \(0\) in the other.

Figure 4: A pair plot of all variables in the PIMA dataset from the R MASS package where the ξₙ correlation coefficient is shown.

Conclusion

I have introduced the new correlation coefficient ξₙ and showed that it indeed captures a wider range of relationships than the traditional correlation coefficients by Spearman, Kendall and Pearson. I have also illustrated some issues unique to this new metric which are not present in the traditional coefficients. So while ξₙ is definitely useful it’s not a silver bullet by any means when looking for relationships between variables. Also we clearly saw that you need to calculate both directions, i.e., you need to think about \(\xi_n(X,Y)\) and \(\xi_n(Y,X)\) before concluding anything about the potential relationship between the two variables X and Y.

References

Chatterjee, Sourav. 2021. “A New Coefficient of Correlation.” Journal of the American Statistical Association 116 (536): 2009–22. https://doi.org/10.1080/01621459.2020.1758115.
Josse, Julie, and Susan Holmes. 2016. Measuring multivariate association and beyond.” Statistics Surveys 10 (none): 132–67. https://doi.org/10.1214/16-SS116.
Kap, Benjamin, Marharyta Aleksandrova, and Thomas Engel. 2021. “Causal Identification with Additive Noise Models: Quantifying the Effect of Noise.” https://arxiv.org/abs/2110.08087.
Pearson, Karl. 1895. “Note on Regression and Inheritance in the Case of Two Parents.” Proceedings of the Royal Society of London Series I 58 (January): 240–42.
Sumner, Tim. 2024. “A New Coefficient of Correlation.” Medium. https://towardsdatascience.com/a-new-coefficient-of-correlation-64ae4f260310.
Taelman, Steff. 2023. “Xicor.jl.” GitHub Repository. https://github.com/stefftaelman/Xicor.jl; GitHub.
Venables, W. N., and B. D. Ripley. 2002. Modern Applied Statistics with s. Fourth. New York: Springer. https://www.stats.ox.ac.uk/pub/MASS4/.
Zhang, Qingyang. 2024. “On Relationships Between Chatterjee’s and Spearman’s Correlation Coefficients.” Communications in Statistics - Theory and Methods 0 (0): 1–0. https://doi.org/10.1080/03610926.2024.2309971.