1. Introduction
In this tutorial, we’ll review two concepts: Pearson and Spearman Correlation. We’ll also discuss how to choose between the two.
2. Correlations
Before we delve into the intricacies of Pearson and Spearman correlations, let’s first define correlation. Correlation is used to refer to how variables are related. It describes the relationship between variables by quantifying the degree to which they relate. The correlation between variables can be linear, where a movement in one variable moves the other. Alternatively, it can be non-linear, where a change in one does not correspond to a change in another.
For example, suppose we observe that the amount of money spent in winter increases with the age of a person. We can assume that there is a correlation between these two variables. The correlation here is that the amount of money spent increases with age.
3. Pearson Correlation
Pearson correlation measures the degree and direction of linear correlations between variables. It is calculated as the covariance ratio of two variables to the product of their standard deviations. Suppose we have variables and , the Pearson correlation would be calculated using this formula:
Here, is the covariance between and . and are the means of and respectively. The value of Pearson correlation between two variables is bounded between , where indicates a strong, positive correlation. This means that when increases, then increases. In contrast, indicates a negative correlation; this means when decreases, then increases. Similarly, a correlation close to means that no correlation exists.
4. Spearman Correlation
Spearman correlation measures the strength and direction of monotonic correlations between variables by considering the ranking of these variables. A monotonic relationship refers to where two variables consistently change in the same direction. It is computed similarly to Pearson’s correlation but takes into account the ordinal ranks of the variables:
Here and are the ranks of variables and . Similarly, the Spearman correlation between two variables is bounded between , where indicates a negative monotonic correlation. This means that a decrease in , idecreases . In contrast, indicates a strong, positive correlation, while indicates no monotonic relationship.
5. An Example
Suppose we have our variables, and , is the age of participants and is the amount of money spent:
X
Y
26
500
56
780
78
200
18
20
50
300
Computing the Pearson correlation over these two variables will give
.
Subsequently, Spearman correlation, using the ordering of datapoints as ranks, would give
. This computation suggests that there is no linear correlation between and . However, a positive monotonic relationship exists between the two variables.
6. Which One Should You Choose?
The choice of whether to choose between Pearson and Spearman correlation depends on the characteristics of the data and the task at hand. For instance, choose:
- Pearson’s correlation for linear relationships, Spearman’s correlation otherwise
- Spearman’s correlation for data with ranking, Pearson’s correlation otherwise
7. Conclusions
In this article, we provided an overview of Pearson and Spearman Correlation.
Pearson correlation quantifies linear relationships, while Spearman correlation measures the degree of mon0tonic correlations between variables. The choice of which one to choose depends on the data’s characteristics or the analysis’s goal.