如何在皮尔逊相关系数和斯皮尔曼相关系数之间选择？

1. Introduction

In this tutorial, we’ll review two concepts: Pearson and Spearman Correlation. We’ll also discuss how to choose between the two.

2. Correlations

Before we delve into the intricacies of Pearson and Spearman correlations, let’s first define correlation. Correlation is used to refer to how variables are related. It describes the relationship between variables by quantifying the degree to which they relate. The correlation between variables can be linear, where a movement in one variable moves the other. Alternatively, it can be non-linear, where a change in one does not correspond to a change in another.

For example, suppose we observe that the amount of money spent in winter increases with the age of a person. We can assume that there is a correlation between these two variables. The correlation here is that the amount of money spent increases with age.

3. Pearson Correlation

Pearson correlation measures the degree and direction of linear correlations between variables. It is calculated as the covariance ratio of two variables to the product of their standard deviations. Suppose we have variables and , the Pearson correlation would be calculated using this formula:

$[\rho_{X,Y} = \frac{\operatorname{cov}(X,Y)}{\sigma_X \sigma_Y} ]$

Here, $\operatorname{cov}(X,Y)$ is the covariance between and . $\sigma_X$ and $\sigma_Y$ are the means of and respectively. The value of Pearson correlation between two variables is bounded between [-1, 1] , where indicates a strong, positive correlation. This means that when increases, then increases. In contrast, indicates a negative correlation; this means when decreases, then increases. Similarly, a correlation close to means that no correlation exists.

4. Spearman Correlation

Spearman correlation measures the strength and direction of monotonic correlations between variables by considering the ranking of these variables. A monotonic relationship refers to where two variables consistently change in the same direction. It is computed similarly to Pearson’s correlation but takes into account the ordinal ranks of the variables:

$[ r_s = \rho_{\text{R}(X),\text{R}(Y)} = \frac{\text{cov}(\text{R}(X),\text{R}(Y))}{\sigma_{\text{R}(X)} \sigma_{\text{R}(Y)}} ]$

Here $\text{R}(X)$ and $\text{R}(X)$ are the ranks of variables and . Similarly, the Spearman correlation between two variables is bounded between [-1, 1] , where indicates a negative monotonic correlation. This means that a decrease in , idecreases . In contrast, indicates a strong, positive correlation, while indicates no monotonic relationship.

5. An Example

Suppose we have our variables, and , is the age of participants and is the amount of money spent:

500

780

200

300

Computing the Pearson correlation over these two variables will give

$[\rho_{X,Y} = 0.20]$

Subsequently, Spearman correlation, using the ordering of datapoints as ranks, would give

$[ r_s = \rho_{\text{R}(X),\text{R}(Y)} = 0.73 ]$

. This computation suggests that there is no linear correlation between and . However, a positive monotonic relationship exists between the two variables.

6. Which One Should You Choose?

The choice of whether to choose between Pearson and Spearman correlation depends on the characteristics of the data and the task at hand. For instance, choose:

Pearson’s correlation for linear relationships, Spearman’s correlation otherwise
Spearman’s correlation for data with ranking, Pearson’s correlation otherwise

7. Conclusions

In this article, we provided an overview of Pearson and Spearman Correlation.

Pearson correlation quantifies linear relationships, while Spearman correlation measures the degree of mon0tonic correlations between variables. The choice of which one to choose depends on the data’s characteristics or the analysis’s goal.

Persistence

REST

Security