1. Introduction
In this tutorial, we’ll delve into the intricacies of Linear Discriminant Analysis (LDA).
2. Dimensionality Reduction Techniques
Dimensionality reduction simplifies datasets by reducing dimensions and categorizing them into unsupervised and supervised approaches.
Unsupervised methods like principal component analysis (PCA) and independent component analysis (ICA) don’t require class labels, offering versatility. Supervised approaches like mixture discriminant analysis (MDA), neural networks (NN), and linear discriminant analysis (LDA) integrate class labels. Our focus is on LDA.
3. What Is Linear Discriminant Analysis (LDA)?
LDA is a powerful dimensionality reduction technique. It seeks to transform our data into a lower-dimensional space, enhancing class separability while minimizing within-class variance:
We observe three data clusters: class A, class B, and class C. Projecting along dimensional space 1 reduces within-class variance but loses between-class variance for classes A and B. Dimensional space 2 maintains both variances. Thus, we discard space 1 and use space 2 for LDA.
3.1. Steps to Performing LDA
In tackling dimensionality reduction with LDA, we divide the process into three main steps.
- First, we maximize class separability by measuring the distance between their means (between-class variance)
- Second, we minimize the closeness of each data point within a class to its mean (within-class variance)
- Lastly, we use these parameters to construct an appropriate lower-dimensional space
The figure below illustrates these steps visually:
This newly constructed space simplifies our data and strengthens its power to distinguish between classes.
3.2. Approaches to LDA
Exploring LDA, we encounter class-dependent and class-independent approaches. In class-dependent LDA, each class gets its own lower-dimensional space. Conversely, class-independent LDA treats each class separately but shares one lower-dimensional space.
4. Mathematical Foundations of LDA
In this section, we discuss the mathematical process for performing LDA.
4.1. Calculating the Between-class Variance
To calculate the between-class variance , we follow the following steps:
We first calculate the separation distance as follows:
(1)
where, represents the projection of the mean of the class which we calculate as follows:
.
is the projection of the total mean of classes which we calculate as follows:
,
where represents the transformation matrix of LDA, represents the mean of the class and we calculate it as follows,
.
is the total mean of all classes which we calculate as follows:
,
where is the total number of classes.
The term in equation 1 represents the between-class variance of the class.
We substitute into equation 1 as follows:
(2)
We now obtain the total between-class variance as follows:
(3)
4.2. Calculating the Within-class Variance
To calculate the between-class variance , we take the following steps:
The within-class variance of the class is the difference between the mean and the samples of that class.
We then search for a lower-dimensional space to minimize the difference between the projected mean and the projected samples of each class , which is the within-class variance.
We capture these steps with the equations as follows:
(4)
From equation 4, we calculate the within-class variance as follows;
= = ,
where represents the sample in the class and is the centring data of the class, i.e.
= = .
To calculate the total within-class variance of our data, we sum up the within-class variance of all classes present in our data as follows;
= = + + … + .
4.3. Constructing the Lower Dimensional Space: Fisher’s Criterion
To transform the higher dimensional features of our data into a lower dimensional space, we maximize the ratio of to to guarantee maximum class separability by using Fisher’s criterion , which we write this way:
(5)
To calculate the transformation matrix , we transform equation 5 into equation 6 as follows:
(6)
where represents the eigenvalues of the transformation matrix
Next, we transform equation 6 into equation 7 as follows:
(7)
we can then calculate the eigenvalues and their corresponding eigenvectors of given in equation 7.
4.4. The Role of the Eigenvalues and Eigenvectors
In LDA, eigenvectors determine the direction of the space, while eigenvalues indicate their magnitude. We select the highest eigenvalue eigenvectors to form a lower dimensional space , neglecting others. This reduces our original data from dimension to . We write this new dimensional space as:
(8)
5. Class-independent and Class-dependent LDA Approach
Here, we describe the slight difference between the class-independent and class-dependent LDA approaches and also provide the algorithms for both.
5.1. Class-independent LDA Approach
In the class-independent method, we use only one lower-dimensional space for all classes. Thus, we calculate the transformation matrix for all classes, and we project the samples of all classes on the selected eigenvectors.
Here is the algorithm for the class-independent LDA approach:
5.2. Class-dependent LDA Approach
In class-dependent LDA, we compute distinct lower-dimensional spaces for each class using . We find eigenvalues and eigenvectors for each separately, then project samples onto their respective eigenvectors.
We provide the algorithm for the class-dependent LDA approach as follows:
6. Computational Complexity of LDA
In this section, we provide the computational complexities of the two classes of LDA
6.1. Computational Complexity for Class-independent LDA
In analyzing the computational complexity of class-dependent LDA, we consider various steps. First, computing the mean of each class has complexity . Second, calculating the total mean has complexity . Third, computing within-class scatter involves . Fourth, calculating between-class scatter has complexity . Fifth, finding eigenvalues and eigenvectors involves . Sorting eigenvectors is . Selecting eigenvectors is . Finally, projecting data onto eigenvectors is . Overall, class-independent LDA has complexity for , otherwise .
6.2. Computational Complexity for Class-dependent LDA
In analyzing computational complexity, both class-dependent and class-independent LDAs follow similar steps. However, class-dependent LDA repeats the process for each class, increasing complexity. Overall, class-dependent LDA complexity is for , or , surpassing class-independent LDA in complexity.
7. Numerical Example
In this section, we will present a numerical example explaining how to calculate the LDA space step by step and how LDA is used to discriminate two different classes of data using the class-independent and class-dependent approach.
The first class is a x matrix with the number of samples . The second class is a x matrix with . Each sample in both classes has two features (i.e., ) as follows:
, and, .
First, we calculate the mean for each class, and then the total mean for both classes. The resulting answers are as follows;
, , and .
Next, we calculate the between-class variance of each class, and the total between-class variance of the given data, as follows:
, so that,
similarly,
, so that,
becomes,
.
Next, we calculate the within-class variance . To do this, we perform the mean-centering data process by subtracting the mean of each class from each sample in that class. We achieve this as follows: . Where is the mean-centering data of the class . The values of and becomes:
, and .
In the next two sub-sections, we’ll present how the two LDA approaches solve this numerical example.
7.1. Class-independent LDA Approach
In this approach, after the data centering process, we calculate the within-class variance as follows: . The values of is as follows:
, and .
becomes,
.
.
Next, we calculate the transformation matrix as follows: . The values of are . The values of the transformation matrix become;
.
We then calculate the eigenvalues and eigenvectors from . Their values are as follows:
, and
Next, we project and onto the first column vector and the second column vector of the matrix to investigate which of the columns is more robust in discriminating the data.
Using , we project on the lower dimensional space as follows:
,
Using the same eigenvector , we project on the lower dimensional space as follows:
,
.
Next, we project and to eigenvector as follows:
,
and
,
.
From our results, we observe that is more robust in discriminating the classes and than . This is because is larger than . Hence, we choose to discriminate and instead of .
7.2. Class-dependent LDA Approach
In this approach, we aim to calculate a separate transformation matrix for each class . The within-class variance for each class is the same as we calculated it in the class-independent approach, we then calculate of each class as follows: . The values of and becomes:
, and
.
Next, we calculate the eigenvalues and the corresponding eigenvectors of each transformation matrix .
The values of and of for are as follows:
, and the corresponding eigenvectors are;
Similarly,the values of the eigenvalues and the corresponding eigenvectors of for the second class data are as follows:
, and the corresponding eigenvectors are; .
From the results above, we observe that of for has a bigger corresponding eigenvalue than , this implies that will discriminate better than the , thus, we chose as the lower dimensional space to project as follows;
.
For the second class , we also see that the second column eigenvector of has a higher eigenvalue than the first column eigenvector , this also implies that will discriminate better than the . We obtain the projected data for as follows;
.
In the next sub-section, we present the visual illustration of the performance of LDA on the two classes of data and and then draw our conclusions.
7.3. Performance Analysis of Class-independent and Class-dependent LDA Approach
First, let us consider the class-independent LDA approach.
The next figures illustrate the probability density function (pdf) graphs of the projected data and on the two eigenvectors and of matrix :
A comparison of the two figures reveals the following:
- (second figure) showed a better discrimination of the data of each class than (first figure). This means that maximizes the between-class variance than
- minimized the within-class variance of the data of both classes than
- By using some techniques, we can easily handle the outlier of in the second figure
Let us investigate the class-dependent LDA approach:
We observe that by projecting onto of the first eigenvector and onto of the second eigenvector, we were able to discriminate the data efficiently. The corresponding eigenvectors achieved this by maximizing and minimizing of the data.
8. Conclusion
In our article, we explored linear discriminant analysis (LDA) through class-independent and class-dependent approaches. Our analysis revealed that while the class-dependent method is efficient, it demands more computational time than the class-independent approach. Consequently, the class-independent approach is commonly favoured as the standard method for LDA.