1. Introduction

Every conceivable physical system or scientific problem involves at its core a system of linear algebraic equations. For instance, fitting a curve to a set of datapoints, optimizing a cost function, analyzing electrical networks etc, requires solving linear equations.

Some of the fundamental tasks in computational linear algebra are:

  • Solving a system of linear equations A\mathbf{x} = \mathbf{b}
  • Finding the inverse of a square matrix A
  • Computing the determinant of a matrix

An applied introduction to the algorithms for performing such fundamental tasks follows.

In this tutorial, we’ll learn, how a matrix can be decomposed into simpler lego blocks. Operating on these smaller lego blocks is often easier, than working with the original matrix.

2. Gaussian Elimination With Partial Pivoting

Consider an elementary system of three linear equations :

    \begin{align*} x + 2y + z &=2\\ 2x+6y+z &=7\\ x + y + 4z &=3\\ \end{align*}

in three unknowns x,y,z. We can write this system in the matrix form A \mathbf{x} = \mathbf{b} as :

    [\begin{bmatrix}1 & 2 & 1 \\ 2 & 6 & 1 \\ 1 & 1 & 4\end{bmatrix}\begin{bmatrix}x \\ y \\ z\end{bmatrix} = \begin{bmatrix}2 \\ 7 \\ 3 \end{bmatrix}]

The high-school algorithm to solve a system of linear equations consists of performing elementary row operations on the equations. Since performing operations on the equations also affects their right-hand sides, keeping track of everything is most easily done using the augmented matrix M=[A|\mathbf{b}].

We shall call the (1,1) entry of the coefficient matrix A, the first pivot, the (2,2) entry as the second pivot, and so on. In regular Gaussian Elimination, the basic idea is to make all entries in the column j below the jth pivot a_{jj} equal to 0.

A key requirement, therefore, is that, a pivot element must always be non-zero. We sometimes need to interchange rows j and k, k > j to ensure that the pivot is non-zero. The most common practice is to choose k, such that \abs{a_{kj}} is maximal. This is called partial pivoting.

    \begin{align*} &\begin{bmatrix} 1 & 2 & 1 &| & 2 \\ 2 & 6 & 1 &| &7\\ 1 & 1 & 4 &| & 3 \end{bmatrix} \xrightarrow {R_2-2R_1} \begin{bmatrix} 1 & 2 & 1 &| & 2 \\ 0 & 2 & -1 &| &3\\ 1 & 1 & 4 &| & 3 \end{bmatrix}\\ \xrightarrow {R_3 - R_1} &\begin{bmatrix} 1 & 2 & 1 &| & 2 \\ 0 & 2 & -1 &| &3\\ 0 & -1 & 3 &| & 1 \end{bmatrix} \xrightarrow {2R_3+R_2} \begin{bmatrix} 1 & 2 & 1 &| & 2 \\ 0 & 2 & -1 &| &3\\ 0 & 0 & 5 &| & 5 \end{bmatrix} \end{align*}

Since we have performed elementary operations on both the left and right-hand sides of the system, the resulting system of equations U\mathbf{x}={c}, is equivalent to the original system A\mathbf{x}=\mathbf{b}. A\mathbf{x} = \mathbf{b} and U\mathbf{x}=c have the same solution vector \mathbf{x} = (x,y,z)^T.

Note that, U is an upper-triangular matrix with all non-zero elements – the pivots on the diagonal. A square matrix A is called non-singular if it can be reduced to an upper triangular form U by elementary row operations. 

The Gaussian Elimination algorithm with partial pivoting in the form of psuedocode is given below:

algorithm GaussianElim(A, b):
    // INPUT
    //    A = a non-singular matrix of size n x n
    //    b = the RHS vector
    // OUTPUT
    //    U = upper triangular form of A
    //    c = vector resulting from transforming b corresponding to U
    
    if A[j,j] = 0:
        big <- 0.0
        kRow <- j
        // outer loop of the search for a pivot
        for k <- j + 1 to n - 1:
            // if a suitable element is found
            if abs(A[k,j]) > big:
                big <- abs(A[k,j])
                kRow <- k

        // swap rows j and kRow
        for l <- j to n - 1:
            dum <- A[j,l]
            A[j,l] <- A[kRow,l]
            A[kRow,l] <- dum

        // swap j-th and k-th row element of b vector
        dum <- b[j]
        b[j] <- b[kRow]
        b[kRow] <- dum

    pivot <- A[j,j]

    if pivot = 0:
        Raise an exception as the matrix A is singular.
        
    // Next, we reduce the rows
    for i <- j + 1 to n - 1:
        mult <- A[i,j] / pivot
        for l <- j to n - 1:
            A[i,l] <- A[i,l] - mult * A[j,l]
        b[i] <- b[i] - mult * b[j]

    return A, b

The resulting system of equations U\mathbf{x} = \mathbf{c} is easy to solve by Back Substitution:

algorithm BackSubstitution(U, c):
    // INPUT
    //    U = upper triangular form
    //    c = vector
    // OUTPUT
    //    the solution vector x

    for i <- n-1 to 0:
        sum <- 0

        for j <- i+1 to n-1:
            sum <- sum + x[j] * U[i,j]

        x[i] <- 1 / U[i,i] * (c[i] - sum)

    return x

The solution vector of the currents in the electrical network example is \mathbf{x}=[-3, 2, 1].

3. LU Factorization

Suppose we’re able to write the matrix \mathbf{A} as the product of two matrices :

    [\mathbf{A} = \mathbf{L} \cdot \mathbf{U}]

where \mathbf{L} is a lower triangular matrix, \mathbf{U} is an upper triangular matrix. We can use such a decomposition to solve the system of linear equations:

    [\mathbf{A}\cdot \mathbf{x} = (\mathbf{L}\cdot \mathbf{U} ) \cdot \mathbf{x} = \mathbf{L}\cdot (\mathbf{U}  \cdot \mathbf{x}) =  \mathbf{b}]

We first solve for the vector \mathbf{y} such that,

    [\mathbf{L} \cdot \mathbf{y} = \mathbf{b}]

and then solving

    [\mathbf{U} \cdot \mathbf{x} = \mathbf{y}]

Solving the lower-triangular system \mathbf{L} \cdot \mathbf{y} = \mathbf{b} can be done by forward-substitution:

algorithm FwdSubstitution(L, b):
    // INPUT
    //    L = Lower triangular form
    //    b = vector b
    // OUTPUT
    //    The solution vector y

    for i <- 0 to n - 1:
        sum <- 0
        
        for j <- 0 to i - 1:
            sum <- sum + y[j] * L[i, j]
        
        y[i] <- (b[i] - sum) / L[i, i]

    return y

Solving the upper-triangular system \mathbf{U} \cdot \mathbf{x} = \mathbf{y} can be done by backward-substitution. In this way, \mathbf{LU} factorization can be used to find the solution of a system of equations.

What is the advantage of breaking one linear system of equations into two successive ones? The advantage is that the solution of a triangular set of equations is easy to find. Once we know the \mathbf{LU}-factorization of the coefficient matrix \mathbf{A}, it is relatively cheap to solve for a large number of right-hand side vectors \mathbf{b}.

3.1. LU Decomposition Algorithm

Let \mathbf{A} be an n \times n non-singular square matrix. Let’s write the matrix factorization \mathbf{A} = \mathbf{LU} in the block form:

    \begin{align*} \begin{bmatrix}a_{11} & \boldsymbol{a_{12}}\\ \boldsymbol{a_{21}} &\mathbf{A_{22}}\end{bmatrix} &= \begin{bmatrix}1 & \boldsymbol{0} \\ \boldsymbol{l_{21}} & \mathbf{L_{22}}\end{bmatrix}\begin{bmatrix}u_{11} & \boldsymbol{u_{12}} \\ \boldsymbol{0} & \mathbf{U_{22}}\end{bmatrix}\\ &= \begin{bmatrix}u_{11} & \boldsymbol{u_{12}}\\u_{11}\boldsymbol{l_{21}} &\boldsymbol{l_{21} u_{12}} + \mathbf{L_{22}U_{22}}\end{bmatrix} \end{align*}

In the above block form of the n \times n matrix A, the entry a_{11} is a scalar, \boldsymbol{a_{12}} is a 1 \times (n-1) row vector, \boldsymbol{a_{12}} is an (n-1)\times 1 column vector, and \mathbf{A_{22}} is an (n-1) \times (n-1) matrix.

Comparing the left- and right-hand sides of the above block matrix equation, we see that:

    \begin{align*} a_{11} &= u_{11}\\ \boldsymbol{a_{12}} &= \boldsymbol{u_{12}}\\ \boldsymbol{a_{21}} &= u_{11}\boldsymbol{l_{21}}\\ \mathbf{A_{22}} &= \boldsymbol{l_{21}}\cdot\boldsymbol{u_{12}} + \mathbf{L_{22}}\mathbf{U_{22}} \end{align*}

Rearranging these equations to solve for the components of \mathbf{L} and \mathbf{U} matrices, we have:

    \begin{align*} u_{11} &= a_{11}\\ \boldsymbol{u_{12}} &= \boldsymbol{a_{12}}\\ \boldsymbol{l_{21}} &= \frac{1}{u_{11}}\boldsymbol{a_{21}}\\ \mathbf{L_{22}U_{22}} &= \mathbf{A_{22}} - \boldsymbol{l_{21}}\cdot\boldsymbol{u_{12}} \end{align*}

We can quickly evaluate the first three equations to get the first row and column of the matrices \mathbf{L}, \mathbf{U}. We can then evaluate the right hand of the last equation, which gives the Schur’s complement \mathbf{S_{22}} of \mathbf{A}. We’re thus left with the equation \mathbf{S_{22}} = \mathbf{L_{22}U_{22}}, which is a smaller sub-problem of size (n-1) \times (n-1) that we can recursively solve.

We conceive a simple algorithm based on this intuitive idea:

algorithm LUdcmp(A):
    // INPUT
    //    A = a non-singular square matrix
    // OUTPUT
    //    Lower unitriangular matrix L 
    //    Upper triangular form U

    N <- the order of the square matrix A

    // If N = 1, we are done
    if N = 1:
        L <- [1]
        U <- A
        return L, U

    // Write A as a 2x2 matrix
    a11 <- A[0,0]
    a12 <- A[0,1:N-1]
    a21 <- A[1:N-1,0]
    A22 <- A[1:N-1,1:N-1]

    // Compute first row and column of L, U matrices
    l11 <- 1
    l12 <- [0 for _ in 1 to N-1]

    u11 <- a11
    u12 <- a12
    l21 <- (1 / u11) * a21

    S22 <- A22 - l21 * u12

    // Recursively solve subproblem of small size
    L22, U22 <- LUdcmp(S22)

    // Put the L, U matrices together
    L <- [ [l11, l12], [l21, L22] ]
    U <- [ [u11, u12], [ zeros(N-1), U22] ]

    return L, U

4. QR Factorization

Like the LU-decomposition, there is another matrix factorization that is sometimes very useful, the so-called QR-decomposition.

4.1. The Gram-Schmidt Process

Suppose, we’re working in a basis \mathcal{B}=\{\mathbf{w}_1,\mathbf{w}_2,\ldots,\mathbf{w}_n\} for \mathbf{R}^n. We’re interested to build an orthonormal basis from \mathcal{B}. A practical algorithm to construct an orthonormal basis is the Gram Schmidt process. The Gram Schmidt process is one of the premier algorithms of applied and computational linear algebra.

We assume that, we already know some basis \mathbf{w}_1,\ldots,\mathbf{w}_n of an n-dimensional space. Our goal is to use this information to construct an orthogonal basis \mathbf{v}_1,\ldots,\mathbf{v}_n.

We’ll construct the orthogonal basis elements one by one. Since initially, we’re not worrying about normality, there are no conditions on the first orthogonal basis element \mathbf{v}_1 and so there is no harm in choosing

    [\mathbf{v}_1 = \mathbf{w}_1]

Note that, \mathbf{v}_1 \neq \mathbf{0}, since \mathbf{w}_1 appears in the original basis. Starting with \mathbf{w}_2, the second basis vector \mathbf{v}_2 must be orthogonal to the first : \mathbf{v}_2 \cdot \mathbf{v}_1 = 0. Suppose that, it is possible to write \mathbf{w}_2 as some suitable scalar multiple of \mathbf{v}_1 and the orthogonal vector \mathbf{v}_2. Set:

    [\mathbf{v}_2 = \mathbf{w}_2 - c\mathbf{v}_1]

where c is a scalar to be determined:

j1

The orthogonality condition \mathbf{v}_2 \cdot \mathbf{v}_1 = 0 yields:

    [\mathbf{w}_2 \cdot \mathbf{v}_1 - c\mathbf{v}_1 \cdot \mathbf{v}_1 = \mathbf{w}_2 \cdot \mathbf{v}_1 - c |\mathbf{v}_1|^2 = 0]

Consequently:

    [c = \frac{\mathbf{w}_2 \cdot \mathbf{v}_1}{|\mathbf{v}_1|^2}]

and therefore:

    [\mathbf{v}_2 = \mathbf{w}_2 - \left(\frac{\mathbf{w}_2 \cdot \mathbf{v}_1}{|\mathbf{v}_1|^2}\right)\mathbf{v}_1]

Next, we construct:

    [\mathbf{v}_3 = \mathbf{w}_3 - c_1 \mathbf{v}_1 - c_2 \mathbf{v}_2]

by subtracting multiples of the first two orthogonal basis elements from \mathbf{w}_3. We want \mathbf{v}_3 to be orthogonal to both \mathbf{v}_1 and \mathbf{v}_2. This gives us two equations:

    \begin{align*} 0 &= \mathbf{v}_3 \cdot \mathbf{v}_1 = \mathbf{w}_3 \cdot \mathbf{v}_1 - c_1 \mathbf{v}_1\cdot \mathbf{v}_1\\ 0 &= \mathbf{v}_3 \cdot \mathbf{v}_2 = \mathbf{w}_3 \cdot \mathbf{v}_2 - c_1 \mathbf{v}_2\cdot \mathbf{v}_2 \end{align*}

and hence

    \begin{align*} c_1 &= \frac{\mathbf{w}_3 \cdot \mathbf{v}_1}{|\mathbf{v}_1|^2}\\ c_2 &= \frac{\mathbf{w}_3 \cdot \mathbf{v}_2}{|\mathbf{v}_2|^2} \end{align*}

In this fashion, we can establish the general Gram-Schmidt formula

    [\mathbf{v}_k = \mathbf{w}_k - \sum_{j=1}^{k-1}\frac{\mathbf{w}_k\cdot \mathbf{v}_j}{|\mathbf{v}_j|^2}\mathbf{v}_j, \quad k=1,\ldots,n]

4.2. Modifications to the Gram-Schmidt Process

With the basic Gram-Schmidt algorithm in hand, let’s look at a reformulation that has both practical and theoretical advantages. This can be used to construct the orthonormal basis vectors \mathbf{u}_1,\ldots,\mathbf{u}_n directly from the basis \mathbf{w}_1,\ldots,\mathbf{w}_n.

We begin by replacing each orthogonal basis vector in the basic Gram-Schmidt formula by it’s normalized version \mathbf{u}_j = \mathbf{v}/|\mathbf{v}_j|.

Now, the original basis vectors \mathbf{w}_1,\ldots,\mathbf{w}_n can also be expressed in terms of the orthogonal basis vectors \mathbf{u}_1,\ldots,\mathbf{u}_n, via the lower-triangular system :

    \begin{align*} \mathbf{w}_1 &= r_{11} \mathbf{u}_1 \\ \mathbf{w}_2 &= r_{12} \mathbf{u}_1 + r_{22} \mathbf{u}_2\\ \mathbf{w}_3 &= r_{13} \mathbf{u}_1 + r_{23} \mathbf{u}_2 + r_{33}\mathbf{u}_3\\ & \vdots  \\ \mathbf{w}_n &= r_{1n} \mathbf{u}_1 + r_{2n} \mathbf{u}_2 + \ldots+ r_{nn}\mathbf{u}_n\\ \end{align*}

The coefficients r_{ij} can be directly computed from these formulas. Taking the dot product of the equation for \mathbf{w}_j with the orthonormal basis vector \mathbf{u}_i for i < j, we get:

    [\mathbf{w}_j \cdot \mathbf{u}_i = r_{1j}\mathbf{u}_1 \cdot \mathbf{u}_i + \ldots + r_{jj}\mathbf{u}_j \cdot \mathbf{u}_i = r_{ij}|\mathbf{u}_{i}|^2 = r_{ij}]

and hence:

    [r_{ij} = \mathbf{w}_j \cdot \mathbf{u}_i, \quad i=1,\ldots,j-1]

On the other hand, taking the dot product of the jth equation with itself, we have:

    \begin{align*} |\mathbf{w}_j|^2 &= |r_{1j} \mathbf{u}_1 + \ldots + r_{jj} \mathbf{u}_j|^2 \\ &= r_{1j}^2 + r_{2j}^2 + \ldots + r_{jj}^2 \end{align*}

Thus, we can obtain r_{jj} by computing:

    [r_{jj} = \sqrt{|\mathbf{w}_j|^2-r_{1j}^2-\ldots - r_{j-1,j}^2}]

Once we have all these coefficients r_{1j},\ldots,r_{jj}, the lower triangular system is easy to solve using forward substitution.

4.3. Orthogonal Matrices

Matrices whose columns form an orthogonal basis of the n-dimensional Euclidean space \mathbf{R}^n are called orthogonal matrices. Orthogonal matrices have a plethora of applications ranging from geometry and physics to crystallography, quantum mechanics, and PDEs. Rotational motions of bodies in three-dimensional space are described by orthogonal matrices.

Mathematically, a square matrix \mathbf{Q} is called orthogonal, if it satisfies :

    [\mathbf{Q}^T \mathbf{Q} = \mathbf{Q} \mathbf{Q}^T = \mathbf{I}]

Orthogonality of the matrix implies that we can easily invert the matrix:

    [\mathbf{Q}^{-1} = \mathbf{Q}^T]

4.4. The QR Factorization

Let \mathbf{w}_1,\ldots,\mathbf{w}_n be a basis of \mathbf{R}^n and let \mathbf{u}_1,\ldots,\mathbf{u}_n be the corresponding orthonormal basis that results from the Gram-Schmidt process. We assemble both sets of column vectors to form non-singular n \times n matrices:

    [\mathbf{A} = \begin{pmatrix} \mathbf{w}_1 \\ \mathbf{w}_2 \\ \vdots \\ \mathbf{w}_n\end{pmatrix},\quad \mathbf{Q}=\begin{pmatrix}\mathbf{u}_1 & \mathbf{u}_2 & \ldots & \mathbf{u}_n\end{pmatrix}]

Since the \mathbf{u}_i form an orthonormal basis for \mathbf{R}^n, \mathbf{Q} is an orthogonal matrix. In view of the matrix multiplication formula, the Gram-Schmidt equations can be recast into the matrix form:

    \begin{align*}\underbrace{\begin{pmatrix} \mathbf{w}_1 & \mathbf{w}_2 & \ldots & \mathbf{w}_n\end{pmatrix}}_{\mathbf{A}} &= \underbrace{\begin{pmatrix}\mathbf{u}_1 & \mathbf{u}_2 & \ldots & \mathbf{u}_n\end{pmatrix}}_{\mathbf{Q}}\underbrace{\begin{pmatrix}r_{11} & r_{12} & \ldots & r_{1n}\\ 0 & r_{22} & \ldots & r_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 &\ldots & r_{nn}\end{pmatrix}}_{\mathbf{R}} \end{align*}

We can easily implement the modified Gram-Schmidt process:

algorithm QRFact(A):
    // INPUT
    //    A = a non-singular square matrix of size n x n 
    // OUTPUT
    //    Q = orthogonal matrix
    //    R = upper triangular form of A

    N <- the order of the square matrix A
    u <- a zero matrix of size N x N
    R <- a zero matrix of size N x N

    for j <- 0 to N - 1:
        w_j <- A[:, j]

        // Squared norm of w_j
        sum <- 0
        for k <- 0 to N - 1:
            sum <- sum + w_j[k] * w_j[k]
        |w_j|^2 <- sum

        // Compute the coefficients R[i, j]
        sum <- 0
        for i <- 0 to j - 1:
            R[i, j] <- dot product of w_j and u[i]
            sum <- sum + R[i, j] * R[i, j]
        R[j, j] <- sqrt(|w_j|^2 - sum)

        // Compute vector u[j] using fwd substitution
        sumvec <- a zero vector of length N
        for i <- 0 to j - 1:
            sumvec <- sumvec + R[i, j] * u[i]
        u[j] <- (w_j - sumvec) / R[j, j]

    Q <- transpose of u
    return Q, R

4.5. Solving Linear Systems Using QR Factorization

Once the QR-decomposition of a matrix A is known, it is fairly efficient to solve the linear system of equations A \mathbf{x} = \mathbf{b}. For we have:

    \begin{align*} A\mathbf{x}&=\mathbf{b}\\ QR\mathbf{x}&=\mathbf{b}\\ R\mathbf{x}&=Q^{-1}\mathbf{b}=Q^T \mathbf{b}=\hat{\mathbf{b}}\\ R\mathbf{x}&=\hat{\mathbf{b}} \end{align*}

The matrix R is upper-triangular, so the system R\mathbf{x}=\hat{\mathbf{b}} is very easy to solve using the back substitution algorithm.

5. Conclusion

In this article, we reviewed the high-school method of solving linear equations through the elimination of variables. We learned that matrix factorizations are building blocks of more complex algorithms.

Today, the implementations of A = LU, A = QR factorizations for solving systems involving square matrices of size 10^6 \times 10^6, are commonplace in all commercial numerical software libraries such as MATLAB, Eigen, LAPACK, Linpack, NAG, and so on.