1. Introduction
Every conceivable physical system or scientific problem involves at its core a system of linear algebraic equations. For instance, fitting a curve to a set of datapoints, optimizing a cost function, analyzing electrical networks etc, requires solving linear equations.
Some of the fundamental tasks in computational linear algebra are:
- Solving a system of linear equations
- Finding the inverse of a square matrix
- Computing the determinant of a matrix
An applied introduction to the algorithms for performing such fundamental tasks follows.
In this tutorial, we’ll learn, how a matrix can be decomposed into simpler lego blocks. Operating on these smaller lego blocks is often easier, than working with the original matrix.
2. Gaussian Elimination With Partial Pivoting
Consider an elementary system of three linear equations :
in three unknowns . We can write this system in the matrix form as :
The high-school algorithm to solve a system of linear equations consists of performing elementary row operations on the equations. Since performing operations on the equations also affects their right-hand sides, keeping track of everything is most easily done using the augmented matrix .
We shall call the entry of the coefficient matrix , the first pivot, the entry as the second pivot, and so on. In regular Gaussian Elimination, the basic idea is to make all entries in the column below the th pivot equal to .
A key requirement, therefore, is that, a pivot element must always be non-zero. We sometimes need to interchange rows and , to ensure that the pivot is non-zero. The most common practice is to choose , such that is maximal. This is called partial pivoting.
Since we have performed elementary operations on both the left and right-hand sides of the system, the resulting system of equations , is equivalent to the original system . and have the same solution vector .
Note that, is an upper-triangular matrix with all non-zero elements – the pivots on the diagonal. A square matrix is called non-singular if it can be reduced to an upper triangular form by elementary row operations.
The Gaussian Elimination algorithm with partial pivoting in the form of psuedocode is given below:
algorithm GaussianElim(A, b):
// INPUT
// A = a non-singular matrix of size n x n
// b = the RHS vector
// OUTPUT
// U = upper triangular form of A
// c = vector resulting from transforming b corresponding to U
if A[j,j] = 0:
big <- 0.0
kRow <- j
// outer loop of the search for a pivot
for k <- j + 1 to n - 1:
// if a suitable element is found
if abs(A[k,j]) > big:
big <- abs(A[k,j])
kRow <- k
// swap rows j and kRow
for l <- j to n - 1:
dum <- A[j,l]
A[j,l] <- A[kRow,l]
A[kRow,l] <- dum
// swap j-th and k-th row element of b vector
dum <- b[j]
b[j] <- b[kRow]
b[kRow] <- dum
pivot <- A[j,j]
if pivot = 0:
Raise an exception as the matrix A is singular.
// Next, we reduce the rows
for i <- j + 1 to n - 1:
mult <- A[i,j] / pivot
for l <- j to n - 1:
A[i,l] <- A[i,l] - mult * A[j,l]
b[i] <- b[i] - mult * b[j]
return A, b
The resulting system of equations is easy to solve by Back Substitution:
algorithm BackSubstitution(U, c):
// INPUT
// U = upper triangular form
// c = vector
// OUTPUT
// the solution vector x
for i <- n-1 to 0:
sum <- 0
for j <- i+1 to n-1:
sum <- sum + x[j] * U[i,j]
x[i] <- 1 / U[i,i] * (c[i] - sum)
return x
The solution vector of the currents in the electrical network example is .
3. LU Factorization
Suppose we’re able to write the matrix as the product of two matrices :
where is a lower triangular matrix, is an upper triangular matrix. We can use such a decomposition to solve the system of linear equations:
We first solve for the vector such that,
and then solving
Solving the lower-triangular system can be done by forward-substitution:
algorithm FwdSubstitution(L, b):
// INPUT
// L = Lower triangular form
// b = vector b
// OUTPUT
// The solution vector y
for i <- 0 to n - 1:
sum <- 0
for j <- 0 to i - 1:
sum <- sum + y[j] * L[i, j]
y[i] <- (b[i] - sum) / L[i, i]
return y
Solving the upper-triangular system can be done by backward-substitution. In this way, factorization can be used to find the solution of a system of equations.
What is the advantage of breaking one linear system of equations into two successive ones? The advantage is that the solution of a triangular set of equations is easy to find. Once we know the -factorization of the coefficient matrix , it is relatively cheap to solve for a large number of right-hand side vectors .
3.1. LU Decomposition Algorithm
Let be an non-singular square matrix. Let’s write the matrix factorization in the block form:
In the above block form of the matrix , the entry is a scalar, is a row vector, is an column vector, and is an matrix.
Comparing the left- and right-hand sides of the above block matrix equation, we see that:
Rearranging these equations to solve for the components of and matrices, we have:
We can quickly evaluate the first three equations to get the first row and column of the matrices , . We can then evaluate the right hand of the last equation, which gives the Schur’s complement of . We’re thus left with the equation , which is a smaller sub-problem of size that we can recursively solve.
We conceive a simple algorithm based on this intuitive idea:
algorithm LUdcmp(A):
// INPUT
// A = a non-singular square matrix
// OUTPUT
// Lower unitriangular matrix L
// Upper triangular form U
N <- the order of the square matrix A
// If N = 1, we are done
if N = 1:
L <- [1]
U <- A
return L, U
// Write A as a 2x2 matrix
a11 <- A[0,0]
a12 <- A[0,1:N-1]
a21 <- A[1:N-1,0]
A22 <- A[1:N-1,1:N-1]
// Compute first row and column of L, U matrices
l11 <- 1
l12 <- [0 for _ in 1 to N-1]
u11 <- a11
u12 <- a12
l21 <- (1 / u11) * a21
S22 <- A22 - l21 * u12
// Recursively solve subproblem of small size
L22, U22 <- LUdcmp(S22)
// Put the L, U matrices together
L <- [ [l11, l12], [l21, L22] ]
U <- [ [u11, u12], [ zeros(N-1), U22] ]
return L, U
4. QR Factorization
Like the -decomposition, there is another matrix factorization that is sometimes very useful, the so-called -decomposition.
4.1. The Gram-Schmidt Process
Suppose, we’re working in a basis for . We’re interested to build an orthonormal basis from . A practical algorithm to construct an orthonormal basis is the Gram Schmidt process. The Gram Schmidt process is one of the premier algorithms of applied and computational linear algebra.
We assume that, we already know some basis of an -dimensional space. Our goal is to use this information to construct an orthogonal basis .
We’ll construct the orthogonal basis elements one by one. Since initially, we’re not worrying about normality, there are no conditions on the first orthogonal basis element and so there is no harm in choosing
Note that, , since appears in the original basis. Starting with , the second basis vector must be orthogonal to the first : . Suppose that, it is possible to write as some suitable scalar multiple of and the orthogonal vector . Set:
where is a scalar to be determined:
The orthogonality condition yields:
Consequently:
and therefore:
Next, we construct:
by subtracting multiples of the first two orthogonal basis elements from . We want to be orthogonal to both and . This gives us two equations:
and hence
In this fashion, we can establish the general Gram-Schmidt formula
4.2. Modifications to the Gram-Schmidt Process
With the basic Gram-Schmidt algorithm in hand, let’s look at a reformulation that has both practical and theoretical advantages. This can be used to construct the orthonormal basis vectors directly from the basis .
We begin by replacing each orthogonal basis vector in the basic Gram-Schmidt formula by it’s normalized version .
Now, the original basis vectors can also be expressed in terms of the orthogonal basis vectors , via the lower-triangular system :
The coefficients can be directly computed from these formulas. Taking the dot product of the equation for with the orthonormal basis vector for , we get:
and hence:
On the other hand, taking the dot product of the th equation with itself, we have:
Thus, we can obtain by computing:
Once we have all these coefficients , the lower triangular system is easy to solve using forward substitution.
4.3. Orthogonal Matrices
Matrices whose columns form an orthogonal basis of the -dimensional Euclidean space are called orthogonal matrices. Orthogonal matrices have a plethora of applications ranging from geometry and physics to crystallography, quantum mechanics, and PDEs. Rotational motions of bodies in three-dimensional space are described by orthogonal matrices.
Mathematically, a square matrix is called orthogonal, if it satisfies :
Orthogonality of the matrix implies that we can easily invert the matrix:
4.4. The QR Factorization
Let be a basis of and let be the corresponding orthonormal basis that results from the Gram-Schmidt process. We assemble both sets of column vectors to form non-singular matrices:
Since the form an orthonormal basis for , is an orthogonal matrix. In view of the matrix multiplication formula, the Gram-Schmidt equations can be recast into the matrix form:
We can easily implement the modified Gram-Schmidt process:
algorithm QRFact(A):
// INPUT
// A = a non-singular square matrix of size n x n
// OUTPUT
// Q = orthogonal matrix
// R = upper triangular form of A
N <- the order of the square matrix A
u <- a zero matrix of size N x N
R <- a zero matrix of size N x N
for j <- 0 to N - 1:
w_j <- A[:, j]
// Squared norm of w_j
sum <- 0
for k <- 0 to N - 1:
sum <- sum + w_j[k] * w_j[k]
|w_j|^2 <- sum
// Compute the coefficients R[i, j]
sum <- 0
for i <- 0 to j - 1:
R[i, j] <- dot product of w_j and u[i]
sum <- sum + R[i, j] * R[i, j]
R[j, j] <- sqrt(|w_j|^2 - sum)
// Compute vector u[j] using fwd substitution
sumvec <- a zero vector of length N
for i <- 0 to j - 1:
sumvec <- sumvec + R[i, j] * u[i]
u[j] <- (w_j - sumvec) / R[j, j]
Q <- transpose of u
return Q, R
4.5. Solving Linear Systems Using QR Factorization
Once the -decomposition of a matrix is known, it is fairly efficient to solve the linear system of equations . For we have:
The matrix is upper-triangular, so the system is very easy to solve using the back substitution algorithm.
5. Conclusion
In this article, we reviewed the high-school method of solving linear equations through the elimination of variables. We learned that matrix factorizations are building blocks of more complex algorithms.
Today, the implementations of , factorizations for solving systems involving square matrices of size , are commonplace in all commercial numerical software libraries such as MATLAB, Eigen, LAPACK, Linpack, NAG, and so on.