1. Introduction
In this tutorial, we’ll show how to multiply a matrix chain using dynamic programming. This problem frequently arises in image processing and computer graphics, e.g., animations and projections.
2. Matrix Chains
Let’s start with an example. We have four matrices: and
. Their shapes are
,
,
, and
, respectively.
Since each matrix’s number of columns corresponds to the next one’s number of rows, their product is defined.
We can calculate it sequentially. First, we multiply and
to get an intermediate
-matrix
. Then, we multiply
with
and get a
-matrix
. Finally, we get the result by multiplying
with
:
How many operations do we perform this way?
To multiply two matrices and
of shapes
and
, we must compute
elements. For each, we perform
scalar multiplications and
additions. So, the total number of operations,
, is
.
In our example:
That’s 460 operations. However, since matrix multiplication is associative, we can compute the chain in a different order:
This multiplication order involves 236 operations:
The reduction is almost 50%.
2.1. Formulation
So, in problems of this type, we have matrices
of the shapes
such that
for all
.
The goal is to find the multiplication order that minimizes , the number of scalar multiplications and additions in computing the product
.
3. Dynamic Programming
We can use dynamic programming to do this. So, let’s start with the recursive relation.
3.1. Recursion
Let (where
). No matter how we order multiplications, the final product will always result from multiplying two matrices.
So, we compute by splitting it into the “left” and “right” products
(of shape
) and
(of shape
):
The choice of determines the cost:
Therefore, we should choose the that minimizes the number of operations:
The base case is , and its cost is zero because we don’t perform any operation to compute
.
3.2. Tabular Algorithm
Let’s check out the tabular approach to computing .
In this method, we start from the base cases and work up to one diagonal at a time. So, after the base cases on the main diagonal, we compute the costs
for
of multiplying pairs:
Then, we compute the for
:
Continuing this way, we calculate in the end:
We keep track of the splitting indices by storing them in a separate data structure as we compute the costs.
We can use matrices for
and
, but half of the reserved space will be unused. Another option is to use a hash map.
3.3. Example
Let’s apply these algorithms to our initial example with matrices and
of shapes
,
,
, and
, respectively.
First, we set the costs on the main diagonal to zero. Then, we compute the costs of ,
, and
.
There will be only one way to split each pair. For example:
The same goes for and
:
We used matrix names instead of indices for readability.
Now, we focus on and
:
So:
Finally, we compute :
4. Reconstructing the Solution
A way to reconstruct the solution is to follow the indices in recursively.
If , where
, that means that
should be computed as
. So, we need
and
to split
and
optimally:
The result of is the solution for the entire input chain
.
4.1. Example
In our example, is a
matrix with the empty lower triangle:
Since , the first split comes after
, so we have:
Since the left product is a single matrix, we focus on the right one. , so we get:
5. Memoization
Memoization is another approach. It computes recursively following its definition. However, it avoids repeated calculations by memorizing the intermediate results:
For each pair , we first check in
if we’ve already computed
. If so, we return
. Otherwise, we calculate
.
This approach is usually easier to implement than the tabular method. However, the excessive depth of the recursion tree for large can cause stack overflow.
6. Conclusion
In this article, we showed how to multiply a chain of matrices using dynamic programming.
The main difference between the tabular approach and memoization is the order in which the sub-problems are solved. The former is a bottom-up iterative approach that starts from the base cases, whereas the latter is a top-down recursive approach.