1. Introduction

In this tutorial, we’ll show how to multiply a matrix chain using dynamic programming. This problem frequently arises in image processing and computer graphics, e.g., animations and projections.

2. Matrix Chains

Let’s start with an example. We have four matrices: A, B, C and D. Their shapes are (4,5), (5,3), (3,10), and (10,2), respectively.

Since each matrix’s number of columns corresponds to the next one’s number of rows, their product A \times B \times C \times D is defined.

We can calculate it sequentially. First, we multiply A and B to get an intermediate (4, 3)-matrix M_1. Then, we multiply M_1 with C and get a (4, 10)-matrix M_2. Finally, we get the result by multiplying M_2 with D:

    [\overbrace{(\underbrace{\text(A \times B)}_{M_1} \times C)}^{M_2} \times D]

How many operations do we perform this way?

To multiply two matrices U and V of shapes (p, q) and (q, r), we must compute p r elements. For each, we perform q scalar multiplications and q-1 additions. So, the total number of operations, cost(U \times V),  is p r  (2q -1).

In our example:

    [\begin{aligned} cost(A \times B) & = 4 \cdot 3 \cdot (2 \cdot 5 - 1) = 108 \\ cost(M_1 \times C)  &= 4 \cdot 10 \cdot (2 \cdot 3 - 1) = 200 \\ cost(M_2 \times D)  &= 4 \cdot 2 \cdot (2 \cdot 10 - 1) = 152 \end{aligned}]

That’s 460 operations. However, since matrix multiplication is associative, we can compute the chain in a different order:

    [A \times \overbrace{(B \times \underbrace{(C \times D)}_{M_1'})}^{M_2'}]

This multiplication order involves 236 operations:

    [\begin{aligned} cost(C \times D) & = 3 \cdot 2 \cdot (2 \cdot 10 - 1) = 114 \\ cost(B \times M_1')  &= 5 \cdot 2 \cdot (2 \cdot 3 - 1) = 50 \\ cost(A \times M_2')  &= 4 \cdot 2 \cdot (2 \cdot 5 - 1) = 72 \end{aligned}]

The reduction is almost 50%.

2.1. Formulation

So, in problems of this type, we have n matrices A_1, A_2, \ldots, A_n of the shapes (p_1, q_1), (p_2, q_2), \ldots, (p_n, q_n) such that q_i = p_{i+1} for all i=1,2,\ldots, n-1.

The goal is to find the multiplication order that minimizes \boldsymbol{cost(A_1 \times A_2 \times \ldots \times A_n)}, the number of scalar multiplications and additions in computing the product \boldsymbol{A_1 \times A_2 \times \ldots \times A_n}.

3. Dynamic Programming

We can use dynamic programming to do this. So, let’s start with the recursive relation.

3.1. Recursion

Let cost(i, j) = cost(A_i \times A_{i+1} \times \ldots \times A_{j}) (where 1 \leq i \leq j \leq n). No matter how we order multiplications, the final product will always result from multiplying two matrices.

So, we compute \boldsymbol{A_i \times \ldots \times A_j} by splitting it into the “left” and “right” products A_i \times \ldots \times A_k (of shape (p_i, q_k)) and A_{k+1} \times \ldots \times A_j (of shape (p_{k+1}, q_j)):

    [(A_i  \times \ldots \times A_k ) \times ( A_{k+1} \times \ldots \times A_j)]

The choice of k determines the cost:

    [cost(i, k) + cosr(k+1, j) + p_i  q_j (2 p_{k+1} - 1)]

Therefore, we should choose the k that minimizes the number of operations:

    [cost(i, j) = \begin{cases} \min_{k=i}^{j-1} \left\{cost(i, k) + cost(k+1, j) + p_i  q_j (2 p_{k+1} - 1)\right\}, & j > i \\ 0, & j=i \end{cases}]

The base case is j=i, and its cost is zero because we don’t perform any operation to compute A_i.

3.2. Tabular Algorithm

Let’s check out the tabular approach to computing cost(1, n).

In this method, we start from the base cases and work up to cost(1, n) one diagonal at a time. So, after the base cases on the main diagonal, we compute the costs cost(i, i+1) for i=1,2,\ldots, n-1 of multiplying pairs:

The first iteration of the tabular approach.

Then, we compute the cost(i, i+2) for i=1,2,\ldots, n-2:

The second iteration of the tabular approach

Continuing this way, we calculate cost(1, n) in the end:

Rendered by QuickLaTeX.com

We keep track of the splitting indices by storing them in a separate data structure T as we compute the costs.

We can use n \times n matrices for cost and T, but half of the reserved space will be unused. Another option is to use a hash map.

3.3. Example

Let’s apply these algorithms to our initial example with matrices A, B, C and D of shapes (4,5), (5,3), (3,10), and (10,2), respectively.

First, we set the costs on the main diagonal to zero. Then, we compute the costs of A\times B, B \times C, and C \times D.

There will be only one way to split each pair. For example:

    [cost(A \times B) = cost(A) + cost(B) + 4 \cdot 3 \cdot (2 \cdot 5 - 1) = 0 +0 + 12 \cdot 9 = 108]

The same goes for B \times C and C \times D:

Computing the first diagonal after the base cases

We used matrix names instead of indices for readability.

Now, we focus on A \times B \times C and B \times C \times D:

    [\begin{aligned} cost(A \times B \times C) &= \min \begin{Bmatrix} cost(A) + cost(B \times C) + 4 \cdot 10 \cdot (2 \cdot 5 - 1) = 0 +250+40\cdot 9 = 610 \\ cost(A \times B) + cost(C) + 4 \cdot 10 \cdot (2 \cdot 3 - 1) = 108 +0 + 40 \cdot 5 = 308 \end{Bmatrix} = 308 \\ cost(B \times C \times D) &= \min \begin{Bmatrix} cost(B) + cost(C \times D) + 5 \cdot 2 \cdot (2 \cdot 10 - 1) = 0 +250+10\cdot 19 = 440 \\ cost(B \times C) + cost(D) + 5 \cdot 2 \cdot (2 \cdot 3 - 1) = 114 +0 + 10 \cdot 5 = 164 \end{Bmatrix} = 164 \end{aligned}]

So:

Example: the second iteration

Finally, we compute cost(A \times B \times C \times D):

    [cost(A, B, C, D) = \min \begin{Bmatrix} cost(A) + cost(B \times C \times D) + 72 =236\\ cost(A \times B) + cost(C \times D) + 40 = 398\\ cost(A \times B \times C) + cost(D) + 152 =460 \end{Bmatrix} = 236]

4. Reconstructing the Solution

A way to reconstruct the solution is to follow the indices in \boldsymbol{T} recursively.

If T[i, j]=k, where i \leq k < j, that means that A_i \times \ldots \times A_j should be computed as (A_i \times \ldots A_k) \times (A_{k+1} \times \ldots \times A_j). So, we need T[i, k] and T[k+1, j] to split A_i \times \ldots A_k and A_{k+1} \times \ldots \times A_j optimally:

Rendered by QuickLaTeX.com

The result of reconstruct(T, 1, n) is the solution for the entire input chain A_1 \times A_2 \times \ldots \times A_n.

4.1. Example

In our example, T is a 4 \times 4 matrix with the empty lower triangle:

The optimal splits

Since T[A, D] = A, the first split comes after A, so we have:

    [A \times ( B \times C \times D)]

Since the left product is a single matrix, we focus on the right one. T[B, D] = B, so we get:

    [A \times \left( B \times (C \times D) \right)]

5. Memoization

Memoization is another approach. It computes t(1, n) recursively following its definition. However, it avoids repeated calculations by memorizing the intermediate results:

Rendered by QuickLaTeX.com

For each pair (i, j), we first check in memory if we’ve already computed cost(i, j). If so, we return memory[i, j]. Otherwise, we calculate cost(i, j).

This approach is usually easier to implement than the tabular method. However, the excessive depth of the recursion tree for large n can cause stack overflow.

6. Conclusion

In this article, we showed how to multiply a chain of matrices using dynamic programming.

The main difference between the tabular approach and memoization is the order in which the sub-problems are solved. The former is a bottom-up iterative approach that starts from the base cases, whereas the latter is a top-down recursive approach.


« 上一篇: 什么是分片?