找出加和最接近目标数的子集

1. Overview

One of the well-known problems in computer science is finding the subset of numbers that add up the closest to a target number without exceeding it.

In this tutorial, we’ll discuss different versions of the problem, provide several solutions, and compare the solutions of each version.

2. Defining the Problem

In this problem, we’re given an array of integers of size and a target number . We need to find a subset of numbers from the array that add up as close as possible to the target number , without exceeding it.

We have two versions of this problem. The first version doesn’t specify the number of items we can choose. Hence, we can select as many items as we want, as long as their sum is as large as possible, without exceeding .

However, in the second version, the problem specifies the exact number of values we can take, let’s call it . Therefore, we need to choose precisely values from the array, and their sum must be as large as possible without exceeding as well.

We’ll discuss three solutions for each of the two versions.

3. Choosing Any Number of Elements

In the first version, we can choose any number of items that we want. The only condition is that their sum must be as large as possible without exceeding .

To do this, we have three solutions. The first solution is a backtracking solution that tries all the possible options to choose the numbers. On the other hand, the second solution is a dynamic programming approach that is based on the backtracking solution.

Finally, the third solution is a meet-in-the-middle approach that is an improvement on the backtracking solution.

3.1. Backtracking Approach

Let’s take a look at the backtracking solution for the problem when we can choose any number of items:

algorithm Backtrack(A, i, sum):
    // INPUT
    //    A = the input array
    //    i = the current index in the array A
    //    sum = the current sum of the chosen elements
    // OUTPUT
    //    the maximum sum of numbers that doesn't exceed k
    
    if sum > k:
        return 0
    if i = A.length:
        return sum
    
    pick <- Backtrack(A, i + 1, sum + A[i])
    leave <- Backtrack(A, i + 1, sum)
    
    return max(pick, leave)

algorithm Backtrack(A, k i, sum):
    // INPUT
    //    A = the input array
    //    k = the target sum
    //    i = the current index in the array A
    //    sum = the current sum of the chosen elements
    // OUTPUT
    //    the maximum sum of numbers that doesn't exceed k
    
    if sum > k:
        return 0
    if i = A.length:
        return sum
    
    pick <- Backtrack(A, k, i + 1, sum + A[i])
    leave <- Backtrack(A, k, i + 1, sum)
    
    return max(pick, leave)

The backtracking function takes as parameters the input array, the current number, and the current sum. We call it with the indices and sum set to zero.

On the other hand, If the current index becomes out of the array’s range, it means we’ve finished an entire walk over the array, and we simply return the sum we managed to achieve.

Secondly, we try both options that we can do for the current number. The first option is to pick the current number. Therefore, we make a recursive call for the next number and update the sum we have.

The second option is to leave the current item. Hence, we make a recursive call for the next one without updating the sum we have.

Each of these calls will return the best answer it managed to find. As a result, we must return the maximum between these two values.

Since from each item, we make two recursive calls, the complexity of the backtracking approach is , where is the number of items.

3.2. Dynamic Programming Approach

The dynamic programming approach is memoization over the backtracking approach. Take a look at the implementation of the dynamic programming approach:

algorithm DynamicProgramming(A, k):
    // INPUT
    //    A = the input array
    //    k = the target sum
    // OUTPUT
    //    The maximum sum of numbers that doesn't exceed k
    
    n = A.length
    for sum <- 0 to k:
        dp[n, sum] <- sum
    for i <- n - 1 to 0:
        for sum <- k to 0:
            pick <- 0
            if sum + A[i] <= k:
                pick <- dp[i + 1, sum + A[i]]
            leave <- dp[i + 1, sum]
            dp[i, sum] <- max(pick, leave)
    
    return dp[0, 0]

First of all, let’s define our array. We’ll assume that stores the best answer for the range when we have already taken sum equals to . Based on that, we can build our dynamic programming solution.

In the beginning, we’ll discuss the base case. If the current index reaches the end of the array, then we should store the sum we have.

Next, we’ll iterate over all possible indices and all the sums we can achieve. Similar to the backtracking approach, the state where we’re at the index and have a sum equal to sum has two choices.

The first choice is to pick the current number, if possible. Therefore, we get the best answer from the state ![dp[i+1, sum + A[i]]](/wp-content/ql-cache/quicklatex.com-d3c92431ed38abbff704ee84f0d71eb4_l3.svg "Rendered by QuickLaTeX.com"). The second choice is to leave the current number. Hence, we get the best answer from the state dp[i+1, sum] .

Note that the two nested for loops are set to iterate backward. The reason for this is that state depends on the value of state i+1 . Therefore, the state i+1 has to be calculated before state . The same holds for the states of sum .

From both of these options, we store the maximum between them in the current state. As a result, the final answer will be inside dp[0, 0] because it contains the answer to the range [0..n-1] with a sum equal to zero, which is the answer to the entire problem.

The complexity of the dynamic programming approach is $O(n \cdot k)$ , where is the number of elements, and is the target number.

3.3. Meet in the Middle Approach

We can come up with a new solution called the meet-in-the-middle approach that reduces the complexity of the backtracking approach. Let’s take a look at the implementation of this approach:

algorithm Generate(A, k, i, end, sum):
    // INPUT
    //    A = the input array
    //    i = the current index
    //    end = the ending index
    //    sum = the current sum
    //    k = the target sum
    // OUTPUT
    //    A list of all sums not exceeding k
    
    if i = end:
        result <- create a new list
        if sum <= k:
            result.add(sum)
        return result
    
    pick <- Generate(A, i + 1, end, sum + A[i])
    leave <- Generate(A, i + 1, end, sum)
    return merge(pick, leave)

algorithm MeetInTheMiddle(A, k, n):
    // INPUT
    //    A = the input array
    //    k = the target sum
    //    n = the size of the array
    // OUTPUT
    //    The maximum sum of numbers that doesn't exceed k
    
    first <- Generate(A, k, 0, n / 2, 0)
    second <- Generate(A, k, n / 2, n, 0)
    sort(second)
    result <- 0
    for i <- 0 to first.size - 1:
        rem <- binarySearch(second, k - first[i])
        result <- max(result, first[i] + rem)
    return result

The function generates all possible combinations of sums from the array starting at position , ending at position end and starting from the given sum .

The idea of this function is similar to the backtracking approach. For each item, we can either choose to pick or leave it. In the end, we return the sum we achieved. The function returns a list of all the sums not exceeding .

Next, we call this function to generate all the possible sums of the first half of the array [0..n/2] and the second half of the array [n/2..n] .

After that, we sort the array of the second half.

Now, the main idea is to iterate over the sums of the first half. Suppose the current sum is A[i] . In this case, the best value needed to reach is k - A[i] . Hence, we perform a binary search operation on the second list to find the maximum number that is not larger than . We’ll assume that if none of the values is smaller than , the function returns a zero.

From all the possible answers, we choose the maximum one because we guarantee that all answers are smaller than .

The complexity of the meet-in-the-middle approach is $O(n \cdot 2^{n/2})$ , where is the number of items. The reason for this complexity is that $log(2^{n/2}) = n/2$ .

3.4. Choosing Any Number of Elements With Repetition

Another version of the problem might ask for the maximum sum of a subset of numbers that adds up to a target number with repetition. What that means is that we’re allowed to choose the same item more than once.

In this case, we need to perform one small update to the previous approaches. In the case of the backtracking and dynamic programming approaches, when we perform the pick option, instead of moving to the next number with index i+1 , we stay at the current number . Hence, we allow the repetition of the current value.

In the case of the meet-in-the-middle approach, no change is needed for the searching technique. However, we need to update the generate function. When performing the pick option, instead of moving to the next number, we stay at the current.

As a result, we allow the current value to be repeated more than once.

3.5. Comparison

Let’s provide a comparison between the discussed approaches:

Backtrack

Dynamic Programming

Meet in the Middle

Main Idea

Pick or leave each element

Memoization over the backtracking approach

Solve each half of the array. Then, merge both solutions

Repetition

With or without repetition

Complexity

O(2^n)

$O(n \cdot k)$

$O(n \cdot 2^{n/2)$

The dynamic programming may sound like the best solution with the lowest complexity. However, keep in mind that the dynamic programming solution is related to . On the other hand, the backtracking and the meet-in-the-middle approaches are not related to .

The backtracking and the meet-in-the-middle approaches are better used when the value of is considerable. On the contrary, if the value of is small, the dynamic programming approach is considered a better option.

4. Choosing a Specific Number of Elements

In the second version of the problem, we’re given and asked to choose exactly numbers whose sum is as close to as possible without exceeding it.

For this version, we have three possible solutions. These solutions are an update over the approaches in sections 3.1, 3.2, and 3.3.

4.1. Backtracking Approach

Take a look at the implementation of the backtracking approach:

algorithm BacktrackM(A, k, n, m, i, sum, taken):
    // INPUT
    //    A = the input array
    //    k = the target sum
    //    n = the size of the array
    //    m = the number of elements to choose
    //    i = the current index in the array A
    //    sum = the current sum of the chosen elements
    //    taken = the number of elements chosen so far
    // OUTPUT
    //    The maximum sum of m numbers that doesn't exceed k
    
    if sum > k or taken > m:
        return 0
    if i = n:
        if taken = m:
            return sum
        else:
            return 0
    
    pick <- BacktrackM(A, k, n, m, i + 1, sum + A[i], taken + 1)
    leave <- BacktrackM(A, k, n, m, i + 1, sum, taken)
    return max(pick, leave)

This backtracking approach is similar to the one provided in section 3.1. The only difference is that now, we also need to pass taken , which represents the number of items taken so far.

In the beginning, we discuss the stop conditions. If the sum exceeds , or the number of items taken exceeds , then we return a zero indicating an invalid answer.

In contrast, if we reach the end of the array, we check the value of taken . If we managed to take numbers through our walk over the array, then we return the sum we achieved. On the other hand, if the number of items taken is less than , then we return a zero.

After that, we have two options. Either we pick the current number and perform a recursive call for the next, after adding the current one to the sum and incrementing the number of numbers taken by one. The other choice is to leave the current element and perform a recursive call for the next one without changing the value of sum and taken .

From both these options, we return the maximum value we achieve.

The complexity of the backtracking approach is , where is the number of items.

4.2. Dynamic Programming Approach

The dynamic programming approach is also similar to the one provided in section 3.2. Let’s take a look at its implementation:

algorithm DynamicProgrammingM(A, k, n, m):
    // INPUT
    //    A = the input array
    //    k = the target sum
    //    n = the size of the array
    //    m = the number of elements to choose
    // OUTPUT
    //    The maximum sum of m numbers that doesn't exceed k
    
    for sum <- 0 to k:
        for taken <- 0 to m:
            if taken = m:
                dp[n, sum, taken] <- sum
            else:
                dp[n, sum, taken] <- 0
    
    for i <- n - 1 to 0:
        for sum <- k to 0:
            for taken <- m to 0:
                pick <- 0
                if sum + A[i] <= k and taken < m:
                    pick <- dp[i + 1, sum + A[i], taken + 1]
                leave <- dp[i + 1, sum, taken]
                dp[i, sum, taken] <- max(pick, leave)
    return dp[0, 0, 0]

We’ll add a new dimension to the array that represents the number of items taken so far. Therefore, stores the best answer for numbers in range with the already taken sum equal to and the number of values taken so far is .

The base case in this approach is to reach the end of the array. Hence, if the number of elements taken equals , then the answer will equal the sum accumulated so far. Otherwise, the answer will equal zero.

After that, we iterate over all possible combinations of , sum , and taken . The first option is to pick the current value, if possible. Thus, we take the answer of the state corresponding to the next element, after adding the current one to the sum and increasing the number of items taken by one.

Similarly, the second option is to leave the current value. Hence, we take the answer of the state corresponding to the next one without changing the value of sum and taken .

In the end, we return the value dp[0, 0, 0] because it corresponds to the range [0..n-1] with the sum and number of values taken equal to zero.

The complexity of the dynamic programming approach is $O(n \cdot k \cdot m)$ , where is the number of items, is the target number, and is the number of items to choose.

4.3. Meet-in-the-Middle Approach

Let’s take a look at the meet-in-the-middle approach when choosing a specific number of elements:

algorithm GenerateM(A, k, i, end, sum, taken):
    // INPUT
    //    A = the input array
    //    k = the target sum
    //    i = the current index
    //    end = the ending index
    //    sum = the current sum
    //    taken = the number of elements chosen so far
    // OUTPUT
    //    A map of lists storing sums for each taken count
    
    if i = end:
        result <- create a new Map
        result[taken].add(sum)
        return result
    
    pick <- GenerateM(A, k, i + 1, end, sum + A[i], taken + 1)
    leave <- GenerateM(A, k, i + 1, end, sum, taken)
    
    return merge(pick, leave)

algorithm MeetInTheMiddleM(A, k, n, m):
    // INPUT
    //    A = the input array
    //    k = the target sum
    //    n = the size of the array
    //    m = the number of elements to choose
    // OUTPUT
    //    The maximum sum of m numbers that doesn't exceed k
    
    first <- GenerateM(A, k, 0, n / 2, 0, 0)
    second <- GenerateM(A, k, n / 2, n, 0, 0)
    
    for taken <- 0 to m:
        sort(second[taken])
    
    result <- 0
    for taken <- 0 to m:
        for i <- 0 to first[taken].size - 1:
            rem <- search(second[m - taken], k - first[taken][i])
            result <- max(result, first[taken][i] + rem)
    return result

The idea is similar to the ones presented in section 3.3. In this case, the generate function takes the number of items taken as well. When returning the result, the result array becomes a 2D array (or possibly a map) that stores the sum reached inside the cell corresponding to the number of chosen elements taken .

The other update to the approach in section 3.3, is that we need to iterate over all the values of taken . When searching for the k - A[i] value, we search for it inside the m - taken cell. The reason is that we’ve chosen taken items from the first half of the array. Hence, we need to choose values from the second half.

The complexity of the meet-in-the-middle approach is $O(n \cdot (m + 2^{n/2}))$ , where is the number of items and is the number of items to choose. The reason is that we’re iterating over all possible values. In the worst case, we can have $2^{n/2}$ values in total. Also, $log(2^{n/2}) = n/2$ .

4.4. Choosing a Specific Number of Elements With Repetition

In the case of allowing the repetition of numbers, the updates are similar to the ones discussed in section 3.4. The pick option in the backtracking, dynamic programming, and meet-in-the-middle approaches should stay at the current number , allowing it to repeat, rather than moving the next one.

4.5. Comparison

Take a look at the comparison between the discussed approaches: