1. Introduction
In this tutorial, we’ll show how to find the pair of 2D points with the minimal Manhattan distance.
2. The Minimal Manhattan Distance
Let’s say that we have two-dimensional points . We want to find the two closest per the Manhattan distance. It measures the length of the shortest rectilinear path between two points that contains only vertical and horizontal segments:
So, the formula for computing the Manhattan distance is:
(1)
3. Algorithm
We can follow the brute-force approach and iterate over all pairs of points, calculating the distance between each two and returning the closest pair. However, that algorithm’s time complexity is quadratic.
Instead, we could design a more efficient algorithm using the divide-and-conquer strategy. Finding the closest pair is a fundamental problem of computational geometry, so it’s necessary to have a fast method for solving it.
3.1. The Divide-and-Conquer Approach
Let’s draw a line through to split it into the left and right halves, and :
Now, the closest pair in the whole is either the closest pair in , the closest pair in , or the closest pair whose one point is in and the other in :
Therefore, to find the closest pair in , we first need to determine the closest pair in its left and right halves. That’s the step where we break the original problem into two smaller sub-problems. Let’s say that is the minimal distance in and is the minimal distance in . With , the closest two points in are at a distance shorter than only if they reside in the -wide strip along the dividing line:
So, the combination step consists of finding the strip’s minimal distance and comparing it to .
That gives a recursive algorithm. As the base case, we’ll take that contains three points and solve it by inspecting all three pairs, but there are other choices.
3.2. The Worst-Case Running Time Analysis
Let be the worst-case running time for the input with points, and let denote the cost of finding and dividing into and . Then, we have the following recurrence:
(2)
Scanning through and to remove all the points not in the -wide strip takes time, so . In the worst case, all the pairs can be in the strip, so an enumerative approach performs steps and the recurrence becomes:
Using the Master Theorem, we get . But, that isn’t any better than the inefficient brute-force algorithm, and we didn’t even consider the cost of splitting . Can we do anything to make the algorithm faster? Luckily, the points in the strip have a special structure that will enable us to find quickly. Indeed, if we implemented the algorithm so that , the recurrence (2) would evaluate to an solution.
3.3. The Vertical Manhattan Strip Has a Special Structure
Let’s assume that the closest two points in are in the strip (). Because , we can be sure that vertically, they can’t be more than apart:
How many other points from and can be in the rectangle containing and ? Since all the points are at distances , at most five points from can reside in the left half of the rectangle, and the same goes for :
So, if we sorted the points in the strip by the -coordinates, the indices of and would differ at most by 9. Therefore, for each point in the strip, we need to check only the nine that follow it in the -sorted array. As a result, we can find in a linear time. But, if we sort the points in each recursive call, we’ll have , and we need to be linear in for the whole algorithm to be log-linear.
3.4. Sorting
We achieve that by presorting. If we sorted by the -axis beforehand, we could split the points into the -sorted and in each call in a linear time. Let’s denote the -sorted copy of as .
Since we partition by the points’ -coordinates, we should also keep a copy of sorted by (let’s denote it as ). Then, we can use the middle copy’s middle index to split into the left and right halves in time. Before making the recursive calls, we need to split and into their and parts just as we split into and . We do that in linear time by reversing the merge step of Merge sort:
algorithm SplitSortedArrayIntoTwoSortedArrays(A):
// INPUT
// A = [a1, a2, ..., an] - a copy of Z sorted by x or y where ai = (xi, yi)
// OUTPUT
// A_P = the points of A that belong to P, keeping the relative order they have in A
// A_Q = the points of A that belong to Q, keeping the relative order they have in A
A_P, A_Q <- initialize two empty arrays
for i <- 1 to n:
if a[i] in P:
Append a[i] to A_P
else:
Append a[i] to A_Q
return A_P, A_Q
Presorting can be done in the worst-case time by Merge sort. With , our algorithm also takes time, so the total complexity doesn’t suffer by presorting.
3.5. The Pseudocode
Finally, here’s the pseudocode of the whole algorithm:
algorithm FindClosest(Z, X, Y):
// INPUT
// Z = a collection of n two-dimensional points (z_i = (x_i, y_i))
// X = Z sorted by the x-coordinates
// Y = Z sorted by the y-coordinates
// OUTPUT
// delta, (z_min`, z_min``) = the pair of points from Z with the minimal Manhattan distance δ
if n in {0, 1}:
// No pair exists
return infinity, (NULL, NULL)
if n <= 3:
// Find the closest two points z_min` and z_min``, and compute their Manhattan distance δ
delta, (z_min`, z_min``) <- manually find the closest points and compute the distance
return delta, (z_min`, z_min``)
P, Q <- split Z into the left and right halves
X_P, X_Q, Y_P, Y_Q <- split X and Y into their P and Q parts
delta_P, (z_P`, z_P``) <- FindClosest(P, X_P, Y_P)
delta_Q, (z_Q`, z_Q``) <- FindClosest(Q, X_Q, Y_Q)
delta <- min{delta_P, delta_Q}
if dekta = delta_P:
(z_min`, z_min``) <- (z_P`, z_P``)
else:
(z_min`, z_min``) <- (z_Q`, z_Q``)
[z_i1, z_i2, ..., z_im] <- remove from Y the points out of the middle strip
for j <- 1 to m-1:
for k <- j+1 to min{j+9, m}:
d <- compute the Manhattan distance between z_ij and z_ik
if d < delta:
delta <- d
z_min`, z_min`` <- (z_ij, z_ik)
return delta, (z_min`, z_min``)
3.6. What If We Didn’t Use the Manhattan Distance?
The only part where we use the properties of the Manhattan distance is the analysis of the maximal number of points we can fit into a rectangle centered at the vertical line so that the distance of each two from the same half is at most .
If we use other distance functions, the rectangle’s structure will change. For example, the Euclidean distance admits at most 8 points:
So, we wouldn’t need to check more than seven points following each one in the filtered .
4. Conclusion
In this article, we showed how to find the closest two points in a two-dimensional space endowed with the Manhattan distance.