1. Introduction
In this tutorial, we’ll show how to find the pair of 2D points with the minimal Manhattan distance.
2. The Minimal Manhattan Distance
Let’s say that we have two-dimensional points
. We want to find the two closest per the Manhattan distance. It measures the length of the shortest rectilinear path between two points that contains only vertical and horizontal segments:
So, the formula for computing the Manhattan distance is:
(1)
3. Algorithm
We can follow the brute-force approach and iterate over all pairs of points, calculating the distance between each two and returning the closest pair. However, that algorithm’s time complexity is quadratic.
Instead, we could design a more efficient algorithm using the divide-and-conquer strategy. Finding the closest pair is a fundamental problem of computational geometry, so it’s necessary to have a fast method for solving it.
3.1. The Divide-and-Conquer Approach
Let’s draw a line through to split it into the left and right halves,
and
:
Now, the closest pair in the whole is either the closest pair in
, the closest pair in
, or the closest pair whose one point is in
and the other in
:
Therefore, to find the closest pair in , we first need to determine the closest pair in its left and right halves. That’s the step where we break the original problem into two smaller sub-problems. Let’s say that
is the minimal distance in
and
is the minimal distance in
. With
, the closest two points in
are at a distance shorter than
only if they reside in the
-wide strip along the dividing line:
So, the combination step consists of finding the strip’s minimal distance and comparing it to
.
That gives a recursive algorithm. As the base case, we’ll take that contains three points and solve it by inspecting all three pairs, but there are other choices.
3.2. The Worst-Case Running Time Analysis
Let be the worst-case running time for the input with
points, and let
denote the cost of finding
and dividing
into
and
. Then, we have the following recurrence:
(2)
Scanning through and
to remove all the points not in the
-wide strip takes
time, so
. In the worst case, all the pairs
can be in the strip, so an enumerative approach performs
steps and the recurrence becomes:
Using the Master Theorem, we get . But, that isn’t any better than the inefficient brute-force algorithm, and we didn’t even consider the cost of splitting
. Can we do anything to make the algorithm faster? Luckily, the points in the strip have a special structure that will enable us to find
quickly. Indeed, if we implemented the algorithm so that
, the recurrence (2) would evaluate to an
solution.
3.3. The Vertical Manhattan Strip Has a Special Structure
Let’s assume that the closest two points in
are in the strip (
). Because
, we can be sure that vertically, they can’t be more than
apart:
How many other points from and
can be in the
rectangle containing
and
? Since all the points
are at distances
, at most five points from
can reside in the left half of the rectangle, and the same goes for
:
So, if we sorted the points in the strip by the -coordinates, the indices of
and
would differ at most by 9. Therefore, for each point in the strip, we need to check only the nine that follow it in the
-sorted array. As a result, we can find
in a linear time. But, if we sort the points in each recursive call, we’ll have
, and we need
to be linear in
for the whole algorithm to be log-linear.
3.4. Sorting
We achieve that by presorting. If we sorted by the
-axis beforehand, we could split the points into the
-sorted
and
in each call in a linear time. Let’s denote the
-sorted copy of
as
.
Since we partition by the points’
-coordinates, we should also keep a copy of
sorted by
(let’s denote it as
). Then, we can use the middle copy’s middle index to split
into the left and right halves in
time. Before making the recursive calls, we need to split
and
into their
and
parts just as we split
into
and
. We do that in linear time by reversing the merge step of Merge sort:
algorithm SplitSortedArrayIntoTwoSortedArrays(A):
// INPUT
// A = [a1, a2, ..., an] - a copy of Z sorted by x or y where ai = (xi, yi)
// OUTPUT
// A_P = the points of A that belong to P, keeping the relative order they have in A
// A_Q = the points of A that belong to Q, keeping the relative order they have in A
A_P, A_Q <- initialize two empty arrays
for i <- 1 to n:
if a[i] in P:
Append a[i] to A_P
else:
Append a[i] to A_Q
return A_P, A_Q
Presorting can be done in the worst-case time by Merge sort. With
, our algorithm also takes
time, so the total complexity doesn’t suffer by presorting.
3.5. The Pseudocode
Finally, here’s the pseudocode of the whole algorithm:
algorithm FindClosest(Z, X, Y):
// INPUT
// Z = a collection of n two-dimensional points (z_i = (x_i, y_i))
// X = Z sorted by the x-coordinates
// Y = Z sorted by the y-coordinates
// OUTPUT
// delta, (z_min`, z_min``) = the pair of points from Z with the minimal Manhattan distance δ
if n in {0, 1}:
// No pair exists
return infinity, (NULL, NULL)
if n <= 3:
// Find the closest two points z_min` and z_min``, and compute their Manhattan distance δ
delta, (z_min`, z_min``) <- manually find the closest points and compute the distance
return delta, (z_min`, z_min``)
P, Q <- split Z into the left and right halves
X_P, X_Q, Y_P, Y_Q <- split X and Y into their P and Q parts
delta_P, (z_P`, z_P``) <- FindClosest(P, X_P, Y_P)
delta_Q, (z_Q`, z_Q``) <- FindClosest(Q, X_Q, Y_Q)
delta <- min{delta_P, delta_Q}
if dekta = delta_P:
(z_min`, z_min``) <- (z_P`, z_P``)
else:
(z_min`, z_min``) <- (z_Q`, z_Q``)
[z_i1, z_i2, ..., z_im] <- remove from Y the points out of the middle strip
for j <- 1 to m-1:
for k <- j+1 to min{j+9, m}:
d <- compute the Manhattan distance between z_ij and z_ik
if d < delta:
delta <- d
z_min`, z_min`` <- (z_ij, z_ik)
return delta, (z_min`, z_min``)
3.6. What If We Didn’t Use the Manhattan Distance?
The only part where we use the properties of the Manhattan distance is the analysis of the maximal number of points we can fit into a rectangle centered at the vertical line so that the distance of each two from the same half is at most
.
If we use other distance functions, the rectangle’s structure will change. For example, the Euclidean distance admits at most 8 points:
So, we wouldn’t need to check more than seven points following each one in the filtered .
4. Conclusion
In this article, we showed how to find the closest two points in a two-dimensional space endowed with the Manhattan distance.