1. Introduction
In this tutorial, we’ll implement two versions of the randomized QuickSort in Python.
2. Random Pivot Selection
Randomized pivot selection is a strategy for avoiding the worst-case complexity scenario of QuickSort in practice, which occurs if an array is already sorted and we pick the last element as a pivot in each iteration.
We can implement randomization by randomly selecting the pivot’s index or shuffling the array. In both cases, the input will consist of:
- a – the input list
- low and high – the lower and upper ends of the active slice
We’ll use Python’s random module for randomization and assume the goal is to sort the input in a non-decreasing order.
3. Random Pivot Selection
Let’s check out the solution with the random pivot index:
def quick_sort_random_index(a, low, high):
if high <= low:
return
# select the pivot randomly
pivot_index = random.randint(low, high)
pivot = a[pivot_index]
# place it at the end
# and proceed as usual
a[pivot_index], a[high] = a[high], a[pivot_index]
# partition the input
split_index = low
for i in range(low, high):
if a[i] <= a[high]:
a[split_index], a[i] = a[i], a[split_index]
split_index = split_index + 1
# place the pivot between the lower and greater elements
a[split_index], a[high] = a[high], a[split_index]
# sort the left and right parts
quick_sort_random_index(a, low, split_index - 1)
quick_sort_random_index(a, split_index + 1, high)
The key step is pivot_index = random.randint(low, high). Here, we randomly select the index of the element we’ll use as the pivot. The index is one of low, low + 1, …, high.
After selection, the only thing to do is swap it with the last element and proceed with partitioning as in the non-randomized formulation of QuickSort.
4. Shuffling
Another way is to shuffle the slice a[low], a[low + 1], …, a[high] before taking a[high] as the pivot:
def shuffle_slice(a, start, end):
for i in range(end, start - 1, -1):
j = random.randint(i, end)
a[i], a[j] = a[j], a[i]
def quick_sort_random_shuffle(a, low, high):
if high <= low:
return
# shuffle the active slice
shuffle_slice(a, low, high)
# and proceed with partitioning
split_index = low
for i in range(low, high):
if a[i] <= a[high]:
a[split_index], a[i] = a[i], a[split_index]
split_index = split_index + 1
# place the pivot between the lower and greater elements
a[split_index], a[high] = a[high], a[split_index]
# sort the left and right parts
quick_sort_random_index(a, low, split_index - 1)
quick_sort_random_index(a, split_index + 1, high)
The step shuffle_slice(a, low, high) replaces a[low], a[low + 1], …, a[high] with its random permutation. So, in addition to randomly selecting the pivot, we also randomize the order of the elements before partitioning.
5. Example
Both approaches sort the input in place, which saves memory. We call the functions with low set to 0 and high to the input’s last index:
a = [1, 2, 5, 1 ,2, 3, 6, 3, 7, 9, 0, 1]
b = list(a)
print('Before sorting:', a)
quick_sort_random_index(a, 0, len(a) - 1)
print('After sorting (random index):', a)
quick_sort_random_shuffle(b, 0, len(a) - 1)
print('After sorting (random shuffle):', a)
Here’s the output showing that both methods work:
Before sorting: [1, 2, 5, 1, 2, 3, 6, 3, 7, 9, 0, 1]
After sorting (random index): [0, 1, 1, 1, 2, 2, 3, 3, 5, 6, 7, 9]
After sorting (random shuffle): [0, 1, 1, 1, 2, 2, 3, 3, 5, 6, 7, 9]
6. Discussion
Let’s check if these approaches reduce the expected complexity. We can calculate the average number of swaps over several runs of quick_sort_random_index and quick_sort_random_shuffle.
We ran the functions on a worst-case input array with 500 elements: [500, 499, …, 2, 1]. The average swap count of the random-index strategy over 100 runs was 3085.49, with a standard deviation of 319.36. The random-shuffle approach made 3047.26 swaps on average, with a standard deviation of 283.49. In contrast, QuickSort’s deterministic formulation does 62749 swaps for this input.
The random-shuffle approach will work slower on large input arrays than the random-index method because the former shuffles the entire slice in each iteration. However, both approaches will exceed the recursion stack capacity for very large inputs.
7. Conclusion
In this article, we showed how to implement two randomized versions of QuickSort. The random-index approach randomly selects the pivot element, while the random-shuffle approach shuffles the entire input slice in each iteration.
Both approaches can reduce the number of swaps for the worst-case input compared to the deterministic QuickSort. However, the random-shuffle strategy is slower, and if implemented recursively, the randomized versions can exhaust the recursion stack.