1. Overview

In this tutorial, we’ll discuss sorting a linked list using the merge sort algorithm.

Firstly, we’ll define the problem and provide an example that explains it.

Secondly, we’ll discuss two approaches to this problem.

2. Defining the Problem

Suppose we have a linked list L consisting of multiple nodes, each node stores two values:

  1. data: which represents the value stored in the node.
  2. next: which represents a pointer to the next node that comes after the current node in the list.

Our task is to sort this linked list using the merge sort algorithm so that each node of the list has a value greater than all the previous nodes’ values.

Let’s check an example for better understanding. Suppose we have the following linked list L:

LinkedList 1

After sorting the given linked list, we’ll get the following linked list:

LinkedList 2

As we can see, each node has a value greater than all the previous nodes’ values.

3. Recursive Approach

3.1. Main Idea

In this approach, we’ll split the given linked list into two halves by making the next value of the middle node of the list equal to NULL, then we’ll call our recursive sort function on each half separately, at the end of each call we’ll merge the two sorted halves to get our linked list sorted.

3.2. Implementation

We’re going to divide our implementation into three functions. First, we’ll implement a function that returns the middle element in a list. Next, we’ll see who to merge two sorted lists. Finally, we’ll explain the complete algorithm.

3.3. Get Middle of Linked List

Let’s take a look at the implementation:

algorithm Get_Middle(head):
    // INPUT
    //    head = the first node of the linked list
    // OUTPUT
    //    the middle node of the linked list
    slow <- head
    fast <- head
    while fast != null and fast.next != null:
        slow <- slow.next
        fast <- fast.next.next
    
    return slow

The Get \_ Middle function will return the middle node of a linked list.

We defined two pointers, slow and fast. Each time, the slow pointer moves one step forward, and the fast one moves two steps forward. So, the fast pointer will move twice the distance that the slow pointer did, which means when the fast pointer reaches the end of the linked list, the slow one will be located at the middle of the linked list.

3.4. Merge Two Sorted Linked Lists

Let’s take a look at the implementation:

algorithm Merge(L1, L2):
    // INPUT
    //    L1 = the first sorted linked list
    //    L2 = the second sorted linked list
    // OUTPUT
    //    A new linked list that merges L1 and L2 in sorted order
    
    new_list <- null
    tail <- null
    
    while L1 != null and L2 != null:
        next_node <- null
        
        if L1.data <= L2.data:
            next_node <- L1
            L1 <- L1.next
        else:
            next_node <- L2
            L2 <- L2.next
        
        if tail != null:
            tail.next <- next_node
        else:
            newList <- next_node
            tail <- next_node
        
        tail <- tail.next
    
    if L1 != null:
        tail.next <- L1
    else:
        tail.next <- L2
    
    return new_list

The Merge function will merge two sorted lists into a single sorted linked list.

First, we declared a linked list new \_ list and tail, which is a pointer to the end of the new \_ list.

Next, while the current pointers L_1 and L_2 not equal to NULL we have to maintain two situations:

  1. \boldsymbol{ L_1. data \leq L_2. data}: which means the current node of the L_1 list should go before the node of the L_2 list because it has a value smaller than or equal to the value of L_2 node. Therefore, the next \_ node should be the current L_1 node, and we move L_1 pointer to the next node.
  2. \boldsymbol{ L_1. data > L_2. data}: then the next \_ node should be the current L_2 node, and we move L_2 pointer to the next node.

After that, we’ll append the next \_ node to the tail of the new \_ list.

When one of the pointers reaches the end of its list, we break out of the loop. Then, we check if one of the pointers still not equal to NULL, we append the rest of its list to the end of new \_ list.

Finally, we’ll return the new \_ list, which is the head pointer of our merged list.

3.5. Merge Sort a Linked List

Let’s take a look at the implementation:

algorithm Merge_Sort(L):
    // INPUT
    //    L = the linked list to sort
    // OUTPUT
    //    The sorted linked list
    if head = null or head.next = null:
        return head
    middle <- Get_Middle(head)
    left_list <- head
    right_list <- middle.next
    middle.next <- null
    left_list <- MergeSort(left_list)
    right_list <- MergeSort(right_list)
    return Merge(left_list, right_list)

First, we check if head equals NULL, then the list is empty. Similarly, if head. next equals NULL, which means there is a single node in the given list. In both case scenarios, the sorted version of the list would still be the same as the initial one, so we return the head.

Next, we’ll get the middle node of the list using the Get \_ middle function. Then, we’ll split the given list into two parts:

  1. left \_ list: which starts at the head pointer.
  2. right \_ list: which starts after the middle node (middle. next).

After that, we’ll make middle. next equal to NULL to cut the list from the middle node into two halves.

Eventually, we sort each part separately, then we’ll merge them to get a single sorted list.

3.6. Complexity

The time and space complexity here is \boldsymbol{ O(N.Log_2(N)) }. Let’s examine the reason behind this complexity.

First, the Get \_ Middle function has a complexity O(N) cause we iterate over each node of the list at most once, where N is the length of the list.

Second, the Merge function has a complexity O(N + M) cause we iterate over the node of each list at most once, where N is the length of the first list and M the length of the second one.

Finally, the Merge \_ Sort function has a complexity O(N.Log_2(N)) cause in each call, we merge two lists with the complexity of O(N) and the depth of recursive tree will be Log_2(N) because in each call we split our list into two halves, where N is the length of the list.

which means the total complexity is \boldsymbol{ O(N.Log_2(N)) }.

4. Iterative Approach

4.1. Main Idea

In this approach, we’ll divide the given linked list into sorted blocks of size powers of two, then we’ll merge every two consecutive blocks together.

First, we divide the list into blocks of size one, then we’ll merge every two consecutive blocks together, so we get blocks of size two sorted.

Second, we divide the list into blocks of size two, then we’ll merge every two consecutive blocks together, so we get blocks of size four sorted.

We keep doing these steps until the block size reaches the first power of two that is greater than or equal to the length of the given linked list, in that moment, our linked list will be sorted.

4.2. Implementation

We’re going to divide our implementation into two functions. The first one is responsible for merging two blocks, while the second is the complete algorithm itself.

4.3. Merge Two Blocks

Let’s take a look at the implementation:

algorithm Merge_Blocks(left, leftSize, right, rightSize, L, tail):
    // INPUT
    //    left = the start of the left block
    //    left_size = size of the left block
    //    right = the start of the right block
    //    right_size = size of the right block
    //    L = the linked list being sorted
    //    tail = the current tail of the sorted linked list
    // OUTPUT
    //    the merged blocks as part of the linked list L
    while not Empty(left) or not Empty(right):
        next_node <- null
        if Compare(left, right):
            next_node <- left
            left <- left.next
            left_size <- left_size - 1
        else:
            next_node <- right
            right <- right.next
            rightSize <- right_size - 1
        nextNode.next <- null
        if tail != null:
            tail.next <- next_node
        else:
            L <- next_node
            tail <- next_node
        tail <- tail.next

The Merge \_ Blocks function will merge two sorted blocks of a specific size. This function’s parameters are a pointer to the beginning of the first block and its size. In addition, we pass the same for the second block. Finally, we pass the linked list L and a pointer to its end.

As long as one of the blocks is not empty, it means there are still some untaken nodes in these blocks, so we perform multiple steps.

First, we declare next \_ node, which will store the node that should come after the last node in the list L.

Second, we call the Compare( left,\ right ) function which checks if the right block is empty or there still some untaken nodes in the left block. In addition, it checks whether the value of the current left node is less than or equal to the value of the current right node, which means the left node should come before the right one.

If so, we assign the left node to the next \_ node and decrease the size of the left block by one and move the left pointer to the next node. Otherwise, the next \_ node will be the right one; we decrease the size of the right block by one and move the right pointer to the next node.

Finally, we make next \_ node. next equal to NULL to cut the node out of the block, and append it to the end of the linked list L.

4.4. Merge Sort a Linked List

Let’s take a look at the implementation:

algorithm Iterative_Merge_Sort(L):
    // INPUT
    //    L = the linked list to sort
    // OUTPUT
    //    the sorted linked list
    block_size <- 1
    block_count <- infinity
    while block_count > 1:
        block_count <- 0
        left <- L
        right <- L
        L <- null
        tail <- null
        while left != null:
            block_count <- block_count + 1
            left_size <- 0
            right_size <- block_size
            while left_size < block_size and right != null:
                right <- right.next
                left_size <- left_size + 1
            Merge_Blocks(left, left_size, right, right_size, L, tail)
            left <- right
    return L

Initially, we declare block \_ size which represents the size of each block of the list after dividing it, and block \_ count which represents the number of merged blocks after merging every two consecutive blocks.

Next, as long as we didn’t reach the first power of two that is greater than or equal to the length of the given linked list, we have to do the following steps:

  1. Reset the value of block \_ count to zero.
  2. Declare left and right pointers that point to the start of the linked list initially.
  3. Clear the list L.
  4. While the left block size is smaller than block \_ size and the right pointer didn’t reach the end of the list, we move the right pointer to the next node of the list and increase the size of the left block by one. Now, the left information is at the beginning of the first block and the right pointer is at the beginning of the second block.
  5. Merge the first and the second blocks.
  6. Make left = right since the right pointer after merging points to the beginning of the block that comes after the second block.
  7. Repeat all these steps until we merge the whole list meaning that the linked list is sorted.

4.5. Complexity

Space complexity here is \boldsymbol{O(N)} since we’re modifying the same list and didn’t create any additional list. Let’s examine the reason behind this complexity.

Regarding time complexity, we keep dividing the given linked list into blocks Log_2(N) times, until we reach the first power of two that is greater than or equal to N the length of the linked list. Next, in each iteration, we merge every two consecutive blocks which the sum of their length in total equal to N.

which means the total complexity is \boldsymbol{ O(N.Log_2(N)) }.

5. Conclusion

In this tutorial, we presented the problem of sorting a linked list using the merge sort algorithm. We explained the general idea and discussed two approaches to solve it.