1. Overview
In this tutorial, we’ll discuss the problem of removing duplicates from a linked list. This problem can have many versions. Thus, we’ll explain the general idea of the problem and then discuss the naive approach that solves it.
After that, we’ll improve the naive approach to get a faster solution. Also, we’ll present some of its other versions and describe how to solve them.
In the end, we’ll present a comparison between all the provided approaches.
2. Defining the Problem
Suppose we have a linked list , which has elements. We want to remove the duplicates from it to get a new linked list that has unique elements.
In other words, the resulting linked list mustn’t have any element repeated more than once.
3. Naive Approach
First of all, we’ll describe the naive approach.
Let’s take a look at its implementation:
algorithm removeDuplicatesNaive(A):
// INPUT
// A = a linked list
// OUTPUT
// Returns the linked list after removing duplicates
answer <- an empty linked list
iIter <- A.head
while iIter != null:
jIter <- answer.head
isAdded <- false
while jIter != null:
if iIter.value = jIter.value:
isAdded <- true
break
jIter <- jIter.next
if not isAdded:
answer.add(iIter.value)
iIter <- iIter.next
return answer
In this approach, we’re checking all the possible values for each of the already added elements. Hence, we call this the brute force approach.
Firstly, we define as an empty list, and we’ll use it to store the resulting linked list.
Next, we use the iterator to iterate over all possible values in . For each value, we use the iterator to iterate over all the previously taken values. We’ll check whether there is any value equal to the current one.
If so, we don’t add the element to the answer. Otherwise, it means this is the first time we’ve seen this element. Therefore, we add it to the answer.
In the end, we return the linked list that doesn’t have any duplicate.
The complexity of the naive approach is , where is the size of the linked list. The reason is that, in the worst-case, all elements are different. Thus, we’ll have to iterate over all taken elements for each .
However, in the special case, when all elements are equal, the complexity is reduced to thanks to the break statement that stops the check loop after finding a matching element.
4. Faster Approach
Let’s try to improve our naive approach.
4.1. General Idea
In the naive approach, we iterated over all previously taken values to check whether there are any duplicates. However, let’s think about how to do this check faster.
In other words, we need a data structure to hold the taken elements. Next, we’ll use this data structure to check whether we have duplicates or not.
For example, the TreeSet data structure is the right choice because we can add and search for elements in it and check if it contains an element in time.
4.2. Implementation
Take a look at the implementation of the faster approach:
algorithm removeDuplicatesFaster(A):
// INPUT
// A = a linked list
// OUTPUT
// Returns the linked list after removing duplicates
answer <- an empty linked list
set <- an empty set
iIter <- A.head
while iIter != null:
if not set.contains(iIter.value):
answer.add(iIter.value)
set.add(iIter.value)
iIter <- iIter.next
return answer
Similarly to the naive approach, we define as an empty list that will hold the final answer. Also, we define as a new empty TreeSet that will hold the added elements to check for duplicates later.
Same as before, we’ll use the iterator to iterate over all possible values in . However, in this approach, we’re using the data structure to check whether there are any previously added values that equal to the current element.
If we don’t find the element inside the data structure, then we should add it to the list. Also, we need to add it to the so that it won’t be added if presented later.
Finally, we return the linked list that doesn’t have any duplicate.
The complexity of the faster approach is , where is the size of the linked list.
The reason is that, in the worst-case, when all elements are different, we’ll need to add all the elements to . Also, we’d need to perform a check for all values inside the list .
However, in the special case, when all elements are equal, the complexity is reduced to . This is because the complexity of the is affected by the number of different elements added to it. In this case, it’d be just one element.
5. Small Integer Values Approach
In this special case, we have a linked list named that contains elements. Each element, has an integer value in the range .
In a particular case, when is relatively small, we can solve this problem using the following approach:
- Define a new boolean array with size . Let’s call it and initialize all values in this array with .
- If we want to check whether value is added before, we need only to check if is .
- When we add the value to , we have to assign to .
Take a look at what the implementation looks like:
algorithm removeDuplicatesSmallIntegers(A):
// INPUT
// A = a linked list
// OUTPUT
// Returns the linked list after removing duplicates
answer <- an empty linked list
isAdded <- array of false (size based on possible integer values)
iIter <- A.head
while iIter != null:
if not isAdded[iIter.value]:
answer.add(iIter.value)
isAdded[iIter.value] <- true
iIter <- iIter.next
return answer
Firstly, let’s define as an empty list that will hold the resulting answer. Also, let’s define as the array used to check the existence of a certain element. The initial value of this array should be .
Secondly, we’ll iterate over all elements in . For each element, we’ll check whether we added the current value before. To do this, we use the array. If we didn’t add this value before, we should add it and assign to its location in .
Finally, we return the linked list.
Note that, in case we have negative values, such that the element’s range is , we can still use this approach after shifting all values by . Hence, the array’s range will and the position of element will be .
The complexity of this approach is , where is the size of the linked list, and is the maximum value of the elements.
6. Sorted Linked List Approach
Suppose the linked list was sorted. Let’s take advantage of the sort to get an even faster solution.
6.1. General Idea
When the linked list is sorted, all equal elements will be adjacent to each other. So, when we’re checking for duplicates, we can suffice by checking the last added element to the list .
6.2. Implementation
Let’s take a look at the implementation of the sorted linked list approach:
algorithm removeDuplicatesSorted(A):
// INPUT
// A = a sorted linked list
// OUTPUT
// Returns the linked list after removing duplicates
answer <- an empty linked list
iIter <- A.head
while iIter != null:
if answer.isEmpty() or answer.getLast() != iIter.value:
answer.add(iIter.value)
iIter <- iIter.next
return answer
At first, we defined as an empty list, similar to before. Then, we start iterating over all the elements and checking whether the current value was added before.
To do this, we compare the current value with the last element added to the list . Also, we pay attention to the case when we’re at the first element inside the list . In this case, the list will be empty.
As a result, we add the value to if we haven’t added any elements before, or if it’s not equal to the last added element.
The complexity of this approach is , where is the size of the linked list.
7. Comparison
Take a look at the differences between previous approaches:
Naive Approach
Faster Approach
Small Integer Values Approach
Sorted Linked List Approach
Worst Time Complexity
Best Time Complexity
Memory Complexity
Use Cases
General approach
General approach
Only when the values are relatively small
Only for sorted linked lists
As we can see, the naive approach is simpler to implement and easier to understand. But, when considering the time complexity, the faster approach generally has a lower complexity.
However, in the special case when the list is sorted, it’s preferred to use the sorted linked list approach because it has a better complexity.
Also, in the case when we have small integer values inside the list, it’s better to use the small integers approach.
8. Conclusion
In this tutorial, we explained the problem of removing duplicates from a linked list.
Firstly, we presented the naive approach and improved it to obtain a faster approach. Then, we discussed two special cases and showed how to solve the problem in these cases.
Finally, we presented a summary comparison between all the presented approaches and showed when to use each one.