1. Overview
In this tutorial, we’ll discuss the differences between Set and List in Java with the help of a simple example. Also, we’ll compare the two data structures in terms of performance and memory allocation.
2. Conceptual Difference
Both List and Set are members of Java Collections. However, there are a few important differences:
- A List can contain duplicates, but a Set can’t
- A List will preserve the order of insertion, but a Set may or may not
- Since insertion order may not be maintained in a Set, it doesn’t allow index-based access as in the List
Please note that there are a few implementations of the Set interface which maintain order, for example, LinkedHashSet.
3. Code Example
3.1. Allowing Duplicates
Adding a duplicate item is allowed for a List. However, it isn’t for a Set:
@Test
public void givenList_whenDuplicates_thenAllowed(){
List<Integer> integerList = new ArrayList<>();
integerList.add(2);
integerList.add(3);
integerList.add(4);
integerList.add(4);
assertEquals(integerList.size(), 4);
}
@Test
public void givenSet_whenDuplicates_thenNotAllowed(){
Set<Integer> integerSet = new HashSet<>();
integerSet.add(2);
integerSet.add(3);
integerSet.add(4);
integerSet.add(4);
assertEquals(integerSet.size(), 3);
}
3.2. Maintaining Insertion Order
A Set maintains order depending on the implementation. For example, a HashSet is not guaranteed to preserve order, but a LinkedHashSet is. Let’s see an example of ordering with LinkedHashSet:
@Test
public void givenSet_whenOrdering_thenMayBeAllowed(){
Set<Integer> set1 = new LinkedHashSet<>();
set1.add(2);
set1.add(3);
set1.add(4);
Set<Integer> set2 = new LinkedHashSet<>();
set2.add(2);
set2.add(3);
set2.add(4);
Assert.assertArrayEquals(set1.toArray(), set2.toArray());
}
Since a Set is not guaranteed to maintain order, it can’t be indexed.
4. Performance Comparison Between List and Set
Let’s compare the performance of the List and Set data structures using the Java Microbench Harness (JMH). First, we’ll create two classes: ListAndSetAddBenchmark and ListAndSetContainBenchmark. Then, we’ll measure the execution time for add() and contains() methods for the List and Set data structures.
4.1. JMH Parameters
We’ll execute the benchmark tests with the following parameters:
@BenchmarkMode(Mode.SingleShotTime)
@Warmup(iterations = 3, time = 10, timeUnit = TimeUnit.MILLISECONDS)
@Measurement(iterations = 3, time = 10, timeUnit = TimeUnit.MILLISECONDS)
public class ListAndSetAddBenchmark {
}
In the class above, we specify the mode of the benchmark. The @BenchmarkMode(Mode.SingleShotTime) annotation sets the mode in which the benchmark is to be run. In our example, the mode is SingleShotTime, which means that the benchmark will run once and measures the time it takes to execute.
The @Warmup annotation specifies the number of iterations and the time to run each iteration during the warm-up phase. In our case, the warm-up phase will consist of three iterations and each iteration will run for 10 milliseconds.
Furthermore, the @Measurement annotation specifies the number of iterations and the time to run each iteration during the measurement phase. Our example class shows that the measurement phase will consist of three iterations and each iteration will run for 10 milliseconds.
4.2. add()
First, let’s create an inner class to declare variables that the benchmark methods will use:
@State(Scope.Benchmark)
public static class Params {
public int addNumber = 10000000;
public List<Integer> arrayList = new ArrayList<>();
public List<Set> hashSet = new HashSet<>();
}
The @State annotation helps to make the class a state class. The state class holds data that’s being used by the benchmark method for computation.
Next, let’s test the add() operation for an ArrayList():
@Benchmark
public void addElementsToArrayList(Params param, Blackhole blackhole) {
param.arrayList.clear();
for (int i = 0; i < param.addNumber; i++) {
blackhole.consume(arrayList.add(i));
}
}
The method above measures the time it takes to add an element to an ArrayList. Also, the @Benchmark annotation indicates that it’s a benchmark method. The Blackhole parameter is used to consume the results of the benchmark method.
Furthermore, let’s test adding an element to a HashSet():
@Benchmark
public void addElementToHashSet(Params param, Blackhole blackhole) {
param.hashSet.clear();
for (int i = 0; i < param.addNumber; i++) {
blackhole.consume(hashSet.add(i));
}
}
Here, we measure the time it takes to add 10000000 to a HashSet. The @Benchmark annotation indicates that the method is a benchmark method. When JMH encounters the method, it generates code to measure the performance of the method.
Finally, let’s compare the test result:
Benchmark Mode Cnt Score Error Units
addElementToArrayList ss 15 0.386 ± 1.266 s/op
addElementToHashSet ss 15 0.419 ± 2.535 s/op
The result shows that adding elements to an ArrayList is faster than adding elements to a HashSet. In a scenario where we need to add elements to a collection as fast as possible, an ArrayList is more efficient.
4.3. contains()
First, let’s define an inner class to fill up the ArrayList and HashSet:
@State(Scope.Benchmark)
public static class Params {
@Param({"5000000"})
public int searchElement;
@Param({"10000000"})
public int collectionSize;
public List<Integer> arrayList;
public Set<Integer> hashSet;
@Setup(Level.Iteration)
public void setup() {
arrayList = new ArrayList<>();
hashSet = new HashSet<>();
for (int i = 0; i < collectionSize; i++) {
arrayList.add(i);
hashSet.add(i);
}
}
}
The @Param annotation specifies the parameter for the benchmark. In this case, it defines a parameter named searchElement and collectionSize with a single value. These parameters will be used to configure the benchmark.
Also, the @Setup annotation marks the method that should be executed before each iteration.
Next, let’s test contains() operation using an ArrayList:
@Benchmark
public void searchElementInArrayList(Params param, Blackhole blackhole) {
for (int i = 0; i < param.containNumber; i++) {
blackhole.consume(arrayList.contains(searchElement));
}
}
The searchElementInArrayList() method search for 5000000 in the ArrayList.
Finally, let’s implement contains() operation using a HashSet:
@Benchmark
public void searchElementInHashSet(Params param, Blackhole blackhole) {
for (int i = 0; i < param.containNumber; i++) {
blackhole.consume(hashSet.contains(searchElement));
}
}
Like the searchElementInArrayList() method, we search for 5000000 in the HashSet.
Here’s the result:
Benchmark Mode Cnt Score Error Units
searchElementInArrayList ss 15 0.014 ± 0.015 s/op
searchElementInHashSet ss 15 ≈ 10⁻⁵ s/op
The result shows that searching for an element in a HashSet is faster than searching for an element in an ArrayList. This ascertains that a HashSet is more efficient in a scenario where we want to search for an element in a collection in a fast and efficient way.
5. Memory Allocation Comparison Between List and Set
In the previous section, we saw different metrics that measure the performance of List and Set with respect to time. Let’s measure the memory allocation for the benchmark methods by specifying the gc profiler option “-prof gc” while running the benchmark.
Let’s modify the main() method and configure the JMH run options for the two benchmark classes:
public static void main(String[] args) throws RunnerException {
Options opt = new OptionsBuilder()
.include(ListAndSetAddBenchmark.class.getSimpleName())
.forks(1)
.addProfiler("gc")
.build();
new Runner(opt).run();
}
In the method above, we create a new Options object to configure the JMH. First, we use the include() method to specify the benchmark that should be run. Next, we specify the number of times the benchmark should run with the fork() method.
Furthermore, we specify the profiler to use with the addProfiler() method. In this case, we are using the gc profiler.
This configuration works for the ListAndSetAddBenchmark class. Also, we need to modify the main() method of ListAndSetContainBenchmark to add the gc profiler:
public static void main(String[] args) throws RunnerException {
Options opt = new OptionsBuilder()
.include(ListAndSetContainBenchmark.class.getSimpleName())
.forks(1)
.addProfiler("gc")
.build();
new Runner(opt).run();
}
Here’s the result of the test:
Benchmark Mode Cnt Score Error Units
addElementToArrayList:·gc.alloc.rate ss 3 172.685 ± 254.719 MB/sec
addElementToHashSet:·gc.alloc.rate ss 3 504.746 ± 1323.322 MB/sec
searchElementInArrayList:·gc.alloc.rate ss 3 248.628 ± 395.569 MB/sec
searchElementInHashSet:·gc.alloc.rate ss 3 254.192 ± 235.294 MB/sec
The result shows that for the add() operation, addElementToHashSet() has a higher gc.alloc.rate of 504.746 MB/sec compared to addElementToArrayList() with a value of 172.685 MB/sec. This suggests that HashSet is allocating more memory during execution compared to an ArrayList.
Furthermore, the result shows that HashSet allocates slightly more memory for search operations compared to an ArrayList.
The error values indicate that there are some variabilities in the result, which may depend on factors such as JVM warm-up and code optimization.
6. Conclusion
In this article, we learned the difference between a List and a Set in Java. Additionally, we saw a benchmark test to compare the performance of List and Set with respect to time and memory allocation. Depending on the use case, List and Set can be better for a specific operation.
As always, the source code for the examples is available over on GitHub.