1. Introduction
In this tutorial, we’ll be looking at the FastUtil library.
First, we’ll code some examples of its type-specific collections.
Then, we’ll analyze the performance that gives FastUtil its name.
Finally, let’s take a peek at FastUtil‘s BigArray utilities.
2. Features
The FastUtil Java library seeks to extend the Java Collections Framework. It provides type-specific maps, sets, lists and queues with a smaller memory footprint and fast access and insertion. FastUtil also provides a set of utilities for working with and manipulating large (64-bit) arrays, sets and lists.
The library also includes a multitude of practical Input/Output classes for binary and text files.
Its latest release, FastUtil 8, also released a host of type-specific functions, extending the JDK’s Functional Interfaces.
2.1. Speed
In many cases, the FastUtil implementations are the fastest available. The authors have even provided their own in-depth benchmark report, comparing it against similar libraries include HPPC and Trove.
In this tutorial, we’ll look to define our own benchmarks using the Java Microbench Harness (JMH).
3. Full Sized Dependency
On top of the usual JUnit dependency, we’ll be using the FastUtils and JMH dependencies in this tutorial.
We’ll need the following dependencies in our pom.xml file:
<dependency>
<groupId>it.unimi.dsi</groupId>
<artifactId>fastutil</artifactId>
<version>8.2.2</version>
</dependency>
<dependency>
<groupId>org.openjdk.jmh</groupId>
<artifactId>jmh-core</artifactId>
<version>1.37</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.openjdk.jmh</groupId>
<artifactId>jmh-generator-annprocess</artifactId>
<version>1.37</version>
<scope>test</scope>
</dependency>
Or for Gradle users:
testCompile group: 'org.openjdk.jmh', name: 'jmh-core', version: '1.37'
testCompile group: 'org.openjdk.jmh', name: 'jmh-generator-annprocess', version: '1.37'
compile group: 'it.unimi.dsi', name: 'fastutil', version: '8.2.2'
3.1. Customized Jar File
Due to the lack of generics, FastUtils generates a large number of type-specific classes. And unfortunately, this leads to a huge jar file.
However, luckily for us, FastUtils includes a find-deps.sh script which allows generation of smaller, more focused jars comprising of only the classes we want to use in our application.
4. Type-Specific Collections
Before we begin, let’s take a quick peek at the simple process of instantiating a type-specific collection. Let’s pick a HashMap that stores keys and values using doubles.
For this purpose, FastUtils provides a Double2DoubleMap interface and a Double2DoubleOpenHashMap implementation:
Double2DoubleMap d2dMap = new Double2DoubleOpenHashMap();
Now that we’ve instantiated our class, we can simply populate data as we would with any Map from the Java Collections API:
d2dMap.put(2.0, 5.5);
d2dMap.put(3.0, 6.6);
Finally, we can check that the data has been added correctly:
assertEquals(5.5, d2dMap.get(2.0));
4.1. Performance
FastUtils focuses on its performant implementations. In this section, we’ll make use of the JMH to verify that fact. Let’s compare the Java Collections HashSet
First, let’s see how to implement the IntOpenHashSet:
@Param({"100", "1000", "10000", "100000"})
public int setSize;
@Benchmark
public IntSet givenFastUtilsIntSetWithInitialSizeSet_whenPopulated_checkTimeTaken() {
IntSet intSet = new IntOpenHashSet(setSize);
for(int i = 0; i < setSize; i++) {
intSet.add(i);
}
return intSet;
}
Above, we’ve simply declared the IntOpenHashSet implementation of the IntSet interface. We’ve also declared the initial size setSize with the @Param annotation.
Put simply, these numbers are fed into JMH to produce a series of benchmark tests with different set sizes.
Next, let’s do the same thing using the Java Collections implementation:
@Benchmark
public Set<Integer> givenCollectionsHashSetWithInitialSizeSet_whenPopulated_checkTimeTaken() {
Set<Integer> intSet = new HashSet<>(setSize);
for(int i = 0; i < setSize; i++) {
intSet.add(i);
}
return intSet;
}
Finally, let’s run the benchmark and compare the two implementations:
Benchmark (setSize) Mode Cnt Score Units
givenCollectionsHashSetWithInitialSizeSet... 100 avgt 2 1.460 us/op
givenCollectionsHashSetWithInitialSizeSet... 1000 avgt 2 12.740 us/op
givenCollectionsHashSetWithInitialSizeSet... 10000 avgt 2 109.803 us/op
givenCollectionsHashSetWithInitialSizeSet... 100000 avgt 2 1870.696 us/op
givenFastUtilsIntSetWithInitialSizeSet... 100 avgt 2 0.369 us/op
givenFastUtilsIntSetWithInitialSizeSet... 1000 avgt 2 2.351 us/op
givenFastUtilsIntSetWithInitialSizeSet... 10000 avgt 2 37.789 us/op
givenFastUtilsIntSetWithInitialSizeSet... 100000 avgt 2 896.467 us/op
These results make it clear the FastUtils implementation is much more performant than the Java Collections alternative.
5. Big Collections
Another important feature of FastUtils is the ability to use 64-bit arrays. Arrays in Java, by default, are limited to 32 bits.
To get started, let’s take a look at the BigArrays class for Integer types. IntBigArrays provides static methods for working with 2-dimensional Integer arrays. By using these provided methods, we can essentially wrap our array into a more user-friendly 1-dimensional array.
Let’s take a look at how this works.
First, we’ll start by initializing a 1-dimensional array, and converting it into a 2-dimensional array using IntBigArray’s wrap method:
int[] oneDArray = new int[] { 2, 1, 5, 2, 1, 7 };
int[][] twoDArray = IntBigArrays.wrap(oneDArray.clone());
We should make sure to use the clone method to ensure a deep copy of the array.
Now, as we’d do with a List or a Map, we can gain access to the elements using the get method:
int firstIndex = IntBigArrays.get(twoDArray, 0);
int lastIndex = IntBigArrays.get(twoDArray, IntBigArrays.length(twoDArray)-1);
Finally, let’s add some checks to ensure our IntBigArray returns the correct values:
assertEquals(2, firstIndex);
assertEquals(7, lastIndex);
6. Conclusion
In this article, we’ve taken a dive into FastUtils core features.
We looked at some of the type-specific collections that FastUtil offers, before playing around with some BigCollections.
As always, the code can be found over on GitHub