1. Introduction

In this article, we’re going to take a look at Caffeine — a high-performance caching library for Java.

One fundamental difference between a cache and a Map is that a cache evicts stored items.

An eviction policy decides which objects should be deleted at any given time. This policy directly affects the cache’s hit rate — a crucial characteristic of caching libraries.

Caffeine uses the Window TinyLfu eviction policy, which provides a near-optimal hit rate.

2. Dependency

We need to add the caffeine dependency to our pom.xml:

<dependency>
    <groupId>com.github.ben-manes.caffeine</groupId>
    <artifactId>caffeine</artifactId>
    <version>3.1.8</version>
</dependency>

You can find the latest version of caffeine on Maven Central.

3. Populating Cache

Let’s focus on Caffeine’s three strategies for cache population: manual, synchronous loading, and asynchronous loading.

First, let’s write a class for the types of values that we’ll store in our cache:

class DataObject {
    private final String data;

    private static int objectCounter = 0;
    // standard constructors/getters
    
    public static DataObject get(String data) {
        objectCounter++;
        return new DataObject(data);
    }
}

3.1. Manual Populating

In this strategy, we manually put values into the cache and retrieve them later.

Let’s initialize our cache:

Cache<String, DataObject> cache = Caffeine.newBuilder()
  .expireAfterWrite(1, TimeUnit.MINUTES)
  .maximumSize(100)
  .build();

Now, we can get some value from the cache using the getIfPresent method. This method will return null if the value is not present in the cache:

String key = "A";
DataObject dataObject = cache.getIfPresent(key);

assertNull(dataObject);

We can populate the cache manually using the put method:

cache.put(key, dataObject);
dataObject = cache.getIfPresent(key);

assertNotNull(dataObject);

We can also get the value using the get method, which takes a Function along with a key as an argument. This function will be used for providing the fallback value if the key is not present in the cache, which would be inserted in the cache after computation:

dataObject = cache
  .get(key, k -> DataObject.get("Data for A"));

assertNotNull(dataObject);
assertEquals("Data for A", dataObject.getData());

The get method performs the computation atomically. This means that the computation will be made only once — even if several threads ask for the value simultaneously. That’s why using get is preferable to getIfPresent.

Sometimes we need to invalidate some cached values manually:

cache.invalidate(key);
dataObject = cache.getIfPresent(key);

assertNull(dataObject);

3.2. Synchronous Loading

This method of loading the cache takes a Function, which is used for initializing values, similar to the get method of the manual strategy. Let’s see how we can use that.

First of all, we need to initialize our cache:

LoadingCache<String, DataObject> cache = Caffeine.newBuilder()
  .maximumSize(100)
  .expireAfterWrite(1, TimeUnit.MINUTES)
  .build(k -> DataObject.get("Data for " + k));

Now we can retrieve the values using the get method:

DataObject dataObject = cache.get(key);

assertNotNull(dataObject);
assertEquals("Data for " + key, dataObject.getData());

We can also get a set of values using the getAll method:

Map<String, DataObject> dataObjectMap 
  = cache.getAll(Arrays.asList("A", "B", "C"));

assertEquals(3, dataObjectMap.size());

Values are retrieved from the underlying back-end initialization Function that was passed to the build method. This makes it possible to use the cache as the main facade for accessing values.

3.3. Asynchronous Loading

This strategy works the same as the previous but performs operations asynchronously and returns a CompletableFuture holding the actual value:

AsyncLoadingCache<String, DataObject> cache = Caffeine.newBuilder()
  .maximumSize(100)
  .expireAfterWrite(1, TimeUnit.MINUTES)
  .buildAsync(k -> DataObject.get("Data for " + k));

We can use the get and getAll methods, in the same manner, taking into account the fact that they return CompletableFuture:

String key = "A";

cache.get(key).thenAccept(dataObject -> {
    assertNotNull(dataObject);
    assertEquals("Data for " + key, dataObject.getData());
});

cache.getAll(Arrays.asList("A", "B", "C"))
  .thenAccept(dataObjectMap -> assertEquals(3, dataObjectMap.size()));

CompletableFuture has a rich and useful API, which you can read more about in this article.

4. Eviction of Values

Caffeine has three strategies for value eviction: size-based, time-based, and reference-based.

4.1. Size-Based Eviction

This type of eviction assumes that eviction occurs when the configured size limit of the cache is exceeded. There are two ways of getting the size — counting objects in the cache, or getting their weights.

Let’s see how we could count objects in the cache. When the cache is initialized, its size is equal to zero:

LoadingCache<String, DataObject> cache = Caffeine.newBuilder()
  .maximumSize(1)
  .build(k -> DataObject.get("Data for " + k));

assertEquals(0, cache.estimatedSize());

When we add a value, the size obviously increases:

cache.get("A");

assertEquals(1, cache.estimatedSize());

We can add the second value to the cache, which leads to the removal of the first value:

cache.get("B");
cache.cleanUp();

assertEquals(1, cache.estimatedSize());

It is worth mention that we call the cleanUp method before getting the cache size. This is because the cache eviction is executed asynchronously, and this method helps to await the completion of the eviction.

We can also pass a weigher Function to get the size of the cache:

LoadingCache<String, DataObject> cache = Caffeine.newBuilder()
  .maximumWeight(10)
  .weigher((k,v) -> 5)
  .build(k -> DataObject.get("Data for " + k));

assertEquals(0, cache.estimatedSize());

cache.get("A");
assertEquals(1, cache.estimatedSize());

cache.get("B");
assertEquals(2, cache.estimatedSize());

The values are removed from the cache when the weight is over 10:

cache.get("C");
cache.cleanUp();

assertEquals(2, cache.estimatedSize());

4.2. Time-Based Eviction

This eviction strategy is based on the expiration time of the entry and has three types:

  • Expire after access — entry is expired after period is passed since the last read or write occurs
  • Expire after write — entry is expired after period is passed since the last write occurs
  • Custom policy — an expiration time is calculated for each entry individually by the Expiry implementation

Let’s configure the expire-after-access strategy using the expireAfterAccess method:

LoadingCache<String, DataObject> cache = Caffeine.newBuilder()
  .expireAfterAccess(5, TimeUnit.MINUTES)
  .build(k -> DataObject.get("Data for " + k));

To configure expire-after-write strategy, we use the expireAfterWrite method:

cache = Caffeine.newBuilder()
  .expireAfterWrite(10, TimeUnit.SECONDS)
  .weakKeys()
  .weakValues()
  .build(k -> DataObject.get("Data for " + k));

To initialize a custom policy, we need to implement the Expiry interface:

cache = Caffeine.newBuilder().expireAfter(new Expiry<String, DataObject>() {
    @Override
    public long expireAfterCreate(
      String key, DataObject value, long currentTime) {
        return value.getData().length() * 1000;
    }
    @Override
    public long expireAfterUpdate(
      String key, DataObject value, long currentTime, long currentDuration) {
        return currentDuration;
    }
    @Override
    public long expireAfterRead(
      String key, DataObject value, long currentTime, long currentDuration) {
        return currentDuration;
    }
}).build(k -> DataObject.get("Data for " + k));

4.3. Reference-Based Eviction

We can configure our cache to allow garbage-collection of cache keys and/or values. To do this, we’d configure usage of the WeakRefence for both keys and values, and we can configure the SoftReference for garbage-collection of values only.

The WeakRefence usage allows garbage-collection of objects when there are not any strong references to the object. SoftReference allows objects to be garbage-collected based on the global Least-Recently-Used strategy of the JVM. More details about references in Java can be found here.

We should use Caffeine.weakKeys(), Caffeine.weakValues(), and Caffeine.softValues() to enable each option:

LoadingCache<String, DataObject> cache = Caffeine.newBuilder()
  .expireAfterWrite(10, TimeUnit.SECONDS)
  .weakKeys()
  .weakValues()
  .build(k -> DataObject.get("Data for " + k));

cache = Caffeine.newBuilder()
  .expireAfterWrite(10, TimeUnit.SECONDS)
  .softValues()
  .build(k -> DataObject.get("Data for " + k));

5. Refreshing

It’s possible to configure the cache to refresh entries after a defined period automatically. Let’s see how to do this using the refreshAfterWrite method:

Caffeine.newBuilder()
  .refreshAfterWrite(1, TimeUnit.MINUTES)
  .build(k -> DataObject.get("Data for " + k));

Here we should understand a difference between expireAfter and refreshAfter. When the expired entry is requested, an execution blocks until the new value would have been calculated by the build Function.

But if the entry is eligible for the refreshing, then the cache would return an old value and asynchronously reload the value.

6. Statistics

Caffeine has a means of recording statistics about cache usage:

LoadingCache<String, DataObject> cache = Caffeine.newBuilder()
  .maximumSize(100)
  .recordStats()
  .build(k -> DataObject.get("Data for " + k));
cache.get("A");
cache.get("A");

assertEquals(1, cache.stats().hitCount());
assertEquals(1, cache.stats().missCount());

We may also pass into recordStats supplier, which creates an implementation of the StatsCounter. This object will be pushed with every statistics-related change.

7. Conclusion

In this article, we got acquainted with the Caffeine caching library for Java. We saw how to configure and populate a cache, as well as how to choose an appropriate expiration or refresh policy according to our needs.

The source code shown here is available over on Github.


« 上一篇: Java Profiler指南
» 下一篇: Java Weekly, 第199期