1. Introduction

In this tutorial, we’re going to learn how to achieve fine-grained synchronization, also known as Lock Striping, a pattern for handling concurrent access to data structures while keeping up a good performance.

2. The Problem

HashMap is not a thread-safe data structure due to its non-synchronized nature. That means commands from a multi-threaded environment might result in data inconsistency.

To overcome that issue, we can either convert the original map with Collections#synchronizedMap method or use the HashTable data structure. Both will return a thread-safe implementation of the Map interface, but they come at the cost of performance.

The approach of defining exclusive access over data structures with a single lock object is called coarse-grained synchronization.

In a coarse-grained synchronization implementation, every access to the object must be made at a time by one thread. We end up having sequential accesses.

Our goal is to allow concurrent threads to work on the data structure while ensuring thread-safety.

3. Lock Striping

To reach our goal, we’ll use the Lock Striping pattern. Lock striping is a technique where the locking occurs on several buckets or stripes, meaning that accessing a bucket only locks that bucket and not the entire data structure.

There are a couple of ways to do this:

  • First, we could use a lock per task, thus maximizing concurrency between tasks – this has a higher memory footprint, though
  • Or, we could use a single lock for every task, which makes use of less memory but also compromises performance in concurrency

To help us manage this performance-memory tradeoff, Guava ships with a class called Striped. It’s similar to logic found in ConcurrentHashMap, but the Striped class goes even further by reducing the synchronization of distinct tasks using semaphores or reentrant locks.

4. A Quick Example

Let’s do a quick example to help us understand the benefits of this pattern.

We’ll compare HashMap vs. ConcurrentHashMap and a single lock vs. a striped lock resulting in four experiments.

For each experiment, we’ll perform concurrent reads and writes on the underlying Map. What will vary is how we access each bucket.

And for that, we’ll create two classes – SingleLock and StripedLock. These are concrete implementations of an abstract class ConcurrentAccessExperiment that does the work.

4.1. Dependencies

Since we’re going to use Guava’s Striped class, we’ll add the guava dependency:

<dependency>
    <groupId>com.google.guava</groupId>
    <artifactId>guava</artifactId>
    <version>31.0.1-jre</version>
</dependency>

4.2. Main Process

Our ConcurrentAccessExperiment class implements the behavior previously described:

public abstract class ConcurrentAccessExperiment {

    public final Map<String,String> doWork(Map<String,String> map, int tasks, int slots) {
        CompletableFuture<?>[] requests = new CompletableFuture<?>[tasks * slots];

        for (int i = 0; i < tasks; i++) {
            requests[slots * i + 0] = CompletableFuture.supplyAsync(putSupplier(map, i));
            requests[slots * i + 1] = CompletableFuture.supplyAsync(getSupplier(map, i));
            requests[slots * i + 2] = CompletableFuture.supplyAsync(getSupplier(map, i));
            requests[slots * i + 3] = CompletableFuture.supplyAsync(getSupplier(map, i));
        }
        CompletableFuture.allOf(requests).join();

        return map;
    }

    protected abstract Supplier<?> putSupplier(Map<String,String> map, int key);
    protected abstract Supplier<?> getSupplier(Map<String,String> map, int key);
}

It’s important to note that, as our test is CPU-bound, we have limited the number of buckets to some multiple of the available processors.

4.3. Concurrent Access with ReentrantLock

Now we’ll implement the methods for our asynchronous tasks.

Our SingleLock class defines a single lock for the entire data structure using a ReentrantLock:

public class SingleLock extends ConcurrentAccessExperiment {
    ReentrantLock lock;

    public SingleLock() {
        lock = new ReentrantLock();
    }

    protected Supplier<?> putSupplier(Map<String,String> map, int key) {
        return (()-> {
            lock.lock();
            try {
                return map.put("key" + key, "value" + key);
            } finally {
                lock.unlock();
            }
        });
    }

    protected Supplier<?> getSupplier(Map<String,String> map, int key) {
        return (()-> {
            lock.lock();
            try {
                return map.get("key" + key);
            } finally {
                lock.unlock();
            }
        });
    }
}

4.4. Concurrent Access with Striped

Then, the StripedLock class defines a striped lock for each bucket:

public class StripedLock extends ConcurrentAccessExperiment {
    Striped lock;

    public StripedLock(int buckets) {
        lock = Striped.lock(buckets);
    }

    protected Supplier<?> putSupplier(Map<String,String> map, int key) {
        return (()-> {
            int bucket = key % stripedLock.size();
            Lock lock = stripedLock.get(bucket);
            lock.lock();
            try {
                return map.put("key" + key, "value" + key);
            } finally {
                lock.unlock();
            }
        });
    }

    protected Supplier<?> getSupplier(Map<String,String> map, int key) {
        return (()-> {
            int bucket = key % stripedLock.size();
            Lock lock = stripedLock.get(bucket);
            lock.lock(); 
            try {
                return map.get("key" + key);
            } finally {
                lock.unlock();
            }
        });
    }
}

So, which strategy performs better?

5. Results

Let’s use JMH (the Java Microbenchmark Harness) to find out. The benchmarks can be found through the source code link at the end of the tutorial.

Running our benchmark, we’re able to see something similar to the following (note that higher throughput is better):

Benchmark                                                Mode  Cnt  Score   Error   Units
ConcurrentAccessBenchmark.singleLockConcurrentHashMap   thrpt   10  0,059 ± 0,006  ops/ms
ConcurrentAccessBenchmark.singleLockHashMap             thrpt   10  0,061 ± 0,005  ops/ms
ConcurrentAccessBenchmark.stripedLockConcurrentHashMap  thrpt   10  0,065 ± 0,009  ops/ms
ConcurrentAccessBenchmark.stripedLockHashMap            thrpt   10  0,068 ± 0,008  ops/ms

6. Conclusions

In this tutorial, we explored different ways of how we can achieve better performance using Lock Striping in Map-like structures. We created a benchmark to compare the results with several implementations.

From our benchmark results, we can understand how different concurrent strategies could significantly affect the overall process. Striped Lock pattern grants quite an improvement as it scores ~10% extra with both HashMap and ConcurrentHashMap.

As usual, the source code for this tutorial is available over on GitHub.