1. Introduction
In this tutorial, we’ll cover the basic problems with Java memory management and the need to constantly find better ways to achieve it. This will primarily cover the new experimental garbage collector introduced in Java called Shenandoah and how it compares against other garbage collectors.
2. Understanding Challenges in Garbage Collection
A garbage collector is a form of automatic memory management where a runtime like JVM manages allocation and reclamation of memory for the user programs running on it. There are several algorithms to implement a garbage collector. These include reference counting, mark-sweep, mark-compact, and copying.
2.1. Considerations for a Garbage Collector
Depending upon the algorithm we use for garbage collection, it can either run while the user program is suspended or run concurrently with the user program. The former achieves higher throughput at the cost of high latency due to long pauses, also known as stop-the-world pauses. The latter aims for better latency but compromises on throughput.
In fact, most modern-day collectors use a hybrid strategy, where they apply both stop-the-world and concurrent approaches. It usually works by dividing the heap space into young and old generations. Generational collectors then use the stop-the-world collection in the young generation and concurrent collection in the old generation, possibly in increments to reduce pauses.
Nevertheless, the sweet spot really is to find a garbage collector that runs with minimal pauses and provides high throughput — all this with a predictable behavior on heap size that can vary from small to very large! This is a constant struggle that has kept the pace of innovation in the Java garbage collection alive since the early days.
2.2. Existing Garbage Collectors in Java
Some of the traditional garbage collectors include serial and parallel collectors. They are generational collectors and use copying in the young and mark-compact in the old generation:
While providing good throughput, they suffer from the problem of long stop-the-world pauses.
The Concurrent Mark Sweep (CMS) collector introduced in Java 1.4 is a generational, concurrent, low-pause collector. It works with copying in the young generation and mark-sweep in the old generation:
It tries to minimize the pause time by doing most of the work concurrently with the user program. Nevertheless, it still has problems leading to unpredictable pauses, requires more CPU time, and is not suitable for a heap larger than 4 GB in size.
As a long-term replacement for CMS, the Garbage First (G1) collector was introduced in Java 7. G1 is a generational, parallel, concurrent, and incrementally compacting low-pause collector. It works with copying in the young generation and mark-compact in the old generation:
G1, however, is also a regionalized collector and structures the heap area into smaller regions. This gives it the benefit of more predictable pauses. Targeted for multiprocessor machines with a large amount of memory, G1 is also not free from the pauses.
So, the race to find a better garbage collector continues, especially one that reduces the pause time further. There’s a series of experimental collectors that JVM has introduced lately, like Z, Epsilon, and Shenandoah. Apart from that, G1 continues to get more improvements.
The objective is really to get as close as possible to a pauseless Java!
3. Shenandoah Garbage Collector
Shenandoah is an experimental collector that has been introduced in Java 12 and is being positioned as a latency specialist. It tries to reduce pause times by doing more of its garbage collection work concurrently with the user program.
For instance, Shenendoah attempts to perform object relocation and compaction concurrently. This essentially means that the pause time in Shenandoah is no longer directly proportional to the heap size. Hence, it can provide consistent low-pause behavior, irrespective of the heap size.
3.1. Heap Structure
Shenandoah, like G1, is a regionalized collector. This means that it divides the heap area into a collection of equal-sized regions. A region is basically a unit of memory allocation or reclamation:
But, unlike G1 and other generational collectors, Shenandoah doesn’t divide the heap area into generations. Therefore, it has to mark most of the live objects every cycle, which generational collectors can avoid.
3.2. Object Layout
In Java, objects in memory don’t only include data fields — they carry some extra information as well. This extra information consists of the header, which contains a pointer to the object’s class, and the mark word. There are several uses for the mark word, like forwarding pointers, age bits, locking, and hashing:
Shenandoah adds an additional word to this object layout. This serves as the indirection pointer and allows Shenandoah to move objects without updating all of the references to them. This is also known as the Brooks pointer.
3.3. Barriers
Performing a collection cycle in the stop-the-world mode is simpler, but the complexity just shoots up when we do that concurrently with the user program. It presents different challenges to the collection phases like concurrent marking and compaction.
The solution lies in intercepting all heap accesses through what we call barriers. Shenandoah and other concurrent collectors like G1 make use of barriers to ensure heap consistency. However, barriers are costly operations and generally tend to reduce the throughput of a collector.
For instance, the read and write operations to an object may be intercepted by the collector using barriers:
Shenandoah makes use of multiple barriers in different phases, like the SATB barrier, read barrier, and write barrier. We’ll see where these are used in later sections.
3.4. Modes, Heuristics, and Failure Modes
Modes define the way Shenandoah runs, like which barriers it uses, and they also define its performance characteristics. There are three modes available: normal/SATB, iu, and passive. The normal/SATB mode is the default.
Heuristics determine when a collection should start and which regions it should include. These include adaptive, static, compact, and aggressive, with adaptive as the default heuristic. For instance, it may choose to select regions with 60 percent or more garbage and start a collection cycle when 75 percent of regions have been allocated.
Shenandoah needs to collect heap faster than the user program that allocates it. But, at times, it may fall behind, leading to one of the failure modes. These failure modes include pacing, degenerated collection, and in the worst case, a full collection.
4. Shenandoah Collection Phases
Shenandoah’s collection cycle consists primarily of three phases: mark, evacuate, and update references. Although most of the work in these phases happens concurrently with the user program, there are still small parts that must happen in a stop-the-world mode.
4.1. Marking
Marking is the process of identifying all objects in the heap or parts of it that are unreachable. We can do this by starting from the root objects and traversing the object graph to find reachable objects. While traversing, we also assign each object one of three colors: white, grey, or black:
Marking in the stop-the-world mode is simpler, but it gets complicated in concurrent mode. This is because the user program concurrently mutates the object graph while marking is in progress. Shenandoah solves this by using the Snapshot At the Beginning (SATB) algorithm.
This means that any object that was alive at the beginning of the marking or that has been allocated since the beginning of marking is considered live. Shenandoah makes use of the SATB barrier to maintain the SATB view of the heap.
While most of the marking is done concurrently, there are still some parts that are done in stop-the-world mode. The parts that happen in the stop-the-world mode are the init-mark to scan the root set and the final-mark to drain all pending queues and re-scan the root set. The final-mark also prepares the collection set that indicates the regions to be evacuated.
4.2. Cleanup and Evacuation
Once the marking is complete, the garbage regions are ready to be reclaimed. The garbage regions are the regions where no live objects are present. The cleanup happens concurrently.
Now, the next step is to move the live objects in the collection set to other regions. This is done to reduce the fragmentation in memory allocation and, hence, is also known as compact. Evacuation or compacting happens entirely concurrently.
Now, this is where Shenandoah is different from other collectors. A concurrent relocation of objects is tricky as the user program continues to read and write them. Shenandoah manages to achieve this by performing a compare-and-swap operation on the Brooks pointer of an object to point to its to-space version:
Further, Shenandoah uses the read and write barriers to ensure that a strict “to-space” invariant is maintained during the concurrent evacuation. What this means is that the read and write must happen from the to-space that is guaranteed to survive the evacuation.
4.3. Reference Update
This phase in the collection cycle is to traverse through the heap and update the references to objects that were moved during the evacuation:
The update reference phase is, again, mostly done concurrently. There are brief periods of init-update-refs that initialize the update reference phase and final-update-refs that re-update the root set and recycle the regions from the collection set. Only these require the stop-the-world mode.
5. Comparision With Other Experimental Collectors
Shenandoah is not the only experimental garbage collector that has been introduced recently in Java. Others include Z and Epsilon. Let’s understand how they compare against Shenandoah.
5.1. Z Collector
Introduced in Java 11, the Z collector is a single-generation, low-latency collector designed for very large heap sizes — we’re talking multi-terabyte territory. The Z collector does most of its work concurrently with the user program and leverages the load barrier for heap references.
Further, the Z collector takes advantage of 64-bit pointers with a technique called pointer coloring. Here, the colored pointers store extra information about objects on the heap. The Z collector remaps objects using the extra information stored in the pointer to reduce memory fragmentation.
Broadly speaking, the Z collector’s goals are similar to those of Shenandoah. They both aim to achieve low pause times that are not directly proportional to the heap size. However, there are more tuning options available with Shenandoah than with the Z collector.
5.2. Epsilon Collector
Epsilon, also introduced in Java 11, has a very different approach to garbage collection. It’s basically a passive or “no-op” collector, which means that it handles memory allocation but doesn’t recycle it! So, when the heap runs out of memory, the JVM simply shuts down.
But why would we ever want to use a collector like that? Basically, any garbage collector has an indirect impact on the performance of the user program. It’s very difficult to benchmark an application and understand the impact of garbage collection on it.
Epsilon serves exactly that purpose. It simply removes the impact of a garbage collector and lets us run the application in isolation. But, this expects us to have a very clear understanding of the memory requirements of our application. Consequently, we can achieve better performance from the application.
Clearly, Epsilon has a very different goal from that of Shenandoah.
6. Conclusion
In this article, we went through the basics of garbage collection in Java and the need to constantly improve it. We discussed in detail the most recent experimental collector introduced in Java — Shenandoah. We also went through how it fares against the other experimental collectors available in Java.
The pursuit of a universal garbage collector isn’t going to be realized anytime soon! So, while G1 remains the default collector, these new additions provide us with options to use Java in low-latency situations. However, we shouldn’t consider them as a drop-ship replacement of other high-throughput collectors.