1. Overview
Ever wondered why Java applications consume much more memory than the specified amount via the well-known -Xms and -Xmx tuning flags? For a variety of reasons and possible optimizations, the JVM may allocate extra native memory. These extra allocations can eventually raise the consumed memory beyond the -Xmx limitation.
In this tutorial we’re going to enumerate a few common sources of native memory allocations in the JVM, along with their sizing tuning flags, and then learn how to use Native Memory Tracking to monitor them.
2. Native Allocations
The heap usually is the largest consumer of memory in Java applications, but there are others. Besides the heap, the JVM allocates a fairly large chunk from the native memory to maintain its class metadata, application code, the code generated by JIT, internal data structures, etc. In the following sections, we’ll explore some of those allocations.
2.1. Metaspace
In order to maintain some metadata about the loaded classes, The JVM uses a dedicated non-heap area called Metaspace. Before Java 8, the equivalent was called PermGen or Permanent Generation. Metaspace or PermGen contains the metadata about the loaded classes rather than the instances of them, which are kept inside the heap.
The important thing here is that the heap sizing configurations won’t affect the Metaspace size since the Metaspace is an off-heap data area. In order to limit the Metaspace size, we use other tuning flags:
- -XX:MetaspaceSize and -XX:MaxMetaspaceSize to set the minimum and maximum Metaspace size
- Before Java 8, -XX:PermSize and -XX:MaxPermSize to set the minimum and maximum PermGen size
2.2. Threads
One of the most memory-consuming data areas in the JVM is the stack, created at the same time as each thread. The stack stores local variables and partial results, playing an important role in method invocations.
The default thread stack size is platform-dependent, but in most modern 64-bit operating systems, it’s around 1 MB. This size is configurable via the -Xss tuning flag.
In contrast with other data areas, the total memory allocated to stacks is practically unbounded when there is no limitation on the number of threads. It’s also worth mentioning that the JVM itself needs a few threads to perform its internal operations like GC or just-in-time compilations.
2.3. Code Cache
In order to run JVM bytecode on different platforms, it needs to be converted to machine instructions. The JIT compiler is responsible for this compilation as the program is executed.
When the JVM compiles bytecode to assembly instructions, it stores those instructions in a special non-heap data area called Code Cache. The code cache can be managed just like other data areas in the JVM. The -XX:InitialCodeCacheSize and -XX:ReservedCodeCacheSize tuning flags determine the initial and maximum possible size for the code cache.
2.4. Garbage Collection
The JVM is shipped with a handful of GC algorithms, each suitable for different use cases. All those GC algorithms share one common trait: they need to use some off-heap data structures to perform their tasks. These internal data structures consume more native memory.
2.5. Symbols
Let’s start with Strings, one of the most commonly used data types in application and library code. Because of their ubiquity, they usually occupy a large portion of the Heap. If a large number of those strings contain the same content, then a significant part of the heap will be wasted.
In order to save some heap space, we can store one version of each String and make others refer to the stored version. This process is called String Interning. Since the JVM can only intern Compile Time String Constants, we can manually call the intern() method on strings we intend to intern.
JVM stores interned strings in a special native fixed-sized hashtable called the String Table, also known as the String Pool. We can configure the table size (i.e. the number of buckets) via the -XX:StringTableSize tuning flag.
In addition to the string table, there’s another native data area called the Runtime Constant Pool. JVM uses this pool to store constants like compile-time numeric literals or method and field references that must be resolved at runtime.
2.6. Native Byte Buffers
The JVM is the usual suspect for a significant number of native allocations, but sometimes developers can directly allocate native memory, too. Most common approaches are the malloc call by JNI and NIO’s direct ByteBuffers.
2.7. Additional Tuning Flags
In this section, we used a handful of JVM tuning flags for different optimization scenarios. Using the following tip, we can find almost all tuning flags related to a particular concept:
$ java -XX:+PrintFlagsFinal -version | grep <concept>
The PrintFlagsFinal prints all the –XX options in JVM. For example, to find all Metaspace related flags:
$ java -XX:+PrintFlagsFinal -version | grep Metaspace
// truncated
uintx MaxMetaspaceSize = 18446744073709547520 {product}
uintx MetaspaceSize = 21807104 {pd product}
// truncated
3. Native Memory Tracking (NMT)
Now that we know the common sources of native memory allocations in the JVM, it’s time to find out how to monitor them. First, we should enable the native memory tracking using yet another JVM tuning flag: -XX:NativeMemoryTracking=off|sumary|detail. By default, the NMT is off but we can enable it to see a summary or detailed view of its observations.
Let’s suppose we want to track native allocations for a typical Spring Boot application:
$ java -XX:NativeMemoryTracking=summary -Xms300m -Xmx300m -XX:+UseG1GC -jar app.jar
Here, we’re enabling the NMT while allocating 300 MB of heap space, with G1 as our GC algorithm.
3.1. Instant Snapshots
When NMT is enabled, we can get the native memory information at any time using the jcmd command:
$ jcmd <pid> VM.native_memory
In order to find the PID for a JVM application, we can use the jps command:
$ jps -l
7858 app.jar // This is our app
7899 sun.tools.jps.Jps
Now if we use jcmd with the appropriate pid, the VM.native_memory makes the JVM print out the information about native allocations:
$ jcmd 7858 VM.native_memory
Let’s analyze the NMT output section by section.
3.2. Total Allocations
NMT reports the total reserved and committed memory as follows:
Native Memory Tracking:
Total: reserved=1731124KB, committed=448152KB
Reserved memory represents the total amount of memory our app can potentially use. Conversely, the committed memory is equal to the amount of memory our app is using right now.
Despite allocating 300 MB of heap, the total reserved memory for our app is almost 1.7 GB, much more than that. Similarly, the committed memory is around 440 MB, which is, again, much more than that 300 MB.
After the total section, NMT reports memory allocations per allocation source. So, let’s explore each source in depth.
3.3. Heap
NMT reports our heap allocations as we expected:
Java Heap (reserved=307200KB, committed=307200KB)
(mmap: reserved=307200KB, committed=307200KB)
300 MB of both reserved and committed memory, which matches our heap size settings.
3.4. Metaspace
Here’s what the NMT says about the class metadata for loaded classes:
Class (reserved=1091407KB, committed=45815KB)
(classes #6566)
(malloc=10063KB #8519)
(mmap: reserved=1081344KB, committed=35752KB)
Almost 1 GB reserved and 45 MB committed to loading 6566 classes.
3.5. Thread
And here’s the NMT report on thread allocations:
Thread (reserved=37018KB, committed=37018KB)
(thread #37)
(stack: reserved=36864KB, committed=36864KB)
(malloc=112KB #190)
(arena=42KB #72)
In total, 36 MB of memory is allocated to stacks for 37 threads – almost 1 MB per stack. JVM allocates the memory to threads at the time of creation, so the reserved and committed allocations are equal.
3.6. Code Cache
Let’s see what NMT says about the generated and cached assembly instructions by JIT:
Code (reserved=251549KB, committed=14169KB)
(malloc=1949KB #3424)
(mmap: reserved=249600KB, committed=12220KB)
Currently, almost 13 MB of code is being cached, and this amount can potentially go up to approximately 245 MB.
3.7. GC
Here’s the NMT report about G1 GC’s memory usage:
GC (reserved=61771KB, committed=61771KB)
(malloc=17603KB #4501)
(mmap: reserved=44168KB, committed=44168KB)
As we can see, almost 60 MB is reserved and committed to helping G1.
Let’s see how the memory usage looks like for a much simpler GC, say Serial GC:
$ java -XX:NativeMemoryTracking=summary -Xms300m -Xmx300m -XX:+UseSerialGC -jar app.jar
The Serial GC barely uses 1 MB:
GC (reserved=1034KB, committed=1034KB)
(malloc=26KB #158)
(mmap: reserved=1008KB, committed=1008KB)
Obviously, we shouldn’t pick a GC algorithm just because of its memory usage, as the stop-the-world nature of the Serial GC may cause performance degradations. There are, however, several GCs to choose from, and they each balance memory and performance differently.
3.8. Symbol
Here is the NMT report about the symbol allocations, such as the string table and constant pool:
Symbol (reserved=10148KB, committed=10148KB)
(malloc=7295KB #66194)
(arena=2853KB #1)
Almost 10 MB is allocated to symbols.
3.9. NMT Over Time
The NMT allows us to track how memory allocations change over time. First, we should mark the current state of our application as a baseline:
$ jcmd <pid> VM.native_memory baseline
Baseline succeeded
Then, after a while, we can compare the current memory usage with that baseline:
$ jcmd <pid> VM.native_memory summary.diff
NMT, using + and – signs, would tell us how the memory usage changed over that period:
Total: reserved=1771487KB +3373KB, committed=491491KB +6873KB
- Java Heap (reserved=307200KB, committed=307200KB)
(mmap: reserved=307200KB, committed=307200KB)
- Class (reserved=1084300KB +2103KB, committed=39356KB +2871KB)
// Truncated
The total reserved and committed memory increased by 3 MB and 6 MB, respectively. Other fluctuations in memory allocations can be spotted as easily.
3.10. Detailed NMT
NMT can provide very detailed information about a map of the entire memory space. To enable this detailed report, we should use the -XX:NativeMemoryTracking=detail tuning flag.
4. Conclusion
In this article, we enumerated different contributors to native memory allocations in the JVM. Then, we learned how to inspect a running application to monitor its native allocations. With these insights, we can more effectively tune our applications and size our runtime environments.