1. Overview
Java objects reside on the heap. However, this can occasionally lead to problems such as inefficient memory usage, low performance, and garbage collection issues. Native memory can be more efficient in these cases, but using it has been traditionally very difficult and error-prone.
Java 14 introduced the foreign memory access API to access native memory more securely and efficiently. Since then, it has evolved through subsequent releases, eventually being unified with the foreign linker API and renamed to the foreign function and memory API in Java 22.
In this tutorial, we’ll look at this API.
2. Motivation
Efficient memory use has always been challenging. This is mainly due to an inadequate understanding of memory, its organization, and complex memory-addressing techniques.
For instance, an improperly implemented memory cache could cause frequent garbage collection, which would drastically degrade application performance.
Before introducing the foreign function and memory access API in Java, there were two main ways to access native memory in Java. These are java.nio.ByteBuffer and sun.misc.Unsafe classes.
Let’s quickly look at these APIs’ advantages and disadvantages.
2.1. ByteBuffer API
The ByteBuffer API allows the creation of direct, off-heap byte buffers. These buffers can be directly accessed from a Java program. However, there are some limitations:
- The buffer size can’t be more than two gigabytes
- The garbage collector is responsible for memory deallocation
Furthermore, incorrect use of a ByteBuffer can cause a memory leak and OutOfMemory errors. This is because an unused memory reference can prevent the garbage collector from deallocating the memory.
2.2. Unsafe API
The Unsafe API is extremely efficient due to its addressing model. However, as the name suggests, this API is unsafe and has several drawbacks:
- It often allows the Java programs to crash the JVM due to illegal memory usage
- It’s a non-standard Java API
2.3. JNI API
Java has supported the invocation of native code since version 1.1. However, it suffers from the following complexities and limitations:
- The need for multiple toolchains
- Difficulties in interoperability with different calling conventions
- Laborious data unpacking
2.4. The Need for a New Api
In summary, accessing a foreign memory poses a dilemma. Should we use a safe but limited path (ByteBuffer), or should we risk using the unsupported and dangerous Unsafe API?
The new foreign function and memory access API aims to resolve these issues.
3. Foreign Function and Memory API
The foreign function and memory access API provides a supported, safe, and efficient API to access both heap and native memory and invoke native code. It provides several key components:
- Arena: controls the lifecycle of native memory segments
- MemorySegment: represents a contiguous region of memory, either on-heap or off-heap
- MemoryLayout: describes the structure of memory segments
- FunctionDescriptor: models the signature of foreign functions
- Linker: facilitates linking Java code with native functions
- SymbolLookup: looks up native symbols (functions, variables) by name
Let’s discuss these in detail.
4. Allocating Native Memory
4.1. Arena
An arena is responsible for managing the lifecycle of native memory segments, offering flexible memory allocation, and ensuring proper deallocation when no longer needed. Java 22 provides various types of arenas with varying lifetimes and access restrictions. Let’s create some arenas with different types:
Arena globalArena = Arena.global();
MemorySegment segment = globalArena.allocate(100);
The global arena has an unbounded lifetime and cannot be closed manually.
Arena arena = Arena.ofAuto();
MemorySegment segment = arena.allocate(100);
An automatic arena has a bounded lifetime managed by the garbage collector and will be cleared when the arena and all its allocated segments become unreachable.
Arena arena = Arena.ofConfined();
MemorySegment segment = arena.allocate(100);
A confined arena has a bounded lifetime and restricts access to the creating thread.
Arena arena = Arena.ofShared();
MemorySegment segment = arena.allocate(100);
A shared arena has a bounded lifetime and allows access from multiple threads.
4.2. MemorySegment
A memory segment is a contiguous region of memory. This can be either heap or off-heap memory. And, there are several ways to obtain a memory segment.
A memory segment backed by native memory is known as a native memory segment. Let’s create a native memory segment of 200 bytes:
MemorySegment memorySegment =
A memory segment can also be backed by an existing heap-allocated Java array. For example, we can create an array memory segment from an array of long:
MemorySegment memorySegment = MemorySegment.ofArray(new long[100]);
Additionally, a memory segment can be backed by an existing Java ByteBuffer. This is known as a buffer memory segment:
MemorySegment memorySegment = MemorySegment.ofBuffer(ByteBuffer.allocateDirect(200));
Alternatively, we can use a memory-mapped file. This is known as a mapped memory segment. Let’s define a 200-byte memory segment using a file path with read-write access:
Arena arena = Arena.ofConfined();
RandomAccessFile file = new RandomAccessFile("/tmp/memory.txt", "rw");
FileChannel fc = file.getChannel();
MemorySegment memorySegment = fc.map(READ_WRITE, 0, 200, arena);
Heap segments are accessible from any thread without restrictions. On the other hand, access to native segments is limited based on the confinement characteristics of the arena from which they were obtained.
Also, a memory segment has spatial and temporal boundaries in terms of memory access:
- Spatial boundary — the memory segment has lower and upper limits
- Temporal boundary — governs creating, using, and closing a memory segment
Together, spatial and temporal checks ensure the safety of the JVM.
4.3. Slicing A Memory Segment
We can slice a memory segment into multiple smaller blocks. This avoids allocating multiple blocks if we want to store values with different layouts.
Let’s try using asSlice:
Arena arena = Arena.ofAuto();
MemorySegment memorySegment = arena.allocate(12);
MemorySegment segment1 = memorySegment.asSlice(0, 4);
MemorySegment segment2 = memorySegment.asSlice(4, 4);
MemorySegment segment3 = memorySegment.asSlice(8, 4);
VarHandle intHandle = ValueLayout.JAVA_INT.varHandle();
intHandle.set(segment1, 0, Integer.MIN_VALUE);
intHandle.set(segment2, 0, 0);
intHandle.set(segment3, 0, Integer.MAX_VALUE);
assertEquals(intHandle.get(segment1, 0), Integer.MIN_VALUE);
assertEquals(intHandle.get(segment2, 0), 0);
assertEquals(intHandle.get(segment3, 0), Integer.MAX_VALUE);
5. Using Native Memory
5.1. MemoryLayout
The MemoryLayout class lets us describe the contents of a memory segment. Specifically, it lets us define how the memory is broken up into elements, where the size of each element is provided.
This is a bit like describing the memory layout as a concrete type, but without providing a Java class. It’s similar to how languages like C++ map their structures to memory.
Let’s take an example of a cartesian coordinate point defined with the coordinates x and y:
int numberOfPoints = 10;
MemoryLayout pointLayout = MemoryLayout.structLayout(ValueLayout.JAVA_INT.withName("x"),
ValueLayout.JAVA_INT.withName("y"));
SequenceLayout pointsLayout = MemoryLayout.sequenceLayout(numberOfPoints, pointLayout);
Here, we’ve defined a layout made of two integer values named x and y. This layout can be used with a SequenceLayout to make something similar to an array, in this case with 10 indices.
5.2. ValueLayout
A ValueLayout models a memory layout for basic data types such as integer and floating types. Each ValueLayout has a size and a byte order. Some predefined ValueLayout types include ValueLayout.JAVA_INT, ValueLayout.CHAR, ValueLayout.DOUBLE and so on.
ValueLayout intLayout = ValueLayout.JAVA_INT;
ValueLayout charLayout = ValueLayout.JAVA_CHAR;
assertEquals(intLayout.byteSize(), 4);
assertEquals(charLayout.byteSize(), 2);
5.3. SequenceLayout
A SequenceLayout denotes the repetition of a given layout. In other words, this can be thought of as a sequence of elements similar to an array with the defined element layout.
For example, we can create a sequence layout for 10 elements of java integer:
SequenceLayout sequenceLayout = MemoryLayout.sequenceLayout(10, ValueLayout.JAVA_INT);
5.4. GroupLayout
A GroupLayout can combine multiple member layouts, which can be similar or a combination of different types.
There are two possible ways to define a group layout. For instance, when the member layouts are organized one after another, it is defined as a struct. On the other hand, if the member layouts are laid out from the same starting offset, then it is called a union.
Let’s create a GroupLayout of struct type with an integer and a long:
GroupLayout groupLayout = MemoryLayout.structLayout(ValueLayout.JAVA_INT, ValueLayout.JAVA_INT);
We can also create a GroupLayout of union type using the unionLayout method:
GroupLayout groupLayout = MemoryLayout.unionLayout(ValueLayout.JAVA_INT, ValueLayout.JAVA_LONG);
The first of these is a structure which contains one of each type. And, the second is a structure that can contain one type or the other.
A group layout allows us to create a complex memory layout consisting of multiple elements. For example:
MemoryLayout memoryLayout1 = ValueLayout.JAVA_INT;
MemoryLayout memoryLayout2 = MemoryLayout.structLayout(ValueLayout.JAVA_LONG);
MemoryLayout complexLayout = MemoryLayout.structLayout(memoryLayout1, MemoryLayout.paddingLayout(4), memoryLayout2);
5.5. VarHandle
VarHandle is a versatile and immutable reference to variables that supports different access modes, allowing read/write operations on various types of variables, including static fields, non-static fields, array elements, and off-heap data structure components. A VarHandle allows access to a memory segment. VarHandle can be constructed using different subtypes of MemoryLayout. Let’s try this out:
int value = 10;
MemoryLayout pointLayout = MemoryLayout.structLayout(
ValueLayout.JAVA_INT.withName("x"),
ValueLayout.JAVA_INT.withName("y")
);
VarHandle xHandle = pointLayout.varHandle(MemoryLayout.PathElement.groupElement("x"));
Arena arena = Arena.ofAuto();
MemorySegment segment = arena.allocate(pointLayout);
xHandle.set(segment, 0, (int) value);
int xValue = (int) xHandle.get(segment, 0);
assertEquals(xValue, value);
In the example above, we create a MemorySegment to store a point structure consisting of two integers, x and y, using a MemoryLayout of eight bytes in total. A VarHandle is then used to set and retrieve the value of the x field, ensuring type-safe and correct memory operations.
5.6. Using VarHandle With Offset
We can also use an offset in conjunction with a MemorySegment to access specific elements, similar to using an index in an array:
int numberOfPoints = 10;
MemoryLayout pointLayout = MemoryLayout.structLayout(ValueLayout.JAVA_INT.withName("x"),
ValueLayout.JAVA_INT.withName("y"));
SequenceLayout pointsLayout = MemoryLayout.sequenceLayout(numberOfPoints, pointLayout);
VarHandle xHandle = pointsLayout.varHandle(MemoryLayout.PathElement.sequenceElement(),
MemoryLayout.PathElement.groupElement("x"));
Arena arena = Arena.ofAuto();
MemorySegment segment = arena.allocate(pointsLayout);
for (int i = 0; i < numberOfPoints; i++) {
xHandle.set(segment, 0, i, i);
}
for (int i = 0; i < numberOfPoints; i++) {
assertEquals(i, xHandle.get(segment, 0, i));
}
In the above example, we are storing the integers 0 to 9 in a memory segment. At first, we create a MemorySegment of 80 bytes. This is because each point structure consists of two integers (8 bytes), and to store 10 point structures, we need 80 bytes (8 * 10). To access each index, we set the VarHandle to point to the correct offset within the MemorySegment.
6. Invoking Native Functions
Foreign function and memory API provides a simple and streamlined approach for integrating native code and offering a safer and more efficient alternative for invoking foreign functions in Java applications.
6.1. FunctionDescriptor
The FunctionDescriptor class describes the signature of a foreign function, specifying the return type, parameter types, and relevant information.
Let’s define a FunctionDescriptor for a foreign function that takes an address as a parameter and returns a long:
FunctionDescriptor fd = FunctionDescriptor.of(JAVA_LONG, ADDRESS);
An address is a ValueLayout used to represent the location or reference to a specific region of memory.
6.2. Linker and SymbolLookup
The Linker class manages the loading and unloading of native libraries, while the SymbolLookup class resolves function symbols within those libraries, enabling the establishment of a connection between Java code and foreign functions defined in native libraries.
For example, we can use Linker to manage native library loading and SymbolLookup to find strlen function from glibc:
Linker linker = Linker.nativeLinker();
var symbol = linker.defaultLookup().find("strlen").orElseThrow();
6.3. MethodHandle
The MethodHandle class serves as a bridge between Java and foreign functions, representing a reference to a function that can be invoked from Java code. It allows dynamic binding and invocation of foreign functions with appropriate arguments and return values.
We can create a MethodHandle for the strlen function and use it to calculate the length of a string:
MethodHandle strlenHandle = linker.downcallHandle(strlenSymbol,
FunctionDescriptor.of(ValueLayout.JAVA_LONG, ValueLayout.ADDRESS));
Let’s wrap the code above into one cohesive example, demonstrating how to invoke the strlen function from Java:
Linker linker = Linker.nativeLinker();
var symbol = linker.defaultLookup().find("strlen").orElseThrow();
MethodHandle strlen = linker.downcallHandle(symbol,
FunctionDescriptor.of(ValueLayout.JAVA_LONG, ValueLayout.ADDRESS));
Arena arena = Arena.ofAuto();
MemorySegment str = arena.allocateFrom("Hello");
long len = (long) strlen.invoke(str);
assertEquals(5, len);
7. Conclusion
In this article, we learned about the new foreign function and memory access API in Java 22.
First, we looked at the need for foreign function and memory access and the limitations of the pre-Java 14 APIs. Then, we saw how the foreign function and memory access API is a safe abstraction for accessing both heap and non-heap memory.
Finally, we explored the use of the API to read and write data both on and off the heap, while also harnessing its capability to invoke native code seamlessly.
As always, the source code of the examples is available over on GitHub.