1. Overview
In this article, we’ll learn about Apache Fury, an incubating project under the Apache Software Foundation. This library promises blazing-fast performance, robust capabilities, and multi-language support.
We’ll examine some of the project’s basic features and compare its performance against other frameworks.
2. Serialization With Apache Fury
Serialization is a critical process in software development that enables efficient data exchange between systems. It allows the application to share the state and communicate through it.
Apache Fury is a serialization library designed to address the limitations of existing libraries and frameworks. It offers a high-performance, easy-to-use library for serializing and deserializing data across various programming languages. Built to handle complex data structures and large data volumes efficiently. The key features offered by Apache Fury are:
- High Performance: Apache Fury is optimized for speed, ensuring minimal overhead during serialization and deserialization processes.
- Cross-Language Support: Supports multiple programming languages, making it versatile for different development environments (Java/Python/C++/Golang/JavaScript/Rust/Scala/TypeScript).
- Complex Data Structures: Capable of handling intricate data models with ease.
- Compact Serialization: Produces compact serialized data, reducing storage and transmission costs.
- GraalVM Native Image Support: AOT compilation serialization is needed for the GraalVM native image, and no reflection/serialization JSON config is necessary.
3. Code Sample
First, we need to add the required dependency to our project so we can start interacting with the Fury library APIs:
<dependency>
<groupId>org.apache.fury</groupId>
<artifactId>fury-core</artifactId>
<version>0.5.0</version>
</dependency>
To try Fury for the first time, let’s create a simple structure using different data types and at least one nested object so we can simulate an everyday use case in an actual application. To do that, we’ll need to create a UserEvent class to represent the state of our user event which later will be serialized:
public class UserEvent implements Serializable {
private final String userId;
private final String eventType;
private final long timestamp;
private final Address address;
// Constructor and getters
}
To introduce a bit more complexity to our event object, let’s define a nested structure for the address using a Java POJO named Address:
public class Address implements Serializable {
private final String street;
private final String city;
private final String zipCode;
// Constructor and getters
}
An important aspect is that Fury doesn’t require the class to implement the Serializable interface. However, later, we’ll use the Java native serializer, which does need it. Next, we should initiate the Fury context.
3.1. Fury Setup
Now, we’ll see how to set up Fury so we can start using it:
class FurySerializationUnitTest {
@Test
void whenUsingFurySerialization_thenGenerateByteOutput() {
Fury fury = Fury.builder()
.withLanguage(Language.JAVA)
.withAsyncCompilation(true)
.build();
fury.register(UserEvent.class);
fury.register(Address.class);
// ...
}
In this code snippet, we create the Fury object and define Java as the protocol for use, as it’s optimal for this case. However, as mentioned before, Fury supports cross-language serialization (using Language.XLANG for example). Moreover, we set the withAsyncCompilation option to true, which allows the compilation of serializers in the background using the JIT (Just In Time) and our application to continue processing other tasks without waiting for the compilation to complete. It uses a non-blocking compilation to implement this optimization.
Once the Fury is set up, we need to register the classes that may be serialized. This is important as Fury can use a pre-generated schema or metadata to streamline the serialization and deserialization process. That eliminates the need for runtime reflection, which can be slow and resource-intensive.
Also, registering classes helps reduce the overhead associated with dynamically determining the class structure during serialization and deserialization. That can lead to faster processing times. Finally, this is relevant from the secure perspective as we create a safelist of classes that are allowed for serialization and deserialization.
Fury’s registry prevents unintentional or malicious serialization of unexpected classes, which could lead to security vulnerabilities such as deserialization attacks. It also mitigates the risk of exploiting vulnerabilities in the serialization mechanism or within the classes themselves. Deserialization of arbitrary or unexpected classes can lead to code execution vulnerabilities.
3.2. Using Fury
Now that Fury is configured, we can use this object to perform multiple serialization and deserialization operations. It offers many APIs with lower and high-level access to the serialization process nuances, but in our case, we can call the following methods:
@Test
void whenUsingFurySerialization_thenGenerateByteOutput() {
//... setup
byte[] serializedData = fury.serialize(event);
UserEvent temp = (UserEvent) fury.deserialize(serializedData);
//...
}
We need this to execute these two basic operations using the library and leverage its great potential. Nonetheless, how could we compare it to other well-known serialization frameworks used in Java? Next, we’ll run some experiments to make such a comparison.
4. Comparing Apache Fury
First of all, this tutorial doesn’t intend to perform an extensive benchmark between Apache Fury and other frameworks. Having said that, to contextualize the kind of performance the project aims to achieve, let’s see how different libraries and frameworks perform against our sample use case. For our comparison, we used Java Native Serialization, Avro Serialization, and Protocol Buffers.
To compare each framework, our test measures the time it takes each of them to serialize and deserialize 100K of our events:
As observed, Fury and Protobuf performed exceptionally in our experiment. In the beginning, Protobuf outperforms Fury, but later, Fury seems to perform better, most likely due to the nature of the JIT compiler. However, both have performed outstandingly, as we can observe. Finally, let’s have a look at the size of output generated for such frameworks:
When it comes to the serialization process’s output, Protobuf seems to have slightly better performance, producing a smaller output. However, the difference between Fury and it looks pretty small, so we can say their performance is also comparable.
Once again, that may not be true for all cases. This isn’t an extensive benchmark but rather a comparison based on our use case. Nonetheless, Apache Fury offers great performance and simple-to-use capabilities, which is the project’s aim.
5. Conclusion
In this tutorial, we looked at Fury, a serialization library that offers blaze-fast, cross-language, powered by JIT (just-in-time compilation) and zero-copy serialization and deserialization capabilities. Moreover, we saw how it performs compared to other well-known serialization frameworks used in the Java ecosystems.
Regardless of which library or framework is faster/more efficient, Fury’s ability to handle complex data structures and provide cross-language support makes it an excellent choice for modern applications requiring high-speed data processing. By incorporating Apache Fury, developers can ensure their applications perform serialization and deserialization tasks with minimal overhead, enhancing overall efficiency and performance.
As usual, all code samples used in this article are available over on GitHub.