1. Overview

Whether querying a remote database, reading a file from local storage, or sending a message to a queue, our data traffic will follow different encoding, encryption, or formats. Serialization (and deserialization) is a standard for operating systems or application libraries to transfer data while keeping the same content over the network or across processes. We can refer to serialization also as marshaling for some specific cases.

In this tutorial, we’ll see how serialization and marshaling work and differ. We’ll also see the most common use cases.

2. Serialization and Deserialization

Serialization is persisting an object into a state independent of its execution environment. During serialization, the data is saved (in memory or physically) in a raw format, such as byte arrays or binary data. Deserialization is the reconstruction of the original object from the serialized data.

Let’s draw what a serialization/deserialization process looks like:

Serialization

We can see that the stream of bytes is created before or after any operation we need to do, whether getting from a DB or saving a file.

2.1. Why Do We Need Serialization?

If we can generate data in a primitive form like binary, why do we need a serialization process?

Depending on the environment, the data can be represented in different ways, for example, using different architectures, memory layouts, or programming languages.

For instance, there could be issues with the endianness or because of different data types like for color bitmaps.

For programming languages, for example, in Java, not all objects maintain information because of being bound to a JVM context. Differently, C compilers can produce serialization code with much less effort.

Let’s see how serialization applies to different use cases and what formats we can have.

2.2. Use Cases

There a many use cases for serialization and deserialization:

  • Data buffers. During a file transfer, our system might serialize the data, for example, while persisting on RAM’s buffer.
  • Data stream. We can send or receive packets of information like video or sound while accessing a website or a web application.
  • Database data persistence and fetch. Before or after a database transaction, the data might go through serialization.
  • Messaging systems that have a publisher-subscriber mechanism, for example, queues.
  • Client-server communication, whether via web services like Rest or Soap or remote calls like RPC.

Although we’ll get the same result of transferring data from one source to another, the implementation can be specific to an operating system or a programming language.

So, a serializer implementation for Java will differ from the one in C#. For example, in Java, we do serialization using the Serializable interface.

Let’s define the class for a Person:

public class Person implements Serializable {

    private long id;
    private String name;

    // getters and setters
}

Serialization and deserialization will occur whenever required. For example, when fetching a Person list from the database or sending a Person object to a queue.

2.3. Formats

Before and after a serialization process, the data can have different formats. Although there are many, probably the most known are XML and JSON.

XML is one of the oldest serialization human-readable formats. It is a format for word processors, HTTP payload, or message exchange for AJAX.

Let’s see an XML example to track persons:

<?xml version="1.0" encoding="UTF-8"?>
    <person id="12345">
        <name>Eric</name>
    </person>
    <person id="67890">
        <name>John</name>
    </person>
    ...

</xml>

Although replaced by JSON for the REST protocol, XML is still widely adopted.

Likewise, let’s see a JSON example:

[
    {
        "id": "12345",@Eri
        "name": "John"
    },
    {
        "id": "67890",
        "name": "Eric"
    }
]

3. Marshaling

Marshaling is moving an object or method call into another execution part. It is more about the interoperability of objects between programs or threads. It can also involve serialization during its operation. Therefore, serialization is usually part of marshaling.

3.1. What Is Marshaling?

We commonly refer to marshaling for remote procedure calls (RPC). We can see its usage in programming languages but also in the operating system at a kernel level.

For example, let’s draw what a marshaling process looks like in an RPC:Marshaling

In this diagram, a client uses a proxy to invoke a remote server’s stub definition. We can see how marshaling works by moving an object (for example, a method parameter or invocation) to another environment.

3.2. Use Cases

For standalone applications, this can happen, for example, in Java with RMI. It allows communication between different JVM using remote interfaces as they were in our local application. RMI typically adopts JAXB for marshaling objects. The remote service implementation is auto-generated, similar to a SOAP web service. RMI is not much in use anymore. The HTTP protocol replaced it as a more flexible approach for network communication.

In .NET, marshaling still refers to RPC. Nonetheless, it also has a generic meaning of transforming to or from a type.

Another example of marshaling is the Protocol Buffers developed by Google. It allows remote communication of serialized data defined by templates. Furthermore, it also generates all the program application’s structure.

At an operating system level, we can see marshaling for inter-process communication. For example, COM is a Windows standard to use interfaces for communication between different components. For instance, Directx libraries use marshaling to optimize the communication between the user rendering request and the CPU or the graphic processor.

4. How Do Serialization and Marshalling Differ?

Let’s summarize how serialization and marshaling differ:

Serialization/Deserialization

Marshaling

Convert an object from and to a byte stream

Move objects from one thread or program to another. Serialization can be used during this process

Apply to any context where serialization is required

Usually refers to remote procedure call or IPC

Store in memory or physically a copy of the original object

Pass by-value or by-reference a copy of the object

No code generation

Service implementation or template is generated

5. Conclusion

In this tutorial, we have seen how serialization works with the most common use cases. We saw serialization (and the opposite deserialization) as a byte stream conversion.

We have also seen how marshaling instead is more about moving objects to different execution threads or environments. It might use serialization as part of the process. Finally, we have seen how serialization and marshaling differ. We usually refer to marshaling in remote calls. However, marshaling is often used interchangeably with serialization.