1. Overview
Apache Avro is a widely used data serialization system, especially popular in big data applications due to its efficiency and schema evolution capabilities. In this tutorial, we’ll walk through object conversion to JSON through Avro, and converting an entire Avro file to a JSON file. This can be particularly useful for data inspection and debugging.
In today’s data-driven world, the ability to work with different data formats is crucial. Apache Avro is often used in systems that require high performance and storage efficiency, such as Apache Hadoop.
2. Configuration
To get started, let’s add dependencies for Avro and JSON to our pom.xml file.
We’ve added version 1.11.1 of Apache Avro for this tutorial:
<dependency>
<groupId>org.apache.avro</groupId>
<artifactId>avro</artifactId>
<version>1.11.1</version>
</dependency>
3. Converting Avro Object to JSON
Converting a Java object to JSON via Avro involves a number of steps which include:
- Inferring/Building the Avro schema
- Converting the Java object to an Avro GenericRecord and finally
- Converting the object to JSON
We’ll utilize Avro’s Reflect API to dynamically infer the schema from Java objects, instead of manually defining the schema.
To demonstrate this, let’s create a Point class with two integral properties, x and y:
public class Point {
private int x;
private int y;
public Point(int x, int y) {
this.x = x;
this.y = y;
}
// Getters and setters
}
Let’s proceed to infer the schema:
public Schema inferSchema(Point p) {
return ReflectData.get().getSchema(p.getClass());
}
We defined a method inferSchema and used the getSchema method of the ReflectData class to infer the schema off the point object. The schema describes the fields x and y along with their data types.
Next, let’s create a GenericRecord object from a Point object and convert it to JSON:
public String convertObjectToJson(Point p, Schema schema) {
try {
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
GenericDatumWriter<GenericRecord> datumWriter = new GenericDatumWriter<>(schema);
GenericRecord genericRecord = new GenericData.Record(schema);
genericRecord.put("x", p.getX());
genericRecord.put("y", p.getY());
Encoder encoder = EncoderFactory.get().jsonEncoder(schema, outputStream);
datumWriter.write(genericRecord, encoder);
encoder.flush();
outputStream.close();
return outputStream.toString();
} catch (Exception e) {
throw new RuntimeException(e);
}
}
The method convertObjectToJson converts Point object into JSON string using the provided schema. Initially, created a GenericRecord object based on the provided schema, populated it with the Point object’s data, and afterward, used DatumWriter to write data to the ByteArrayOutputStream transitively via the JsonEncoder object, finally, the toString method was used on the OutputStream object to get the JSON string.
Let’s verify the content of the JSON produced:
private AvroFileToJsonFile avroFileToJsonFile;
private Point p;
private String expectedOutput;
@BeforeEach
public void setup() {
avroFileToJsonFile = new AvroFileToJsonFile();
p = new Point(2, 4);
expectedOutput = "{\"x\":2,\"y\":4}";
}
@Test
public void whenConvertedToJson_ThenEquals() {
String response = avroFileToJsonFile.convertObjectToJson(p, avroFileToJsonFile.inferSchema(p));
assertEquals(expectedOutput, response);
}
4. Converting Avro File to JSON File
Converting an entire Avro file to a JSON file follows a similar process but involves reading from a file. This is common when we have data stored in Avro format on disk and need to convert it to a more accessible format, such as JSON.
Let’s begin by defining a method, writeAvroToFile, which will be used to write some Avro data to a file:
public void writeAvroToFile(Schema schema, List<Point> records, File writeLocation) {
try {
if (writeLocation.exists()) {
if (!writeLocation.delete()) {
System.err.println("Failed to delete existing file.");
return;
}
}
GenericDatumWriter<GenericRecord> datumWriter = new GenericDatumWriter<>(schema);
DataFileWriter<GenericRecord> dataFileWriter = new DataFileWriter<>(datumWriter);
dataFileWriter.create(schema, writeLocation);
for (Point record: records) {
GenericRecord genericRecord = new GenericData.Record(schema);
genericRecord.put("x", record.getX());
genericRecord.put("y", record.getY());
dataFileWriter.append(genericRecord);
}
dataFileWriter.close();
} catch (IOException e) {
e.printStackTrace();
System.out.println("Error writing Avro file.");
}
}
The method converts Point objects into Avro format by structuring them as GenericRecord instances according to the provided Schema. The GenericDatumWrite serializes these records, which are then written to an Avro file using DataFileWriter.
Let’s verify the file is written to a file and exists:
private File dataLocation;
private File jsonDataLocation;
...
@BeforeEach
public void setup() {
// Load files from the resources folder
ClassLoader classLoader = getClass().getClassLoader();
dataLocation = new File(classLoader.getResource("").getFile(), "data.avro");
jsonDataLocation = new File(classLoader.getResource("").getFile(), "data.json");
...
}
...
@Test
public void whenAvroContentWrittenToFile_ThenExist(){
Schema schema = avroFileToJsonFile.inferSchema(p);
avroFileToJsonFile.writeAvroToFile(schema, List.of(p), dataLocation);
assertTrue(dataLocation.exists());
}
Next, we’ll read the file from the stored location and write it back to another file in JSON format.
Let’s create a method called readAvroFromFileToJsonFile to handle that:
public void readAvroFromFileToJsonFile(File readLocation, File jsonFilePath) {
DatumReader<GenericRecord> reader = new GenericDatumReader<>();
try {
DataFileReader<GenericRecord> dataFileReader = new DataFileReader<>(readLocation, reader);
DatumWriter<GenericRecord> jsonWriter = new GenericDatumWriter<>(dataFileReader.getSchema());
Schema schema = dataFileReader.getSchema();
OutputStream fos = new FileOutputStream(jsonFilePath);
JsonEncoder jsonEncoder = EncoderFactory.get().jsonEncoder(schema, fos);
while (dataFileReader.hasNext()) {
GenericRecord record = dataFileReader.next();
System.out.println(record.toString());
jsonWriter.write(record, jsonEncoder);
jsonEncoder.flush();
}
dataFileReader.close();
} catch (IOException e) {
throw new RuntimeException(e);
}
}
We read the Avro data from readLocation and write it as JSON to jsonFilePath. We use the DataFileReader to read GenericRecord instances from the Avro file, then serialize these records into JSON format using JsonEncoder and GenericDatumWriter.
Let’s proceed to confirm the content of the JSON content written to the file produced:
@Test
public void whenAvroFileWrittenToJsonFile_ThenJsonContentEquals() throws IOException {
avroFileToJsonFile.readAvroFromFileToJsonFile(dataLocation, jsonDataLocation);
String text = Files.readString(jsonDataLocation.toPath());
assertEquals(expectedOutput, text);
}
5. Conclusion
In this article, we explored how to write Avro’s content to a file, read it, and store it in a JSON-formatted file, using examples to illustrate the process. Additionally, it is worth noting that schemas can also be stored in a separate file instead of being included with the data.
The implementation of the examples and code snippets can be found over on GitHub.