DEV Community

loading...
Cover image for How to create LogicalType in Apache Avro

How to create LogicalType in Apache Avro

anilkulkarni87
A seasoned QA engineer transitioned to a Data engineer!
Originally published at anilkulkarni.com on ・4 min read

I am writing this post to understand how to create a custom LogicalType in Apache Avro. It helps in enforcing strict schema validations for the data in transit as well as rest within their systems. Dividing this into broader sections:

  1. Define a schema using Avro IDL
  2. Generate Java classes using gradle avro plugin
  3. Define Custom Logical Type
  4. Test the Custom LogicalType
  5. Next Steps (There is always a better way)

LogicalType in AvroDefine schema using Avro IDLLets go over in detail how to create a Custom LogicalType in Apache Avro.

  1. Define a schema using Avro IDLAny simple to complex schema can be defined easily using Avro IDL. Maven and Gradle plugins which aid in generation .avsc files and java classes for the defined schema. Below is a sample schema defined by me. Read more about Avro IDL here. As a standard practice we usually define default value for our fields to accommodate for backward compatibility whenever schema changes.
  2. Generate Java classes and schema file Gradle avro plugin to our rescue. The plugin has been updated many times. You might notice that what I have is an older version. The first task creates Avro protocol files, next task created schema files (.avsc files), The final task created the required java classes. The complete build file can be found here. Gradle Avro Plugin
  3. Define Custom LogicalTypeA logical type is an Avro primitive or complex type with extra attributes to represent a derived type. This basically is a custom defined type and can be leveraged for many use cases. Notice the “reversed” keyword in the .avdl file in Step 1. We need to define the LogicalType and the custom conversion for it. For example, What I am doing here is appending the data ingested as part of ‘queryAuthor‘ field with the word ‘reversed‘. Use Cases: -Create a type for encryption which encrypts the data ingested as well as decrypts the field later. -Create definition for Address (Line1, Line2, Line3, State etc ) and use that as Custom Logical type. -Choose to do a one way hash of the data ingested. You can find the LogicalType and Conversion class here: Logical Type: Click here Conversion: Click Here

Another important steps is to register this Logical Type which i am doing in the main method.
public class AvroExample {<br></br>public static void main(String[] args) {<br></br>LogicalTypes.register(ReversedLogicalType.REVERSED_LOGICAL_TYPE_NAME, new LogicalTypes.LogicalTypeFactory() {<br></br>private final LogicalType reversedLogicalType = new ReversedLogicalType();<br></br>@Override<br></br>public LogicalType fromSchema(Schema schema) {<br></br>return reversedLogicalType;<br></br>}<br></br>});
What I am doing in the Custom conversion is self explanatory if you look at the code below.
Image of Custom conversion class

  1. Test the Custom LogicalTypeMy approach to test the Logical Type is: a) Write data to .avro file. b) Read the data in the file. c) Read the files using avro tools Better explained with some screenshots: In the main method I am creating an avro file and we shall see the value before the data is saved as .avro file. Then we read the data from the file and print it before conversion happens.

Testing Logical Type
We also see the data in the avro file using avro-tools.
Avro-tools

  1. Next Steps or TODO: There is always a better way to do it. I am going to work on these and probably come up with another post – Github link is here. – Leverage latest version of plugin to register the LogicalType as part of the build. – Notice the TODO in QueryRecord.java. – Explore modifying velocity templates to include the LogicalTypes in the auto-generated code. – Some of my other learnings.

The post How to create LogicalType in Apache Avro appeared first on Anil Kulkarni | Blog.

Discussion (0)