DEV Community

Cover image for Beyond JSON: A Beginner's Guide to Data Serialization
danielAsaboro
danielAsaboro

Posted on

Beyond JSON: A Beginner's Guide to Data Serialization

For every thirty mins flight you take in my home country, Nigeria — you get exposed to different languages, culture, and diverse mode of operation.

Alongside the languages, you are immersed in a variety of customs, traditions, and ways of life showcasing the beautiful diversity that thrives within the country.

For the most part, they live in peace and harmony.

However, the past was different. There were challenges, rivalries, and fragmented communication. This stunted their growth until the arrival of British colonialists.

They brought a universal language — English — which encouraged collaboration and united efforts towards shared goals.

Web programming languages have the same dynamics

List of Programming Languages from Google

HTML, to CSS, JavaScript, Python, Ruby, PHP — Each language operating with its unique syntax, frameworks, and paradigms —bringings its own set of strengths, serving different purposes and catering to varying needs:

  • HTML provides the structure and content of web pages.
  • CSS brings style and visual appeal, and
  • JavaScript adds interactivity and dynamic behavior.

  • Python offers simplicity and versatility.

  • Ruby emphasizes elegant code readability, and

  • PHP excels in server-side scripting.

The tricky question is, the English language was necessary for collaboration in my country. What's the equivalent for the Web programming languages?

What's preventing the chaos that would have happened if there wasn't a way to communicate and share data between diverse part across systems?

Data Serialization: The unsung hero of Seamless Communication

Data serialization is the process of transforming data into a standardized representation (format) that can be understood by different systems or programming languages for easy storage and transmission.

It ensures persistence, interoperability, and compatibility among intricate web of interconnected software systems. It doesn't matter whether it's a microservice architecture, a distributed computing environment, or a simple data transfer — serialization is the glue that binds them together.

Think of data serialization as a universal translator, breaking down the barriers of programming languages, enabling seamless communication between disparate systems, paving the way for collaboration, and allowing components built with different technologies to effortlessly exchange data.

For long, JSON has been the default standard

Initially developed for JS, JSON was quickly adopted accross all board - displacing XML (another data serialization format that had early popularity and looked like HTML, the web's foundation).

How did JSON do it?

For one, XML was clunky and confusing. But JSON? It was simple to use. It's just a pair of keys and values. Besides that, data types like Strings, Numbers, Arrays, Objects, Literals, false, true, null, can be nested in it.

It was also human-friendly and readable.

XML:

XML code

JSON equivalent:

Javscript script equivalent of the XML code above

Both XML and JSON are open standards that anyone could use and contribute to, compressible, but JSON was able to tack on Javascript's growing popularity.

But for all its good, JSON has it's own limitation

They include lack of support for binary data, limited data typing, and verbosity compared to other formats like Protocol-Buffers or MessagePack.

Lets dive into a few limitations:

1. Lack of built-in support for data types:

JSON's lack of built-in support for data types poses challenges in ensuring data integrity and type safety. Without explicit type information, developers must rely on conventions or additional mechanisms to validate and handle data appropriately.

2. Schema evolution complexity

Similarly, modifying the structure of JSON data can be complex, particularly in distributed systems where different versions of schemas may coexist. Managing schema evolution requires careful consideration to maintain compatibility and prevent data inconsistencies.

3. No support for Binary data and Complex Data Structures

JSON is primarily designed for representing structured data using plain text, which limits its suitability for efficiently handling binary data like images or audio files.

Representing complex data structures or nested relationships in JSON can also be cumbersome and less concise compared to other formats specifically designed for such scenarios. This limitation can impact performance and increase the complexity of handling and manipulating data with intricate relationships.

4. No inherent support for comments

JSON lacks a standardized way to include comments within the data itself. This absence of comment support can make it challenging to add explanatory or contextual information directly within the serialized data.

Comments can be helpful for providing documentation or additional details about the data, aiding in its comprehension and maintenance. Without this feature, developers may need to rely on external documentation or conventions to provide necessary context for the JSON data. More on that here

All hope is not lost

Fortunately, and over the years, alternative data serialization formats have emerged to address these limitations and cater to specific needs.

One of them is Protocol Buffers (protobuf). It offers a compact binary representation, efficient parsing, and built-in schema evolution support.

Apache Avro is another example. It combines a compact binary format with a flexible schema system and data typing. Additionally, MessagePack provides a compact binary format with support for multiple programming languages.

These alternative formats provide enhanced performance, smaller data sizes, strong data typing, and schema evolution capabilities, depending on the specific requirements of your project.

Without explanation, here are other examples:

  1. CBOR
  2. CSV
  3. XML-RPC
  4. YAML
  5. TSV
  6. S-Expressions
  7. BIN
  8. BSON
  9. PICKLE and so on.

Each have their specialization:

  • JSON for REST APIs
  • YAML for containers
  • Protobuf for gRPC

It's important you now this as a developer inorder to make informed choices based on your specific project needs, optimize the way data is stored, shared, and transmitted across diverse systems

I'm not saying JSON is dead. All I'm saying is there might be better alternatives to JSON for your specific scenerios. But first, you must know about them — which is the goal of this post.

Bye.

Top comments (0)