DEV Community

Cover image for protobuf-related: intro + write a proto and use it
yactouat
yactouat

Posted on

protobuf-related: intro + write a proto and use it

proto whuut ?

Protocol buffers are Google's interface description language.

They provide an effective serialization technique that outperforms the JSON's format in terms of size and de serialization speed.

According to this excellent definition from the Microsoft docs, serialization is the process of converting an object into a stream of bytes to store the object or transmit it to memory, a database, or a file. Its main purpose is to save the state of an object in order to be able to recreate it when needed. The reverse process is called deserialization....

Like Monsieur Jourdain was making prose without knowing it (🇫🇷 reference to Molière), we all serialize/deserialize stuff without even thinking about it:

  • serialization when weJSON.stringify(myObj) an object on one hand; for instance, when saving an object at a given state in the local storage of the browser
  • deserialization when weJSON.parse(someENCODEDString) on the other hand; for instance, when we retrieve the same object we stored in the local storage; here I insist on the term ENCODED as it will help you understand what protocol buffers do

So, to summarize: protocol buffers perform better than JSON because the payload of the encoded transferred data is lighter with the former, also deserializing this format is faster than deserializing JSON's.

ok, cool, but why using this ? I have a decent Internet connection and the JSON format is really handy

If you don't care about the size of the data you transfer and how fast it's decoded (shame on you), consider this: protocol buffers provide an easy way to write cross-platforms and languages common data access objects; moreover you can define entire RPC services with them.

Imagine that you have multiple web apps/API's that need to talk together at some point but are written in different languages (PHP, JS, Python, you name it); somehow these apps are built within the same domain logic and will share objects that will always represent the same thing: for instance, an ecosystem of apps built around libraries and books will likely share the same definition of what a book is.

What, besides the good will of the developers working with you, will guarantee you that your definitions of these common objects wont diverge at some point ? Don't you need some kind of a guarantee that all apps in your ecosystem will always share the exact same definition of what x and y are ?

That's where protocol buffers come to the rescue: you write what a book (the shared data access object) should look like once in a language-agnostic syntax and then you can import that definition in any codebase using the right tools for that: the protoc compiler and the protocol buffers runtime for your language.

These tools will help you generate code in your target language.

let's get to it: installing the whole shabang

I will be quite short in this section: just follow the guidelines provided by Google.

This means:

  • installing the compiler for your platform (Mac, Windows, Linux) from the latest releases
  • installing the runtime(s) that will allow you to use protocol buffers in your target language(s)

I'll go for the PHP runtime. They give you 2 choices in the readme: either install the C extension or install a Composer package (with the C extension being more performant and the Composer package being more portable to various platforms).

Installing the PHP runtime as a package is as simple as composer require google/protobuf in your project.

now what ? now let's write a proto

Protocol buffers or, if you prefer, your apps shared definitions of what an object or a service should look like, are stored in .proto files.

Let's write a simplistic one for our book example, in a directory called protos at the root of the project in which you installed the google\protobuf package:

book.proto =>

syntax = "proto3";

package protos.models;

/**
    a book is what represents the main object shared within our ecosystem
 */
message Book {
  // the title of the book
  string title = 1;
  // the person who wrote the book
  string author = 2;
  // the year when the book was first published
  int32 year_of_publication = 3;
}
Enter fullscreen mode Exit fullscreen mode

Let's break this down:

  • syntax = "proto3"; is the protocol buffers syntax version we're using, currently we're at 3
  • package protos.models; will be translated into a PHP namespace when you'll generate the PHP code based on that definition with protoc; this means that, when we'll run the command to do that, a Protos\Models\Book.php file will be created; you can name your packages however you see fit
  • message Book {, a message is the object you define, it is called so because it is destined to be transferred; a message can contain 0 to n fields
  • string title = 1;, this is how you define a field and the number of this field (in ascending order): you give it its type (full list of available types is here), then its name, and finally its number; fields numbers identify the field in the message binary format and are meant to be stable, meaning not to change over time when your protos definitions evolve
  • you noticed that I wrote a lot of comments that may seem redundant, but this is important as .protoc will generate doc blocks for you when you compile that definition into a regular PHP class (or in any other language); besides, since these protos are meant to represent the core business logic of your organization, being as explicit and clear as one can be cant hurt

yay I wrote a proto, now I'm lazy and I dont want to write PHP to reflect that

So far, we have at the root of our project:

  • a composer.json (I assume you installed the dependencies)
  • a protos/book.proto file containing the definition of what a book should look like

Let's add to that a src folder for our human-written PHP code and let's also configure our autoloading settings in the composer.json:

    //
    "autoload": {
        "psr-4": {
            "App\\": "src/",
            "Protos\\": "protos/",
            "GPBMetadata\\": "gpbmetadata/",
            "Tests\\": "tests/"
        }
    },
    //
Enter fullscreen mode Exit fullscreen mode

While you're at it you can require phpunit/phpunit in your dev dependencies and create a tests folder at the root of the project... but wait ! I did not create a protos or a gpbmetadata folder in my project 🤔. Let's see why.

When I run protoc --proto_path=./protos --php_out=./ protos/*.proto; this will create a Protos\Models folder structure containing our ready to use Book.php auto generated class ! It will also create a GPBMetadata folder which contains a meta counterpart of the class you generated that will allow the google/protobuf package to interact with this generated code when you'll use it.

As you can see, our Book.php is a full-fledged class that inherits from Google's Message class. You are not to modify the generated classes and the metadata around it in any other way than using protoc after you have modified your protos; if you do so, that would defeat the purpose of having unified data access objects across languages and codebases...

I want to test this !

In a real world app', you would send data using this generated code using Pub/Sub or gRPC to benefit from the performance of this format. Here, we are just going to write a test to show how to encode a data access object created with protocol buffers and how to decode it (from, supposedly another app' that is waiting for that message):

<?php

namespace Tests;

use Protos\Models\Book;
use PHPUnit\Framework\TestCase;

class BookTest extends TestCase
{

    /**
     * @test
     */
     public function encodedTitleGetsProperlyDecoded() {
         $expected = "Moby Dick";
         // we create a book instance
         $book= new Book();
         // we set the title of the book using an auto generated setter already documented with dock blocks
         $book->setTitle($expected);
         // now let's prep' our object for transfer by encoding it
         $bookSerialized= $book->serializeToString();
         // let's create an empty book message that will serve as a placeholder when we'll decode the data
         $actual = new Book();
         try {
             // this method present on all classes that extend Google's Message will allow us to hydrate our empty book with the contents of the serialized one (this will happen when communicating between apps in the real world)
             $actual->mergeFromString($bookSerialized);
         } catch (\Exception $e) {
             $this->fail("failed decoding the book payload");
         }
         $this->assertSame($expected, $actual->getTitle());
     }

}
Enter fullscreen mode Exit fullscreen mode

While this whole thing seems like a lot of efforts to just write models in a PHP app', trust me it:

  • will make your life easier when you're in situations when different codebases need to talk together using the same objects
  • improve the network and processing performance of your apps, especially when dealing with a lot of data transfers
  • open the doors of exicting technologies like Pub/Sub and gRPC, to name a few

When learning, we tend to dive into the cool stuff first, like how Pub/Sub works, and leave aside technical underlying foundations that look more boring, like protocol buffers. Don't do that.

Instead, play around with protoc, the various fields types in a proto; create protos and use them in a PHP project, then do the same thing in NodeJS and try to let both PHP and NodeJS apps communicate by sharing the same proto-generated objects data over the network...

Next, you'll be able to get acquainted with schema driven development and how to centralize all of your protos definitions in one place with buf (look that up it's fascinating) alongside the good practices when implementing that kind of workflow.

I hope you enjoyed learning about protocol buffers, see you then !

Discussion (0)