DEV Community

nadirbasalamah
nadirbasalamah

Posted on

Introduction to Protocol Buffers

There are many data structures that can be used to perform communication between applications like XML and JSON. These two data structures are commonly used in application development but there are many disadvantages in performance and data size. To solve this disadvantages can be done by using another alternative called protocol buffers.

Protocol Buffers

Protocol Buffers is a mechanism that developed by Google to perform data serialization. There are many advantages when using Protocol Buffers:

  • The data structure is tidy and more manageable.
  • Can be used for RPC communication.
  • Validation feature in data structure.
  • Better performance compared to XML and JSON.

There are many disadvantages of Protocol Buffers:

  • Only support some programming languages like Java, JavaScript, Go, C++. Probably in future, other programming languages support for protocol buffers will be added.

Setup

In order to use protocol buffers in programming language, the protocol buffers compiler is needed. The compiler can be downloaded here then choose the compiler based on the operating system that is used. For example, for windows operating system the compiler that is needed is protoc-3.17.3-win64.zip file.

For windows user, follow these steps.

Download the protocol buffers compiler in this link. Then choose for windows operating system (example: protoc-3.17.3-win64.zip).

Extract that file in a folder called proto3, this folder can be put in any location.

Open start then search for "environment variables" and then choose Edit the system environment variables.

Choose Environment Variables

Choose Environment Variables..

Choose Environment Variables Button

Choose Path in system variables.

Choose Path

Click new then add the folder location for proto3 folder for bin folder (example: D:\proto3\bin).

New path variable

Click OK. After the path variable is added. Check the compiler installation with protoc --version command. If the version is visible then the compiler is installed correctly.

Create a Protocol Buffers

In this tutorial, the protocol buffers version that is used is Protocol Buffers version 3 (proto3)

The data structure of protocol buffers looks like this:

message message_name {
    data_type field_name = tag;
    ..
}
Enter fullscreen mode Exit fullscreen mode

In this example, the protocol buffers is used to define a data structure for car entity.

// define the syntax type of protocol buffers
// the syntax type is proto3
syntax = "proto3";

// create a message
message Car {
  int32 id = 1;
  string manufacturer = 2;
  string name = 3;
  float mileage = 4;
  bool is_new = 5;
}
Enter fullscreen mode Exit fullscreen mode

Then naming convention for message name, enum and service is using capital letter for each first sentence like SearchRequest then for field name is using underscore _ like zip_code.

These are the basic data types that can be used in protocol buffers.

Data Type Description
int32 32 non decimal integer
string A group of alphanumeric characters
bool Contains true and false
float decimal integer
uint32 32 non decimal integer and must be positive integer

All supported data types in protocol buffers can be checked here.

In protocol buffers, the enum can be created if needed. This is the basic syntax to create an enum.

enum enum_name {
  enum_value = tag;
}
Enter fullscreen mode Exit fullscreen mode

In this protocol buffers, the enum called CarType is created.

// define the syntax type of protocol buffers
// the syntax type is proto3
syntax = "proto3";

// create a message
message Car {
  int32 id = 1;
  string manufacturer = 2;
  string name = 3;
  float mileage = 4;
  bool is_new = 5;
  uint32 vin = 6;

  // create an enum
  enum CarType {
    UNKNOWN = 0;
    RACE_CAR = 1;
    ROAD_CAR = 2;
  }
  // using enum
  CarType car_type = 7;
}
Enter fullscreen mode Exit fullscreen mode

Another data type that is supported is map. Map is a data type that can store many key value pairs. This is the example of map usage.

syntax = "proto3";

message Student {
  string name = 1;
  uint32 student_id = 2;
  // using map
  // key: string
  // value: string
  map<string, string> courses = 3;
}
Enter fullscreen mode Exit fullscreen mode

The message that already created is also a data type that can be used. This is the example of using another message as a data type.

syntax = "proto3";

message Student {
  string name = 1;
  uint32 student_id = 2;

  // using Course message as a data type
  repeated Course courses = 3;
}

message Course {
  string code = 1;
  string name = 2;
}
Enter fullscreen mode Exit fullscreen mode

To import another protocol buffers file, use the full path from the root folder that is used to store protocol bufers files. This is the example of import mechanism in protocol buffers. The protocol buffers file that is imported is person.proto.

person.proto file in models folder.

syntax = "proto3";

message Person {
  string name = 1;
  string address = 2;
  int32 age = 3;
}
Enter fullscreen mode Exit fullscreen mode

The car.proto file is using person that is imported.

// using proto3 syntax
syntax = "proto3";

// import person.proto
import "models/person.proto";

// create a message
message Car {
  int32 id = 1;
  string manufacturer = 2;
  string name = 3;
  float mileage = 4;
  bool is_new = 5;
  uint32 vin = 6;

  // create an enum
  enum CarType {
    UNKNOWN = 0;
    RACE_CAR = 1;
    ROAD_CAR = 2;
  }
  CarType car_type = 7;

  // using person that is imported
  Person owner = 8;
}
Enter fullscreen mode Exit fullscreen mode

Rules in Protocol Buffers

There are many rules that can be used in protocol buffers.

Rule Description
repeated A field could contains many values
oneof A field's value must be chosen from the specified value choices

By default in proto3, the field without specified rule could contain empty value.

The repeated rule is used to create a field that can store many values like list or array. For example, if there is a field repeated int32 numbers = 1;. This field means that a numbers can store many values that has a int32 data type.

This is the example of rule usage in protocol buffers.

syntax = "proto3";

message Student {
  string name = 1;
  string student_id = 2;
  repeated Course courses = 3;
  uint32 member_id = 4;
  oneof card_number {
    string student_card_number = 5;
    string id_card_number = 6;
  }
}

message Course {
  string code = 1;
  string name = 2;
}
Enter fullscreen mode Exit fullscreen mode

Based on the code above, the courses field could contains many values that has a Course data type. The oneof rule is applied into card_number field which means one of the card_number value must be chosen from student_card_number or id_card_number.

Changes in Protocol Buffers

If the changes is occurred in protocol buffers that is used in application. There are many rules when the change is occurred in protocol buffers.

  • Tag number change is not allowed.
  • The field name can be changed
  • If there is a new field addition, the new field is filled automatically with the default value based on the specified data type. The default values for each data type can be checked here.

In this example, the new field is added in protocol buffers file called Blog.

Before new field addition.

syntax = "proto3";

message Blog {
  string title = 1;
  string author = 2;
  string content = 3;
}
Enter fullscreen mode Exit fullscreen mode

After new field is added.

syntax = "proto3";

message Blog {
  string title = 1;
  string author = 2;
  string content = 3;
  // add new field
  string category = 4;
}
Enter fullscreen mode Exit fullscreen mode
  • If the field is unused. The unused field must be specified with OBSOLETE_ keyword or using reserve keyword for the unused field's tag number. Using reserve keyword is recommended to avoid bug.

This is the example of reserved keyword usage in student_id field.

syntax = "proto3";

message Course {
  string code = 1;
  string name = 2;
  //   string student_id = 3;
  //   string lecturer_name = 4;

  // student_id field with tag number 3 is not used
  reserved 3;
  // field called lecturer_name is not used
  reserved "lecturer_name";
}
Enter fullscreen mode Exit fullscreen mode
  • In enum, the addition, changes and removal can be performed.

In this example, the enum is changed.

Before changed.

syntax = "proto3";

message User {
  string name = 1;

  // using enum
  enum Role {
    UNKNOWN = 0;
    USER = 1;
  }
}
Enter fullscreen mode Exit fullscreen mode

After changed.

syntax = "proto3";

message User {
  string name = 1;

  // using enum
  enum Role {
    // default value for enum
    UNKNOWN = 0;
    // USER = 1;
    STAFF = 2;
    // add new value inside enum
    ADMIN = 3;

    // remove "USER" value from enum
    reserved "USER";
    reserved 1;
  }
  Role role = 2;
}
Enter fullscreen mode Exit fullscreen mode

Using Protocol Buffers

Protocol buffers can be used together with programming languages such as Java, JavaScript, Go, C++ and Dart. In this example, the protocol buffers is used with Go programming language.

The Go application project is created with this command. Make sure the domain name is specified based on your repository.

go mod init github.com/nadirbasalamah/protodemo
Enter fullscreen mode Exit fullscreen mode

Add some dependencies to use protocol buffers.

go get github.com/golang/protobuf
go get google.golang.org/protobuf
Enter fullscreen mode Exit fullscreen mode

Create a new protocol buffers file in src/model directory called student.proto. The option inside protocol buffers file is added so the generated code can be used from student package.

syntax = "proto3";
package tutorial;

// only for Golang
option go_package = "model;student";

message Student {
  string name = 1;
  string student_id = 2;
  repeated Course courses = 3;
  uint32 member_id = 4;
  oneof card_number {
    string student_card_number = 5;
    string id_card_number = 6;
  }
}

message Course {
  string code = 1;
  string name = 2;
}


Enter fullscreen mode Exit fullscreen mode

Generate code from protocol buffers so the code can be used together with Go. In this command, the file is generated from root directory src/ then the generated code is stored in src/. The protocol buffers file that will be generated is src/model/student.proto.

protoc -I=src/ --go_out=src/ src/model/student.proto
Enter fullscreen mode Exit fullscreen mode

If the file called student.pb.go is exists, then the generate operation is success.

The generated code is used in main.go file.

package main

import (
    "fmt"

    student "github.com/nadirbasalamah/protodemo/src/model"
)

func main() {
    // create a student entity
    var newStudent student.Student = student.Student{
        Name:      "Nathan Mckane",
        StudentId: "RVN2021",
        CardNumber: &student.Student_IdCardNumber{
            IdCardNumber: "12345321",
        },
    }

    // print some value from fields
    fmt.Println("Student Data")
    fmt.Println("Name: ", newStudent.GetName())
    fmt.Println("Student ID: ", newStudent.GetStudentId())
    fmt.Println("Card Number: ", newStudent.GetCardNumber())
}
Enter fullscreen mode Exit fullscreen mode

Output

Student Data
Name:  Nathan Mckane
Student ID:  RVN2021
Card Number:  &{12345321}
Enter fullscreen mode Exit fullscreen mode

Based on the code above, the generate result from protocol buffers is used in student package. The object from Student struct is created then the values from newStudent object is retrieved from provided getter methods.

Another example of protocol buffers usage is to convert protocol buffers into another data structure like JSON or vice versa.

package main

import (
    "fmt"
    "log"

    "github.com/golang/protobuf/jsonpb"
    "github.com/golang/protobuf/proto"
    student "github.com/nadirbasalamah/protodemo/src/model"
)

func main() {
    // create a student entity
    var newStudent student.Student = student.Student{
        Name:      "Nathan Mckane",
        StudentId: "RVN2021",
        CardNumber: &student.Student_IdCardNumber{
            IdCardNumber: "12345321",
        },
    }

    // add some courses
    newStudent.Courses = []*student.Course{
        {
            Code: "C001",
            Name: "Algorithm",
        },
        {
            Code: "C002",
            Name: "Data Structure",
        },
    }

    // print courses
    fmt.Println("Courses")
    for _, v := range newStudent.Courses {
        fmt.Println(v)
    }

    // convert to JSON
    var result string = convertToJSON(&newStudent)
    fmt.Println("to JSON: ", result)

    // convert to protocol buffers from JSON data
    var jsonData string = `{"name":"Ryan Cooper","studentId":"RPC2007","idCardNumber":"98753443"}`
    var protoResult student.Student = student.Student{}

    // call the function
    convertToProto(jsonData, &protoResult)
    fmt.Println("to proto: ", &protoResult)

}

func convertToJSON(pb proto.Message) string {
    var marshaler jsonpb.Marshaler = jsonpb.Marshaler{}
    result, err := marshaler.MarshalToString(pb)

    if err != nil {
        log.Fatalln("Cant convert to JSON", err)
        return ""
    }
    return result
}

func convertToProto(data string, pb proto.Message) {
    err := jsonpb.UnmarshalString(data, pb)
    if err != nil {
        log.Fatalln("Cant convert to proto, ", err)
    }
}

Enter fullscreen mode Exit fullscreen mode

Output

Courses
code:"C001"  name:"Algorithm"
code:"C002"  name:"Data Structure"
to JSON:  {"name":"Nathan Mckane","studentId":"RVN2021","courses":[{"code":"C001","name":"Algorithm"},{"code":"C002","name":"Data Structure"}],"idCardNumber":"12345321"}
to proto:  name:"Ryan Cooper"  student_id:"RPC2007"  id_card_number:"98753443"
Enter fullscreen mode Exit fullscreen mode

Notes

The protocol buffers can be used by these steps:

  1. Create a protocol buffers file.

  2. Generate a code from protocol buffers file based on the programming language that is used.

  3. Use the code from generated code.

The code example of using protocol buffers in Go can be checked here.

Sources

I hope this article is helpful to learn protocol buffers. If you have any thoughts, you can write it in the discussion section below.

Discussion (0)