Pigeon Codeur

Posted on Nov 6, 2024

Building a Custom C++ Serializer for Efficient Data Handling

#cpp #performance #gamedev #data

Serialization is fundamental to engine development and data-driven applications, enabling complex objects to be saved, transferred, and loaded easily. In my custom game engine, I needed a serializer that could manage various types of data structures while ensuring efficient and readable output. This post explores the design and functionality of the C++ serializer I built, capable of handling basic and custom data types with ease.

This guide will help you understand how to create a flexible serializer, including practical examples and test cases to ensure reliability.

Converting Complex Data Structures to a Readable Format

Serialization allows developers to convert complex C++ structs into a format that’s both human-readable and easy to restore. This is particularly useful in game engines where game objects, configurations, and states need to be saved and reloaded frequently.

For example, consider the following Texture2DComponent struct in a game engine:

struct Texture2DComponent : public Ctor
{
    std::string textureName;
    float opacity = 1.0f;
    constant::Vector3D overlappingColor = {0.0f, 0.0f, 0.0f};
    float overlappingColorRatio = 0.0f;
};

When serialized, this struct might look like this in YAML:

Texture2DComponent:
    textureName: "exampleTexture"
    opacity: 0.8
    overlappingColor: [1.0, 0.0, 0.5]
    overlappingRatio: 0.75

This transformation makes the data easy to read and edit manually, which can be a huge advantage during development, it also helps a lot in transferring data across the internet. In this post, we’ll walk through how to build this serializer, exploring its design and extensibility.

Core Design and Structure of the Serializer

In this chapter, I’ll walk through the core design of the serializer, which includes three main classes:

Archive: The core class responsible for constructing serialized data.
UnserializedObject: A class for handling deserialized data, allowing easy access to serialized attributes and nested structures.
Serializer: The main manager that coordinates file handling, saving, and loading operations.

Each of these classes plays a critical role in ensuring that data can be serialized and deserialized in a structured, reliable manner. Let’s dive into each component.

The Archive Class: Building the Serialized Data

The Archive class acts as a container for the serialized string. It manages formatting, indentation, and data flow, ensuring that the serialized output is both readable and parsable.

Key Design Features of `Archive`:

Indentation Control:
- To keep the serialized output clean and readable, the Archive class uses an indentation level (indentLevel) that increases or decreases depending on the depth of the data structure. Each new line starts with the appropriate number of tabs based on indentLevel.
End of Line Struct:
- The Archive class includes an EndOfLine helper struct that handles line breaks and resets formatting flags (requestNewline and requestComma). This struct ensures that each data entry is properly formatted, with commas and line breaks applied where necessary.
Operator Overloading:
- Overloading the << operator allows Archive to handle multiple data types seamlessly. The template-based operator<< ensures that any type of data can be serialized, provided it has a matching serialize function or template specialization. This feature gives the serializer the flexibility to handle simple data types, custom components, and complex structures alike.

Here’s a snippet of the Archive class in action:

Archive& operator<<(const EndOfLine&)
{
    requestComma = true;
    requestNewline = true;
    return *this;
}

template <typename Type>
Archive& operator<<(const Type& rhs)
{
    if (requestNewline)
    {
        if (requestComma)
            container << ",";

        container << std::endl;
        container << std::string(*endOfLine.indentLevel, '\t');
        requestNewline = false;
    }

    container << rhs;
    return *this;
}

With these design choices, Archive keeps serialized data clean, ensuring that even complex structures are readable and formatted correctly.

The UnserializedObject Class: Parsing Deserialized Data

The UnserializedObject class represents the deserialized data, allowing us to access fields by name and handle nested structures. It holds metadata such as object names and types, making it easier to retrieve individual fields from serialized data.

Key Features of `UnserializedObject`:

Attribute Handling:
- UnserializedObject includes a helper method, getAsAttribute, that retrieves individual attributes within serialized data. This is particularly useful for objects containing a mix of fields, as each attribute is stored with a unique name for easy access.
Error Checking and Logging:
- The UnserializedObject class performs checks to validate serialized data during deserialization, including checks for missing or mismatched braces, delimiters, and attribute names. Errors are logged to simplify debugging.
Overloaded Operators for Field Access:
- Operator overloads, such as operator[], make it easy to access attributes by name or index. This approach simplifies the code for handling nested data, allowing for intuitive retrieval of deserialized objects.

Here’s an example of how UnserializedObject handles attribute retrieval:

const UnserializedObject& UnserializedObject::operator[](const std::string& key)
{
    auto isObjectName = [=](UnserializedObject obj) { return obj.objectName == key; }; 
    auto it = std::find_if(children.begin(), children.end(), isObjectName);

    if (it != children.end())
        return *it;
    else
    {
        LOG_ERROR(DOM, "Requested child '" + key + "' not present in the object");
        return children[0];
    }
}

With UnserializedObject, parsing serialized data becomes straightforward, supporting intuitive access to data fields while maintaining error handling and flexibility.

The Serializer Class: Managing Files and Serialization Flow

The Serializer class orchestrates the entire serialization and deserialization process. It handles file input and output, reading and writing serialized data, and storing serialized objects in a serializedMap for easy retrieval.

Key Functions in `Serializer`:

File Management:
- The Serializer can read from and write to files, supporting both direct paths and file objects. It also has an optional auto-save feature (autoSave) that automatically writes serialized data to file when the program exits.
Version Control:
- Each serialized file includes a version header. This allows the Serializer to parse files according to different serialization formats if needed, supporting backward compatibility.
Serializing and Deserializing Objects:
- The Serializer class includes methods for serializing (serializeObject) and deserializing (deserializeObject) various data types. These methods use Archive and UnserializedObject instances to manage the data flow, ensuring that objects are serialized and deserialized consistently.

Here’s how Serializer initializes a file read:

void Serializer::readFile(const std::string& data)
{
    LOG_THIS_MEMBER(DOM);

    if (data.empty())
    {
        LOG_MILE(DOM, "Reading an empty file");
        return;
    }

    std::string line;
    std::istringstream stream(data);

    // First line of the file should always be the version number
    std::getline(stream, line);
    version = line;

    auto stringData = gulp(stream);
    LOG_INFO(DOM, stringData);
    serializedMap = readData(version, stringData);
}

The Serializer’s ability to handle file-based storage and version control makes it ideal for game development, where serialized data needs to be saved, loaded, and versioned consistently.

The combination of Archive, UnserializedObject, and Serializer provides a powerful and flexible system for managing data in a structured, human-readable format. By controlling indentation, handling errors, and managing complex structures with ease, this serializer is a valuable tool for game development. It enables the efficient saving and loading of game state, assets, and configurations, making the development process smoother and more efficient.

In the next chapters, we’ll dive deeper into each component, exploring specific features like attribute handling, error checking, and practical examples for custom components.

Data Types and Extensibility

This custom C++ serializer is built to handle both basic data types and complex structures, a feature that’s particularly useful in game engines where data persistence and readability are essential. In this chapter, we’ll explore how basic types are serialized and deserialized, and how this system can be extended to support custom types.

The serializer achieves flexibility through template specializations for each type, allowing us to control precisely how different types of data are stored and retrieved. Let’s go through the handling of basic types first, and then look at how the system can easily accommodate custom types like vectors and models.

Basic Type Serialization and Deserialization

Each basic type has its own serialize and deserialize template specialization. This allows the serializer to convert each type to a string representation with a label, which is then used to identify the type during deserialization.

Below are some examples:

Boolean:

   template <>
   void serialize(Archive& archive, const bool& value) {
       LOG_THIS(DOM);
       std::string res = value ? "true" : "false";
       archive.setAttribute(res, "bool");
   }

   template <>
   bool deserialize(const UnserializedObject& serializedString) {
       LOG_THIS(DOM);
       auto attribute = serializedString.getAsAttribute();
       if (attribute.name != "bool") {
           LOG_ERROR(DOM, "Serialized string is not a bool (" << attribute.name << ")");
           return false;
       }
       return attribute.value == "true";
   }

The bool serializer converts the value to "true" or "false", which can be easily checked during deserialization.

Integer:

   template <>
   void serialize(Archive& archive, const int& value) {
       LOG_THIS(DOM);
       archive.setAttribute(std::to_string(value), "int");
   }

   template <>
   int deserialize(const UnserializedObject& serializedString) {
       LOG_THIS(DOM);
       int value = 0;
       auto attribute = serializedString.getAsAttribute();
       if (attribute.name != "int") {
           LOG_ERROR(DOM, "Serialized string is not an int (" << attribute.name << ")");
           return value;
       }
       std::stringstream sstream(attribute.value);
       sstream >> value;
       return value;
   }

Here, integers are converted to strings, with type validation during deserialization to ensure that data remains consistent.

Floating Point Types:

   template <>
   void serialize(Archive& archive, const float& value) {
       LOG_THIS(DOM);
       archive.setAttribute(std::to_string(value), "float");
   }

   template <>
   float deserialize(const UnserializedObject& serializedString) {
       LOG_THIS(DOM);
       float value = 0;
       auto attribute = serializedString.getAsAttribute();
       if (attribute.name != "float") {
           LOG_ERROR(DOM, "Serialized string is not a float (" << attribute.name << ")");
           return value;
       }
       std::stringstream sstream(attribute.value);
       sstream >> value;
       return value;
   }

The floating-point serializers handle both float and double, converting these to strings for easy readability and conversion.

String:

   template <>
   void serialize(Archive& archive, const std::string& value) {
       LOG_THIS(DOM);
       archive.setAttribute(value, "string");
   }

   template <>
   std::string deserialize(const UnserializedObject& serializedString) {
       LOG_THIS(DOM);
       auto stringAttribute = serializedString.getAsAttribute();
       if (stringAttribute.name != "string") {
           LOG_ERROR(DOM, "String attribute name is not 'string' (" << stringAttribute.name << ")");
           return "";
       }
       return stringAttribute.value;
   }

Strings are handled directly, stored without conversion, and labeled as "string" for consistency during deserialization.

Custom Type Support with Template Specializations

For game engines, custom data types like vectors and models are essential. Using template specializations, we can handle these types in the same way as basic types by creating specific serialize and deserialize functions for each custom type.

Vector2D:

   template <>
   void serialize(Archive& archive, const constant::Vector2D& vec2D) {
       LOG_THIS(DOM);
       archive.startSerialization("Vector 2D");
       serialize(archive, "x", vec2D.x);
       serialize(archive, "y", vec2D.y);
       archive.endSerialization();
   }

   template <>
   constant::Vector2D deserialize(const UnserializedObject& serializedString) {
       LOG_THIS(DOM);
       auto x = deserialize<float>(serializedString["x"]);
       auto y = deserialize<float>(serializedString["y"]);
       return constant::Vector2D{x, y};
   }

For Vector2D, each component (x and y) is serialized individually. During deserialization, each component is retrieved and used to reconstruct the vector.

Vector3D and Vector4D: Similar functions are defined for Vector3D and Vector4D, with each component (x, y, z, and w) serialized as a separate attribute. Here’s an example for Vector3D:

   template <>
   void serialize(Archive& archive, const constant::Vector3D& vec3D) {
       LOG_THIS(DOM);
       archive.startSerialization("Vector 3D");
       serialize(archive, "x", vec3D.x);
       serialize(archive, "y", vec3D.y);
       serialize(archive, "z", vec3D.z);
       archive.endSerialization();
   }

   template <>
   constant::Vector3D deserialize(const UnserializedObject& serializedString) {
       LOG_THIS(DOM);
       auto x = deserialize<float>(serializedString["x"]);
       auto y = deserialize<float>(serializedString["y"]);
       auto z = deserialize<float>(serializedString["z"]);
       return constant::Vector3D{x, y, z};
   }

ModelInfo: The ModelInfo struct represents a more complex custom type. Each attribute, such as vertices and indices, is serialized as a list. Here’s an example:

   template <>
   void serialize(Archive& archive, const constant::ModelInfo& modelInfo) {
       LOG_THIS(DOM);
       archive.startSerialization("Model Info");

       std::string attribute = "[ ";
       for (unsigned int i = 0; i < modelInfo.nbVertices; i++)
           attribute += std::to_string(modelInfo.vertices[i]) + " ";
       attribute += "]";
       archive.setAttribute(attribute, "Vertices");

       attribute = "[ ";
       for (unsigned int i = 0; i < modelInfo.nbIndices; i++)
           attribute += std::to_string(modelInfo.indices[i]) + " ";
       attribute += "]";
       archive.setAttribute(attribute, "Indices");

       archive.endSerialization();
   }

This function serializes arrays as lists, which makes it easy to store and retrieve large datasets.

Extending the Serializer with New Data Types

The template-based design of this serializer makes it highly extensible. To add support for new types:

Define the serialize Function: Create a template specialization for serialize to store each field of the custom type.
Define the deserialize Function: Create a corresponding specialization for deserialize, retrieving each field from the serialized data.
Test the New Type: Once defined, new data types can be serialized and deserialized like any other, ensuring seamless integration.

This design allows new data types to be added with minimal changes to the core serializer, making it ideal for game engines where data structures evolve frequently.

With specialized functions for both basic and custom types, this serializer can handle diverse data structures in a game engine. Its template-based extensibility allows new types to be added easily, and each serialized field remains labeled and readable. By supporting types from simple integers to complex vectors, the serializer is robust, flexible, and ready for any type of data management needs in game development.

Practical Example – Serializing a `Texture2DComponent`

Serialization becomes particularly valuable in scenarios where complex game components need to be stored, loaded, and modified easily. In this chapter, we’ll go through a detailed example of serializing and deserializing a Texture2DComponent, a struct in our game engine that represents a 2D texture with properties like opacity, color, and a texture name.

By using template specializations, we can define custom serialization and deserialization functions for Texture2DComponent, making it possible to save this component’s data in a readable format, such as YAML, and retrieve it as needed.

The `Texture2DComponent` Struct

Here’s the structure of Texture2DComponent that we want to serialize:

struct Texture2DComponent
{
    std::string textureName;                     // Name of the texture
    float opacity = 1.0f;                        // Opacity level
    constant::Vector3D overlappingColor = {0.0f, 0.0f, 0.0f}; // Overlapping color
    float overlappingColorRatio = 0.0f;          // Intensity of the overlapping color
};

This struct includes various fields:

textureName: The name of the texture as a string.
opacity: A float representing the texture’s opacity level.
overlappingColor: A custom Vector3D struct, representing RGB color values.
overlappingColorRatio: A float that indicates the overlapping color's intensity.

To serialize and deserialize Texture2DComponent, we’ll create template specializations for each operation.

Step 1: Serializing `Texture2DComponent`

The serialize function for Texture2DComponent needs to convert each field into a human-readable format. Here’s the code for serializing Texture2DComponent:

template <>
void serialize(Archive& archive, const Texture2DComponent& value)
{
    LOG_THIS(DOM);  // Logging for debugging and tracking

    // Start the serialization process, labeling the component type
    archive.startSerialization("Texture2DComponent");

    // Serialize each field by its name
    serialize(archive, "textureName", value.textureName);
    serialize(archive, "opacity", value.opacity);
    serialize(archive, "overlappingColor", value.overlappingColor);
    serialize(archive, "overlappingRatio", value.overlappingColorRatio);

    // End the serialization
    archive.endSerialization();
}

In this code:

Start Serialization: archive.startSerialization("Texture2DComponent") begins the serialization process, identifying this block of data as a Texture2DComponent.
Serialize Fields: Each field in Texture2DComponent is serialized with a label. For instance, textureName is serialized as "textureName" so that it can be easily identified during deserialization.
End Serialization: We call archive.endSerialization() to mark the end of this component’s data.

The result is a structured, readable output that might look like this:

Texture2DComponent {
    textureName: "exampleTexture",
    opacity: 0.8,
    overlappingColor: {x: 1.0, y: 0.0, z: 0.5},
    overlappingRatio: 0.75
}

Each field is clearly labeled and indented, making it easy to understand and even edit directly if needed.

Step 2: Deserializing `Texture2DComponent`

Deserialization is the reverse process, where we reconstruct a Texture2DComponent from its serialized representation. Here’s the code for deserializing this component:

template <>
Texture2DComponent deserialize(const UnserializedObject& serializedString)
{
    LOG_THIS(DOM);  // Logging for debugging

    // Check if the serialized object is valid
    if (serializedString.isNull()) {
        LOG_ERROR(DOM, "Element is null");
        return Texture2DComponent{""};  // Return an empty Texture2DComponent if null
    }

    LOG_INFO(DOM, "Deserializing a Texture2DComponent");

    // Extract each field from the serialized object by its name
    auto textureName = deserialize<std::string>(serializedString["textureName"]);
    auto opacity = deserialize<float>(serializedString["opacity"]);
    auto overlappingColor = deserialize<constant::Vector3D>(serializedString["overlappingColor"]);
    auto overlappingColorRatio = deserialize<float>(serializedString["overlappingRatio"]);

    // Construct and populate the Texture2DComponent
    Texture2DComponent texture{textureName};
    texture.opacity = opacity;
    texture.overlappingColor = overlappingColor;
    texture.overlappingColorRatio = overlappingColorRatio;

    return texture;
}

Here’s what each part of this function does:

Null Check: if (serializedString.isNull()) checks if the serialized data is valid. If not, an empty Texture2DComponent is returned, and an error is logged.
Retrieve Fields: Each field is extracted from serializedString using the field’s label, ensuring it matches the serialized format. This process allows the component’s data to be restored to its original values.
Reconstruct the Component: After all fields are retrieved, they’re used to populate a new Texture2DComponent object, which is then returned.

This approach allows us to reconstitute the Texture2DComponent from its serialized form with minimal effort.

Sample Usage

Here’s a quick example of how you might use the serializer to handle a Texture2DComponent in code:

// Creating a Texture2DComponent instance
Texture2DComponent texture;
texture.textureName = "grass_texture";
texture.opacity = 0.9f;
texture.overlappingColor = {0.5f, 0.8f, 0.2f};
texture.overlappingColorRatio = 0.4f;

// Serializing the component
Archive archive;
serialize(archive, texture);

// Display the serialized output
std::cout << archive.container.str() << std::endl;

// Deserializing the component from serialized data
UnserializedObject unserializedObj(archive.container.str(), "Texture2DComponent");
Texture2DComponent deserializedTexture = deserialize<Texture2DComponent>(unserializedObj);

// Verifying the deserialized component
std::cout << "Deserialized texture name: " << deserializedTexture.textureName << std::endl;
std::cout << "Opacity: " << deserializedTexture.opacity << std::endl;

In this example:

We create a Texture2DComponent, fill it with data, and serialize it.
The serialized output can be printed, edited, or saved to a file.
We then create an UnserializedObject from the serialized data and use deserialize to reconstruct the Texture2DComponent.
Finally, we confirm the fields to ensure they match the original values.

Key Benefits of This Approach

Using this serializer for Texture2DComponent provides several advantages:

Readability: The serialized format is organized and easy to understand.
Modifiability: Fields are labeled, making it possible to edit values directly in serialized files.
Consistency: The serializer enforces a structure that can be easily parsed back into C++ objects.
Scalability: Adding new fields or even new types of components requires minimal code changes.

This example illustrates how custom template specializations allow complex components like Texture2DComponent to be serialized and deserialized easily. The process ensures data consistency and flexibility, making it simple to manage game components, configurations, and state information.

With this serializer, you have a powerful tool to handle everything from game assets to state data in a clear, readable format. The custom serializer is adaptable, handling both primitive and custom data types, and is a robust solution for data persistence in game development.

Conclusion and Final Thoughts

This custom serializer provides flexibility, readability, and efficiency, all of which are crucial for game development. By supporting both basic and custom types, it adapts to evolving game mechanics and data structures. Through rigorous testing, we ensure that the serializer can handle complex scenarios reliably.

If you’re building a game engine or data-driven application, this approach can streamline data management, improve debugging, and make saved data easily accessible.
The complete code is open-source — check it out here leave a star or a reaction if you found this post interesting, explore the examples, and consider contributing new features or improvements!

DEV Community

Building a Custom C++ Serializer for Efficient Data Handling

Converting Complex Data Structures to a Readable Format

Core Design and Structure of the Serializer

The Archive Class: Building the Serialized Data

Key Design Features of `Archive`:

The UnserializedObject Class: Parsing Deserialized Data

Key Features of `UnserializedObject`:

The Serializer Class: Managing Files and Serialization Flow

Key Functions in `Serializer`:

Data Types and Extensibility

Basic Type Serialization and Deserialization

Custom Type Support with Template Specializations

Extending the Serializer with New Data Types

Practical Example – Serializing a `Texture2DComponent`

The `Texture2DComponent` Struct

Step 1: Serializing `Texture2DComponent`

Step 2: Deserializing `Texture2DComponent`

Sample Usage

Key Benefits of This Approach

Conclusion and Final Thoughts

Top comments (0)

Read next

Understanding Rendering Layers in Web Development

Ruby Performance Evolution: From 1.0 to Today

All about Sound effects and Music (soundtrack) creation for game-dev

Go Create Pac-Man in One Day!

Converting Complex Data Structures to a Readable Format

Core Design and Structure of the Serializer

The Archive Class: Building the Serialized Data

Key Design Features of Archive:

The UnserializedObject Class: Parsing Deserialized Data

Key Features of UnserializedObject:

The Serializer Class: Managing Files and Serialization Flow

Key Functions in Serializer:

Data Types and Extensibility

Basic Type Serialization and Deserialization

Custom Type Support with Template Specializations

Extending the Serializer with New Data Types

Practical Example – Serializing a Texture2DComponent

The Texture2DComponent Struct

Step 1: Serializing Texture2DComponent

Step 2: Deserializing Texture2DComponent

Sample Usage

Key Benefits of This Approach

Conclusion and Final Thoughts

Read next

Understanding Rendering Layers in Web Development

Ruby Performance Evolution: From 1.0 to Today

All about Sound effects and Music (soundtrack) creation for game-dev

Go Create Pac-Man in One Day!

Key Design Features of `Archive`:

Key Features of `UnserializedObject`:

Key Functions in `Serializer`:

Practical Example – Serializing a `Texture2DComponent`

The `Texture2DComponent` Struct

Step 1: Serializing `Texture2DComponent`

Step 2: Deserializing `Texture2DComponent`