DEV Community

Cover image for From Zero to Merge: Building a JSON Renaming Field Component in Go
Akash Jana
Akash Jana

Posted on

From Zero to Merge: Building a JSON Renaming Field Component in Go

Instill ai homepage

Introduction to Instill-ai

Working on Instill’s pipeline-backend project was like solving a jigsaw 🧩 puzzle—except some pieces kept changing names! My mission? To create a component that could rename JSON fields without creating conflicts. Join me as I share my journey through learning Go, studying Instill’s well-organized docs, and creating a solution that’s now merged and ready to roll! 🎉


The Challenge

Instill needed a way to rename fields in JSON data structures dynamically. The twist? We had to handle cases where a renamed field might clash with an existing field. Without a conflict resolution system, chaos would reign supreme!

Checkout the Issue here:

[JSON] Support Rename Fields for JSON operator #1133

Issue Description

Current State

  • It is very difficult to manipulate JSON data with JSON operator.

Proposed Change

  • Please fetch this JSON Schema to implement the functions.
  • Manipulating JSON data

JSON schema pseudo code

JsonOperator:
  Task: Rename fields
  
  Input:
    data: 
      type: object
      description: Original data, which can be a JSON object or array of objects.
    fields: 
      type: array
      description: An array of objects specifying the fields to be renamed.
      items:
        type: object
        properties:
          currentField: 
            type: string
            description: The field name in the original data to be replaced, supports nested paths if "supportDotNotation" is true.
          newField: 
            type: string
            description: The new field name that will replace the currentField, supports nested paths if "supportDotNotation" is true.
#    supportDotNotation:
#      type: boolean
#      default: true
#      description: Determines whether to interpret field names as paths using dot notation. If false, fields are treated as literal keys.
    conflictResolution:
      type: string
      enum: [overwrite, skip, error]
      default: overwrite
      description: Defines how conflicts are handled when the newField already exists in the data.
  
  Output:
    data:
      type: object
      description: The modified data with the specified fields renamed.

Key Features: conflictResolution: Handling conflicts when renaming fields in JSON, especially when working with nested objects and dot notation, is critical to avoid data loss or unexpected behavior. Allow users to specify how they want conflicts to be resolved (e.g., via a parameter such as conflictResolution: 'overwrite'|'skip'|'error'),

  • Provides flexibility and control to the user.
  • Adapts to different use cases.

Here are different strategies to manage conflicts and some considerations for each.

1. Overwrite the Existing Field (Default Behavior)

Description: If the newField already exists in the object, overwrite its value with the value from currentField. Pros:

  • Simple and straightforward.
  • Useful when the intention is to replace the existing value. Cons:
  • Can lead to data loss if not used carefully.

Implementation:

if new_key in obj:
    obj[new_key] = obj.pop(current_key)
else:
    obj[new_key] = obj.pop(current_key)

2. Skip the Renaming Operation

Description: If the newField already exists, skip the renaming operation for that particular field. Pros:

  • Prevents accidental overwriting of data.
  • Safeguards against potential conflicts without altering the existing data. Cons:
  • The currentField remains unchanged, which might not be the desired outcome.

Implementation:

if new_key in obj:
    # Skip renaming if new_key already exists
    continue
else:
    obj[new_key] = obj.pop(current_key)

3. Merge Values

Description: If both currentField and newField exist and contain objects or arrays, merge the two values. This approach is more complex but can be very powerful. Pros:

  • Preserves both sets of data.
  • Useful for combining information rather than choosing one over the other. Cons:
  • Can be complex to implement, especially if the data types of currentField and newField differ.
  • May require custom logic depending on how you want to merge the data (e.g., combining arrays, merging objects, etc.).

Implementation:

if new_key in obj:
    if isinstance(obj[new_key], dict) and isinstance(obj[current_key], dict):
        # Merge dictionaries
        obj[new_key].update(obj.pop(current_key))
    elif isinstance(obj[new_key], list) and isinstance(obj[current_key], list):
        # Merge lists
        obj[new_key].extend(obj.pop(current_key))
    else:
        # Handle other types (overwrite, append, etc.)
        obj[new_key] = obj.pop(current_key)
else:
    obj[new_key] = obj.pop(current_key)

4. Rename with a Suffix or Prefix

Description: If the newField already exists, rename the new field by appending a suffix or prefix (e.g., _1, _conflict) to avoid conflicts. Pros:

  • Both original and new data are preserved.
  • Easy to track conflicts. Cons:
  • The resulting data structure may become less predictable or harder to work with if many conflicts occur.

Implementation:

suffix = 1
original_new_key = new_key
while new_key in obj:
    new_key = f"{original_new_key}_{suffix}"
    suffix += 1
obj[new_key] = obj.pop(current_key)

5. Return an Error or Warning

Description: If a conflict is detected, stop the operation and return an error or warning to the user. This forces the user to address the conflict before proceeding. Pros:

  • Prevents accidental data overwriting.
  • Makes the user aware of potential issues immediately. Cons:
  • Halts the process, which might be undesirable in automated workflows.

Implementation:

if new_key in obj:
    raise ValueError(f"Conflict detected: '{new_key}' already exists.")
else:
    obj[new_key] = obj.pop(current_key)

Summary:

  • Overwrite: Simple and effective, but can lead to data loss.
  • Skip: Safe but may leave data unchanged.
  • Error/Warning: Forces user intervention; best for critical operations. Choose the strategy that best aligns with your application's needs and the user's expectations. Implementing a combination of these strategies, such as providing a default behavior with options for customization, can offer the best balance between usability and robustness.

Example Usage:

Scenario: Input data as JSON object

// input { "data": { "name": "John Doe", "age": 30, "address": { "street": "123 Main St", "city": "Anytown", "state": "CA" }, "state": "conflict" }, "fields": [ {"currentField": "address.street", "newField": "address.road"}, {"currentField": "state", "newField": "address.state"} ], // "supportDotNotation": true, "conflictResolution": "overwrite" }

Conflict Resolution Scenarios: 1. Overwrite (Default):

  • The state field in data would be moved to address.state, overwriting the existing address.state field.
  • Final output:
{
  "data": {
    "name": "John Doe",
    "age": 30,
    "address": {
      "road": "123 Main St",
      "city": "Anytown",
      "state": "conflict"
    }
  }
}

2. Skip:

  • The renaming of state to address.state would be skipped, so both state and address.state remain unchanged.
  • Final output:
{
  "data": {
    "name": "John Doe",
    "age": 30,
    "address": {
      "road": "123 Main St",
      "city": "Anytown",
      "state": "CA"
    },
    "state": "conflict"
  }
}

3. Error:

  • The process would raise an error, stopping execution, because address.state already exists. ValueError: Conflict detected: 'address.state' already exists.

Scenario: Input Data as an Array of Objects

If the input data is an array of objects, the logic needs to be adapted to handle each object in the array individually. The schema and the function would process each object within the array according to the specified fields and conflictResolution rules.

Below is an example demonstrating how the "Rename Fields" operation would work with input data that is an array of objects.

Input

{
  "data": [
    {
      "name": "John Doe",
      "age": 30,
      "address": {
        "street": "123 Main St",
        "city": "Anytown",
        "state": "CA"
      },
      "contacts": [
        {
          "type": "email",
          "value": "john.doe@example.com"
        }
      ]
    },
    {
      "name": "Jane Smith",
      "age": 28,
      "address": {
        "street": "456 Oak St",
        "city": "Othertown",
        "state": "NY"
      }
      // Note: Jane Smith does not have a "contacts" field
    }
  ],
  "fields": [
    {"currentField": "name", "newField": "fullName"},
    {"currentField": "address.street", "newField": "address.road"},
    {"currentField": "contacts.0.value", "newField": "contacts.0.contactInfo"},
    {"currentField": "age", "newField": "yearsOld"}
  ],
//  "supportDotNotation": true,
  "conflictResolution": "skip"
}

Explanation:

  • Field "name": The "name" field will be renamed to "fullName" for each object in the array.
  • Field "address.street": The "street" field inside the "address" object will be renamed to "road" for each object.
  • Field "contacts.0.value": The "value" field inside the first element of the "contacts" array will be renamed to "contactInfo" for the first object, but this step will be skipped for the second object because the "contacts" field does not exist.
  • Field "age": The "age" field will be renamed to "yearsOld" for each object.

Output:

{
  "data": [
    {
      "fullName": "John Doe",
      "yearsOld": 30,
      "address": {
        "road": "123 Main St",
        "city": "Anytown",
        "state": "CA"
      },
      "contacts": [
        {
          "type": "email",
          "contactInfo": "john.doe@example.com"
        }
      ]
    },
    {
      "fullName": "Jane Smith",
      "yearsOld": 28,
      "address": {
        "road": "456 Oak St",
        "city": "Othertown",
        "state": "NY"
      }
      // The "contacts" field is not present, so no renaming occurs for "contacts.0.value"
    }
  ]
}

Rules for the Component Hackathon

  • Each issue will only be assigned to one person/team at a time.
  • You can only work on one issue at a time.
  • To express interest in an issue, please comment on it and tag @kuroxx, allowing the Instill AI team to assign it to you.
  • Ensure you address all feedback and suggestions provided by the Instill AI team.
  • If no commits are made within five days, the issue may be reassigned to another contributor.
  • Join our Discord to engage in discussions and seek assistance in #hackathon channel. For technical queries, you can tag @chuang8511.

Component Contribution Guideline | Documentation | Official Go Tutorial

First time in my life, saw such a detailed issue ☠️ which is a good thing but quite overwhelming!

As I got into the issue, I realized that this wasn’t just a simple renaming task. It was like trying to change a tire while the car is still driving! 🚗💨 But the challenge was exactly what I needed to level up my coding skills.


Step 1️⃣: Learning Go and Exploring Instill’s Codebase

Since I was absolutely new to Go, I spent the first four days studying both the language and Instill’s codebase (huge shoutout to their amazing documentation 📚). Whether it’s the architecture, contribution guidelines, README documentation, or anything else, everything is very well defined 📝.
Moderators Anni and ChunHao were a huge help—Anni provided moral support 😃, while ChunHao patiently provided the technical 💻guidance I needed.

I was like, "Go what now?" But after a few days, Go went from "Golly, Oh no!" to "Great Opportunity" 💡. Their docs made it easy to dive into the Instill pipeline, and ChunHao’s clear instructions on the JSON schema were a huge help.

A Glimpse of Instill pipeline-backend:

GitHub logo instill-ai / pipeline-backend

⇋ A REST/gRPC server for Instill VDP API service

pipeline-backend

Integration Test

pipeline-backend manages all pipeline resources within Versatile Data Pipeline (VDP) to streamline data from the start component, through AI/Data/Application components and to the end component.

Concepts

Pipeline

In 💧 Instill VDP, a pipeline is a DAG (Directed Acyclic Graph) consisting of multiple components.

flowchart LR
s[Trigger] --> c1[OpenAI Component]
c1 --> c2[Stability AI Component]
c1 --> c3[MySQL Component]
c1 --> e[Response]
c2 --> e
Loading

Component

A Component serves as an essential building block within a Pipeline.

See the component package documentation for more details.

Recipe

A pipeline recipe specifies how components are configured and how they are interconnected.

Recipes are defined in YAML language:

variable
  # pipeline input fields
output:
  # pipeline output fields
component:
  <component-id>:
    type: <component-definition-id>
    task: <task-id>
    input:
      # values for the input fields
    condition: <condition> # conditional statement to execute or bypass the
Enter fullscreen mode Exit fullscreen mode

To be honest, I was starting to doubt if I could solve this issue, but then Anni dropped the perfect message that kept me going.

Screenshot of chat with Anni where she provided Moral support

Once I got comfortable, ChunHao, who had crafted a JSON schema for this task, gave me the green light🚦 to start coding. And so, the journey began!


Step 2️⃣: Designing the Solution

The key requirements were:

  1. Dynamic Renaming: Fields should be renamed without disturbing the JSON structure.
  2. Conflict Detection: We needed to spot any conflicts between the original and renamed fields.
  3. Conflict Resolution: A smooth solution, like appending a suffix, would prevent name clashes.

System Design


Step 3️⃣: Building the Component

Armed with coffee☕ and courage💪, I got down to coding. Here’s a sneak peek at the core logic:

Mapping Fields

First, I created a mapping system to track old and new field names. This was key to detecting conflicts.

func mapFields(fields map[string]string) map[string]string {
    newFieldMap := make(map[string]string)
    for oldName, newName := range fields {
        // Check for conflict
        if _, exists := newFieldMap[newName]; exists {
            newName += "_conflict" // Add suffix for conflicts
        }
        newFieldMap[oldName] = newName
    }
    return newFieldMap
}
Enter fullscreen mode Exit fullscreen mode

Any time a conflict was detected, the function added "_conflict" to the new name. It’s a simple trick that ensures our JSON fields stay unique and, most importantly, friendly to each other! ✌️

Renaming Fields

Once the field mappings were in place, the next step was applying them to our JSON data.

func renameFields(data map[string]interface{}, fieldMap map[string]string) map[string]interface{} {
    renamedData := make(map[string]interface{})
    for key, value := range data {
        newKey := fieldMap[key]
        renamedData[newKey] = value
    }
    return renamedData
}
Enter fullscreen mode Exit fullscreen mode

Here’s the logic that applies the mapped names to our JSON data. The result? Our data’s neatly renamed, conflicts resolved, and structure intact. 🔥

After creating the component dropped the draft PR & got a comment:

Comment from Chunhao that I haven't added the test code


Step 4️⃣: Testing and Refinement

After familiarizing myself with Instill's testing methods and learning how to create effective test cases, I proceeded further.

screenshot of commits created

Testing time! 🧪 I wrote tests covering everything from simple renames to complex edge cases with nested JSON fields. Each round of testing led to further refinements.

{
    name: "ok - rename fields with overwrite conflict resolution",
    // Test where renaming conflicts are resolved by overwriting the existing field.
    // Expected outcome: "newField" holds "value1", overwriting any previous value.
},

{
    name: "ok - rename fields with skip conflict resolution",
    // Test where conflicts are resolved by skipping the rename if the target field already exists.
    // Expected outcome: "newField" remains "value2" without overwrite.
},

{
    name: "nok - rename fields with error conflict resolution",
    // Test with "error" strategy, which should raise an error on conflict.
    // Expected outcome: Error message "Field conflict."
},

// Additional cases for missing required fields and invalid conflict resolution strategy
Enter fullscreen mode Exit fullscreen mode

Here’s where I’d love to share a personal reflection: Testing was the hardest part of this project 😮‍💨. There were times when I thought, "Is this test even doing what it’s supposed to?"

Just then, I ran into a lint issue

Lint issue comment by chunhao

He pointed out the problem and even provided the solution. All I had to do was implement it, but it was a reminder that even the smallest details matter in making the code work smoothly.

Once I got past those initial hurdles, testing became my safety net. It gave me the confidence to know that my code would work across different scenarios 🕵️‍♂️. It also showed me that testing isn’t just a step to check off—it’s a way to ensure my code is reliable and resilient.


Step 5️⃣: CI Check and Final Adjustments

After completing my tests, I pushed my code, ready for the review process. However, our CI (Continuous Integration) checks didn’t pass.

Anni’s comment gave me a gentle reminder to double-check my test cases:

“Hey @AkashJana18, could you check your test cases? Our CI check shows it has not passed here. Please test it locally first before pushing it to the PR. Whenever you push your commit, we’ll trigger the checks so you can spot any issues before our engineers review your code. Thanks!”

CI failed test

That’s when I realized I had to run the tests locally before submitting.

ChunHao also added:

"Please run and pass it before you request the review. Run $ go test ./pkg/component/operator/json/v0/... to check it locally."

I quickly ran the tests locally, identified the issues, and fixed them.

Commit image

Test cases all cleared

A little moment of celebration 🥳

This process made me appreciate the importance of local testing even more, as it ensured everything was solid before submitting for review.

Before merging, ChunHao did a final review, made a few tweaks, QAed Test Recipe and updated the documentation to reflect the new changes. Big thanks to Anni for her ongoing support throughout the process—it made a huge difference. 🙌


👥 Reflection on the Collaborative Process 🫱🏼‍🫲🏼

One of the biggest lessons I learned was how collaboration and mentorship can make or break a project. Instill's moderators, Anni and ChunHao, provided me with the guidance I needed when I was lost in Go syntax or struggling with the right approach. Working together, we turned a complex problem into a clean, functional solution.

I’ll be honest, there were moments I felt like I had bitten off more than I could chew. But the constant encouragement from Anni, combined with the clear direction from ChunHao, kept me on track.


⏭️ Next Steps and Future Improvements

Another step could be expanding this approach to other parts of the pipeline that require dynamic field name handling—because who doesn’t love a little bit of automation⚙️?


🛠️ Tools & Resources 🪛

  1. Go Documentation: For diving into Go syntax and understanding core concepts.
  2. Instill Docs: A goldmine of well-organized resources to understand the Instill pipeline.
  3. Go Testing Framework: The built-in testing package in Go for writing unit tests, ensuring everything works as expected, and integrating with CI tools.
  4. Golangci-lint: A Go linters aggregator to identify issues and enforce code quality during development and CI checks.

🧑🏻‍💻🗒️ My Learning

With Instill’s rock-solid documentation, guidance from ChunHao, and Anni's moral support, this project became a fantastic learning experience. I went from knowing nothing about Go to implementing a fully functional feature ready for production (and I have the merged PR to prove it 😉).

Proof:

feat(json): Support Rename Fields for JSON operator #813

Because

  • we want to manipulate JSON data

This commit

  • provide the function to rename the json key with different strategies.

Whether you're tackling JSON renaming, learning Go, or just love a good coding puzzle, I hope this blog sheds light on the process of solving conflicts, one field at a time. 🌈


📓✍🏻 Your Learning

1️⃣ How to Approach New Challenges: Whether it’s a new language or a complex feature, learn how to break down problems and solve them step by step, just like I did with this project.

2️⃣ The Power of Documentation: Discover how clear, organized documentation (like Instill’s) can be a game changer when working on complex projects.

3️⃣ Persistence and Moral Support: Understand the importance of perseverance and the value of support from your peers when tackling challenging tasks.

4️⃣ How to implement conflict resolution strategies: Learn how to handle various scenarios like overwrite, skip, and error conflict resolutions while renaming JSON fields.


Top comments (9)

Collapse
 
anni_in_tech profile image
An-Ni Chen

Congratulations on your first open-source contribution! I'm glad my words were enough to encourage you to keep going 💪

Thank you for joining us at Instill AI for Hacktoberfest 2024 ✨

See you at the next one!

Collapse
 
akashjana profile image
Akash Jana

Definitely interested for the next time! ✨

Collapse
 
rohan_sharma profile image
Rohan Sharma

You nailed it!

I liked this blog a lot. This is how we make OS contributions. Contributing to Instill-AI is not an easy task. You're great!

Collapse
 
akashjana profile image
Akash Jana

Thank you Rohan! Inspired by you ✨

Collapse
 
rohan_sharma profile image
Rohan Sharma

And I'm inspiring by you ppl!

Collapse
 
tingyi_chen_953cd8525334 profile image
Ting-Yi Chen

Superb!

Collapse
 
akashjana profile image
Akash Jana

Thank you! ✨

Collapse
 
ayush91010 profile image
Ayush91010

Technically motivating blog.... Definitely enjoyed reading it... Keep it up

Collapse
 
akashjana profile image
Akash Jana

Thanks for reading!