Introduction to Instill-ai
Working on Instill’s pipeline-backend project was like solving a jigsaw 🧩 puzzle—except some pieces kept changing names! My mission? To create a component that could rename JSON fields without creating conflicts. Join me as I share my journey through learning Go, studying Instill’s well-organized docs, and creating a solution that’s now merged and ready to roll! 🎉
The Challenge
Instill needed a way to rename fields in JSON data structures dynamically. The twist? We had to handle cases where a renamed field might clash with an existing field. Without a conflict resolution system, chaos would reign supreme!
Checkout the Issue here:
[JSON] Support Rename Fields for JSON operator #1133
Current State
- It is very difficult to manipulate JSON data with JSON operator.
Proposed Change
- Please fetch this JSON Schema to implement the functions.
- Manipulating JSON data
JSON schema pseudo code
JsonOperator:
Task: Rename fields
Input:
data:
type: object
description: Original data, which can be a JSON object or array of objects.
fields:
type: array
description: An array of objects specifying the fields to be renamed.
items:
type: object
properties:
currentField:
type: string
description: The field name in the original data to be replaced, supports nested paths if "supportDotNotation" is true.
newField:
type: string
description: The new field name that will replace the currentField, supports nested paths if "supportDotNotation" is true.
# supportDotNotation:
# type: boolean
# default: true
# description: Determines whether to interpret field names as paths using dot notation. If false, fields are treated as literal keys.
conflictResolution:
type: string
enum: [overwrite, skip, error]
default: overwrite
description: Defines how conflicts are handled when the newField already exists in the data.
Output:
data:
type: object
description: The modified data with the specified fields renamed.
Key Features:
conflictResolution
: Handling conflicts when renaming fields in JSON, especially when working with nested objects and dot notation, is critical to avoid data loss or unexpected behavior. Allow users to specify how they want conflicts to be resolved (e.g., via a parameter such as conflictResolution: 'overwrite'|'skip'|'error'
),
- Provides flexibility and control to the user.
- Adapts to different use cases.
Here are different strategies to manage conflicts and some considerations for each.
Description: If the newField
already exists in the object, overwrite its value with the value from currentField
.
Pros:
- Simple and straightforward.
- Useful when the intention is to replace the existing value. Cons:
- Can lead to data loss if not used carefully.
Implementation:
if new_key in obj:
obj[new_key] = obj.pop(current_key)
else:
obj[new_key] = obj.pop(current_key)
Description: If the newField
already exists, skip the renaming operation for that particular field.
Pros:
- Prevents accidental overwriting of data.
- Safeguards against potential conflicts without altering the existing data. Cons:
- The currentField remains unchanged, which might not be the desired outcome.
Implementation:
if new_key in obj:
# Skip renaming if new_key already exists
continue
else:
obj[new_key] = obj.pop(current_key)
Description: If both currentField
and newField
exist and contain objects or arrays, merge the two values. This approach is more complex but can be very powerful.
Pros:
- Preserves both sets of data.
- Useful for combining information rather than choosing one over the other. Cons:
- Can be complex to implement, especially if the data types of
currentField
andnewField
differ. - May require custom logic depending on how you want to merge the data (e.g., combining arrays, merging objects, etc.).
Implementation:
if new_key in obj:
if isinstance(obj[new_key], dict) and isinstance(obj[current_key], dict):
# Merge dictionaries
obj[new_key].update(obj.pop(current_key))
elif isinstance(obj[new_key], list) and isinstance(obj[current_key], list):
# Merge lists
obj[new_key].extend(obj.pop(current_key))
else:
# Handle other types (overwrite, append, etc.)
obj[new_key] = obj.pop(current_key)
else:
obj[new_key] = obj.pop(current_key)
Description: If the newField
already exists, rename the new field by appending a suffix or prefix (e.g., _1
, _conflict
) to avoid conflicts.
Pros:
- Both original and new data are preserved.
- Easy to track conflicts. Cons:
- The resulting data structure may become less predictable or harder to work with if many conflicts occur.
Implementation:
suffix = 1
original_new_key = new_key
while new_key in obj:
new_key = f"{original_new_key}_{suffix}"
suffix += 1
obj[new_key] = obj.pop(current_key)
Description: If a conflict is detected, stop the operation and return an error or warning to the user. This forces the user to address the conflict before proceeding. Pros:
- Prevents accidental data overwriting.
- Makes the user aware of potential issues immediately. Cons:
- Halts the process, which might be undesirable in automated workflows.
Implementation:
if new_key in obj:
raise ValueError(f"Conflict detected: '{new_key}' already exists.")
else:
obj[new_key] = obj.pop(current_key)
- Overwrite: Simple and effective, but can lead to data loss.
- Skip: Safe but may leave data unchanged.
- Error/Warning: Forces user intervention; best for critical operations. Choose the strategy that best aligns with your application's needs and the user's expectations. Implementing a combination of these strategies, such as providing a default behavior with options for customization, can offer the best balance between usability and robustness.
Scenario: Input data as JSON object
// input { "data": { "name": "John Doe", "age": 30, "address": { "street": "123 Main St", "city": "Anytown", "state": "CA" }, "state": "conflict" }, "fields": [ {"currentField": "address.street", "newField": "address.road"}, {"currentField": "state", "newField": "address.state"} ], // "supportDotNotation": true, "conflictResolution": "overwrite" }
Conflict Resolution Scenarios: 1. Overwrite (Default):
- The state field in data would be moved to address.state, overwriting the existing address.state field.
- Final output:
{
"data": {
"name": "John Doe",
"age": 30,
"address": {
"road": "123 Main St",
"city": "Anytown",
"state": "conflict"
}
}
}
2. Skip:
- The renaming of state to address.state would be skipped, so both state and address.state remain unchanged.
- Final output:
{
"data": {
"name": "John Doe",
"age": 30,
"address": {
"road": "123 Main St",
"city": "Anytown",
"state": "CA"
},
"state": "conflict"
}
}
3. Error:
- The process would raise an error, stopping execution, because
address.state
already exists.ValueError: Conflict detected: 'address.state' already exists.
If the input data is an array of objects, the logic needs to be adapted to handle each object in the array individually. The schema and the function would process each object within the array according to the specified fields and conflictResolution
rules.
Below is an example demonstrating how the "Rename Fields" operation would work with input data that is an array of objects.
Input
{
"data": [
{
"name": "John Doe",
"age": 30,
"address": {
"street": "123 Main St",
"city": "Anytown",
"state": "CA"
},
"contacts": [
{
"type": "email",
"value": "john.doe@example.com"
}
]
},
{
"name": "Jane Smith",
"age": 28,
"address": {
"street": "456 Oak St",
"city": "Othertown",
"state": "NY"
}
// Note: Jane Smith does not have a "contacts" field
}
],
"fields": [
{"currentField": "name", "newField": "fullName"},
{"currentField": "address.street", "newField": "address.road"},
{"currentField": "contacts.0.value", "newField": "contacts.0.contactInfo"},
{"currentField": "age", "newField": "yearsOld"}
],
// "supportDotNotation": true,
"conflictResolution": "skip"
}
Explanation:
- Field "name": The "name" field will be renamed to "fullName" for each object in the array.
- Field "address.street": The "street" field inside the "address" object will be renamed to "road" for each object.
- Field "contacts.0.value": The "value" field inside the first element of the "contacts" array will be renamed to "contactInfo" for the first object, but this step will be skipped for the second object because the "contacts" field does not exist.
- Field "age": The "age" field will be renamed to "yearsOld" for each object.
Output:
{
"data": [
{
"fullName": "John Doe",
"yearsOld": 30,
"address": {
"road": "123 Main St",
"city": "Anytown",
"state": "CA"
},
"contacts": [
{
"type": "email",
"contactInfo": "john.doe@example.com"
}
]
},
{
"fullName": "Jane Smith",
"yearsOld": 28,
"address": {
"road": "456 Oak St",
"city": "Othertown",
"state": "NY"
}
// The "contacts" field is not present, so no renaming occurs for "contacts.0.value"
}
]
}
- Each issue will only be assigned to one person/team at a time.
- You can only work on one issue at a time.
- To express interest in an issue, please comment on it and tag @kuroxx, allowing the Instill AI team to assign it to you.
- Ensure you address all feedback and suggestions provided by the Instill AI team.
- If no commits are made within five days, the issue may be reassigned to another contributor.
- Join our Discord to engage in discussions and seek assistance in #hackathon channel. For technical queries, you can tag @chuang8511.
Component Contribution Guideline | Documentation | Official Go Tutorial
First time in my life, saw such a detailed issue ☠️ which is a good thing but quite overwhelming!
As I got into the issue, I realized that this wasn’t just a simple renaming task. It was like trying to change a tire while the car is still driving! 🚗💨 But the challenge was exactly what I needed to level up my coding skills.
Step 1️⃣: Learning Go and Exploring Instill’s Codebase
Since I was absolutely new to Go, I spent the first four days studying both the language and Instill’s codebase (huge shoutout to their amazing documentation 📚). Whether it’s the architecture, contribution guidelines, README documentation, or anything else, everything is very well defined 📝.
Moderators Anni and ChunHao were a huge help—Anni provided moral support 😃, while ChunHao patiently provided the technical 💻guidance I needed.
I was like, "Go what now?" But after a few days, Go went from "Golly, Oh no!" to "Great Opportunity" 💡. Their docs made it easy to dive into the Instill pipeline, and ChunHao’s clear instructions on the JSON schema were a huge help.
A Glimpse of Instill pipeline-backend:
instill-ai / pipeline-backend
⇋ A REST/gRPC server for Instill VDP API service
pipeline-backend
pipeline-backend
manages all pipeline resources within Versatile Data
Pipeline (VDP) to streamline data from the
start component, through AI/Data/Application components and to the end
component.
Concepts
Pipeline
In 💧 Instill VDP, a pipeline is a DAG (Directed Acyclic Graph) consisting of multiple components.
flowchart LR
s[Trigger] --> c1[OpenAI Component]
c1 --> c2[Stability AI Component]
c1 --> c3[MySQL Component]
c1 --> e[Response]
c2 --> e
Component
A Component serves as an essential building block within a Pipeline.
See the component
package documentation for more
details.
Recipe
A pipeline recipe specifies how components are configured and how they are interconnected.
Recipes are defined in YAML language:
variable
# pipeline input fields
output:
# pipeline output fields
component:
<component-id>:
type: <component-definition-id>
task: <task-id>
input:
# values for the input fields
condition: <condition> # conditional statement to execute or bypass the
…To be honest, I was starting to doubt if I could solve this issue, but then Anni dropped the perfect message that kept me going.
Once I got comfortable, ChunHao, who had crafted a JSON schema for this task, gave me the green light🚦 to start coding. And so, the journey began!
Step 2️⃣: Designing the Solution
The key requirements were:
- Dynamic Renaming: Fields should be renamed without disturbing the JSON structure.
- Conflict Detection: We needed to spot any conflicts between the original and renamed fields.
- Conflict Resolution: A smooth solution, like appending a suffix, would prevent name clashes.
Step 3️⃣: Building the Component
Armed with coffee☕ and courage💪, I got down to coding. Here’s a sneak peek at the core logic:
Mapping Fields
First, I created a mapping system to track old and new field names. This was key to detecting conflicts.
func mapFields(fields map[string]string) map[string]string {
newFieldMap := make(map[string]string)
for oldName, newName := range fields {
// Check for conflict
if _, exists := newFieldMap[newName]; exists {
newName += "_conflict" // Add suffix for conflicts
}
newFieldMap[oldName] = newName
}
return newFieldMap
}
Any time a conflict was detected, the function added "_conflict"
to the new name. It’s a simple trick that ensures our JSON fields stay unique and, most importantly, friendly to each other! ✌️
Renaming Fields
Once the field mappings were in place, the next step was applying them to our JSON data.
func renameFields(data map[string]interface{}, fieldMap map[string]string) map[string]interface{} {
renamedData := make(map[string]interface{})
for key, value := range data {
newKey := fieldMap[key]
renamedData[newKey] = value
}
return renamedData
}
Here’s the logic that applies the mapped names to our JSON data. The result? Our data’s neatly renamed, conflicts resolved, and structure intact. 🔥
After creating the component dropped the draft PR & got a comment:
Step 4️⃣: Testing and Refinement
After familiarizing myself with Instill's testing methods and learning how to create effective test cases, I proceeded further.
Testing time! 🧪 I wrote tests covering everything from simple renames to complex edge cases with nested JSON fields. Each round of testing led to further refinements.
{
name: "ok - rename fields with overwrite conflict resolution",
// Test where renaming conflicts are resolved by overwriting the existing field.
// Expected outcome: "newField" holds "value1", overwriting any previous value.
},
{
name: "ok - rename fields with skip conflict resolution",
// Test where conflicts are resolved by skipping the rename if the target field already exists.
// Expected outcome: "newField" remains "value2" without overwrite.
},
{
name: "nok - rename fields with error conflict resolution",
// Test with "error" strategy, which should raise an error on conflict.
// Expected outcome: Error message "Field conflict."
},
// Additional cases for missing required fields and invalid conflict resolution strategy
Here’s where I’d love to share a personal reflection: Testing was the hardest part of this project 😮💨. There were times when I thought, "Is this test even doing what it’s supposed to?"
Just then, I ran into a lint issue—
He pointed out the problem and even provided the solution. All I had to do was implement it, but it was a reminder that even the smallest details matter in making the code work smoothly.
Once I got past those initial hurdles, testing became my safety net. It gave me the confidence to know that my code would work across different scenarios 🕵️♂️. It also showed me that testing isn’t just a step to check off—it’s a way to ensure my code is reliable and resilient.
Step 5️⃣: CI Check and Final Adjustments
After completing my tests, I pushed my code, ready for the review process. However, our CI (Continuous Integration) checks didn’t pass.
Anni’s comment gave me a gentle reminder to double-check my test cases:
“Hey @AkashJana18, could you check your test cases? Our CI check shows it has not passed here. Please test it locally first before pushing it to the PR. Whenever you push your commit, we’ll trigger the checks so you can spot any issues before our engineers review your code. Thanks!”
That’s when I realized I had to run the tests locally before submitting.
ChunHao also added:
"Please run and pass it before you request the review. Run
$ go test ./pkg/component/operator/json/v0/...
to check it locally."
I quickly ran the tests locally, identified the issues, and fixed them.
A little moment of celebration 🥳
This process made me appreciate the importance of local testing even more, as it ensured everything was solid before submitting for review.
Before merging, ChunHao did a final review, made a few tweaks, QAed Test Recipe and updated the documentation to reflect the new changes. Big thanks to Anni for her ongoing support throughout the process—it made a huge difference. 🙌
👥 Reflection on the Collaborative Process 🫱🏼🫲🏼
One of the biggest lessons I learned was how collaboration and mentorship can make or break a project. Instill's moderators, Anni and ChunHao, provided me with the guidance I needed when I was lost in Go syntax or struggling with the right approach. Working together, we turned a complex problem into a clean, functional solution.
I’ll be honest, there were moments I felt like I had bitten off more than I could chew. But the constant encouragement from Anni, combined with the clear direction from ChunHao, kept me on track.
⏭️ Next Steps and Future Improvements
Another step could be expanding this approach to other parts of the pipeline that require dynamic field name handling—because who doesn’t love a little bit of automation⚙️?
🛠️ Tools & Resources 🪛
- Go Documentation: For diving into Go syntax and understanding core concepts.
- Instill Docs: A goldmine of well-organized resources to understand the Instill pipeline.
-
Go Testing Framework: The built-in
testing
package in Go for writing unit tests, ensuring everything works as expected, and integrating with CI tools. - Golangci-lint: A Go linters aggregator to identify issues and enforce code quality during development and CI checks.
🧑🏻💻🗒️ My Learning
With Instill’s rock-solid documentation, guidance from ChunHao, and Anni's moral support, this project became a fantastic learning experience. I went from knowing nothing about Go to implementing a fully functional feature ready for production (and I have the merged PR to prove it 😉).
Proof:
feat(json): Support Rename Fields for JSON operator #813
Because
- we want to manipulate JSON data
This commit
- provide the function to rename the json key with different strategies.
Whether you're tackling JSON renaming, learning Go, or just love a good coding puzzle, I hope this blog sheds light on the process of solving conflicts, one field at a time. 🌈
📓✍🏻 Your Learning
1️⃣ How to Approach New Challenges: Whether it’s a new language or a complex feature, learn how to break down problems and solve them step by step, just like I did with this project.
2️⃣ The Power of Documentation: Discover how clear, organized documentation (like Instill’s) can be a game changer when working on complex projects.
3️⃣ Persistence and Moral Support: Understand the importance of perseverance and the value of support from your peers when tackling challenging tasks.
4️⃣ How to implement conflict resolution strategies: Learn how to handle various scenarios like overwrite, skip, and error conflict resolutions while renaming JSON fields.
Top comments (9)
You nailed it!
I liked this blog a lot. This is how we make OS contributions. Contributing to Instill-AI is not an easy task. You're great!
Thank you Rohan! Inspired by you ✨
And I'm inspiring by you ppl!
Congratulations on your first open-source contribution! I'm glad my words were enough to encourage you to keep going 💪
Thank you for joining us at Instill AI for Hacktoberfest 2024 ✨
See you at the next one!
Definitely interested for the next time! ✨
Superb!
Thank you! ✨
Technically motivating blog.... Definitely enjoyed reading it... Keep it up
Thanks for reading!