DEV Community

Cover image for 7 common pitfalls in JSON operations in Golang
Bruce Du
Bruce Du

Posted on • Edited on

7 common pitfalls in JSON operations in Golang

0 Preface

JSON is a data format often used by many developers in their work. It is generally used in scenarios such as configuration files or network data transfer. And because of its simplicity, easy to understand, and good readability, JSON has become one of the most common formats in the entire IT world. For such things, Golang, like many other languages, also provides standard library level support, which is encoding/json.

Just like JSON itself is easy to understand, the encoding/json library for manipulating JSON is also very easy to use. But I believe that many developers may have encountered various strange problems or bugs like I did when I first used this library. This article summarizes the problems and mistakes I personally encountered when using Golang to operate JSON. I hope it can help more developers who read this article to master the skills of Golang to operate JSON more smoothly and avoid unnecessary mistakes’ pit.

BTW, the content of this article is based on Go 1.22. There may be slight differences between different versions. Readers should to pay attention when reading and using it. At the same time, all cases listed in this article use encoding/json and do not involve any third-party JSON library.

1 Basic use

Let’s simply talk about the basic use of encoding/json library first.

As a kind of data formats, the core operations of JSON are just two: Serialization, and Deserialization. Serialization is to convert a Go object into a string (or byte sequence) in JSON format. Deserialization is the opposite, converting JSON format data into a Go object.

The object mentioned here is a broad concept. It not only refers to structure objects, but also includes slice and map type data. They support JSON serialization as well.

Example:

import (
    "encoding/json"
    "fmt"
)

type Person struct {
    ID   uint
    Name string
    Age  int
}

func MarshalPerson() {
    p := Person{
        ID:   1,
        Name: "Bruce",
        Age:  18,
    }
    output, err := json.Marshal(p)
    if err != nil {
        panic(err)
    }
    println(string(output))
}

func UnmarshalPerson() {
    str := `{"ID":1,"Name":"Bruce","Age":18}`
    var p Person
    err := json.Unmarshal([]byte(str), &p)
    if err != nil {
        panic(err)
    }
    fmt.Printf("%+v\n", p)
}
Enter fullscreen mode Exit fullscreen mode

The core is the two functions json.Marshal and json.Unmarshal, which are used for serialization and deserialization respectively. Both functions will return error, here I simply panic.

Readers who have used encoding/json might know that there is another pair of tools that are often used: NewEncoder and NewDecoder. A brief look at the source code shows that the underlying core logic calls of the two are the same as Marshal, so I won’t give examples here.

2 Pitfalls

2.1 Public struct fields

This is probably the most common mistake made by developers who are new to Go or encoding/json. That is, if we're manipulating JSON with structs, then the member fields of the struct must be public, i.e., capitalized, and private members cannot be parsed.

Example:

type Person struct {
    ID   uint
    Name string
    age  int
}

func MarshalPerson() {
    p := Person{
        ID:   1,
        Name: "Bruce",
        age:  18,
    }
    output, err := json.Marshal(p)
    if err != nil {
        panic(err)
    }
    println(string(output))
}

func UnmarshalPerson() {
    str := `{"ID":1,"Name":"Bruce","age":18}`
    var p Person
    err := json.Unmarshal([]byte(str), &p)
    if err != nil {
        panic(err)
    }
    fmt.Printf("%+v\n", p)
}

// Output Marshal:
{"ID":1,"Name":"Bruce"}

// Output Unmarshal:
{ID:1 Name:Bruce age:0}
Enter fullscreen mode Exit fullscreen mode

Here, age is set as a private variable, so there is no age field in the serialized JSON string. Similarly, when you deserialize a JSON string into Person, you can't read the value of age correctly.

The reason for this is simple. If we dig deeper into Marshal's source code, we can see that it actually uses reflect underneath to dynamically parse struct objects:

// .../src/encoding/json/encode.go

func (e *encodeState) marshal(v any, opts encOpts) (err error) {
    // ...skip
    e.reflectValue(reflect.ValueOf(v), opts)
    return nil
}
Enter fullscreen mode Exit fullscreen mode

Golang prohibits reflective access to the private members of a structure at the language design level, so this kind of reflective parsing naturally fails, and the same applies to deserialization.

2.2 Use map sparingly

As mentioned earlier, JSON can manipulate not only structures, but also slice, map and other types of data. slice is special, but map and structure are actually the same in JSON format:

m := map[string]any{
    "ID": 1, 
    "Name": "Bruce",
}

// ...skip the structure code

// JSON output:
{
    "ID": 1,
    "Name": "Bruce"
}
Enter fullscreen mode Exit fullscreen mode

In this case, unless there is a special circumstance or need, use maps sparingly. Because maps incur additional overhead, additional code, and additional maintenance costs.

Why?

First of all, as in the Person example above, since ID and Name are different types, if we want to deserialize this JSON data with a map, we can only assert a map of type map[string]any. any, which is interface{}, means that if we want to use Name or ID on their own, we need to convert the type of the JSON data with the type assertion if we want to use Name or ID alone:

var m map[string]any
// ...Deserialize JSON data into `m`,ignore the code...
// Fetch Name
name, ok := m["Name"].(string)
Enter fullscreen mode Exit fullscreen mode

The type assertion itself is an extra step, and in order to prevent panic, we need to determine the second parameter ok, which definitely increases the development effort as well as the code burden.

In addition, a map is inherently unconstrained with respect to data. We can predefine fields and types in structures, but not in maps. This means that we can only understand what's in the map through documentation or comments or the code itself. Moreover, while a structure can restrict the key and value types of JSON data from being changed, a map can't restrict changes to JSON and can only be detected by business logic code. Just think about the amount of work and maintenance costs involved.

The reason I mention this pitfall is that before I developed in Go, my main language was Python, and Python, you know, doesn't have structures, it just has dict (map) to load JSON data. When I first got into Go, I used to use map to interact with JSON. But because Go is statically typed, you have to explicitly convert the type (type assertion), not like Python way. Those troublesome gave me a headache for a while.

In short, use map sparingly, or as little as possible, to manipulate JSON.

2.3 Beware of struct combinations

Go is object-oriented, but it doesn't have classes, only structures, and structures don't have inheritance. So Go uses a kind of combination to reuse different structures. In many cases, this combination is very convenient, as we can manipulate other members of the combination as if they were members of the structure itself, like this:

type Person struct {
    ID   uint
    Name string
    address
}

type address struct {
    Code   int
    Street string
}

func (a address) PrintAddr() {
    fmt.Println(a.Code, a.Street)
}

func Group() {
    p := Person{
        ID:   1,
        Name: "Bruce",
        address: address{
            Code:   100,
            Street: "Main St",
        },
    }
    // Access all address's fields and methods directly
    fmt.Println(p.Code, p.Street)
    p.PrintAddr()
}

// Output
100 Main St
100 Main St
Enter fullscreen mode Exit fullscreen mode

Convenient, right? I think so too. But there's a small pitfall to be aware of when we incorporate combinations into our use of JSON. Take a look at the code below:

// The structure used here is the same as the previous one, 
// so I won't repeat it. error is also not captured to save space.

func MarshalPerson() {
    p := Person{
        ID:   1,
        Name: "Bruce",
        address: address{
            Code:   100,
            Street: "Main St",
        },
    }
    // It would be more pretty by MarshalIndent
    output, _ := json.MarshalIndent(p, "", "  ")
    println(string(output))
}

func UnmarshalPerson() {
    str := `{"ID":1,"Name":"Bruce","address":{"Code":100,"Street":"Main St"}}`
    var p Person
    _ = json.Unmarshal([]byte(str), &p)
    fmt.Printf("%+v\n", p)
}

// Output MarshalPerson:
{
  "ID": 1,
  "Name": "Bruce",
  "Code": 100,
  "Street": "Main St"
}

// Ouptput UnmarshalPerson:
{ID:1 Name:Bruce address:{Code:0 Street:}}
Enter fullscreen mode Exit fullscreen mode

This code is slightly more informative, so let's run through it bit by bit.

Let's start with the MarshalPerson function. Here a Person object is declared and then the serialization result is beautified with MarshalIndent and printed. As we can see from the printout, the entire Person object is flattened. In the case of the Person struct, it looks like it still has an address member field, despite the combination. So sometimes we take it for granted that the serialized JSON of Person looks like this:

// The imagine of JSON serialization result
{
  "ID": 1,
  "Name": "Bruce",
  "address": {
    "Code": 100,
    "Street": "Main St"
  }
}
Enter fullscreen mode Exit fullscreen mode

But it doesn't, it's flattened. This is more in line with the feeling we had earlier when we accessed the address member directly through Person, i.e., the address member seems to become a member of Person directly. This is something to be aware of, as the combination will flatten the serialized JSON result.

Another slightly counter-intuitive point is that the address struct is a private struct, and it seems that private members shouldn't be serialized? That's right, and that's one of the things that's not so great about this form of composition: it exposes the public members of the private composition object. So, it's important to note here that this exposure is sometimes unintentional, but it can cause unwanted data leaks.

Then there's the UnmarshalPerson function. With the previous function out of the way, this is easy to understand, but it's really just a matter of flattening the JSON result after combining. So, if we need to deserialize back to Person, we also need a flattened JSON data.

In fact, in my personal use of Go over the years, when I encounter these kinds of structures that need to be converted to JSON, I usually don't use combinations very much, unless there are some special circumstances. After all, it's too easy to cause the problems mentioned above. Also, since JSON is flattened and structs are not defined to be flattened (looks not), once the struct is defined to be more and more complex, the harder it is to visually compare it to the original flattened JSON data, which will make the code much less readable.

If you don't have special needs (e.g., the raw JSON data is flattened and there are multiple structs with duplicate fields that need to be reused), from my personal point of view, I would recommend that you try to write it this way:

type Person struct {
    ID      int
    Name    string
    Address address
}
Enter fullscreen mode Exit fullscreen mode

Of course, if this structure doesn't involve JSON serialization, then I'd still prefer to use a combination, it's really convenient.

2.4 Care needed when deserializing part of member fields (patch update data)

Look directly at the code:

type Person struct {
    ID   uint
    Name string
}

// PartUpdateIssue simulates parsing two different 
// JSON strings with the same structure
func PartUpdateIssue() {
    var p Person
    // The first data has the ID field and is not 0
    str := `{"ID":1,"Name":"Bruce"}`
    _ = json.Unmarshal([]byte(str), &p)
    fmt.Printf("%+v\n", p)
    // The second data does not have an ID field, 
    // deserializing it again with p preserves the last value
    str = `{"Name":"Jim"}`
    _ = json.Unmarshal([]byte(str), &p)
    // Notice the output ID is still 1
    fmt.Printf("%+v\n", p)
}

// Output
{ID:1 Name:Bruce}
{ID:1 Name:Jim}
Enter fullscreen mode Exit fullscreen mode

The comment explains it clearly: when we use the same structure to deserialize different JSON data repeatedly, once the value of a JSON data contains only some of the member fields, then the uncovered members will be left with the value of the last deserialization. This is actually a dirty data pollution problem.

This is a problem that can be easily encountered and, when triggered, is quite insidious. I previously wrote a post (Bugs in Golang caused by Zero Value and features of the gob library) about a similar situation encountered when using the gob library. Of course, the gob problem is related to zero values, which is not quite the same as the JSON problem we're talking about today, but they both end up behaving in a similar way, with some of the member fields being contaminated by dirty data.

The solution is also simple: each time you deserialize JSON, use a brand new struct object to load the data.

Anyway, be careful with this situation.

2.5 Handling pointer fields

Many developers get a big headache when they hear the word pointer, but you don't have to, it's not that complicated. But pointers in Go do give developers one of the most common panics in Go programs: the null pointer exception. And what happens when pointers are combined with JSON?

Look at this code:

type Person struct {
    ID      uint
    Name    string
    Address *Address
}

func UnmarshalPtr() {
    str := `{"ID":1,"Name":"Bruce"}`
    var p Person
    _ = json.Unmarshal([]byte(str), &p)
    fmt.Printf("%+v\n", p)
    // It would panic this line
    // fmt.Printf("%+v\n", p.Address.Street)
}

// Output
{ID:1 Name:Bruce Address:<nil>}
Enter fullscreen mode Exit fullscreen mode

We define the Address member as a pointer, and when we deserialize a piece of JSON data that doesn't contain an Address, the pointer field is set to nil because it doesn't have a corresponding piece of data. encoding/json doesn't create empty Address object and point to it for this field. If we call p.Address.xxx directly, the program will panic because p.Address is nil.

So, if there is a pointer to a member of our structure, remember to determine if the pointer is nil before using it. This is a bit tedious, but it can't be helped. After all, writing a few lines of code may not be a big deal compared to the damage caused by a panic in a production environment.

Also, when creating a structure with a pointer field, the assignment of the pointer field can be relatively cumbersome:

type Person struct {
    ID   int    
    Name string 
    Age  *int   
}

func Foo() {
    p := Person{
        ID:   1,
        Name: "Bruce",
        Age:  new(int),
    }
    *p.Age = 20
    // ...
}
Enter fullscreen mode Exit fullscreen mode

Some people say, "Well, isn't it recommended that we try not to use pointers in any of the member variables of a JSON data structure?” I don't think so this time, because pointers do have a couple of scenarios where they are better suited than non-pointer members. One is that pointers reduce some of the overhead, and the other is what we'll talk about in the next section, zero-value related issues.

2.6 Confusion caused by zero values

Zero value is a feature of Golang variables that we can simply think of as default values. That is, if we don't explicitly assign a value to a variable, Golang assigns it a default value. For example, as we have seen in the previous example, int has a default value of 0, string has an empty string, pointer has a zero value of nil, and so on.

What are the pitfalls of handling JSON with zero values?

Look at the following example:

type Person struct {
    Name        string
    ChildrenCnt int
}

func ZeroValueConfusion() {
    str := `{"Name":"Bruce"}`
    var p Person
    _ = json.Unmarshal([]byte(str), &p)
    fmt.Printf("%+v\n", p)

    str2 := `{"Name":"Jim","ChildrenCnt":0}`
    var p2 Person
    _ = json.Unmarshal([]byte(str2), &p2)
    fmt.Printf("%+v\n", p2)
}

// Output
{Name:Bruce ChildrenCnt:0}
{Name:Jim ChildrenCnt:0}
Enter fullscreen mode Exit fullscreen mode

We added a ChildrenCnt field to the Person structure to count the number of children of the person. Because of the zero value, this field is assigned 0 when the JSON data loaded by p does not have ChildrenCnt data, which is misleading: we can't distinguish objects with this missing data from objects that do have 0 children. In the case of Bruce and Jim, one of them has 0 children due to missing data, and the other has 0. Bruce's number of children should be "unknown", and if we really treat it as 0, it may cause problems in business.

This kind of confusion is very fatal in some scenarios with strict data requirements. So is there any way to avoid this kind of zero-value interference? There really is, is the last legacy of the last section of the use of pointers scenarios.

Let's change Person's ChildrenCnt type to *int and see what happens:

type Person struct {
    Name        string
    ChildrenCnt *int
}

// Output
{Name:Bruce ChildrenCnt:<nil>}
{Name:Jim ChildrenCnt:0xc0000124c8}
Enter fullscreen mode Exit fullscreen mode

The difference is that Bruce has no data, so ChildrenCnt is a nil, and Jim is a non-null pointer. At this point it is clear that the number of Bruce's children is unknown.

Essentially this approach still utilizes zero values, the zero values of pointers. It's kind of fighting fire with fire (laughs).

2.7 Pitfalls of tags

Finally, we're talking about tags. Tags are also a very important feature in Golang and are often accompanied by JSON. In fact, for those of you who have used Go tags, you should know that tags are a very flexible and easy to use feature. So what are the pitfalls of using such a great feature?

One is the name problem: Tag can specify the name of a field in JSON data, which is very flexible and useful, but it is also error-prone, and to some extent adds some professionalism to the programmer himself.

For example, if a programmer intentionally or unintentionally defines a structure like this:

type PersonWrong struct {
    FirstName string `json:"last_name"`
    LastName  string `json:"first_name"`
}
Enter fullscreen mode Exit fullscreen mode

The Tag of FirstName and LastName are swapped. would you want to beat the programmer up if you encountered such code? I've actually encountered something like this in my production code. Of course, it was unintentional, a mistake during a code update. However, when it comes to this kind of situation, such bugs are usually not easy to locate. Mostly because, well, who the **** would have thought?

Anyway, don't do it, and be careful when you write it.

Another problem is related to the combination of omitempty + zero, see the code:

type Person struct {
    Name        string `json:"person_name"`
    ChildrenCnt int    `json:"cnt,omitempty"`
}

func TagMarshal() {
    p := Person{
        Name:        "Bruce",
        ChildrenCnt: 0,
    }
    output, _ := json.MarshalIndent(p, "", "  ")
    println(string(output))
}

// Output
{
  "person_name": "Bruce"
}
Enter fullscreen mode Exit fullscreen mode

See the problem? We assigned a value of 0 to ChildrenCnt when we created the new struct object p. Because of the omitempty tag, it causes the JSON to omit empty values when it is serialized or deserialized. This is manifested in serialization by the fact that the output JSON data does not contain ChildrenCnt and looks as if it does not exist. What is an empty value? That's right, it's a zero value.

So, the familiar confusion arises: Bruce has 0 children, not unknown data. The output JSON says that Bruce's children count is unknown.

Deserialization suffers from the same problem, so I won't give you an example.

What about this omitempty problem? Since it's still essentially a zero-valued problem, use pointers.

3 Summary

In this article, I've listed 7 mistakes that I've made when using the encoding/json library, and I've encountered most of them in my own work. If you haven't encountered them yet, congratulations! It's also a reminder to be careful with JSON in the future; if you've encountered any of these problems and are confused by them, I hope this article has helped you.

If there are any errors or lack of clarity in this post, please do not hesitate to point them out, thank you!

Top comments (1)

Collapse
 
jhelberg_63 profile image
Joost

Not a single thing is worth writing a syllable, let alone a paragraph, about. Only one sub-section item (omitempty) is worth mentioning, the rest is pure incompetence.