DEV Community

Franz Wong
Franz Wong

Posted on

Process large json with limited memory

Sometimes, we need to process big json file or stream but we don't need to store all contents in memory.

For example, when we count the number of items in a big array, we just need to load 1 item, increment the count, throw it away and repeat until the whole array is counted.

I found big json file from this git repository https://github.com/zemirco/sf-city-lots-json (~190MB).

The file looks this and I want to count the number of features.

{
  "type": "FeatureCollection",
  "features": [ /* lots of feature objects */ ]
}
Enter fullscreen mode Exit fullscreen mode

This is how feature object looks like if you are interested.

{
  "type": "Feature",
  "properties": {
    "MAPBLKLOT": "0001001",
    "BLKLOT": "0001001",
    "BLOCK_NUM": "0001",
    "LOT_NUM": "001",
    "FROM_ST": "0",
    "TO_ST": "0",
    "STREET": "UNKNOWN",
    "ST_TYPE": null,
    "ODD_EVEN": "E"
  },
  "geometry": {
    "type": "Polygon",
    "coordinates": [
      [
        [
          -122.422003528252475,
          37.808480096967251,
          0.0
        ],
        [
          -122.422076013325281,
          37.808835019815085,
          0.0
        ],
        [
          -122.421102174348633,
          37.808803534992904,
          0.0
        ],
        [
          -122.421062569067274,
          37.808601056818148,
          0.0
        ],
        [
          -122.422003528252475,
          37.808480096967251,
          0.0
        ]
      ]
    ]
  }
}
Enter fullscreen mode Exit fullscreen mode

Let's say my application can only allocate 50MB and I try to load the whole file into memory.

Path filePath = Path.of("/src/sf-city-lots-json/citylots.json");
String content = Files.readString(filePath);
Enter fullscreen mode Exit fullscreen mode

Obviously, we can't load it to memory.

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
Enter fullscreen mode Exit fullscreen mode

Gson provides JsonReader which allows reading data stream.

public int getFeatureCount(Path filePath) throws Exception {
    int count = 0;
    try (JsonReader reader = new JsonReader(Files.newBufferedReader(filePath))) {
        reader.beginObject();
        while (reader.hasNext()) {
            String name = reader.nextName();
            if ("features".equals(name)) {
                count = getFeatureCountFromArray(reader);
            } else {
                reader.skipValue();
            }
        }
        reader.endObject();
    }
    return count;
}

private int getFeatureCountFromArray(JsonReader reader) throws Exception {
    int count = 0;
    reader.beginArray();
    while (reader.hasNext()) {
        count++;
        reader.beginObject();
        while (reader.hasNext()) {
            reader.skipValue();
        }
        reader.endObject();
    }
    reader.endArray();
    return count;
}
Enter fullscreen mode Exit fullscreen mode

Greater power comes with greater responsibility. Unlike Gson.fromJson, we need to call begin*, end* and skipValue in the right timing (according to the structure of the json object) to let it process the data correctly, otherwise it will throw exception. So it should be used only when you have restriction on memory footprint or performance.

Top comments (1)

Collapse
 
shreyasht profile image
Shreyash

This helped a lot. Thanks!