Originally published on AnotherDevBlog.
The use case
I've got an interesting problem at work where I need to take any arbitrary JSON blob (object or array) and represent the leaf nodes in memory as a collection of key/value pairs. For example, given this JSON:
[
{
"Name": "Fish",
"Color": "Silver",
"Attributes": [
{
"Name": "Environment",
"Value": "Aquatic"
},
{
"Name": "Parts",
"Value": [
{
"Type": "fin",
"Length": 3
}
]
}
]
}
]
I want to see something like this output:
[0].Name = Fish
[0].Color = Silver
[0].Attributes[0].Name = Environment
[0].Attributes[0].Value = Aquatic
[0].Attributes[1].Name = Parts
[0].Attributes[1].Value[0].Type = fin
[0].Attributes[1].Value[0].Length = 3
Here's what I did, and the lessons I learned
Lesson 1: Not everything is on StackOverflow
"This sounds like a pretty common use case," I said to myself, "surely there is something on the documentation or StackOverflow."
Nope. I searched StackOverflow for quite a while, and while I found a few answers referring to Java libraries, I couldn't find one for the Json.NET library we are using here. The most popular NuGet package on the internet and no one has ever faced this issue before? Seriously?!
It took a few hours and lots of debugging, but I eventually wrote an extension method to allow me to grab the leaf node values of arbitrary JSON:
public static class JExtensions
{
public static IEnumerable<JValue> GetLeafValues(this JToken jToken)
{
if (jToken is JValue jValue)
{
yield return jValue;
}
else if (jToken is JArray jArray)
{
foreach (var result in GetLeafValuesFromJArray(jArray))
{
yield return result;
}
}
else if (jToken is JProperty jProperty)
{
foreach (var result in GetLeafValuesFromJProperty(jProperty))
{
yield return result;
}
}
else if (jToken is JObject jObject)
{
foreach (var result in GetLeafValuesFromJObject(jObject))
{
yield return result;
}
}
}
#region Private helpers
static IEnumerable<JValue> GetLeafValuesFromJArray(JArray jArray)
{
for (var i = 0; i < jArray.Count; i++)
{
foreach (var result in GetLeafValues(jArray[i]))
{
yield return result;
}
}
}
static IEnumerable<JValue> GetLeafValuesFromJProperty(JProperty jProperty)
{
foreach (var result in GetLeafValues(jProperty.Value))
{
yield return result;
}
}
static IEnumerable<JValue> GetLeafValuesFromJObject(JObject jObject)
{
foreach (var jToken in jObject.Children())
{
foreach (var result in GetLeafValues(jToken))
{
yield return result;
}
}
}
#endregion
}
Then in my calling code, I just extract the Path
and Value
properties from the JValue
objects returned:
var jToken = JToken.parse("blah blah json here");
foreach (var jValue in jToken.GetLeafValues()
{
Console.WriteLine("{jValue.Path} = {jValue.Value}");
}
Awesome!
Lesson 2: But it's always on StackOverflow
So it turns out there is an answer on StackOverflow for this use case (link). I was searching for terms like "get all leaf nodes" or "get all values with paths," but the magic keyword to make the answer appear is "flatten." Here's the answer code that was posted:
JObject jsonObject=JObject.Parse(theJsonString);
IEnumerable<JToken> jTokens = jsonObject.Descendants().Where(p => p.Count() == 0);
Dictionary<string, string> results = jTokens.Aggregate(new Dictionary<string, string>(), (properties, jToken) =>
{
properties.Add(jToken.Path, jToken.ToString());
return properties;
});
Lesson 3: But you can't always just copy what's on StackOverflow
Wow, that code snippet is a lot shorter than my solution, so I tried it out. But ultimately went back to my own. Here's why:
- This solution doesn't work, at least not for my case. See, I need it to handle an arbitrary JSON blob. I can't promise it's going to be a
JObect
-- it could be an array or something else, so this solution, unfortunately, fails for me out the gate with my first test case (an array). AndJToken
doesn't have a handy littleDescendants()
method I can call likeJObject
does, so I'd have to do some type checking anyway. Yuck. - Another problem: this solution builds a dictionary in memory to represent the flattened structure. I'm dealing with some pretty massive objects, and it's already painful enough to load up that initial
JToken
. I'd really rather not add the memory pressure of the dictionary on top of that. - Speaking of memory, I'd like to (eventually) only return the JValue if it's not null or default for the value type.
- That
.Count()
looks really expensive since it's a method being called on every single descendant, whether you end up using the descendant at all. Probably safer to just select only descendants that you know areJValue
objects:.Descendants().OfType<JValue>()
. Then you can call.Value
. And when you have aJValue
object, you can call.Value
and get the underlying primitive (or pseudo-primitive string) value without calling the.ToString()
.
Top comments (2)
I don't know if there's something like a JSON streaming parser (in Java there's a SAX streaming parser).
If it exists, a single loop plus a stack should be enough to get the desired output.
And the memory consumption would be O(1).
Thank you for the article, fully agree on every point! The funny thing is I just finished long and unsuccessful search at SO for exactly that issue and twitter notified me about this post.