Today we're going to learn how to parse strings in JSON. This is more complicated than the datatypes we've done so far, but don't worry, it's not bad :)
Recap
Let's recall our string definition from part 0:
A string is a string of characters enclosed within double quotes (
"
). Any character can be put within a string, except for the following, which must be escaped:
- double quotes (
"
) [Escaped with\"
]- backslash (
\
) [Escaped with\\
]- backspace [Escaped with
\b
]- form feed [Escaped with
\f
]- line feed [Escaped with
\n
]- carriage return [Escaped with
\r
]- horizontal tab [Escaped with
\t
]In addition, the escape code
\/
resolves to a forward slash (/
), and a backslash followed byu
followed by four hex digits resolves to the character at the Unicode codepoint specified by said hex digits.
Writing it
First, let's set up the skeleton of our method:
fun readString(): String? {
val oldCursor = cursor
val result = StringBuilder()
}
The quotes
A string will always start with a quote, so if we don't see one at the start, we can fail:
if (step() != '"') {
cursor = oldCursor
return null
}
The loop
We'll next use a loop to iterate over each character, firstly making sure that there is a next character, and it isn't a double-quote (since that would mark the end of the string) - then, in the loop we'll store the current character in a variable so we can do our logic on it.
while (hasNext() && peek() != '"') {
val char = step()
}
Escaped characters
First, we need to check for escaped characters. These will always start with a \
and will always have at least one other character afterwards - so let's check for a backslash character, and then begin pattern-matching on the next character:
if (char == '\\') { // just a single backslash, written as a double backslash to escape it
when (val it = step()) {
Now we can check for each valid following character, and add that to our string:
'"' -> result.append('"')
'\\' -> result.append('\\')
'/' -> result.append('/')
'b' -> result.append('\b')
'f' -> result.append(0x0C.toChar())
'n' -> result.append('\n')
'r' -> result.append('\r')
't' -> result.append('\t')
'u' -> result.append(readHexChar())
else -> return null
A few things of interest here:
- Kotlin does not support
\f
for form feeds, so we have to use the raw ASCII value. - We've put the
\u0000
-reading logic into a new function,readHexChar
, which we'll write in a second. - We have
else -> return null
as there are no other valid characters after a backslash. If you want to be slightly less spec-conforming, you could useelse -> result.append(it)
readHexChar()
Let's make our readHexChar
method:
private fun readHexChar(): Char? {
val oldCursor = cursor
return try {
read(::isHexDigit).toInt(16).toChar()
} catch (e: NumberFormatException) {
cursor = oldCursor
null
}
}
This is fairly simple, using our read
and isHexDigit
functions, then parsing the resulting hex string, then converting it to a Char
. Simple!
The other characters
The other branch of our if
, for non-escaped characters, is nice and simple:
} else {
if (char >= 32.toChar()) {
result.append(char)
} else {
cursor = oldCursor
return null
}
}
The characters disallowed in JSON strings are the first 32 ASCII codepoints, the control characters. If we encounter one of those, we can fail. Otherwise, we add it to our string.
Leaving the loop
}
skip()
return result.toString()
Finally, we skip the last character (since it's a double quote) and return our final string.
Conclusion
Here's our final code:
fun readString(): String? {
val oldCursor = cursor
val result = StringBuilder()
if (step() != '"') {
cursor = oldCursor
return null
}
while (hasNext() && peek() != '"') {
val char = step()
if (char == '\\') {
when (val it = step()) {
'"' -> result.append('"')
'\\' -> result.append('\\')
'/' -> result.append('/')
'b' -> result.append('\b')
'f' -> result.append(0x0C.toChar())
'n' -> result.append('\n')
'r' -> result.append('\r')
't' -> result.append('\t')
'u' -> result.append(readHexChar())
else -> return null
}
} else {
if (char >= 32.toChar()) {
result.append(char)
} else {
cursor = oldCursor
return null
}
}
}
skip()
return result.toString()
}
Want to improve it? You could instead throw an exception when encountering invalid characters, instead of returning null.
Top comments (0)