I was working on a text processing example across several different programming languages, including C++, Java, Rust, and Scala, and noticed some discrepancies in the results.
It turned out that these are due to Unicode string length meaning different things in different languages:
In Java, Scala, etc., the
length()
method returns the number of abstract, high-level characters (glyphs) from a human reader's point of view.By contrast, in C++, Go, and Rust, the equivalent functions and methods return a result based on the number of bytes required to store those characters.
jshell> "résumé".length()
$1 ==> 6
❯ evcxr
Welcome to evcxr. For help, type :help
>> "résumé".len()
8
>> "résumé".chars().count()
6
len([]rune("résumé")) // returns 6
Apparently it's a bit more complicated in C++.
Top comments (0)