Discussion on: The 7 assumptions about strings you probably have

View post

The interesting thing is - as @rdentato points out in his recent post - that the UTF-8 encoding plays nicely with plain old C, as long as you don't need any special properties of Unicode.
E.g. UTF-8 never introduces a null byte, so strlen works; UTF-8 never introduces ASCII bytes (all bytes used for encoding are > 0x7F), so searching for ASCII characters in a UTF-8 encoded string still works by iterating over the strings bytes; etc.