Normalisation is a really interesting topic, especially because I think there's a tendency for devs to assume that all data can ultimately be normalised.
Which is probably not true? Things with standards are almost normalised, providing the standards work for all valid data.... But data is validated by the originating jurisdiction, not the consumer.
This is kind of a pet peeve of mine; I actually gave a talk at (Automation Guild 2022)guildconferences.com/automation-gu... about bad testing data assumptions and how they interact with names and addresses. Most of my examples were about Japan, where, for example:
The characters representing a name may also represent other names
Any particular name may have multiple characters it can be spelt with
There are no middle names
A person may interact with a Western system using a chosen Western name that isn't theirs, legally
Not all streets have names
Building number may be non-sequential
Kyoto has a "citizen" addressing scheme... Which the government recognises as valid alongside the nationwide addressing scheme
There's more. A lot more. The upshot I have is that normalisation is a super useful tool when it's possible, but trying too hard for it leads too pain.
Normalisation is a really interesting topic, especially because I think there's a tendency for devs to assume that all data can ultimately be normalised.
Which is probably not true? Things with standards are almost normalised, providing the standards work for all valid data.... But data is validated by the originating jurisdiction, not the consumer.
This is kind of a pet peeve of mine; I actually gave a talk at (Automation Guild 2022)guildconferences.com/automation-gu... about bad testing data assumptions and how they interact with names and addresses. Most of my examples were about Japan, where, for example:
There's more. A lot more. The upshot I have is that normalisation is a super useful tool when it's possible, but trying too hard for it leads too pain.
(Phone numbers tho; 100% normalise them)
Thanks for the so detailed and helpful comment, @dylanlacey ! I have watched you speak on this topic on Automation Guild this year, great talk.