Here is a discipline I am trying to adopt in my Python programs: use "My string".casefold()
instead of "My string".lower()
when comparing strings irrespective of case.
When checking for string equality, in which I don't care about uppercase vs. lowercase, it is tempting to do something like this:
if "StrinG".lower() == "string".lower():
print("Case-insensitive equality!")
Of course, it works.
But some human languages have case rules that function with a bit more nuance. Let's say we have three strings with slightly different ways of writing Kubernetes (writing Kubernetes in Greek makes you sound doubly smart).
k8s = "ΚυβερνΉτης"
k8S = "ΚυβερνΉτηΣ"
k8s_odd = "ΚυβερνΉτησ" # Apologies to the scribes of Athens
These three are all mixed-case strings. The first one correctly ends with a final lowercase sigma, the second one has a capital sigma, and that last one, oddly, has a non-final sigma.
Let's imagine we have a use case in which we want to consider all of these as equal. Would str.lower()
work?
>>> k8s.lower()
'κυβερνήτης'
>>> k8S.lower()
'κυβερνήτης'
>>> k8s_odd.lower()
'κυβερνήτησ'
Apparently not.
Using str.casefold()
instead:
>>> k8s.casefold()
'κυβερνήτησ'
>>> k8S.casefold()
'κυβερνήτησ'
>>> k8s_odd.casefold()
'κυβερνήτησ'
All are equal! Exactly what we want for case-insensitive string comparison.
One should not use str.casefold()
if you are aiming for clean spellings. str.upper().lower()
might yield a more printable result:
>>> k8s_odd.upper().lower()
'κυβερνήτης'
But for case-insensitive comparison that respects a wide range of human languages, str.casefold()
is our friend.
References
- Python docs on
str.casefold
- The Unicode Standard, Section "3.13 Default Case Algorithms" on page 150 of chapter 3
Top comments (0)