So here you are.
res1 = " ".join(filter(lambda r: r.lower().startswith('a'), WORDS))
What is happening here?
We take any name, for instance "John" and converted it to lower cases, ie. "john".
Then we call another method to get its first letter and compare it with "a" character.
I realized, in this particular case, that I don't need:
- to lowercase the whole name,
- to use special method to get first letter of string,
- to lowercase a letter when it is already "a", thus short-circuit of or operator comes handy,
- to use == operator as my CPython implementation allows to (ab)use *is" operator.
Let's see some code:
# some very random words at first WORDS = ("Ahaha asa sas s sas aa AA A sd df df fdfdf sdfs ds fsf fsf sfs fsdfs fsfsf sfsf sf" "SAS D DA A SDAD ADD ADAD ADAD ADA DASDSDAFDEAA AD AD D SD ad sds saAA AADADDdas a a AA SAD" "fg ffhhf ghhfh h fh hgfh gf hhh h hhh hftrt hr gfghgfh fh fh fh hhfghf" "weewe e ee wer err t trt trt rtzt ztz tzz trt rtrt tr ttre rer erre rere rre r" "WWFOGFO FOFGKIGOG FKGFOGF KIGFOFGOFO GFK GFKF GF GFG EG FD GFDFG DG GD DF" * 1000).split()
%%timeit res1 = " ".join(filter(lambda r: r.lower().startswith('a'), WORDS))
26.7 ms ± 348 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Now to make the filtering function as fast (and error-prone) as possible I am storing the first unicode character of name and instead of full comparison I am just letting Python to check if the pointers of character refers to the same address. Furthermore, as and and or are short-circuited operators in Python, I prevent Python to perform the second unnecessary comparison if the letter in question is "a" or whatever.
def my_filter(name, lower="a", upper="A"): first = name return first is lower or first is upper
%%timeit res2 = " ".join(filter(my_filter, WORDS))
13 ms ± 105 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
It may seem I was able to reduce the runtime by 50%? Hurray?
Oh not so fast, you squishy poet from beyond the stars.
Consider, for instance, this:
character = "ř" character is "ř"
As it is now painfully obvious, CPython's is operator works only when deals with ASCII codes.
This is exactly why one should not use a clever hack when it comes to optimization.
But not only that, all my construction is fragile. To make it useless is enough to look for first two letters.
If we want to find names starting with "jo" we would need four arguments "jo", "Jo", "JO", "jO".
There is more, actually. The original article was dealing with querying a database.
In such particular case would be better to let the task to be performed by database engine as this is much faster and efficient.
Whatever, I hope you enjoyed our little excursion on The planet of pointless optimization.