In the excellent article from Apruzzese Francesco I have found one line of his code and I started my usual nitty-picking. One lambda stroke me like a good example for optimizing exercise.
So here you are.
res1 = " ".join(filter(lambda r: r.lower().startswith('a'), WORDS))
What is happening here?
We take any name, for instance "John" and converted it to lower cases, ie. "john".
Then we call another method to get its first letter and compare it with "a" character.
I realized, in this particular case, that I don't need:
- to lowercase the whole name,
- to use special method to get first letter of string,
- to lowercase a letter when it is already "a", thus short-circuit of or operator comes handy,
- to use == operator as my CPython implementation allows to (ab)use *is" operator.
Let's see some code:
# some very random words at first
WORDS = ("Ahaha asa sas s sas aa AA A sd df df fdfdf sdfs ds fsf fsf sfs fsdfs fsfsf sfsf sf"
"SAS D DA A SDAD ADD ADAD ADAD ADA DASDSDAFDEAA AD AD D SD ad sds saAA AADADDdas a a AA SAD"
"fg ffhhf ghhfh h fh hgfh gf hhh h hhh hftrt hr gfghgfh fh fh fh hhfghf"
"weewe e ee wer err t trt trt rtzt ztz tzz trt rtrt tr ttre rer erre rere rre r"
"WWFOGFO FOFGKIGOG FKGFOGF KIGFOFGOFO GFK GFKF GF GFG EG FD GFDFG DG GD DF"
* 1000).split()
%%timeit
res1 = " ".join(filter(lambda r: r.lower().startswith('a'), WORDS))
26.7 ms ± 348 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Now to make the filtering function as fast (and error-prone) as possible I am storing the first unicode character of name and instead of full comparison I am just letting Python to check if the pointers of character refers to the same address. Furthermore, as and and or are short-circuited operators in Python, I prevent Python to perform the second unnecessary comparison if the letter in question is "a" or whatever.
def my_filter(name, lower="a", upper="A"):
first = name[0]
return first is lower or first is upper
%%timeit
res2 = " ".join(filter(my_filter, WORDS))
13 ms ± 105 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Conclusion
It may seem I was able to reduce the runtime by 50%? Hurray?
Oh not so fast, you squishy poet from beyond the stars.
Consider, for instance, this:
character = "ř"
character is "ř"
False
As it is now painfully obvious, CPython's is operator works only when deals with ASCII codes.
This is exactly why one should not use a clever hack when it comes to optimization.
But not only that, all my construction is fragile. To make it useless is enough to look for first two letters.
If we want to find names starting with "jo" we would need four arguments "jo", "Jo", "JO", "jO".
There is more, actually. The original article was dealing with querying a database.
In such particular case would be better to let the task to be performed by database engine as this is much faster and efficient.
Whatever, I hope you enjoyed our little excursion on The planet of pointless optimization.
Discussion
The planet of pointless micro optimization.
:-)
Even better