DEV Community

loading...

One pointless optimization

hanpari profile image Pavel Morava ・3 min read

In the excellent article from Apruzzese Francesco I have found one line of his code and I started my usual nitty-picking. One lambda stroke me like a good example for optimizing exercise.

So here you are.

res1 = " ".join(filter(lambda r: r.lower().startswith('a'), WORDS))

What is happening here?
We take any name, for instance "John" and converted it to lower cases, ie. "john".
Then we call another method to get its first letter and compare it with "a" character.

I realized, in this particular case, that I don't need:

  1. to lowercase the whole name,
  2. to use special method to get first letter of string,
  3. to lowercase a letter when it is already "a", thus short-circuit of or operator comes handy,
  4. to use == operator as my CPython implementation allows to (ab)use *is" operator.

Let's see some code:

# some very random words at first
WORDS = ("Ahaha asa sas s sas aa AA A sd df df fdfdf  sdfs ds fsf fsf sfs fsdfs fsfsf sfsf sf" 
         "SAS D DA A SDAD ADD ADAD ADAD ADA DASDSDAFDEAA AD AD D  SD  ad sds saAA   AADADDdas  a a AA SAD"
         "fg ffhhf ghhfh h fh   hgfh gf hhh  h  hhh hftrt  hr gfghgfh fh  fh fh hhfghf"
         "weewe e ee wer err t trt trt rtzt ztz tzz trt rtrt tr ttre rer erre rere rre r"
         "WWFOGFO  FOFGKIGOG FKGFOGF KIGFOFGOFO GFK  GFKF GF GFG EG FD GFDFG DG GD DF"
         * 1000).split()
%%timeit
res1 = " ".join(filter(lambda r: r.lower().startswith('a'), WORDS))
26.7 ms ± 348 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Now to make the filtering function as fast (and error-prone) as possible I am storing the first unicode character of name and instead of full comparison I am just letting Python to check if the pointers of character refers to the same address. Furthermore, as and and or are short-circuited operators in Python, I prevent Python to perform the second unnecessary comparison if the letter in question is "a" or whatever.

def my_filter(name, lower="a", upper="A"):
    first = name[0]
    return first is  lower or first is  upper
%%timeit 
res2 = " ".join(filter(my_filter, WORDS))

13 ms ± 105 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Conclusion

It may seem I was able to reduce the runtime by 50%? Hurray?
Oh not so fast, you squishy poet from beyond the stars.
Consider, for instance, this:

character = "ř"
character is "ř"
False

As it is now painfully obvious, CPython's is operator works only when deals with ASCII codes.
This is exactly why one should not use a clever hack when it comes to optimization.
But not only that, all my construction is fragile. To make it useless is enough to look for first two letters.
If we want to find names starting with "jo" we would need four arguments "jo", "Jo", "JO", "jO".

There is more, actually. The original article was dealing with querying a database.
In such particular case would be better to let the task to be performed by database engine as this is much faster and efficient.

Whatever, I hope you enjoyed our little excursion on The planet of pointless optimization.

Discussion

pic
Editor guide
Collapse
paddy3118 profile image
Paddy3118

The planet of pointless micro optimization.
:-)

Collapse
hanpari profile image