DEV Community

loading...
Cover image for Memoized Functions in Python with functools.lru_cache

Memoized Functions in Python with functools.lru_cache

bowmanjd profile image Jonathan Bowman ・3 min read

"A memoized function 'remembers' the results corresponding to some set of specific inputs." (From Wikipedia's Memoization article). I think of memoization as an internal smart cache. A memoized function caches the results dependent on the arguments.

Python provides a convenient and high-performance way to memoize functions through the functools.lru_cache decorator. Feel free to geek out over the LRU (Least Recently Used) algorithm that is used here.

Do you have "pure" functions that have no side effects? In other words, when repeatedly called with the same arguments, the function will and should return the exact same results? Furthermore, is the function computationally "expensive" or does it wait on input/output operations?

If so, that function could be memoized with lru_cache.

Side note: for the memoization to be effective, the arguments must all be "hashable", such as strings and numbers, not dicts or lists. More specifically, each argument must have a __hash__ method.

Take the following function:

def expensive(filepath):
    return hashlib.sha512(filepath.read_bytes()).hexdigest()
Enter fullscreen mode Exit fullscreen mode

On a large file that never changes, this expensive operation need not run more than once. So this function is a decent candidate for memoization with functools.lru_cache. Here is a fuller rendition:

import functools
import hashlib
import os
import pathlib
import time


@functools.lru_cache()
def expensive(filepath):
    return hashlib.sha512(filepath.read_bytes()).hexdigest()


def run():
    file_path = pathlib.Path("bigfile.txt")
    if not file_path.exists():
        file_path.write_bytes(os.urandom(1024 ** 3))

    print(f"Start: {time.strftime('%X')}")
    print(expensive(file_path))
    print(f"End: {time.strftime('%X')}\n")

    print(f"Start memoized: {time.strftime('%X')}")
    print(expensive(file_path))
    print(f"End memoized: {time.strftime('%X')}")


if __name__ == "__main__":
    run()
Enter fullscreen mode Exit fullscreen mode

Save this file as lru_test.py and you should be able to execute this file with something like:

python lru_test.py
Enter fullscreen mode Exit fullscreen mode

It will create a 1GB file named bigfile.txt and hash it "twice" showing you before and after timestamps. Of course, it is only hashing it the first time. The second time, it is reading the cache associated with that filename. It returns the memo, the cached value, and doesn't need to re-compute.

Clearing the cache

The expensive() function above is not pure, by my choice, so we can demonstrate another feature of lru_cache. Potentially, the file could change on disk.

What do you do if you want the performance benefits of lru_cache but also want to decide, on occasion, that the cached values cannot be trusted?

Execute cache_clear() on the function that has been decorated. The full example to demonstrate the point:

import functools
import hashlib
import os
import pathlib
import time


@functools.lru_cache()
def expensive(filepath):
    return hashlib.sha512(filepath.read_bytes()).hexdigest()


def run():
    file_path = pathlib.Path("bigfile.txt")
    if not file_path.exists():
        file_path.write_bytes(os.urandom(1024 ** 3))

    print(f"Start: {time.strftime('%X')}")
    print(expensive(file_path))
    print(f"End: {time.strftime('%X')}\n")

    print(f"Start memoized: {time.strftime('%X')}")
    print(expensive(file_path))
    print(f"End memoized: {time.strftime('%X')}\n")

    expensive.cache_clear()

    print(f"Start cleared: {time.strftime('%X')}")
    print(expensive(file_path))
    print(f"End cleared: {time.strftime('%X')}")


if __name__ == "__main__":
    run()
Enter fullscreen mode Exit fullscreen mode

Running this script again should demonstrate that, once cache_clear() is called, the function needs to fully run again. It has "forgotten" the values formerly computed.

A side note: in Python 3.8 and later, you don't need the parentheses after the @lru_cache decorator. This works fine:

@functools.lru_cache
def expensive(filepath):
    return hashlib.sha512(filepath.read_bytes()).hexdigest()
Enter fullscreen mode Exit fullscreen mode

I hope this helps you be fully functional.

Discussion (3)

Collapse
hanpari profile image
Pavel Morava

Did I miss the part where the LRU acronym is explained? I think it is of crucial importance here.

Least Recently Used plays a significant role when limiting the size of the cache.

Moreover, you should emphasize the cache is going to work correctly only with immutable, hashable arguments.

Collapse
bowmanjd profile image
Jonathan Bowman Author • Edited

Good call on both counts. I have now included mention of both of the above. At this point, my treatment of LRU is inadequate, however, if the reader is hoping for a solid theoretical explanation. I hope it is adequate for practical usage, though.

Can you say more about why an understanding of LRU is important for daily usage of this Python decorator? Including usage of the max_size argument may be useful, for instance.

Collapse
hanpari profile image
Pavel Morava • Edited

It indicates how the cached values are stored. I was all the time wondering why they named it so clumsily when cache should have been enough.

Forem Open with the Forem app