Memoized Functions in Python with functools.lru_cache

#python #functional

"A memoized function 'remembers' the results corresponding to some set of specific inputs." (From Wikipedia's Memoization article). I think of memoization as an internal smart cache. A memoized function caches the results dependent on the arguments.

Python provides a convenient and high-performance way to memoize functions through the functools.lru_cache decorator. Feel free to geek out over the LRU (Least Recently Used) algorithm that is used here.

Do you have "pure" functions that have no side effects? In other words, when repeatedly called with the same arguments, the function will and should return the exact same results? Furthermore, is the function computationally "expensive" or does it wait on input/output operations?

If so, that function could be memoized with lru_cache.

Side note: for the memoization to be effective, the arguments must all be "hashable", such as strings and numbers, not dicts or lists. More specifically, each argument must have a __hash__ method.

Take the following function:

def expensive(filepath):
    return hashlib.sha512(filepath.read_bytes()).hexdigest()

On a large file that never changes, this expensive operation need not run more than once. So this function is a decent candidate for memoization with functools.lru_cache. Here is a fuller rendition:

import functools
import hashlib
import os
import pathlib
import time


@functools.lru_cache()
def expensive(filepath):
    return hashlib.sha512(filepath.read_bytes()).hexdigest()


def run():
    file_path = pathlib.Path("bigfile.txt")
    if not file_path.exists():
        file_path.write_bytes(os.urandom(1024 ** 3))

    print(f"Start: {time.strftime('%X')}")
    print(expensive(file_path))
    print(f"End: {time.strftime('%X')}\n")

    print(f"Start memoized: {time.strftime('%X')}")
    print(expensive(file_path))
    print(f"End memoized: {time.strftime('%X')}")


if __name__ == "__main__":
    run()

Save this file as lru_test.py and you should be able to execute this file with something like:

python lru_test.py

It will create a 1GB file named bigfile.txt and hash it "twice" showing you before and after timestamps. Of course, it is only hashing it the first time. The second time, it is reading the cache associated with that filename. It returns the memo, the cached value, and doesn't need to re-compute.

Clearing the cache

The expensive() function above is not pure, by my choice, so we can demonstrate another feature of lru_cache. Potentially, the file could change on disk.

What do you do if you want the performance benefits of lru_cache but also want to decide, on occasion, that the cached values cannot be trusted?

Execute cache_clear() on the function that has been decorated. The full example to demonstrate the point:

import functools
import hashlib
import os
import pathlib
import time


@functools.lru_cache()
def expensive(filepath):
    return hashlib.sha512(filepath.read_bytes()).hexdigest()


def run():
    file_path = pathlib.Path("bigfile.txt")
    if not file_path.exists():
        file_path.write_bytes(os.urandom(1024 ** 3))

    print(f"Start: {time.strftime('%X')}")
    print(expensive(file_path))
    print(f"End: {time.strftime('%X')}\n")

    print(f"Start memoized: {time.strftime('%X')}")
    print(expensive(file_path))
    print(f"End memoized: {time.strftime('%X')}\n")

    expensive.cache_clear()

    print(f"Start cleared: {time.strftime('%X')}")
    print(expensive(file_path))
    print(f"End cleared: {time.strftime('%X')}")


if __name__ == "__main__":
    run()

Running this script again should demonstrate that, once cache_clear() is called, the function needs to fully run again. It has "forgotten" the values formerly computed.

A side note: in Python 3.8 and later, you don't need the parentheses after the @lru_cache decorator. This works fine:

@functools.lru_cache
def expensive(filepath):
    return hashlib.sha512(filepath.read_bytes()).hexdigest()

I hope this helps you be fully functional.

Top comments (1)

Jonathan Bowman • Aug 11 '20 • Edited

Good call on both counts. I have now included mention of both of the above. At this point, my treatment of LRU is inadequate, however, if the reader is hoping for a solid theoretical explanation. I hope it is adequate for practical usage, though.

Can you say more about why an understanding of LRU is important for daily usage of this Python decorator? Including usage of the max_size argument may be useful, for instance.

DEV Community

Memoized Functions in Python with functools.lru_cache

Clearing the cache

Top comments (1)

Read next

cs50x week 6 Python

Building 5 AI Agents with phidata and Ollama

Clean architecture and Python Polylith - a real example

How AI is Shaping Smarter Games and Simulated Worlds