Memoisation In Python - Exploring @lru_cache And Its Control Knobs

Memoisation In Python - Exploring @lru_cache And Its Control Knobs

·

0 min read

Memoisation is a common technique applied to reduce unnecessary repeated calculations in programming. This technique is frequently brought up as one of the dynamic programming techniques, and in articles, books aiming to help you get your first developer job.

In this article, you'll:

  • learn what memoisation is
  • take a look at how memoisation can be implemented with a homemade solution
  • learn to achieve memoisation using the one-liner @lru_cache in Python
  • learn to inspect the cache info of a memoised function
  • explore the control knobs of @lru_cache
  • learn about the caveats when using @lru_cache-decorated functions

If you're already familiar with the concept of memoisation, feel free to skip the first section.

What is Memoisation?

Memoisation is a technique used to ensure the results of a function is stored for the given arguments. Subsequent function invocations with the same arguments should be immediately returned with the previously-stored result.

memoization.png

Implementing Memoisation With A Homemade Solution

There are ways to achieve memoisation. In Python, a decorator is a stark candidate. Let's say we defined a function called calculate_variance() that doesn't possess any abilities to remember any previously calculated results:

from typing import Tuple
import statistics

def calculate_variance(numbers: Tuple[int]):
    return statistics.variance(numbers)

To make our calculate_variance() memoise the previously calculated results, there is no need to reimplement the logic to calculate variance. Instead, we would use a @memoise decorator as mentioned.

Eagle-eyed readers may also notice the input argument numbers is typed with Tuple[int], you will see why a Tuple instead of a List is used in the last section.

from typing import Tuple
import statistics

@memoise    # This is not implemented yet
def calculate_variance(numbers: Tuple[int]):
    return statistics.variance(numbers)

How should we implement the @memoise decorator? I've heard you ask. A simple implementation would be a function that keeps tracks of the input arguments and results in a hashmap and performs calculations only when the results are not found.

from typing import Tuple

def memoise(target_function):

    memo = {}

    def wrapper(number: Tuple[int]):
        hashed_number = hash(number)
        if number not in memo:            
            memo[hashed_number] = target_function(x)
        return memo[hashed_number]
    return wrapper

The @memoise decorator should work fine with our calculate_variance() function. However, often time, we may come across situations where we need to tailor the memoisation behaviour. For instance, we may want to limit the size of the hashmap, or inspect if the memoised function is working as intended. Fortunately, Python comes with a built-in decorator @lru_cache that gives us the customisability we need.

@lru_cache - The One-Liner To Memoise In Python

@lru_cache is a built-in decorator provided by the functools package that adds memoisation capability to any functions decorated. Strictly speaking, it works not just with functions but any Callable.

Let's rewrite our calculate_variance() function with the @lru_cache decorator.

from functools import lru_cache    # Add this line
from typing import Tuple
import statistics

@lru_cache
def calculate_variance(numbers: Tuple[int]):
    return statistics.variance(numbers)

With slight changes, we implemented another memoised version of calculate_variance(). Let's invoke and observe our memoised calculate_variance() with the same set of arguments several time.

Screenshot from 2020-04-25 17-13-21.png

We can clearly see that the function is working correctly. However, how do we know if the values returned are the memoised values?

Inspecting Memoisation Info Using cache_info()

Thanks to @lru_cache, the memoised function now includes a useful method to allow us to peek into the memoisation information (or cache info).

Let's take a look at the screenshot below:

Screenshot from 2020-04-25 17-24-28.png

In In[2], calculate_variance.cache_info() is called immediately after the definition of calculate_variance(). From Out[2], we can see a bunch of cache info, including hit, miss, maxsize, and cursize, but what are those?

  • cache hit (hit): number of times when a memoised result is found
  • cache miss (miss): the contrary of cache hit, or the number of times when calculations are required
  • maximum cache size (maxsize): the maximum number of memoised results stored
  • current cache size (cursize): the current number of memoised results stored

Let's try to invoke calculate_variance() with the same argument a few more times and inspect the cache info.

Screenshot from 2020-04-25 17-29-54.png

As you can see, the cache hit increases as the function is called with the same argument. ✌️✌️✌️

Clearing Memoised Results

While I can't think of a good situation where a cache purge is required, @lru_cache includes a handy method cache_clear() to purge the memoised results.

Screenshot from 2020-04-25 17-35-38.png

The Control Knobs Of @lru_cache

typed - Treat Values Of Different Types As Separate Entries

Is the integer 2 the same as 2.0? While 2 == 2.0 is evaluated to True, we recognise they are of different types, int and float. The second control knob of @lru_cache allows us to treat values with different types as separate cache entries. By default, the value of the keyword argument typed is False.

Untitled Diagram.png

From the screenshot above, we can see the cursize grows when add() is called with the numbers but of different types.

maxsize - The Maximum Size Of The Cache

As mentioned earlier, @lru_cache allows us to tune the maximum size of the cache. To do so, simply pass the keyword argument maxsize into the decorator as shown:

Screenshot from 2020-04-25 18-04-26.png

If we want unlimited cache size, all we need is to set maxsize to None.

What Happens When Cache Space Runs Out?

The cache replacement policy of @lru_cache is obviously LRU or Least Recently Used. What does that mean though?

Before we think about when to replace a memoised value, let's think about when a replacement is required. Recapping how memoisation works, we know that after a new calculation is performed, the new result would be stored in the memoised results (or cache) before returning value to the caller.

With the LRU cache replacement policy, when a new result is produced and the cache is full (cursize == maxsize), the least recently used pair will be replaced with the newer pair.

Caveats When Using @lru_cache-Decorated Functions

Order of Arguments Matters

@lru_cache treats add(1, 2) and add(2, 1) as two distinct arguments set.

Untitled Diagram2.png

Arguments Must Be Hashable

In our example of calculate_variance(), we typed the input argument numbers with Tuple[int].

Screenshot from 2020-04-25 18-35-41.png

It is because all input arguments passed to a memoised function must be hashable.

What does hashable mean? According to the official Python glossary, a value is considered hashable if it has a hash value which never changes during its lifetime and can be compared with another value of same type.

Hashability makes an object usable as a dictionary key and a set member, because these data structures use the hash value internally.

How do we know if a value is hashable? A quickiest way is to try to hash the value with the built-in hash() function.

Screenshot from 2020-04-25 18-43-55.png

From the screenshot above, we could see that all immutable values are hashable, as well as user-defined class instances. However, Python built-in mutable containers (list, dict, etc) are not.

If we call a memoised (or @lru_cache-decorated) function with a unhashable type, a TypeError will be raised.

Screenshot from 2020-04-25 18-48-30.png

Conclusion

By now, you should know:

  • what memoisation is
  • how the one-liner @lru_cache works in Python
  • how to inspect the caching info of a memoised function
  • the control knobs typed and maxsize of @lru_cache
  • the caveats when using @lru_cache-decorated functions

Python functools package provides more than just the @lru_cache. I recommend you to check it out! 🧐

I hope this article helps. Feel free to share it with your mates. I appreciate if you could give this article some claps 👏👏👏.