Pointers? In My Python? It's More Likely Than You Think - Part 2: Equality

#python #memory #programming #computerscience

This is the second of a three-part series which covers various aspects of Python's memory management. It started life as a conference talk I gave in 2021, titled 'Pointers? In My Python?' and the most recent recording of it can be found here.

Check out Part 1 of the series, or read on for an discussion of Object IDs in Python!

Object IDs, and why they matter

We ended Part 1 with the following question: how do we know when two Python objects are really the same object in memory? If we do b = deepcopy(a), how can we know for sure that it didn't just create a new pointer instead of a whole new object? The answer is Object IDs.

Python has a built-in id function with the following properties:

id(x) is an integer
id(x) != id(y) exactly when x and y point at different objects in memory
id(x) is constant for the lifetime of x - that is, as long as x remains in memory

There are many implementations of Python, and while the above three things must be true of the id function in each of them, they don't all do it in the same way under the hood. Some implementations, such as CPython (the python interpreter written in C), use the object's memory address as its id - but don't assume that all implementations will!

For example, Skulpt is the Python-to-JavaScript compiler which Anvil uses to run Python client code in the browser so you can develop for the web without having to write JavaScript; Skulpt's implementation of id generates and caches a random number for every object in memory.

For the rest of this article, we'll be using examples generated using CPython, which equates an object's id with its address in memory.

So, let's look at what happens when we check an object's id!

>>> a = ["a", "list"]
>>> id(a)
139865338256192

>>> b = a
>>> id(a), id(b)
(139865338256192, 139865338256192)

Here we've defined a list a, and created a new pointer to it by setting b = a. When we check their ids, we can see that they're the same - a and b point to the same thing.

>>> c = a.copy()
>>> id(a), id(c)
(139865338256192, 139865337919872)

Trying the same thing with c = a.copy() shows that this creates a new list object; a and c have different ids.

However, that isn't the only notion of 'sameness' that Python provides. Consider our familiar example, with a pointing to a list object list, b another pointer to that object, and c a pointer to a copy of that object:

>>> a = ["my", "list"]
>>> b = a
>>> c = a.copy()

With this setup, we can do the following comparisons:

>>> a == b
True
>>> a is b
True

>>> a == c
True
>>> a is c
False

Once again: what is going on here? How can two things be the same and not the same? The answer is that is and == are designed to serve two different purposes. is is for when you want to know if two pointers are pointing at the exact same object in memory; == is for when you want to know if two objects should be considered to be equal.

`is` uses `id(x)`

Saying a is b is directly equivalent to saying id(a) == id(b). When you call is on two objects, Python takes their ids and directly compares them.

`==` uses `eq`

When you write a == b, you're actually calling a magic method, also known as a dunder method (named for the double-underscore on each side). You might be familiar with some magic methods already, such as:

__init__, called when an instances of a Python class is initialised
__str__, called when you use the str built-in - e.g. str(some_object)
__repr__, similar to __str__ but also called in other circumstances such as error messages

Magic methods are simply methods on a Python class, and the double underscores indicate that they interact with built-in Python methods. For example, overwriting the __str__ method on a Python class would change how the str built-in behaved if you called it on an instance of that modified class.

When it comes to == and __eq__, it's easiest to understand with some examples. Let's dive in!

class MyClass:
  def __eq__(self, other):
    return self is other

Here we've defined a custom class with its own __eq__ method. Every __eq__ method takes two arguments including self - because whenever it's called, it'll be comparing two objects, including the instance of the class in question. In the above example, we've just set the method to fall through to the is definition of equality (comparing the ids of each object). As it happens, this is actually the default behaviour for any user-defined class in Python.

So, what happens if we define some non-default behaviour?

class MyAlwaysTrueClass:
  def __init__(self, name):
    self.name = name

  def __eq__(self, other):
    return True

Here we've defined a class which takes a name argument (so we can keep track of our instances!) and has an __eq__ method which indiscriminately returns True. This gives us the following behaviour:

>>> jane = MyAlwaysTrueClass("Jane")
>>> bob = MyAlwaysTrueClass("Bob")
>>> jane.name == bob.name
False

>>> jane == bob
True

Because we overrode the __eq__ method to always return True, that means that all instances of this class will be considered equal under the == comparator - even when their names have different values!

Conversely, we can also do the following:

class MyAlwaysFalseClass:
  def __init__(self, name):
    self.name = name

  def __eq__(self, other):
    return False

You might think this is more sensible, but consider:

>>> a = MyAlwaysFalseClass("name")
>>> a == a
False

Moreover, because the behaviour of __eq__ is dependent on which object is self and which is other, we can get the following:

>>> jane = MyAlwaysTrueClass("Jane")
>>> bob = MyAlwaysFalseClass("Bob")
>>> jane == bob
True

>>> bob == jane
False

In summary: magic methods are a fun way to make Python do things that seem very strange.

Earlier, we mentioned that id(x) is constant and unique 'for the lifetime of x', which is equivalent to saying 'as long as x remains in memory'. That raises the question: once we've created an object x, how does its 'lifetime' end? Hold on 'til Part 3 of this series, where we'll get the answer!

More about Anvil

If you're new here, welcome! Anvil is a platform for building full-stack web apps with nothing but Python. No need to wrestle with JS, HTML, CSS, Python, SQL and all their frameworks – just build it all in Python.

Try Anvil - it's free forever!