loading...
Cover image for Python: Pattern Matching Proposal

Python: Pattern Matching Proposal

delta456 profile image Swastik Baranwal ・6 min read

As you Python language keeps evolving every time and adding new features and proposals. This time Python will be introducing pattern matching i.e. match statement.

Origins

The work has several origins:

  • Many statically compiled languages (especially functional ones) have a match expression, for example Scala, Rust, F#;
  • Several extensive discussions on python-ideas, culminating in a summarizing blog post by Tobias Kohn;
  • An independently developed draft PEP by Ivan Levkivskyi.

Draft

This draft is made by Guido Van Rossum which can be found here.

Semantics

The proposed large scale semantics for choosing the match is to choose the first matching pattern and execute the corresponding suite. The remaining patterns are not tried. If there are no matching patterns, the statement 'falls through', and execution continues at the following statement.

Essentially this is equivalent to a chain of if ... elif ... else statements. Note that unlike for the previously proposed switch statement, the pre-computed dispatch dictionary semantics does not apply here.

There is no default or else case - instead the special wildcard _ can be used (see the section on name_pattern) as a final 'catch-all' pattern.

Syntax

Literal Pattern

match number:
    case 0:
        print("Nothing")
    case 1:
        print("Just one")
    case 2:
        print("A couple")
    case -1:
        print("One less than nothing")
    case 1-1j:
        print("Good luck with that...")

Raw strings and byte strings are supported. F-strings are not allowed (since in general they are not really literals).

Name Pattern

A name pattern serves as an assignment target for the matched expression:

match greeting:
    case "":
        print("Hello!")
    case name:
        print(f"Hi {name}!")

Constant Value Pattern

It used to match against constants and enum values. Every dotted name in a pattern is looked up using normal Python name resolution rules, and the value is used for comparison by equality with the matching expression (same as for literals).

from enum import Enum

class Color(Enum):
    BLACK = 1
    RED = 2

BLACK = 1
RED = 2

match color:
    case .BLACK | Color.BLACK:
        print("Black suits every color")
    case BLACK:  # This will just assign a new value to BLACK.
        ...

Sequence Pattern

A sequence pattern follows the same semantics as unpacking assignment. Like unpacking assignment, both tuple-like and list-like syntax can be used, with identical semantics. Each element can be an arbitrary pattern; there may also be at most one *name pattern to catch all remaining items:

match collection:
    case 1, [x, *others]:
        print("Got 1 and a nested sequence")
    case (1, x):
        print(f"Got 1 and {x}")

To match a sequence pattern the target must be an instance of collections.abc.Sequence, and it cannot be any kind of string (str, bytes, bytearray). It cannot be an iterator.

The _wildcard can be starred to match sequences of varying lengths. For example:

  • [*_] matches a sequence of any length.
  • (_, _, *_), matches any sequence of length two or more.
  • ["a", *_, "z"] matches any sequence of length two or more that starts with "a" and ends with "z".

Mapping Pattern

Mapping pattern is a generalization of iterable unpacking to mappings. Its syntax is similar to dictionary display but each key and value are patterns "{" (pattern ":" pattern)+ "}". A **name pattern is also allowed, to extract the remaining items. Only literal and constant value patterns are allowed in key positions:

import constants

match config:
    case {"route": route}:
        process_route(route)
    case {constants.DEFAULT_PORT: sub_config, **rest}:
        process_config(sub_config, rest)

The target must be an instance of collections.abc.Mapping. Extra keys in the target are ignored even if **rest is not present. This is different from sequence pattern, where extra items will cause a match to fail. But mappings are actually different from sequences: they have natural structural sub-typing behavior, i.e., passing a dictionary with extra keys somewhere will likely just work.

For this reason, **_ is invalid in mapping patterns; it would always be a no-op that could be removed without consequence.

Class Pattern

A class pattern provides support for destructuring arbitrary objects. There are two possible ways of matching on object attributes: by position like Point(1, 2), and by name like User(id=id, name="Guest"). These two can be combined, but positional match cannot follow a match by name. Each item in a class pattern can be an arbitrary pattern. A simple example:

match shape:
    case Point(x, y):
        ...
    case Rectangle(x0, y0, x1, y1, painted=True):
        ...

Whether a match succeeds or not is determined by calling a special __match__() method on the class named in the pattern (Point and Rectangle in the example), with the value being matched (shape) as the only argument. If the method returns None, the match fails, otherwise the match continues with respect to. attributes of the returned proxy object, see details in runtime section.

This PEP only fully specifies the behavior of match() for object and some builtin and standard library classes, custom classes are only required to follow the protocol specified in runtime section.

Combining Multiple Patterns

Multiple alternative patterns can be combined into one using |. This means the the whole pattern matches if at least one alternative matches. Alternatives are tried from left to right and have short-circuit property, subsequent patterns are not tried if one matched. Like:

match something:
    case 0 | 1 | 2:
        print("Small number")
    case [] | [_]:
        print("A short sequence")
    case str() | bytes():
        print("Something string-like")
    case _:
        print("Something else")

The alternatives may bind variables, as long as each alternative binds the same set of variables (excluding _). For example:

match something:
    case 1 | x:  # Error!
        ...
    case x | 1:  # Error!
        ...
    case one := [1] | two := [2]:  # Error!
        ...
    case Foo(arg=x) | Bar(arg=x):  # Valid, both arms bind 'x'
        ...
    case [x] | x:  # Valid, both arms bind 'x'
        ...

Guards

Each top-level pattern can be followed by a guard of the form if expression. A case clause succeeds if the pattern matches and the guard evaluates to true value. For example:

match input:
    case [x, y] if x > MAX_INT and y > MAX_INT:
        print("Got a pair of large numbers")
    case x if x > MAX_INT:
        print("Got a large number")
    case [x, y] if x == y:
        print("Got equal items")
    case _:
        print("Not an outstanding input")

If evaluating a guard raises an exception, it is propagated on-wards rather than fail the case clause. Names that appear in a pattern are bound before the guard succeeds. So this will work:

values = [0]

match value:
    case [x] if x:
        ...  # This is not executed
    case _:
        ...
print(x)  # This will print "0"

Note that guards are not allowed for nested patterns, so that [x if x > 0] is a SyntaxError and 1 | 2 if 3 | 4 will be parsed as (1 | 2) if (3 | 4).

Named sub-patterns

It is often useful to match a sub-pattern and to bind the corresponding value to a name. For example, it can be useful to write more efficient matches, or simply to avoid repetition. To simplify such cases, a name pattern can be combined with another arbitrary pattern using named sub-patterns of the form name := pattern. For example:

match get_shape():
    case Line(start := Point(x, y), end) if start == end:
        print(f"Zero length line at {x}, {y}")

Note that the name pattern used in the named sub-pattern can be used in the match suite, or after the match statement. However, the name will only be bound if the sub-pattern succeeds. Another example:

match group_shapes():
    case [], [point := Point(x, y), *other]:
        print(f"Got {point} in the second group")
        process_coordinates(x, y)
        ...

Technically, most such examples can be rewritten using guards and/or nested match statements, but this will be less readable and/or will produce less efficient code.

_ is not a valid name here.

More

This article only covers the main features and syntax. For more information please refer to:

GitHub logo gvanrossum / patma

Pattern Matching

Pattern Matching

This repo contains a draft PEP proposing a match statement.

Origins

The work has several origins:

  • Many statically compiled languages (especially functional ones) have a match expression, for example Scala Rust F#;
  • Several extensive discussions on python-ideas, culminating in a summarizing blog post by Tobias Kohn;
  • An independently developed draft PEP by Ivan Levkivskyi.

Implementation

A full reference implementation written by Brandt Bucher is available as a fork of the CPython repo. This is readily converted to a pull request).

Examples

Some example code is available from this repo.

Tutorial

A match statement takes an expression and compares it to successive patterns given as one or more case blocks. This is superficially similar to a switch statement in C, Java or JavaScript (an many other languages), but much more powerful.

The simplest form compares a target value against one or more literals:

def http_error

Note

This is just a proposal only so minor things will change but most of the design is ready. You can check out the issues listed there.

Discussion

pic
Editor guide