Pattern Matching in Python 3.10
Intro
Today I want to talk about the new feature bring in Python 3.10 – Pattern matching 🎉
Those who have learned C language must be familiar with the following switch
statement:
switch (expression)
{
case constant_1:
// statements
break;
case constant_2:
// statements
break;
// Fall through
// the value of the expression can be either constant_3 or constant_4 :)
case constant_3:
case constant_4:
// statements
default:
// default statements
}
To recap, the syntax rules of the switch
statement:
expression
should be theint
orchar
type.constant
should be theint
orchar
constant- The execution process of the
switch
statement: Calculate the value ofexpression
, and compare this value with eachconstant
from top to bottom. If they are equal, the statements inside the specificcase
and all thecase
after the matchedcase
will be executed, unless it finds abreak
statement. This feature is called Fall through, and we can use this feature to stack multiplecase
statements together to represent a logical “or” relationship - The
default
branch will be executed when all the previouscase
branches fail to match
Python does not provide a switch
statement, but we can use if...elif..elif..else
to achieve the same effect, for example, suppose we want to perform different operations depending on the length of the list
, We can write this:
some_list = [1, 2, 3, 4, 5]
if len(some_list) == 1:
# do something when the length is 1
...
# or more pythonic way: elif len(some_list) in [3, 5]:
elif len(some_list) == 3 or len(some_list) == 5:
# do something when the length is 3 or 5
...
else:
...
The above series of if...elif..elif..else
is actually slightly less readable, and it also violates the DRY(Don’t repeat yourself) principle, we write len(some_list)
many times.
Of course, we can choose to use a variable length
to remember the length of some_list
first, so that we can type less code. However, if the situation is more complicated, this trick is not applicable.
A more elegant way to handle this situation is what this article is about: Pattern matching ⬇️
match len(some_list):
case 1:
# do something when the length is 1
...
case 3 | 5:
# do something when the length is 3 or 5
...
case _: # equal to the `default:`
...
Syntax rules
The syntax of the pattern matching is:
match subject:
case <pattern_1>:
<action_1>
case <pattern_2>:
<action_2>
case <pattern_3>:
<action_3>
# [Optional] wildcard to cover all situations
case _:
<action_wildcard>
Syntactically, it is similar to the aforementioned switch
statement in C. The differences are:
- There is no fall through in pattern matching. Only the code inside the matched
case
branch will be executed. So we don’t need to add abreak
statement at the end of eachcase
block - No
default
available. But we can usecase _
to capture all cases, which is called the Wildcard pattern that will be discussed later - The
subject
andpattern
here are much more powerful than the C language, not only the integer and char types, butpattern
can also be combined and nested with each other. It will be explained in detail later
Patterns
In pattern matching, a pattern basically does two things:
- Structure constraints: we can add constraints to the
subject
in various ways. For example the length, the specific value in a specific position, etc - Bind variables: we can bind some names in the pattern to component elements of the subject. See the Capture pattern below.
Below I discuss different patterns :)
To avoid confusion, it’s worth explaining in advance that both ()
and []
are optional when pattern matching matches a sequence. For example, case foo, bar
and case (foo, bar)
and case [foo, bar]
are equivalent. They have the same meaning.
Capture pattern
Capture pattern means that when we check whether a pattern
matches, we can use variable names to bind to any part of it, and we can use these variables later
some_list = ["foo", "bar"]
match some_list:
# we want to match a seq which has length = 2
# , we also use `first` and `second` to capture \
# the 1st and 2nd elements here.
case [first, second]:
# we can access first, second now
print(f'the 1st element: {first}, 2nd element: {second}')
the 1st element: foo, 2nd element: bar
People who often deal with sequences must be familiar with the following code:
*before, last = [1, 2, 3, 4]
assert last == 4, "Error"
first, *middle, last = [1, 2, 3, 4]
assert first == 1 and last == 4, "Error"
first, *rest = [1, 2, 3, 4]
assert first == 1, "Error"
Similarly, we can do this in pattern matching:
some_list = ["foo", "bar", "another_foo", "another_bar"]
match some_list:
# we want to match a seq
# , we also use `*rest` to capture the remaining elements
case [first, *rest]:
print(f'the 1st element: {first}, 2nd element: {rest}')
the 1st element: foo, 2nd element: ['bar', 'another_foo', 'another_bar']
Literal pattern
Literal pattern means that we can specify literal values in specific positions. The literals here can be number literals, string literals, True
, False
, and None
📒 Note: For number literals and string literals, python will use
==
for comparison, and forTrue/False/None
these three will useis
.
some_list = ["foo", "bar"]
match some_list:
# we want to match a seq which has length = 2
# , the 1st element should be equal to "foo"
# , and we use `second` to capture the 2nd element
case ["foo", second]:
print(f'the 2nd element: {second}')
the 2nd element: bar
some_list = [True]
match some_list:
case [1]:
print(f'Matched, 1 == True')
Matched, 1 == True
Wildcard pattern
It is a common practice in python that: use _
to indicate that we don’t care about the value. Similarly, _
can be used here, and it will not bind any variables
some_list = ["foo", "bar"]
match some_list:
# we want to match a seq which has length = 2
# , the 1st element should be equal to "foo"
# , and we use `_` to ignore the 2nd value
case ["foo", _]:
print(f'the 2nd value: {_}')
# you should see empty output because we aren't binding value here
the 2nd value:
Another common usage is the case _
that appeared before. The _
will match everything, so case _
is often put at the end to indicate the default case
some_list = ["foo", "bar"]
match some_list:
# this case branch will not be matched
case ["bar", _]:
print('Match successfully')
case _:
print('Default case')
Default case
Or pattern
Just like in the if
statement we can use or
, in pattern matching we also have a similar syntax. Like most other languages, Python chooses to use |
to express an “or” logical relationship. We can declare the alternatives conveniently
some_list1 = ["foo"]
some_list2 = ["bar"]
match some_list1:
# we want to match a seq which has length = 1
# , the 1st element can be "foo" or "bar"
case ["foo" | "bar"]:
print('[First match] Match foo or bar')
match some_list2:
case ["foo" | "bar"]:
print('[Second match] Match foo or bar')
[First match] Match foo or bar
[Second match] Match foo or bar
The disadvantage of the Or pattern above is that we have no way of knowing which one we matched exactly
As pattern
In the previous example, we may match any of the alternatives, so how do we know which one we match? Because we may need to decide what to do based on which one is matched. In pattern matching, we can use as
to bind a value
some_list = ["foo"]
match some_list:
# we want to match a seq which has length = 1
# , the 1st element can be "foo" or "bar"
# we bind matched string literal with `matched_element`
case ("foo" | "bar") as matched_element:
print(f'Match {matched_element}')
Class pattern
Python is a dynamically typed language, and sometimes we need to decide whether to match based on the type. The intuitive way is using isinstance()
to check if it is qualified, but there is a better way. Next, I will show you how to add type constraints step by step:
some_list = ["foo", 1, 3.14]
match some_list:
# match without type constraints
case [s, v1, v2]:
if isinstance(s, str) and isinstance(v1, int) and isinstance(v2, float):
print(f'Match {s} - {v1} - {v2}')
Match foo - 1 - 3.14
The intuitive way: we use the Capture pattern to bind values, and use isinstance
to check the type in the code block
some_list = ["foo", 1, 3.14]
match some_list:
# match with type constraints
case [str() as s, int() as v1, float() as v2]:
print(f'Match {s} - {v1} - {v2}')
Match foo - 1 - 3.14
The syntax here is quite similar to the previous Literal pattern + As pattern. i.e. We declare the type we want to match in the corresponding position. At the same time, we use the as
keyword to bind the value.
Less typing but still less elegant :(. Fortunately, python provides us with syntactic sugar 🍬
some_list = ["foo", 1, 3.14]
match some_list:
# match with type constraints
case [str(s), int(v1), float(v2)]:
print(f'Match {s} - {v1} - {v2}')
Match foo - 1 - 3.14
Mapping pattern
The previous ones are all matching against a sequence, here we are matching against the dict
. I believe that after reading the previous examples, it is not difficult to understand the following example. But there are a few things to keep in mind:
- we match
dict
based on its present keys. The keys here must be a literal value or a value of enum type (for performance considerations), values do not have this restriction - we can use
**<name>
to capture the key-value pair we didn’t write inpattern
. Otherwise the undeclared key-value pairs will be ignored
some_dict = {
'first_name': 'foo',
'second_name': 'bar'
}
match some_dict:
case {'first_name': first_name}:
print(f'[First match] The first_name: {first_name}')
match some_dict:
case {'first_name': first_name, **rest}:
print(f'[Second match] The rest: {rest}')
[First match] The first_name: foo
[Second match] The rest: {'second_name': 'bar'}
Value pattern
It’s good practice to use “named variables” as parameter values or to clarify the meaning of specific values. e.g. case (HttpStatus.OK, body)
is better than case (200, body)
The challenge of implementing the Value pattern in Python is to distinguish it from the previous Capture pattern. A discussion of this can be found at 1
The final solution provided by python is a restricted Value pattern that only supports Value pattern of the form foo.bar
. Common usage will be combined with the enum type, see the following example
from enum import Enum
class HttpStatusCode(Enum):
CONTINUE = 100
OK = 200
some_list = [HttpStatusCode["OK"]]
match some_list:
case [HttpStatusCode.OK as status_code]:
print(f"Receive {status_code}")
Receive HttpStatusCode.OK
Use pattern matching on a class
If we can only use pattern matching on built-in types, it doesn’t seem to be that useful. But in fact, Python also allows us to use pattern matching on objects of our own custom classes.
Considering the application scenario, when we use pattern matching on an object, we often want to check whether the object is a certain class type. We may also care about some of its fields and want to extract the values.
But in Python, this is difficult to implement1, mainly because the class has a lot of fields, most of which are special methods like __repr__
, and these fields are unordered. Because it is unordered, we can’t directly bind variables by position in pattern
, see the following example:
class Point:
""" A simple class represents a Point in a 2D"""
def __init__(x: int, y: int):
self.x = x
self.y = y
some_point = Point(1, 2)
match some_point:
# the intuitive way, we want to match a Point type
# , and we want to bind the `x` and `y` and their two fields respectively
case Point(x, y):
print(f"The x: {x}")
print(f"The y: {y}")
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Input In [17], in <cell line: 7>()
4 self.x = x
5 self.y = y
----> 7 some_point = Point(1, 2)
9 match some_point:
10 # the intuitive way, we want to match a Point type
11 # , and we want to bind the `x` and `y` and their two fields respectively
12 case Point(x, y):
13 print(f"The x: {x}")
14 print(f"The y: {y}")
TypeError: Point.__init__() takes 2 positional arguments but 3 were given
Python provides two solutions, which are syntactically very similar to calling a function: we can choose to pass arguments by position, or we can choose to use the form foo=bar
Let’s talk about the simple one first: add constraints by using foo=bar
, which means that the object should have a field called foo
and we want to bind bar
to it.
class Point:
""" A simple class represents a Point in a 2D"""
def __init__(self, x: int, y: int):
self.x = x
self.y = y
some_point = Point(1, 2)
match some_point:
# the intuitive way, we want to match a Point type
# , and we want to bind the `x` and `y` and their two fields respectively
case Point(x=x, y=y):
print(f"The x: {x}")
print(f"The y: {y}")
The x: 1
The y: 2
Another solution is to modify the __match_args__
attribute of the class, which specifies the order of the fields
class Point:
""" A simple class represents a Point in a 2D"""
# we tell python that the order is first "x" and then "y"
__match_args__ = ("x", "y")
def __init__(self, x: int, y: int):
self.x = x
self.y = y
some_point = Point(1, 2)
match some_point:
# the intuitive way, we want to match a Point type
# , and we want to bind the `x` and `y` and their two fields respectively
case Point(x, y):
print(f"The x: {x}")
print(f"The y: {y}")
The x: 1
The y: 2
if you are familiar with @dataclass
2, you can do this:
from dataclasses import dataclass
@dataclass(match_args=True)
class Point:
""" A simple class represents a Point in a 2D"""
x: int
y: int
print(f"The order is {Point.__match_args__}")
some_point = Point(1, 2)
match some_point:
# the intuitive way, we want to match a Point type
# , and we want to bind the `x` and `y` and their two fields respectively
case Point(x, y):
print(f"The x: {x}")
print(f"The y: {y}")
The order is ('x', 'y')
The x: 1
The y: 2
Guard
Sometimes we not only care about whether the pattern matches, but we also want to impose certain restrictions.
Consider such a situation where we want to match a sequence containing two int
values, but we require the first element shoudld be larger than the second. How would you write it? Combined with the previous Class pattern, it is not difficult for us to write the following code:
some_list = [3, 4]
match some_list:
case [int(first), int(second)]:
if first > second:
...
else:
print("Expect first > second. Match failed")
Expect first > second. Match failed
The above solution is fine, we definitely can use the if
statement in the code block to add constraints. But just like type constraints, Python has taken this requirement into account, so it provides a Guard 💂♀️ mechanism, which allows us to move the if
statement to the end of the pattern
for readability.
The syntax rules to follow:
match subject:
case <pattern> if <expression>:
...
<pattern>
followed by anif
statement- Note how python evaluates this here:
- Check if
<pattern>
matches - If it matches, bind values if we declared
- Check if the
if <expression>
statement returnsTrue
. The<expression>
here can use the previously bound variable in step 2
- Check if
- The code inside the code block will be executed if and only if
<pattern>
matches +if
statement returnsTrue
. Otherwise, it will fail and try to match the next<pattern>
some_list = [3, 4]
match some_list:
case [int(first), int(second)] if first > second:
print("Match successfully!")
Wrap up
I would prefer pattern matching to the if...elif...elif...else
for several reasons:
- We can easily bind values for subsequent processing when matching
- For better readabilityg
- The various patterns of pattern matching can actually be nested and combined, that’s what makes pattern matching a shining point.
The above is a brief introduction to pattern matching introduced in Python 3.10 🚀