87 Lesser-known Python Features
This post is for people who use Python daily, but have never actually sat down and read through all the documentation.
If you’ve been using Python for years and know enough to get the job done, it’s probably not a wise use of your time to read through several thousand pages of documentation just to maybe discover a handful of new tricks. So I’ve put together a short (by comparison) list of features that aren’t widely used, but are widely applicable to general programming tasks.
This list will of course contain some stuff you already know and have a sprinkling of things you’ll never use, but among them, I hope there are a few that are new and useful to you.
First, a bit of housekeeping:
- I’ll assume you’re on at least Python 3.8. For the few features added after this, I’ll mention when they were added.
- There’s no strict order to the list, but the more basic features are toward the start.
- Wherever you see
assert
on this page, you can assume that the assert passes. - If you would like to quench my curiosity, you can highlight the titles of the ones you find useful.
Now, on to the list…
1. help(x)
help
can take a string, an object, or anything. For class instances, this will collect methods from all parent classes, and show you what is inherited from where (and the method resolution order for methods).
This is also useful when you want help on something that’s hard to search for, like or
. Typing help('or')
will be much faster than trying to find the part of the Python docs that describes how or
works.
2. 1_000_000
You can use an underscore as a thousand separator. This makes large numbers more readable.
x = 1_000_000
You can format a number in this way, too:
assert f"{1000000:_}" == "1_000_000"
3. str.endswith() takes a tuple
if filename.endswith((".csv", ".xls", ".xlsx"))
# Do something spreadsheety
The same is true for startswith()
Here's endswith in the docs.
4. isinstance() takes a tuple
If you want to check if an object is an instance of one of several classes, you don’t need multiple isinstance
expressions, just pass a tuple of types as the second argument. Or — from Python 3.10 onwards — use a union:
assert isinstance(7, (float, int))
assert isinstance(7, float | int)
More on isinstance and union types.
5. … is a valid function body
I use ...
to mean “I’ll get to this shortly” (before committing any changes) and pass
to mean something closer to a no-op.
def do_something():
...
6. The walrus operator :=
The walrus operator (AKA assignment expressions) allows you to assign a value to a variable, but as an expression, which means that you can do something with the resulting value.
This is useful in if
statements and even more useful with elif
:
if first_prize := get_something():
... # Do something with first_prize
elif second_prize := get_something_else():
... # Do something with second_prize
Just remember that if you want to compare the result with something, you’ll need to wrap the walrus expression in parentheses like so:
if (first_prize := get_something()) is not None:
... # Do something with first_prize
elif (second_prize := get_something_else()) is not None:
... # Do something with second_prize
These can be used in plenty of places, to borrow an example from the original PEP:
filtered_data = [y for x in data if (y := f(x)) is not None]
See the Assignment expressions in the docs.
7. attrgetter and itemgetter
If you need to sort a list of objects by a specific property of those objects, you can use attrgetter
(when the value you’re after is an attribute, e.g. for class instances) or itemgetter
(when the value you’re after is an index or dictionary key).
For example, to sort a list of dicts by the "score"
key of the dicts:
from operator import itemgetter
scores = [
{"name": "Alice", "score": 12},
{"name": "Bob", "score": 7},
{"name": "Charlie", "score": 17},
]
scores.sort(key=itemgetter("score"))
assert list(map(itemgetter("name"), scores)) == ["Bob", "Alice", "Charlie"]
attrgetter
is similar, used when you would otherwise use dot notation. This can do nested access too, such as name.first
in the example below:
from operator import attrgetter
from typing import NamedTuple
class Name(NamedTuple):
first: str
last: str
class Person(NamedTuple):
name: Name
height: float
people = [
Person(name=Name("Gertrude", "Stein"), height=1.55),
Person(name=Name("Shirley", "Temple"), height=1.57),
]
first_names = map(attrgetter("name.first"), people)
assert list(first_names) == ["Gertrude", "Shirley"]
There’s also a methodcaller
that does what you might guess.
If you don’t know what a NamedTuple
does, keep reading…
See the operator module docs for more info.
8. Operators as functions
All the familiar operators like +
, <
!=
have functional equivalents in the operators
module. These are useful if you’re iterating over collections, for example looking for a mismatch between two lists using is_not
:
import operator
list1 = [1, 2, 3, 4]
list2 = [1, 2, 7, 4]
matches = list(map(operator.is_not, list1, list2))
assert matches == [False, False, True, False]
Here’s a table of operators and their functional equivalents.
9. Sorting a dictionary by its values
You can use key
to sort a dict by its values. In this case, the function that you pass to key
will be called with each key of the dict and return a value, which is what the get
method of a dictionary does.
my_dict = {
"Plan A": 1,
"Plan B": 3,
"Plan C": 2,
}
my_dict = {key: my_dict[key] for key in sorted(my_dict, key=my_dict.get)}
assert list(my_dict.keys()) == ['Plan A', 'Plan C', 'Plan B']
10. Create a dict from tuples
dict()
can take a sequence of tuples of key/value pairs. So if you have lists of keys and values, you can zip them together to turn them into a dictionary:
keys = ["a", "b", "c"]
vals = [1, 2, 3]
assert dict(zip(keys, vals)) == {'a': 1, 'b': 2, 'c': 3}
The dict docs.
11. Combining dicts with **
You can combine dictionaries by using the **
operator to unpack the dicts into the body of a dict literal:
sys_config = {
"Option A": True,
"Option B": 13,
}
user_config = {
"Option B": 33,
"Option C": "yes",
}
config = {
**sys_config,
**user_config,
"Option 12": 700,
}
Here the user_config
will override the sys_config
since it comes later.
Dictionary unpacking in the docs.
12. Updating dicts with |=
If you want to extend an existing dict, without doing it one key at a time, you can use the |=
operator.
config = {
"Option A": True,
"Option B": 13,
}
# later
config |= {
"Option C": 7,
"Option D": "bananas",
}
This was added in Python 3.9
13. defaultdicts, or setdefault
Let’s say you want to create a dict of lists from a list of dicts. A naive approach might be to create a placeholder dict, then check whether or not you need to create a new key at every step of the loop.
pets = [
{"name": "Fido", "type": "dog"},
{"name": "Rex", "type": "dog"},
{"name": "Paul", "type": "cat"},
]
pets_by_type = {}
# Bad code
for pet in pets:
# If the type isn't already a key,
# we'll need to create it with an empty list
if pet["type"] not in pets_by_type:
pets_by_type[pet["type"]] = []
# Now we can safely call .append()
pets_by_type[pet["type"]].append(pet["name"])
assert pets_by_type == {'dog': ['Fido', 'Rex'], 'cat': ['Paul']}
A cleaner way is to create a defaultdict
. In the below, when I attempt to read a key that doesn’t exist, it will automatically create that key with the default value of an empty list.
pets_by_type = defaultdict(list)
for pet in pets:
pets_by_type[pet["type"]].append(pet["name"])
The downsides are you need to import defaultdict
and if you want a true dict at the end you’d need to do dict(pets_by_type)
.
Another option is to use a plain dict with the oddly named .setdefault()
:
pets_by_type = {}
for pet in pets:
pets_by_type.setdefault(pet["type"], []).append(pet["name"])
Think of “set default” as being “get, but set with default when required”.
Bonus tip: you can create a dict that returns None
if a key doesn’t exist with defaultdict(lambda: None)
.
Here’s default dict in the docs.
14. TypedDict for … typed dicts
If you want to define types for the values of your dict:
from typing import TypedDict
class Config(TypedDict):
port: int
name: str
config: Config = {
'port': 4000,
'name': 'David',
'unknown': True, # warning, wrong type
}
port = config["poort"] # warning, what's a poort?
Depending on your IDE (I use PyCharm) you’ll get the correct auto-complete suggestions (types aren’t just about robust code, they’re about being lazy and typing fewer characters.)
I find this particularly useful for adding types to dictionaries that come from other sources. If I want types for an object that I create, I would rarely use a dict.
15. a // b
does not always return an int
Let’s say you have a function that splits a list into a certain number of batches. Can you spot the potential bug in this code?
# Bad code
def batch_a_list(the_list, num_batches):
batch_size = len(the_list) // num_batches
batches = []
for i in range(0, len(the_list), batch_size):
batches.append(the_list[i : i + batch_size])
return batches
my_list = list(range(13))
batched_list = batch_a_list(my_list, 4)
This would raise a TypeError
: batch_a_list(my_list, 4.)
The problem occurs because when num_batches
is a float, then len(the_list) // num_batches
also returns a float, which is not a valid index so raises an error.
Specifically, if either side of the //
is a float, then both sides will be converted to floats and the result will be a float.
int(a / b)
is the safer bet.
As mentioned in the Expressions docs, but not in a particularly clear way.
16. Circular references are fine without ‘from’
You can have two modules that each import the other, this is not an error. In fact, I think it’s kinda sweet.
Problems only arise when one of them does from other_module import something
. So if you’re getting circular reference errors, consider dropping from
from the problem imports.
Programming FAQ: What are the “best practices” for using import in a module?
17. Use ‘or’ to set defaults
Say you have a function with an optional argument. The default value should be an empty list, but you can’t do that in the function signature (all calls to the function would share the same list), so you need to set the default in the body of the function.
You don’t need a two-line check for None
like I often see:
def make_list(start_list: list = None):
if start_list is None:
start_list = []
...
You can instead just use or
.
def make_list(start_list: list = None):
start_list = start_list or []
...
If start_list
is truthy (a non-empty list) it will be used, otherwise it will be set to an empty list.
This is only a reasonable approach if the function won’t be called with a value where bool(value) == False
.
18. Use default_timer as your … default timer
There are a lot of ways to get the time in Python. If your plan is to calculate the difference between two points in time, here are some incorrect ways to do it:
# Wrong
start_time = datetime.now()
start_time = datetime.today()
start_time = time.time()
start_time = time.gmtime()
start_time = time.localtime()
# Less wrong but will fail on Windows
start_time = time.clock_gettime(1)
start_time = time.clock_gettime(time.CLOCK_MONOTONIC) # same thing
You don’t want time to go backwards, but the first five above will allow such a situation (daylight savings, leap seconds, clock adjustments, etc.). Bugs caused by the belief that time can’t go backwards are rare but do happen and can be darn hard to track down.
The best way is this:
start_time = timeit.default_timer()
This is easy to remember and gives you a time that can’t go backwards.
As of Python 3.11, this points to time.perf_counter
, so there are two more honorable mentions:
start_time = time.perf_counter()
start_time = time.perf_counter_ns() # nanoseconds
19. ‘_’ in the interpreter: the last result
This is handy if you’ve performed some operation and a value has been output and now you want to do something with that output.
>>> get_some_data() # Takes a while to run
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
34, 35, 36, 37, 38, 39])
>>> _.mean()
19.5
type(_)
is another one I often use. Just beware that you can only use it once, because you’ll then have a new output, and that’s what the _
will then refer to. (For IPython users, you can type _4
to refer to Out[4] — docs).
The _
has other jobs too, see the lexical analysis docs for more, and also the next section…
20. *_ to gather unrequired elements
If you’re calling a function that returns many values in a tuple but you only want the first one, you could append [0]
to the call, or ignore the other returned values with *_
:
def get_lots_of_things():
return "Puffin", (0, 1), True, 77
bird, *_ = get_lots_of_things()
The *
denotes iterable unpacking. The single underscore is a convention for an unused variable (most type/style checkers won’t give an ‘unused variable’ warning when the variable name is _
).
This behaviour was defined in PEP 3132.
21. dict keys need not be strings
All sorts of things can be dict keys: functions, tuples, numbers — anything hashable.
Let’s say you wanted to represent a graph as a dict, where each key is a pair of nodes and the value describes the edge between them. You could do this with tuples, but if you want (a, b)
to return the same value as (b, a)
you’ll need a set. Sets aren’t hashable, but frozen sets are, so we have:
graph = {
frozenset(["a", "b"]): "The edge between a and b",
frozenset(["b", "c"]): "The edge between b and c",
}
assert graph[frozenset(["b", "a"])] == graph[frozenset(["a", "b"])]
Clearly, if you wanted to do this for reals you’d do something like extend dict:
class OrderFreeKeys(dict):
def __getitem__(self, key):
return super().__getitem__(frozenset(key))
def __setitem__(self, key, value):
return super().__setitem__(frozenset(key), value)
graph = OrderFreeKeys()
graph["A", "B"] = "An edge between A and B"
assert graph["B", "A"] == "An edge between A and B"
This second example uses capital letters for A and B because variety is the spice of life.
22. __call__
makes a class instance callable
If you want to create a function that has some sort of state, you could use a function in a function (creating a ‘closure’), or create a class with a __call__
method.
class LikeAFunction:
counter = 0
def __call__(self, *args, **kwargs):
self.counter += 1
return self.counter
func = LikeAFunction()
assert func() == 1
assert func() == 2
assert func() == 3
23. Format a number as a percentage
If you’ve got a value like 0.1234
that you want to display as 12.3%
, no need to multiply it by 100, format as a float, and append a percentage sign, just use the %
presentation type.
pct = 0.1234
assert f"{pct * 100:.1f}%" == "12.3%" # The old way
assert f"{pct:.1%}" == "12.3%" # The new way
Docs on the Format Specification Mini-Language.
24. Format a number with commas
Thousands separators are underused, says me. Just add :,
after your value or optionally define the number of decimals to show, like so:
num = 1234.56
assert f"{num:,}" == "1,234.56"
assert f"{num:,.0f}" == "1,235"
Keep in mind that not everyone on earth uses the comma for a thousands separator, so a more robust approach is to use the user’s locale. But if you’re, say, producing an image of a chart so can’t format the values dynamically, a comma is better than nothing.
25. Format different scale numbers with ‘g’
If you want a format that will handle numbers from 1,000,000 to 0.0000001 with style, try g
. By default, it will switch to scientific notation above/below certain thresholds:
assert f"{1e-6:g}" == "1e-06"
assert f"{1e-5:g}" == "1e-05"
assert f"{1e-4:g}" == "0.0001"
assert f"{1e-3:g}" == "0.001"
assert f"{1e-2:g}" == "0.01"
assert f"{1e-1:g}" == "0.1"
assert f"{1e0:g}" == "1"
assert f"{1e1:g}" == "10"
assert f"{1e2:g}" == "100"
assert f"{1e3:g}" == "1000"
assert f"{1e4:g}" == "10000"
assert f"{1e5:g}" == "100000"
assert f"{1e6:g}" == "1e+06"
assert f"{1e7:g}" == "1e+07"
You can configure the threshold, which is explained oh-so-simply in the docs.
26. Format a number as currency
You can format a number as a currency for a given locale without manually placing dollar signs and the like.
Simply set the locale… No just kidding, nothing about locale
is simple. To quote the docs: “[excuses, excuses] … This makes the locale somewhat painful to use correctly”.
But, once you know what to do, and have worked out what locale to use, it’s really not so bad:
# Either
locale.setlocale(locale.LC_ALL, 'en_US.UTF-8') # Set a specific locale
# OR
os.environ['LANG'] = "en_US.UTF-8" # The next line relies on LANG
locale.setlocale(locale.LC_ALL, '') # Tell Python to use the 'preferred' locale, e.g. LANG
# Then you're good to go, lovely currency formatting
assert locale.currency(1234, grouping=True) == "$1,234.00"
Where do you find the exact string to set the locale for your country/currency/system? There’s a list in an undocumented attribute of the locale
module, locale_alias
. Here’s the list in the GitHub repo.
Or, on Linux and macOS, you can run locale -a
to get a list of available locales. In PowerShell, it’s Get-WinUserLanguageList
, although you might need to cross-reference those results against the locale_alias
list to find the exact string.
27. Print an expression and its value with ‘=’
When formatting an expression as an f-string, adding =
afterwards indicates that you want to see the expression and the result.
def get_value():
return 77
assert f"{get_value() = }" == "get_value() = 77"
You can do this with or without spaces around the =
sign (this will be reflected in the output), and can add :
and a format specifier after the equal sign.
Here’s a quick overview of f-strings and more detail on the Lexical analysis page.
28. Use Path for dealing with files
There are still tutorials out there suggesting the use of with open()
to open files, but I think this is unnecessarily clunky for most cases. A lot of the time you can just use Path.read_text()
:
from pathlib import Path
file_contents = Path('my.log').read_text()
Alas, it needs an import, but Path
objects have all sorts of neat properties, like testing for existence, automatic file-creation when writing, and joining paths in an OS-agnostic way. Here’s a smattering of examples:
import datetime
import json
from pathlib import Path
file_contents = Path("my.log").read_text() # one big string
file_lines = Path("my.log").read_text().splitlines() # list of lines
path = Path(__file__).parent / "config.json" # relative to this module
if not path.exists(): # Check for existence
path.parent.mkdir(parents=True, exist_ok=True) # Create directories
path.write_text("{}") # Writing creates a new file by default
config = json.loads(path.read_text()) # load/parse JSON
config["last_modified"] = str(datetime.datetime.utcnow())
path.write_text(json.dumps(config)) # save as JSON
There’s even path.glob()
or its recursive sibling path.rglob()
to find all files matching a pattern (where the path
in question refers to a directory, not a file).
29. Booleans are ints
If you have a function that accepts either an int
or a bool
, you need to be careful when checking what you’ve got, because bools are ints.
def do_something(any_value):
if isinstance(any_value, int):
return f"I got your int {any_value}"
if isinstance(any_value, bool):
return f"I got your bool {any_value}"
assert do_something(7) == "I got your int 7"
assert do_something(True) == "I got your int True" # Not bool!
So just remember that more specific checks should come first, which in this case means switching the order of the if statements.
30. Lazy load modules
Python modules support a top-level __getitem__
function that you can use to lazy-load submodules. For example, I have my own collection of data science utilities, which are handy, but if I import them all, (and all the third-party packages they use), it takes 4 seconds to load. And I just don’t have that sort of free time.
The clunky approach is to require sub-modules to be explicitly imported before being used.
The smart thing to do is to allow dot-notation access all the way from the top module, but wait until a module is needed before loading it. The following goes in the top-level __init__.py
file.
__all__ = ["data", "mpl", "pd", "pl", "pt", "sk", "vega"]
def __getattr__(name):
if name in __all__:
return importlib.import_module(f".{name}", __name__)
raise AttributeError(f"module {__name__!r} has no attribute {name!r}")
I still only need a single import utils
; when I do utils.pt.some_helper()
it will load the file utils/pt.py
(which loads PyTorch which takes quite a while). If I never reference utils.pt
, then it won’t load that module.
The __all__
statement is required for typing/auto-complete.
By the way, !r
on the last line above is a way to print quote marks around values in a format string. It calls repr
.
This lazy-loading approach reduced the loading time for my utils from 4 seconds to 0.4 seconds. That’s, like, at least 9 times faster.
Package authors, please do this!
See Customizing module attribute access in the docs for more.
31. divmod is // and %
Let’s say you want to iterate over all the x/y coordinates of a grid. divmod
is a handy tool for this.
grid = [divmod(x, 3) for x in range(2 * 3)]
assert grid == [
(0, 0),
(0, 1),
(0, 2),
(1, 0),
(1, 1),
(1, 2),
]
The above is a 2 x 3 grid, but it also works for 7 x 4 grids, and perhaps even others!
You can picture this as divmod
restricting movement to a certain number of columns. So divmod(5, 3)
turns the 5
into row 1, column 2.
There are two other ways to achieve this that are worth a mention:
import itertools
grid = [divmod(x, 3) for x in range(2 * 3)]
assert grid == list(itertools.product(range(2), range(3)))
assert grid == [(row, col) for row in range(2) for col in range(3)]
32. Lists of month and day names
The calendar
module has lists of month and days names. This can be useful for sorting by day name (e.g. in a chart axis) without having to first convert to a day-of-week integer.
import calendar
# A list of days you would like to have sorted
days = ["Tuesday", "Monday", "Saturday", "Monday"]
day_names = list(calendar.day_name)
days.sort(key=day_names.index)
assert days == ['Monday', 'Monday', 'Tuesday', 'Saturday']
Here’s day_name in the calendar docs, there’s also month_name
, and abbreviated versions of both.
33. Counter: it counts things
Let’s say you wanted to count the occurrences of each word in a text. One way is to print them out and use a pen and paper to keep track. An equally bad approach is to manually create a dict where each word is a key, and the value is the count of that word, then loop over all the words, incrementing counters.
The best way is to use Counter
.
import re
from collections import Counter
words = re.findall(
pattern=r"[\w'-]+(?<!\d)",
string="I'll e-mail François from the café via e-mail! Hey, qu'est-ce?",
)
word_frequencies = Counter(words)
assert word_frequencies.most_common(1)[0] == ("e-mail", 2)
Here’s Counter in the collections docs.
34. getpass(), not input()
If you’re creating a CLI interface and need to prompt a user to provide their password, don’t use input()
. Do this instead:
from getpass import getpass
password = getpass("What's your password?")
There’s also a getuser()
function which gets the user’s name without even asking them.
35. Identifiers don’t need to be Ascii
You can use many (but not just any) unicode characters in identifier names.
from math import tau as τ, exp, sqrt
def normal_dist_pdf(μ, σ, x):
return exp(((x - μ) / σ) ** 2 / -2) / (σ * sqrt(τ))
I am not condoning this behaviour, only stating that it is possible. Clearly it’s a pain to type and modify such code, but if you have a function with a high read-to-modify ratio, and using the correct symbols makes it easier to comprehend (and you’re fond of weirdness), go for it.
Full details of what’s allowed.
36. In-place printing with \r
With print
, you can set end="\r"
so that the next print
will overwrite what you’ve printed, because \r
means return to the beginning of the line.
I find this approach useful in loops where I want to print progress, but don’t want a wall of logs.
import time
total_steps = 17
for step in range(total_steps):
print(f"Processing {(step+1)/total_steps:.0%}", end="\r")
time.sleep(0.1)
I wouldn’t ship this in package code, because it can be messy, and doesn’t behave well everywhere, but for quick and dirty progress-logging it works a treat.
37. atexit as a decorator
If you need to run some code, even after a crash, use atexit
. The easiest way to implement this is to decorate a function with @atexit.register
.
import atexit
def do_raise():
raise ValueError("Oh no, we crashed")
@atexit.register
def cleanup():
# Close connections, persist settings, etc
print("Finished in style regardless")
do_raise()
This will run, raise an error, and then print "Finished in style regardless"
. (Note though this might not fire in something like the PyCharm Python Console).
38. Functions can have attributes
def multiply(a, b):
return a * b
multiply.test_cases = [
((2, 3), 6),
((0, 1), 1),
((4, 4), 16),
]
I used this just the other day for a package that expected a callable class instance with a name
attribute. I already had a function that did the job, so I just added my_func.name = "MyFunc"
to appease the third-party package.
39. zip(*list) for transposing
This is useful if you’ve got data in some 2D structure and you want to extract the columns into variables.
combined = [
["this", [1, 2, 3]],
["that", [6, 5, 4]],
]
labels, values = zip(*combined)
assert labels == ("this", "that")
assert values == ([1, 2, 3], [6, 5, 4])
If you find it hard to wrap your head around what’s happening here, you’re not alone. My only advice is to tinker about until it clicks. zip
is useful surprisingly often.
More about the built-in zip function.
40. http.server couldn’t be simpler
The below will start a simple http server, serving static files from wherever you run the command.
python -m http.server
On some systems (e.g. WSL) you may need to add --bind localhost
.
The docs are very clear that this is not for production.
41. webbrowser.open()
If you want to open a URL in the default browser (as defined by the operating system):
import webbrowser
webbrowser.open("http://localhost:8080")
42. copy() doesn’t always really copy
To copy a list
or dict
you can write my_list.copy()
or my_dict.copy()
. So far, so good. But if you think that a change to the original version can’t affect the copy, I regret to inform you that you’re wrong.
The complicating factor is compound objects, for example a list of dicts. copy()
will create a new list, but each dict in the list will be a reference to the same object in the original list.
The below shows the difference between a shallow copy and a deep copy (which really does copy) for a list-of-dicts.
from copy import deepcopy
my_list = [{"one": 1}]
# Create two types of copies
shallow_copy = my_list.copy()
deep_copy = deepcopy(my_list)
# Change the original
my_list.append({"two": 2})
my_list[0]["one"] = 77
# Look at the changes
assert my_list == [{"one": 77}, {"two": 2}]
assert shallow_copy == [{"one": 77}] # Mutated!
assert deep_copy == [{"one": 1}]
The copy module docs explain in more detail.
43. dict.keys() and set comparison
Three things: a) if you subtract one set from another, you’re left with the items that are only in the first set. b) you can use <
to check if one set doesn’t have all the elements of another set. c) dict.keys()
is set-like.
Combining these ideas we get the following example: if you’re expecting a dictionary with a certain set of keys, you can check and warn about missing keys.
user_info = {
"name": "Sam",
"age": 6,
}
required_keys = {"name", "age", "weight"} # A set
if user_info.keys() < required_keys:
print(f"Missing info:", *required_keys.difference(user_info.keys()))
44. Flatten a list with sum
You can combine two lists with +
, a symbol that (you may have heard) also means ‘sum’. Extending this thought, you can combine many lists with the sum
function, as long as you tell it you want to start with an empty list.
list_of_lists = [
[1, 2, 3],
[3, 4, 5],
[5, 6, 7],
]
flat_list = sum(list_of_lists, start=[])
assert flat_list == [1, 2, 3, 3, 4, 5, 5, 6, 7]
This is handy for flattening a list of a few lists, but if fast code is your jam and you have thousands of lists, then you should know it’s ten times faster to use a comprehension:
flat_list = [item for row in list_of_lists for item in row]
Nested for
loops in comprehensions can be a bit of an eyeful, but if you remember that the for
loops appear in the same order that they would if you wrote them out in their nested form, they’re not so bad.
itertools.chain(*list_of_lists)
also does the job and is on par with a comprehension for speed.
Speaking of lists and fastness…
45. deque: the fast list, sometimes
If you have a list and find yourself inserting items at the start on a regular basis, a deque
(pronounced ‘deck’) will do the same job in about 1% of the time.
import collections
my_list = []
my_list.insert(0, "left") # Slow as molasses
my_deque = collections.deque()
my_deque.appendleft("left") # Fast as molasses on a plane
Here’s deque in the collections docs. Side note: if you only read one page in the docs, make it the collections page. Followed by math, itertools, and functools. I choose to offer no rationale for this advice.
46. all and any
You can boil down a sequence of booleans into a single boolean with any()
and all()
.
Let’s say you’ve got a list of messages, and a list of things that you care about, and you want to know if any of the things you care about is in the list of messages.
concerns = ["money", "time", "cat"]
messages = ["The cat is sick", "The dog ran away", "The fish is bloated"]
# One of the nine combinations is True, so any() returns True
assert any(concern in message for message in messages for concern in concerns)
If you’re a fan of the functional style, you can use itertools.product
, itertools.starmap
, and operator.contains
to get the same result:
from itertools import product, starmap
from operator import contains
concerns = ["money", "time", "cat"]
messages = ["The cat is sick", "The dog ran away", "The fish is bloated"]
assert any(starmap(contains, product(messages, concerns)))
Here’s all and any on the built-ins page.
47. Optional does not make an arg optional
Let’s say you have a class with an optional argument. I sometimes see code like this, attempting to indicate that short_name
is optional.
from typing import Optional
# Bad code
class Config:
def __init__(
self,
port: int,
host: str,
short_name: Optional[str] = None,
):
self.port = port
self.host = host
self.short_name = short_name
But Optional
doesn’t mean optional (seriously!), it means None
is allowed. Specifically, Optional[str]
is shorthand for str | None
. (Using the pipe operator to denote a union was added in Python 3.10.)
The below shows the difference between “required”, “required, None
is acceptable”, and “not required”
from typing import Optional
class SomeClass:
def __init__(
self,
required: str,
required_none_is_ok: Optional[str],
not_required: str = None,
):
...
My preference is to not use Optional
. If None
is an acceptable value, I’ll almost always make it the default, thus making the argument optional. In rare cases where I want to force the user to provide a value, even if it’s None
, I’ll use str | None
rather than Optional[str]
simply because Optional
is confusing.
Here’s the docs on Optional explaining that “this is not the same concept as an optional argument”.
48. Six ways to print multi-line text
Let’s say you want to print a multi-line message, have each line start without an indent, and have it align nicely in the source code too. You have a few options.
I recommend the last one, the others are to show why this is a fiddly problem.
import textwrap
def function_that_prints_things():
# Triple-quote and manually dedent. Gross.
print(
"""line 1 Lorem ipsum dolor sit amet, consectetur adipiscing
line 2 Lorem ipsum dolor sit amet, consectetur adipiscing
line 3 Lorem ipsum dolor sit amet, consectetur adipiscing"""
)
# Triple-quote with backslash, and manually dedent, also gross.
print(
"""\
line 1 Lorem ipsum dolor sit amet, consectetur adipiscing
line 2 Lorem ipsum dolor sit amet, consectetur adipiscing
line 3 Lorem ipsum dolor sit amet, consectetur adipiscing"""
)
# Rely on auto-concatting strings and manual newlines, meh
print(
"line 1 Lorem ipsum dolor sit amet, consectetur adipiscing\n"
"line 2 Lorem ipsum dolor sit amet, consectetur adipiscing\n"
"line 3 Lorem ipsum dolor sit amet, consectetur adipiscing"
)
# textwrap module with starting backslash, meh but getting there
print(
textwrap.dedent(
"""\
line 1 Lorem ipsum dolor sit amet, consectetur adipiscing
line 2 Lorem ipsum dolor sit amet, consectetur adipiscing
line 3 Lorem ipsum dolor sit amet, consectetur adipiscing"""
)
)
# textwrap module with starting/trailing backslash, ugh
print(
textwrap.dedent(
"""\
line 1 Lorem ipsum dolor sit amet, consectetur adipiscing
line 2 Lorem ipsum dolor sit amet, consectetur adipiscing
line 3 Lorem ipsum dolor sit amet, consectetur adipiscing\
"""
)
)
# textwrap module and strip() for leading space, noice.
print(
textwrap.dedent(
"""
line 1 Lorem ipsum dolor sit amet, consectetur adipiscing
line 2 Lorem ipsum dolor sit amet, consectetur adipiscing
line 3 Lorem ipsum dolor sit amet, consectetur adipiscing
"""
).strip()
)
function_that_prints_things()
49. Decorators are pretty easy
You’ve no-doubt seen decorators, but may not realise that they’re quite easy to create. Let’s say we wanted a decorator that prints how long a function takes to run.
from functools import wraps
from timeit import default_timer
# The definition of the decorator
def time_function(func):
# Use wraps() to keep the original function signature
@wraps(func)
def wrapped(*args, **kwargs):
# Start a timer
start_time = default_timer()
# Run the function, store the result
result = func(*args, **kwargs)
# Calculate/print the elapsed time
elapsed = default_timer() - start_time
print(f"{func.__name__}() ran in {elapsed:g}s")
return result
return wrapped
# Using the decorator
@time_function
def do_something():
return sum(x for x in range(1_000_000))
do_something()
# This prints: do_something() ran in 0.016040s
There’s quite a lot to dig into there, but once you’ve done a few decorators you’ll wonder how you survived with them. Or maybe not. I don’t know what sorts of things you wonder about. What sorts of things do you wonder about?
Decorators can be added to function or class definitions.
50. Create a formatting function
In some cases, in order to define the formatting of a value, you’ll need to provide a function (e.g. with Pandas display.float_format
). You could write a lambda for this, but the better way is to leverage the .format
method that all strings have.
format_percent = "{:.1%}".format
assert format_percent(0.1234) == "12.3%"
Here are the docs for str.format.
51. Getters and setters are easy
Creating a getter property for a class is as easy as adding the @property
decorator to a method.
Then, through some sort of witchcraft, you can create a setter using the getter’s name as a decorator.
class MyClass:
_my_prop = 77
@property
def my_property(self):
return self._my_prop
@my_property.setter
def my_property(self, new_value):
self._my_prop = new_value
The best practice is to only use getters for operations that are fast, and stick to methods in other cases. So I tend to use them as a shortcut for otherwise verbose code that has to reach down into nested objects, or if I want to validate the setting of a value.
Docs for the @property built-in.
52. dict(key=val)
You can type out a dictionary literal with strings for keys, or save typing all those quote marks and use the dict()
constructor. The two configs below are the same:
config1 = {
"OptionA": True,
"OptionB": 33,
"OptionC": "yes",
}
config2 = dict(
OptionA=True,
OptionB=33,
OptionC="yes",
)
assert config1 == config2
Using the dict(key=val)
style limits what you can have as keys (e.g. strings only, no spaces) but I find it easier to read and type.
53. SimpleNamespace
Speaking of ‘easier to type’. Accessing the values of a dictionary — with all those brackets and quote marks — requires quite a lot of right-pinky gymnastics. If you prefer dot-notation to access your values, you might like SimpleNamespace
s.
from types import SimpleNamespace
config = SimpleNamespace(
OptionA=True,
OptionB=33,
OptionC="yes",
)
assert config.OptionB == 33
I share this here as an interesting alternative to dicts, but personally I never use them. Both PyCharm and VS Code fail to provide auto-complete for the attributes, and you can’t iterate over the keys or values (since they’re really class attributes). You can access them as a dict
using vars(config)
though.
Instead, I prefer…
54. Dataclasses
These are great, they save you from having to type a bunch of repetitive code in an __init__
function to map arguments to attributes, like self.name = name
, etc.
from dataclasses import dataclass
@dataclass
class Animal:
name: str
kind: str
age: int = None
a_pet = Animal(name="Philbert", kind="fish", age=402)
Since this creates the __init__
function for you, you can’t also write your own; instead you can use __post_init__
.
There is a lot more to dataclasses, check out the docs for more.
55. NamedTuples
These are also great! They’re particularly useful as return values for functions. Instead of returning a tuple of several values, return a named tuple:
from typing import NamedTuple
class Things(NamedTuple):
values: list[float]
indices: list[int]
def get_things():
values = [1.2, 3.4, 5.6]
indices = [1, 2, 3]
return Things(values, indices)
things = get_things()
assert things.indices == [1, 2, 3]
This way, if the consumer of a function only wants one part of the returned tuple, they can be explicit with code like get_things().indices
.
Another nice feature of the named tuple is that you can treat it like a regular tuple. So from the above code, values, indices = get_things()
would also work, as would indices = get_things()[1]
. This means if you have a function that returns a tuple, you can upgrade it to return a named tuple without breaking existing code.
56. Enums can have methods
This code probably speaks for itself:
from enum import Enum
class ResponseCode(Enum):
FINE = 200
ALSO_FINE = 203
NOT_FINE = 500
def is_ok(self):
return 200 <= self.value < 300
assert ResponseCode(203).is_ok()
assert not ResponseCode(500).is_ok()
Here’s Enum in the docs. Note that there were some significant changes in 3.11, so pay attention to what version of the docs you’re looking at.
Bonus tip: there’s an HTTPStatus
enum built into the http
module that has all the status codes. (Fun fact inside a bonus tip: I was beginning to question whether I really needed to read every single word of the docs, when I came across something that made it all worthwhile. HTTP status code 418: IM_A_TEAPOT
).
57. Context managers are pretty easy
You’ve probably used a context manager (e.g. with open()
), but perhaps you don’t know that they’re really very easy to create yourself; they’re nothing more than a class with __enter__
and __exit__
methods.
Let’s say you wanted to know how long some code takes to run. You can ‘wrap’ it in a context manager (a with
statement) that starts a timer, runs the code, then prints the elapsed time.
from dataclasses import dataclass
from timeit import default_timer
@dataclass
class timer:
name: str
def __enter__(self):
self.start_time = default_timer()
def __exit__(self, *args):
elapsed = default_timer() - self.start_time
print(f"{elapsed} ⮜ {self.name}")
# Using the context manager
with timer("Make a big number"):
sum(x for x in range(1_000_000))
Pro-tip: if you’re printing times and want all times to be printed in a consistent format, use datetime.timedelta
:
elapsed = timedelta(seconds=default_timer() - self.start_time)
print(f"{elapsed} ⮜ {self.name}")
This makes comparing different times easier. For example I’ve got a data-processing pipeline logging times:
⏱ 0:00:00.157739 ⮜ join_with_stores
⏱ 0:00:01.175930 ⮜ fill_missing_days
⏱ 0:00:00.693886 ⮜ add_school_zones
⏱ 0:00:06.870284 ⮜ add_holidays
⏱ 0:00:08.036416 ⮜ add_features
⏱ 0:00:00.922283 ⮜ fill_closed_periods
⏱ 0:00:03.338278 ⮜ smooth_outliers
If this was a mixture of ‘seconds’ and ‘milliseconds’ with varying decimal places, it would be harder to visually scan and see what’s taking the longest, and the right side wouldn’t be aligned.
58. Run code after a function returns
If you’ve ever wanted to define some code inside a function and have it run after the function returns, you may like ExitStack
.
from contextlib import ExitStack
def any_function():
with ExitStack() as stack:
stack.callback(print, "After function return")
print("About to return...")
return 7
If you’re doing this frequently, a decorator might be a better option.
59. Unpack an zip/tar file with shutil
There are modules for zipfile and tarfile, but if you want to unpack a compressed file of any supported type, you can do this:
import shutil
shutil.unpack_archive("my_archive.tar.gz", extract_dir="unpacked")
This will work out which module to use based on the file extension. You can type shutil.get_unpack_formats()
to see the mappings from file extension to format.
unpack_archive in the shutil docs.
60. Combine raw and formatted strings
If you want to insert a variable into a string, you use f"..."
, if you want to write a raw string that doesn’t treat a backslash as anything special, you use a raw string r"..."
. If you want to do both, no problem, just do rf"..."
— this is useful when creating regex patterns.
import re
search_term = "hat"
results = re.findall(rf"\b{search_term}\b", "the cat shat in my hat")
assert results == ["hat"]
61. You don’t need to .compile() your regex
I see a lot of code that first uses re.compile()
to create a regular expression object before calling a method like findall()
. This is not necessary.
The re
module internally caches the most recent patterns (512 of them as of Python 3.10), so unless you have an application using (and reusing) a lot of unique regular expressions, you’ve got nothing to gain by using re.compile()
.
See the note under re.compile() in the re docs for more.
62. Extract multiple values at once with RegEx
The .group
method can take multiple values to access multiple matches at once.
import re
key, val = re.match("(.*): (.*)", "this: that").group(1, 2)
assert key == "this"
assert val == "that"
In this case, there only are two groups so we could have used .groups()
:
key, val = re.match("(.*): (.*)", "this: that").groups()
Or if you like ugly regexes, you can name your capture groups and extract them as a dict. Here the names key
and val
are defined in the regex.
import re
my_dict = re.match("(?P<key>.*): (?P<val>.*)", "this: that").groupdict()
assert my_dict["key"] == "this"
assert my_dict["val"] == "that"
63. Adjacent strings collapse
If you’ve got two adjacent strings (separated by whitespace, which includes newlines), they’ll automatically be combined into one string. This can be useful for long messages, just don’t forget the spaces at the end of the string.
assert start_angle > 2, (
"There are so many things wrong with what you've done, "
"I don't even know where to begin. "
"For one thing, your start angle is too small. "
"I mean, whatever were you thinking? "
"Have you got the vapours? "
"Why must you always be like this?"
)
And to borrow an example from the Lexical analysis page of the docs, this is also useful for commenting regexs:
import re
re.compile(
"[A-Za-z_]" # letter or underscore
"[A-Za-z0-9_]*" # letter, digit or underscore
)
Reminder: you don’t need to compile your regexes just because the authors of the docs seem to like doing it.
Fun fact: PEP 3126 proposed to remove the string collapsing behaviour, but it was rejected.
64. Use sentinel objects to detect unprovided args
Imagine you have a function and would like to differentiate between an arg being provided with the value None
and not being provided at all. You can do this with a ‘sentinel’ object, like so:
_sentinel = object()
def do_something(arg=_sentinel):
if arg is _sentinel:
return "Arg not provided"
else:
return f"Got {arg}"
assert do_something(None) == "Got None"
assert do_something(10) == "Got 10"
assert do_something() == "Arg not provided"
The idea is that the caller of the function isn’t going to accidentally pass that exact _sentinel
object, so if that’s what the value of arg
is, you can infer that arg
was not provided.
This sometimes goes by the name missing
or undefined
.
This is not exactly a ‘feature’ of Python, but is a pattern used a lot in the source code and even gets a mention in the FAQ.
65. Capturing import times
If you want to know where time is being spent in your imports, you can run a file with the importtime
flag.
python -X importtime my_file.py
Or do the same for a specific command with -c
:
python -X importtime -c 'import numpy'
Although I wanted to keep this post free of third party packages, if you’re being a good citizen and trying to make your code load faster, you’ll probably want to visualize the resulting logs. For this, I’m a fan of the tuna
package.
python -X importtime
will output to stderr
, so to capture this and send it to a file, use 2>
(Windows or Linux), then view that file with tuna
:
python -X importtime my_file.py 2> import_times.log
tuna import_times.log
This will give you a flame chart showing where time is being spent.
Here’s importtime in the command line docs.
66. Reference a class before it exists
Let’s say you’ve got a structure that’s recursive. That is, it can contain children with the same type as itself. You can do this by putting the type in quotes, as with the type of children
below, which is a list of the things that children
is a child of.
from dataclasses import dataclass
@dataclass
class Node:
name: str
children: list["Node"] = None # "Node" refers to the Node class
tree = Node(
name="A parent",
children=[
Node("A child"),
Node("Another child"),
],
)
If you’re not fond of these quote marks, you can reach into the future and fix that with annotations
:
from __future__ import annotations
from dataclasses import dataclass
@dataclass
class Node:
name: str
children: list[Node] = None # No more quotes
tree = Node(
name="A parent",
children=[
Node("A child"),
Node("Another child"),
],
)
Normally you can expect things in the __future__
to at one point or another become things in the present. However the release notes for 3.11 mentioned that this has been put on hold indefinitely.
67. Type your strings with Literal
If you’ve got a function parameter that accepts one of several strings, you can type it with Literal
to make life easier for consumers.
from typing import Literal
def do_something(color: Literal["Red", "Blue", "Green"]):
...
Both PyCharm and VS Code will warn if an invalid string is provided.
PyCharm will show the options in a tooltip:
VS Code goes a step further and gives autocomplete suggestions:
68. Final: prevent a value from being changed
A variable declared as Final
can’t be changed.
from typing import Final
MY_CONST: Final = 12
Well, actually it can, this is just a type hint, but a type checker will complain.
This not only works for top-level variables, but class attributes too. So if you want a class to have an attribute with a particular value, and be sure that even subclasses can’t change this, mark it as Final
.
69. Type hints without assignment
Let’s say you’re iterating over some un-typed data from an external source (API, JSON file, etc). You can define the type of a variable on its own line, like so:
for row in get_some_data():
row: tuple[int, str, list[str]]
That second row doesn’t do anything, other than tell the typing machinery what sort of thing you’re dealing with, so you get the right autocomplete options. For example, it knows that name
is a str
in the below:
Side-note: this is PyCharm being quietly brilliant and working out which method I probably want to use (‘upper’) based on the name I gave my variable.
70. Class methods returning Self
Imagine you have a class method that returns self
, to allow for method chaining. You can type this return value using a TypeVar
, by convention called Self
. This even behaves correctly through inheritance.
from typing import TypeVar
Self = TypeVar("Self")
class BaseEstimator:
def fit(self: Self) -> Self:
# Do stuff
return self
class Estimator(BaseEstimator):
def score(self: Self) -> Self:
# Do stuff
return self
If we create an instance of Estimator
and call fit()
, the system knows that although that method is defined on BaseEstimator
, the return value Self
refers to Estimator
and I see the appropriate completions:
This pattern is so useful that Self
was added to the typing
module in 3.11. See more in the docs.
71. Convert timedelta into a specific unit
Let’s say you’ve calculated the difference between two datetimes and you want to know “who long was that, in days, as a float?”. You can divide any timedelta
by another timedelta
of a specific duration (such a 1 day).
from datetime import datetime, timedelta
the_incident = datetime(2010, 11, 3)
time_since_the_incident = datetime.now() - the_incident
days_since_the_incident = time_since_the_incident / timedelta(days=1)
assert isinstance(days_since_the_incident, float)
4,612 days without laughing, time flies!
Pro-tip: if you just want to know the value in seconds, you can use time_since_the_incident.total_seconds()
, but not time_since_the_incident.seconds
which is something else.
You can even do divmod(time_since_the_incident, timedelta(days=1))
and get an int
and a timedelta
back, if that’s what floats your boat.
More about timedelta in the datetime docs.
72. Don’t check for ints with isinstance(x, int)
A check for int
is typically not what you want. For example NumPy can produce values that very much look like an int
or a float
but aren’t. So if you want to know whether or not you can consider a value to be an integer, the safe bet is to check for numbers.Integral
.
from numbers import Integral
import numpy as np
for item in np.array([1, 2, 3]):
assert isinstance(item, Integral)
assert not isinstance(item, int) # These aren't really ints!
73. Number is a tricky concept
Checking if a variable is a ‘number’ is not as straightforward as one might hope. By now you know that checking for int
or float
is overly restrictive. If you look in the docs at the numbers
module you will see this beacon of hope:
numbers.Number
: The root of the numeric hierarchy. If you just want to check if an argumentx
is a number, without caring what kind, useisinstance(x, Number)
.
So surely that’s how you check if something’s a number, right?
Well, here’s a fun quiz: is it possible for the following code to raise an error on the return
line?
from numbers import Number
def compare_numbers(a, b):
if isinstance(a, Number) and isinstance(b, Number):
return a < b
The answer is yes, because a Number
object might be complex and asking if one complex number is less than another is like asking if cheese is greater than turtle.
If you aren’t familiar with the various types of numbers, don’t want to be, and want a general rule, I’d say that numbers.Real
is the closest thing to what you’d refer to as a number in everyday language. To be a bit more rigorous, you should consider the operations are you about to perform on the number and look through the hierarchy of types in the numbers docs to see at which level those operators are implemented.
So now you’re all set … except … another quiz: can this function raise an error?
from numbers import Real
def to_int(a):
if isinstance(a, Real):
return int(a)
The answer is of course yes. float("nan")
is technically a “real” number, but can’t be converted to int
and will raise an error.
A more Pythonic approach is the principle of EAFP (Easier to Ask Forgiveness than Permission). That is, use a try
/except
.
def to_int(a):
try:
return int(a)
except (ValueError, TypeError):
return None
But then someone comes along and throws infinity at the function which gets you an uncaught OverflowError
. So to implement EAFP, you either have to know in advance every possible exception that a function can raise (and even in this very simple case of a single, built-in function it isn’t documented) or resort to except Exception
and have people tell you this is too broad and that you’re a bad programmer.
I’ve digressed a little, but the point to all this is that number types are tricky and you should take the time to think through all the edge cases and proceed with caution. And write tests!
74. Fractions: interesting
If you’re dealing with fractions and don’t want to run into floating point troubles, you can use a Fraction
. You need to be careful though that you don’t create floating point errors before even passing the value to Fraction
.
from fractions import Fraction
assert Fraction("1/10") + Fraction("2/10") == Fraction("3/10")
assert Fraction(1, 10) + Fraction(2, 10) == Fraction(3, 10)
assert Fraction(1 / 10) + Fraction(2 / 10) != Fraction(3 / 10) # Not equal!
75. Decimals: interesting
A similar story to Fraction
s, Decimal
s can work around floating point problems, but again you need to be careful:
from decimal import Decimal
assert Decimal("0.1") + Decimal("0.2") == Decimal("0.3")
assert Decimal(0.1) + Decimal(0.2) != Decimal(0.3) # Not equal!
There’s a lot more to decimals, in the docs.
76. Euclidean distance with math.dist()
import math
assert math.dist([0, 0], [3, 4]) == 5
This is not limited to the 2-dimensional case, the arguments can be any vector.
77. Cache your functions
Let’s say you have a function that searches for matching strings. It responds to a user’s input as they type. So a search for ‘abc’ is really a search for ‘a’, followed by a search for ‘ab’, then a search for ‘abc’.
Once you’ve already searched for ‘a’, you might as well use those results as a starting point when it comes time to search for ‘ab’ (we don’t need to search through all the strings that don’t have ‘a’ in them), and when we search for ‘abc’, we only need to search in the results of the search for ‘ab’.
One way to implement this is using cache (or to be more direct: one way to demonstrate @functools.lru_cache
is to implement this concept.)
from functools import lru_cache
all_words = ["a", "ab", "abc", "abcd", "butter"]
class Finder:
matched_words = []
@lru_cache # Cache calls to this method
def filter_words(self, text):
return [word for word in self.matched_words if word.startswith(text)]
def find(self, text):
self.matched_words = all_words
if len(text) > 1:
self.matched_words = self.filter_words(text[:-1]) # will hit cache
self.matched_words = self.filter_words(text)
return self.matched_words
finder = Finder()
# Simulate a user typing a-b-c
finder.find("a")
finder.find("ab")
finder.find("abc")
assert (
str(finder.filter_words.cache_info())
== "CacheInfo(hits=2, misses=3, maxsize=128, currsize=3)"
)
So when we call finder.find("abc")
(knowing that we’re handling user-typed input on each keystroke) we assume that we’ve already searched for "ab"
and first call that, which is fast since it’s cached, and results in a much smaller search space for the search for "abc"
.
Note the assert at the end showing that our cached method has a cache_info()
method attached to it with some useful info about hits and misses. There’s also a cache_clear()
method which does what you think it does.
I’ve found that having cache as a mental tool in my toolbox sometimes results in quite different solutions. If I’m struggling to come up with an elegant, performant solution to a problem, I’ll think “what would an elegant, non-performant solution look like” and then ask if cache could be used to make it fast without causing memory use issues.
The docs have a much simpler example and more details.
78. Shelve
shelve
is a quick way to store some data on disk. Much like a salad car, it uses pickle under the hood so you can store anything that pickle can store (including not lambdas).
import shelve
with shelve.open("my_shelf") as shelf:
shelf["config"] = dict(port=4000, host="localhost")
# later
with shelve.open("my_shelf") as shelf:
port = shelf["config"]["port"]
assert port == 4000
The shelve docs will appear in front of your eyes if you click this underlined text.
79. Compare values in tuples
You can ask if one tuple is larger than another. Particularly useful when checking version info:
import sys
assert sys.version_info < (4, 7)
Be careful though, the exact rules for comparing sequences are tricky. To quote the Expressions docs: “Collections that support order comparison are ordered the same as their first unequal elements (for example, [1,2,x] <= [1,2,y]
has the same value as x <= y
). If a corresponding element does not exist, the shorter collection is ordered first (for example, [1,2] < [1,2,3]
is true).”
In the case of sys.version_info
, which contains more than just the version values, stick to <
comparisons and you’ll be fine.
80. Pool makes multiprocessing easy
I’ll preface this by saying that if you’re doing multi-threading, you should probably just use a package like joblib
. With that said, some parallel operations are surprisingly easy. Let’s say you’ve got a big 2D matrix and you want the mean of reach row.
import statistics
from multiprocessing import Pool
data = [
[1, 2, 3, 4],
[5, 6, 7, 8],
[7, 6, 5, 4],
]
with Pool() as pool:
row_means = pool.map(statistics.fmean, data)
assert row_means == [2.5, 6.5, 5.5]
By default, Pool
will use all your available CPU cores. But this (or any parallelism) doesn’t magically make everything faster. On my machine, the overhead is about 100ms, so anything that runs faster than that isn’t worth parallelising.
Read more about Pool
in the multiprocessing docs.
81. Share across processes
Imagine you have a long-running process, keeping some sort of state, and you would like to be able to reach into that process to inspect the state without interrupting it. One way to achieve this is with shared memory.
This first block of code represents the main, long-running process. Where arr
is the object we’d like to be able to take a peek at from another process.
import time
import numpy as np
from multiprocessing import shared_memory
arr_base = np.arange(30) # Create an object of the appropriate size
my_memory = shared_memory.SharedMemory(
create=True,
size=arr_base.nbytes, # Use this size when creating the shared memory
name="my_shared_memory", # Remember this name
)
arr = np.frombuffer(my_memory.buf) # Make a NumPy array backed by this buffer
arr[:] = arr_base[:] # Copy the original data into shared memory
# Some long-running process that periodically changes state
for i in range(len(arr)):
print("Tick", i)
arr[i] = 77
time.sleep(1)
And this is the code that we’d run (e.g. in a different shell on the same machine) to access and print the current arr
:
import numpy as np
from multiprocessing import shared_memory
my_shared_memory = shared_memory.SharedMemory(name="my_shared_memory")
arr = np.frombuffer(my_shared_memory.buf)
print(arr) # Prints [77, 77, 77, ...]
Please, for the love of Enya don’t go copy-pasting this code blindly, it’s the absolute bare minimum to demonstrate the magic of shared memory. You’ll want to do some docs reading in order to get something working robustly.
And of course this is a problem created to fit the solution, I’m not suggesting shared memory is the smartest way to access the state of a running process.
82. Respect robots.txt with robotparser
If you’re doing some web scraping and want to be a good human, you may wish to respect robots.txt
. There’s a module for that: urllib.robotparser
. In theory, it works like this:
from urllib.robotparser import RobotFileParser
robo_parser = RobotFileParser()
robo_parser.set_url("https://medium.com/robots.txt")
robo_parser.read()
can_fetch_trending = robo_parser.can_fetch(
useragent="*",
url="/trending",
)
can_hit_api = robo_parser.can_fetch(
useragent="*",
url="/_/api/users",
)
In practice, RobotFileParser
tries urllib.request.urlopen(url)
to open the robots.txt
and if that fails it will silently swallow the error, then always return False
for any call to can_fetch
.
Dodgy.
As it turns out, medium.com
will give you a 403 if you don’t pass a user-agent, so the real solution is a bit nastier:
import urllib.request
from urllib.error import HTTPError
from urllib.request import urlopen
from urllib.robotparser import RobotFileParser
robo_parser = RobotFileParser()
robo_parser.set_url("https://medium.com/robots.txt")
try:
urlopen(robo_parser.url) # OK you can use RobotFileParser
robo_parser.read()
except HTTPError:
# Try getting the file manually, and passing to parse
response = urlopen(
urllib.request.Request(
url=robo_parser.url,
headers={"User-Agent": "David"},
)
)
robo_parser.parse(response.read().decode("utf-8").splitlines())
can_fetch_trending = robo_parser.can_fetch(
useragent="*",
url="/trending",
)
can_hit_api = robo_parser.can_fetch(
useragent="*",
url="/_/api/users",
)
assert not can_fetch_trending # Not Allowed
assert can_hit_api # Allowed
This isn’t great but you can wrap it up as a PatchedRobotFileParser if you’re doing this often, or use a third party package.
83. Set the log level of 3rd party packages
By default, Python runs with the log level set to WARNING
, so you won’t see INFO
messages logged. But some 3rd party packages have a nasty habit of setting their logger log level to INFO
(which is both illogical and annoying) and spamming your output.
You can fix this by reaching into any logger and setting the level.
import logging
logging.getLogger('package_name').setLevel(logging.WARNING)
There’s more a package can do to break logging best practices, stay tuned for a post on taking control of all aspects of logging…
84. Inspect the arguments of a callback
Let’s say you’ve got a function that takes a callback, and you would like to perform different actions based on the types of the callback’s parameters.
You can inspect a function’s parameters with the inspect
module, like so:
import inspect
def do_with_callback(callback):
# Extract information about the callback's parameter types
signature = inspect.signature(callback)
arg_types = [val.annotation for val in signature.parameters.values()]
if arg_types == [str, str]:
# Caller is expecting two strings, give 'em two strings
return callback("A string", "another string")
if arg_types == [int, str]:
# This one will match the callback defined below
return callback(77, "A string")
raise ValueError(f"Unsupported callback signature {arg_types}")
# Dummy callback with types [int, str]
def my_callback(num: int, text: str):
return num, text
# Call the function, passing a callback
assert do_with_callback(my_callback) == (77, "A string")
Admittedly this code is unusual, program flow is not normally affected by the types of function parameters (as opposed to the types of the passed arguments), so I would think twice before using this in a public-facing API.
There’s a lot more the inspect module can do, here’s the docs.
85. breakpoint() to drop into the debugger
You can write a breakpoint()
statement anywhere in your code to trigger the debugger (which is what you use when your code is buggered and you want to de-bugger it). This is a built-in function, so no imports required. Wrap it in an if
statement and tell your friends you made a ‘conditional breakpoint’.
I’ll often have something like this in the training loop of a machine learning model (which can run for hours, and is occasionally interesting to interrupt and inspect).
if Path("BREAK.me").exists():
breakpoint()
Then to interrupt the running script and drop into the debugger, I create a BREAK.me
file and it will pause on the next loop. Keep performance in mind when doing this, I’d suggest its more suited to experimentation than production code.
breakpoint() on the built-ins page, and the pdb (Python Debugger) docs.
86. PDB post mortem is magic
You run some code, an error is raised, and you want to know what was going on to cause the crash. Getting to the point where the crash occurred is a piece of cake, using pdb.pm()
Take this code:
def compare_numbers(first, second):
return first < second
compare_numbers(7, 1j)
If I run this, I’ll get an error. If I’m using an interactive interpreter (e.g. in PyCharm), I can then type import pdb; pdb.pm()
which will run ‘post mortem’ mode of the Python debugger (PDB), essentially reinstating the interpreter at the point where the error occurred. I can then run all the usual PDB commands to inspect what’s going on. In the picture below, I’ve run the command args
to list the arguments at the point of failure:
But what if you’re running a script in a shell/console/terminal? Well, you can’t do much once it’s crashed, but you can run it again with the -i
(interactive) flag, this keeps the Python interpreter running after the script has finished (or crashed):
Or you could run python -m pdb your_file.py
which will drop into post-mortem mode automatically if the program crashes.
There’s lots more to pdb
, and even though the debuggers in PyCharm and VS Code are pretty good, I’ve found that at times pdb
is the better choice (e.g. when the other debuggers struggle with multiple processes, are too slow, or I want to use post-mortem). I highly recommend getting to know some basic commands, like stepping back and forward, printing args, and displaying the source code where the debugger is stopped. It can even answer those existential questions that plague us all, like whatis self
?
87. random() will never return 0.05954861408025609
This last one is in no way meant to be useful information, just a final fun fact. You can multiply any random.random()
number by 2⁵³ and you’ll get an integer.
import random
assert (random.random() * 2**53).is_integer()
Now you know.
Hey there, we’re all done! Thanks for reading, have a good night’s sleep.