87 Lesser-known Python Features

This post is for people who use Python daily, but have never actually sat down and read through all the documentation.

43 min readJun 29, 2023
Did you know Python could do this? Created by DALL·E via Bing image creator

If you’ve been using Python for years and know enough to get the job done, it’s probably not a wise use of your time to read through several thousand pages of documentation just to maybe discover a handful of new tricks. So I’ve put together a short (by comparison) list of features that aren’t widely used, but are widely applicable to general programming tasks.

This list will of course contain some stuff you already know and have a sprinkling of things you’ll never use, but among them, I hope there are a few that are new and useful to you.

First, a bit of housekeeping:

  • I’ll assume you’re on at least Python 3.8. For the few features added after this, I’ll mention when they were added.
  • There’s no strict order to the list, but the more basic features are toward the start.
  • Wherever you see assert on this page, you can assume that the assert passes.
  • If you would like to quench my curiosity, you can highlight the titles of the ones you find useful.

Now, on to the list…

1. help(x)

help can take a string, an object, or anything. For class instances, this will collect methods from all parent classes, and show you what is inherited from where (and the method resolution order for methods).

This is also useful when you want help on something that’s hard to search for, like or. Typing help('or') will be much faster than trying to find the part of the Python docs that describes how or works.

2. 1_000_000

You can use an underscore as a thousand separator. This makes large numbers more readable.

x = 1_000_000

You can format a number in this way, too:

assert f"{1000000:_}" == "1_000_000"

3. str.endswith() takes a tuple

if filename.endswith((".csv", ".xls", ".xlsx"))
# Do something spreadsheety

The same is true for startswith()

Here's endswith in the docs.

4. isinstance() takes a tuple

If you want to check if an object is an instance of one of several classes, you don’t need multiple isinstance expressions, just pass a tuple of types as the second argument. Or — from Python 3.10 onwards — use a union:

assert isinstance(7, (float, int))
assert isinstance(7, float | int)

More on isinstance and union types.

5. … is a valid function body

I use ... to mean “I’ll get to this shortly” (before committing any changes) and pass to mean something closer to a no-op.

def do_something():
...

6. The walrus operator :=

The walrus operator (AKA assignment expressions) allows you to assign a value to a variable, but as an expression, which means that you can do something with the resulting value.

This is useful in if statements and even more useful with elif:

if first_prize := get_something():
... # Do something with first_prize
elif second_prize := get_something_else():
... # Do something with second_prize

Just remember that if you want to compare the result with something, you’ll need to wrap the walrus expression in parentheses like so:

if (first_prize := get_something()) is not None:
... # Do something with first_prize
elif (second_prize := get_something_else()) is not None:
... # Do something with second_prize

These can be used in plenty of places, to borrow an example from the original PEP:

filtered_data = [y for x in data if (y := f(x)) is not None]

See the Assignment expressions in the docs.

7. attrgetter and itemgetter

If you need to sort a list of objects by a specific property of those objects, you can use attrgetter (when the value you’re after is an attribute, e.g. for class instances) or itemgetter (when the value you’re after is an index or dictionary key).

For example, to sort a list of dicts by the "score" key of the dicts:

from operator import itemgetter

scores = [
{"name": "Alice", "score": 12},
{"name": "Bob", "score": 7},
{"name": "Charlie", "score": 17},
]

scores.sort(key=itemgetter("score"))

assert list(map(itemgetter("name"), scores)) == ["Bob", "Alice", "Charlie"]

attrgetter is similar, used when you would otherwise use dot notation. This can do nested access too, such as name.first in the example below:

from operator import attrgetter
from typing import NamedTuple

class Name(NamedTuple):
first: str
last: str

class Person(NamedTuple):
name: Name
height: float

people = [
Person(name=Name("Gertrude", "Stein"), height=1.55),
Person(name=Name("Shirley", "Temple"), height=1.57),
]

first_names = map(attrgetter("name.first"), people)

assert list(first_names) == ["Gertrude", "Shirley"]

There’s also a methodcaller that does what you might guess.

If you don’t know what a NamedTuple does, keep reading…

See the operator module docs for more info.

8. Operators as functions

All the familiar operators like +, < != have functional equivalents in the operators module. These are useful if you’re iterating over collections, for example looking for a mismatch between two lists using is_not:

import operator

list1 = [1, 2, 3, 4]
list2 = [1, 2, 7, 4]

matches = list(map(operator.is_not, list1, list2))

assert matches == [False, False, True, False]

Here’s a table of operators and their functional equivalents.

9. Sorting a dictionary by its values

You can use key to sort a dict by its values. In this case, the function that you pass to key will be called with each key of the dict and return a value, which is what the get method of a dictionary does.

my_dict = {
"Plan A": 1,
"Plan B": 3,
"Plan C": 2,
}

my_dict = {key: my_dict[key] for key in sorted(my_dict, key=my_dict.get)}

assert list(my_dict.keys()) == ['Plan A', 'Plan C', 'Plan B']

10. Create a dict from tuples

dict() can take a sequence of tuples of key/value pairs. So if you have lists of keys and values, you can zip them together to turn them into a dictionary:

keys = ["a", "b", "c"]
vals = [1, 2, 3]

assert dict(zip(keys, vals)) == {'a': 1, 'b': 2, 'c': 3}

The dict docs.

11. Combining dicts with **

You can combine dictionaries by using the ** operator to unpack the dicts into the body of a dict literal:

sys_config = {
"Option A": True,
"Option B": 13,
}

user_config = {
"Option B": 33,
"Option C": "yes",
}

config = {
**sys_config,
**user_config,
"Option 12": 700,
}

Here the user_config will override the sys_config since it comes later.

Dictionary unpacking in the docs.

12. Updating dicts with |=

If you want to extend an existing dict, without doing it one key at a time, you can use the |= operator.

config = {
"Option A": True,
"Option B": 13,
}

# later
config |= {
"Option C": 7,
"Option D": "bananas",
}

This was added in Python 3.9

13. defaultdicts, or setdefault

Let’s say you want to create a dict of lists from a list of dicts. A naive approach might be to create a placeholder dict, then check whether or not you need to create a new key at every step of the loop.

pets = [
{"name": "Fido", "type": "dog"},
{"name": "Rex", "type": "dog"},
{"name": "Paul", "type": "cat"},
]

pets_by_type = {}

# Bad code
for pet in pets:
# If the type isn't already a key,
# we'll need to create it with an empty list
if pet["type"] not in pets_by_type:
pets_by_type[pet["type"]] = []

# Now we can safely call .append()
pets_by_type[pet["type"]].append(pet["name"])

assert pets_by_type == {'dog': ['Fido', 'Rex'], 'cat': ['Paul']}

A cleaner way is to create a defaultdict. In the below, when I attempt to read a key that doesn’t exist, it will automatically create that key with the default value of an empty list.

pets_by_type = defaultdict(list)

for pet in pets:
pets_by_type[pet["type"]].append(pet["name"])

The downsides are you need to import defaultdict and if you want a true dict at the end you’d need to do dict(pets_by_type).

Another option is to use a plain dict with the oddly named .setdefault():

pets_by_type = {}

for pet in pets:
pets_by_type.setdefault(pet["type"], []).append(pet["name"])

Think of “set default” as being “get, but set with default when required”.

Bonus tip: you can create a dict that returns None if a key doesn’t exist with defaultdict(lambda: None).

Here’s default dict in the docs.

14. TypedDict for … typed dicts

If you want to define types for the values of your dict:

from typing import TypedDict

class Config(TypedDict):
port: int
name: str

config: Config = {
'port': 4000,
'name': 'David',
'unknown': True, # warning, wrong type
}

port = config["poort"] # warning, what's a poort?

Depending on your IDE (I use PyCharm) you’ll get the correct auto-complete suggestions (types aren’t just about robust code, they’re about being lazy and typing fewer characters.)

I find this particularly useful for adding types to dictionaries that come from other sources. If I want types for an object that I create, I would rarely use a dict.

TypedDict in the docs.

15. a // b does not always return an int

Let’s say you have a function that splits a list into a certain number of batches. Can you spot the potential bug in this code?

# Bad code
def batch_a_list(the_list, num_batches):
batch_size = len(the_list) // num_batches
batches = []

for i in range(0, len(the_list), batch_size):
batches.append(the_list[i : i + batch_size])

return batches

my_list = list(range(13))

batched_list = batch_a_list(my_list, 4)

This would raise a TypeError: batch_a_list(my_list, 4.)

The problem occurs because when num_batches is a float, then len(the_list) // num_batches also returns a float, which is not a valid index so raises an error.

Specifically, if either side of the // is a float, then both sides will be converted to floats and the result will be a float.

int(a / b) is the safer bet.

As mentioned in the Expressions docs, but not in a particularly clear way.

16. Circular references are fine without ‘from’

You can have two modules that each import the other, this is not an error. In fact, I think it’s kinda sweet.

Problems only arise when one of them does from other_module import something. So if you’re getting circular reference errors, consider dropping from from the problem imports.

Programming FAQ: What are the “best practices” for using import in a module?

17. Use ‘or’ to set defaults

Say you have a function with an optional argument. The default value should be an empty list, but you can’t do that in the function signature (all calls to the function would share the same list), so you need to set the default in the body of the function.

You don’t need a two-line check for None like I often see:

def make_list(start_list: list = None):
if start_list is None:
start_list = []
...

You can instead just use or.

def make_list(start_list: list = None):
start_list = start_list or []
...

If start_list is truthy (a non-empty list) it will be used, otherwise it will be set to an empty list.

This is only a reasonable approach if the function won’t be called with a value where bool(value) == False.

More on or in the docs.

18. Use default_timer as your … default timer

There are a lot of ways to get the time in Python. If your plan is to calculate the difference between two points in time, here are some incorrect ways to do it:

# Wrong
start_time = datetime.now()
start_time = datetime.today()
start_time = time.time()
start_time = time.gmtime()
start_time = time.localtime()

# Less wrong but will fail on Windows
start_time = time.clock_gettime(1)
start_time = time.clock_gettime(time.CLOCK_MONOTONIC) # same thing

You don’t want time to go backwards, but the first five above will allow such a situation (daylight savings, leap seconds, clock adjustments, etc.). Bugs caused by the belief that time can’t go backwards are rare but do happen and can be darn hard to track down.

The best way is this:

start_time = timeit.default_timer()

This is easy to remember and gives you a time that can’t go backwards.

As of Python 3.11, this points to time.perf_counter, so there are two more honorable mentions:

start_time = time.perf_counter()
start_time = time.perf_counter_ns() # nanoseconds

default_timer in the docs

19. ‘_’ in the interpreter: the last result

This is handy if you’ve performed some operation and a value has been output and now you want to do something with that output.

>>> get_some_data()  # Takes a while to run
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
34, 35, 36, 37, 38, 39])
>>> _.mean()
19.5

type(_) is another one I often use. Just beware that you can only use it once, because you’ll then have a new output, and that’s what the _ will then refer to. (For IPython users, you can type _4 to refer to Out[4] — docs).

The _ has other jobs too, see the lexical analysis docs for more, and also the next section…

20. *_ to gather unrequired elements

If you’re calling a function that returns many values in a tuple but you only want the first one, you could append [0] to the call, or ignore the other returned values with *_:

def get_lots_of_things():
return "Puffin", (0, 1), True, 77

bird, *_ = get_lots_of_things()

The * denotes iterable unpacking. The single underscore is a convention for an unused variable (most type/style checkers won’t give an ‘unused variable’ warning when the variable name is _).

This behaviour was defined in PEP 3132.

21. dict keys need not be strings

All sorts of things can be dict keys: functions, tuples, numbers — anything hashable.

Let’s say you wanted to represent a graph as a dict, where each key is a pair of nodes and the value describes the edge between them. You could do this with tuples, but if you want (a, b) to return the same value as (b, a) you’ll need a set. Sets aren’t hashable, but frozen sets are, so we have:

graph = {
frozenset(["a", "b"]): "The edge between a and b",
frozenset(["b", "c"]): "The edge between b and c",
}

assert graph[frozenset(["b", "a"])] == graph[frozenset(["a", "b"])]

Clearly, if you wanted to do this for reals you’d do something like extend dict:

class OrderFreeKeys(dict):
def __getitem__(self, key):
return super().__getitem__(frozenset(key))

def __setitem__(self, key, value):
return super().__setitem__(frozenset(key), value)

graph = OrderFreeKeys()
graph["A", "B"] = "An edge between A and B"

assert graph["B", "A"] == "An edge between A and B"

This second example uses capital letters for A and B because variety is the spice of life.

22. __call__ makes a class instance callable

If you want to create a function that has some sort of state, you could use a function in a function (creating a ‘closure’), or create a class with a __call__ method.

class LikeAFunction:
counter = 0

def __call__(self, *args, **kwargs):
self.counter += 1
return self.counter

func = LikeAFunction()

assert func() == 1
assert func() == 2
assert func() == 3

23. Format a number as a percentage

If you’ve got a value like 0.1234 that you want to display as 12.3%, no need to multiply it by 100, format as a float, and append a percentage sign, just use the % presentation type.

pct = 0.1234

assert f"{pct * 100:.1f}%" == "12.3%" # The old way
assert f"{pct:.1%}" == "12.3%" # The new way

Docs on the Format Specification Mini-Language.

24. Format a number with commas

Thousands separators are underused, says me. Just add :, after your value or optionally define the number of decimals to show, like so:

num = 1234.56

assert f"{num:,}" == "1,234.56"
assert f"{num:,.0f}" == "1,235"

Keep in mind that not everyone on earth uses the comma for a thousands separator, so a more robust approach is to use the user’s locale. But if you’re, say, producing an image of a chart so can’t format the values dynamically, a comma is better than nothing.

25. Format different scale numbers with ‘g’

If you want a format that will handle numbers from 1,000,000 to 0.0000001 with style, try g. By default, it will switch to scientific notation above/below certain thresholds:

assert f"{1e-6:g}" == "1e-06"
assert f"{1e-5:g}" == "1e-05"
assert f"{1e-4:g}" == "0.0001"
assert f"{1e-3:g}" == "0.001"
assert f"{1e-2:g}" == "0.01"
assert f"{1e-1:g}" == "0.1"
assert f"{1e0:g}" == "1"
assert f"{1e1:g}" == "10"
assert f"{1e2:g}" == "100"
assert f"{1e3:g}" == "1000"
assert f"{1e4:g}" == "10000"
assert f"{1e5:g}" == "100000"
assert f"{1e6:g}" == "1e+06"
assert f"{1e7:g}" == "1e+07"

You can configure the threshold, which is explained oh-so-simply in the docs.

26. Format a number as currency

You can format a number as a currency for a given locale without manually placing dollar signs and the like.

Simply set the locale… No just kidding, nothing about locale is simple. To quote the docs: “[excuses, excuses] … This makes the locale somewhat painful to use correctly”.

But, once you know what to do, and have worked out what locale to use, it’s really not so bad:

# Either
locale.setlocale(locale.LC_ALL, 'en_US.UTF-8') # Set a specific locale

# OR
os.environ['LANG'] = "en_US.UTF-8" # The next line relies on LANG
locale.setlocale(locale.LC_ALL, '') # Tell Python to use the 'preferred' locale, e.g. LANG

# Then you're good to go, lovely currency formatting
assert locale.currency(1234, grouping=True) == "$1,234.00"

Where do you find the exact string to set the locale for your country/currency/system? There’s a list in an undocumented attribute of the locale module, locale_alias. Here’s the list in the GitHub repo.

Or, on Linux and macOS, you can run locale -a to get a list of available locales. In PowerShell, it’s Get-WinUserLanguageList, although you might need to cross-reference those results against the locale_alias list to find the exact string.

27. Print an expression and its value with ‘=’

When formatting an expression as an f-string, adding = afterwards indicates that you want to see the expression and the result.

def get_value():
return 77

assert f"{get_value() = }" == "get_value() = 77"

You can do this with or without spaces around the = sign (this will be reflected in the output), and can add : and a format specifier after the equal sign.

Here’s a quick overview of f-strings and more detail on the Lexical analysis page.

28. Use Path for dealing with files

There are still tutorials out there suggesting the use of with open() to open files, but I think this is unnecessarily clunky for most cases. A lot of the time you can just use Path.read_text():

from pathlib import Path

file_contents = Path('my.log').read_text()

Alas, it needs an import, but Path objects have all sorts of neat properties, like testing for existence, automatic file-creation when writing, and joining paths in an OS-agnostic way. Here’s a smattering of examples:

import datetime
import json
from pathlib import Path

file_contents = Path("my.log").read_text() # one big string
file_lines = Path("my.log").read_text().splitlines() # list of lines
path = Path(__file__).parent / "config.json" # relative to this module

if not path.exists(): # Check for existence
path.parent.mkdir(parents=True, exist_ok=True) # Create directories
path.write_text("{}") # Writing creates a new file by default

config = json.loads(path.read_text()) # load/parse JSON
config["last_modified"] = str(datetime.datetime.utcnow())

path.write_text(json.dumps(config)) # save as JSON

There’s even path.glob() or its recursive sibling path.rglob() to find all files matching a pattern (where the path in question refers to a directory, not a file).

Docs for the pathlib module.

29. Booleans are ints

If you have a function that accepts either an int or a bool, you need to be careful when checking what you’ve got, because bools are ints.

def do_something(any_value):
if isinstance(any_value, int):
return f"I got your int {any_value}"

if isinstance(any_value, bool):
return f"I got your bool {any_value}"

assert do_something(7) == "I got your int 7"
assert do_something(True) == "I got your int True" # Not bool!

So just remember that more specific checks should come first, which in this case means switching the order of the if statements.

30. Lazy load modules

Python modules support a top-level __getitem__ function that you can use to lazy-load submodules. For example, I have my own collection of data science utilities, which are handy, but if I import them all, (and all the third-party packages they use), it takes 4 seconds to load. And I just don’t have that sort of free time.

The clunky approach is to require sub-modules to be explicitly imported before being used.

The smart thing to do is to allow dot-notation access all the way from the top module, but wait until a module is needed before loading it. The following goes in the top-level __init__.py file.

__all__ = ["data", "mpl", "pd", "pl", "pt", "sk", "vega"]

def __getattr__(name):
if name in __all__:
return importlib.import_module(f".{name}", __name__)

raise AttributeError(f"module {__name__!r} has no attribute {name!r}")

I still only need a single import utils; when I do utils.pt.some_helper() it will load the file utils/pt.py (which loads PyTorch which takes quite a while). If I never reference utils.pt, then it won’t load that module.

The __all__ statement is required for typing/auto-complete.

By the way, !r on the last line above is a way to print quote marks around values in a format string. It calls repr.

This lazy-loading approach reduced the loading time for my utils from 4 seconds to 0.4 seconds. That’s, like, at least 9 times faster.

Package authors, please do this!

See Customizing module attribute access in the docs for more.

31. divmod is // and %

Let’s say you want to iterate over all the x/y coordinates of a grid. divmod is a handy tool for this.

grid = [divmod(x, 3) for x in range(2 * 3)]

assert grid == [
(0, 0),
(0, 1),
(0, 2),
(1, 0),
(1, 1),
(1, 2),
]

The above is a 2 x 3 grid, but it also works for 7 x 4 grids, and perhaps even others!

You can picture this as divmod restricting movement to a certain number of columns. So divmod(5, 3) turns the 5 into row 1, column 2.

There are two other ways to achieve this that are worth a mention:

import itertools

grid = [divmod(x, 3) for x in range(2 * 3)]

assert grid == list(itertools.product(range(2), range(3)))
assert grid == [(row, col) for row in range(2) for col in range(3)]

32. Lists of month and day names

The calendar module has lists of month and days names. This can be useful for sorting by day name (e.g. in a chart axis) without having to first convert to a day-of-week integer.

import calendar

# A list of days you would like to have sorted
days = ["Tuesday", "Monday", "Saturday", "Monday"]

day_names = list(calendar.day_name)
days.sort(key=day_names.index)

assert days == ['Monday', 'Monday', 'Tuesday', 'Saturday']

Here’s day_name in the calendar docs, there’s also month_name, and abbreviated versions of both.

33. Counter: it counts things

Let’s say you wanted to count the occurrences of each word in a text. One way is to print them out and use a pen and paper to keep track. An equally bad approach is to manually create a dict where each word is a key, and the value is the count of that word, then loop over all the words, incrementing counters.

The best way is to use Counter.

import re
from collections import Counter

words = re.findall(
pattern=r"[\w'-]+(?<!\d)",
string="I'll e-mail François from the café via e-mail! Hey, qu'est-ce?",
)

word_frequencies = Counter(words)

assert word_frequencies.most_common(1)[0] == ("e-mail", 2)

Here’s Counter in the collections docs.

34. getpass(), not input()

If you’re creating a CLI interface and need to prompt a user to provide their password, don’t use input(). Do this instead:

from getpass import getpass

password = getpass("What's your password?")

There’s also a getuser() function which gets the user’s name without even asking them.

getpass in the docs.

35. Identifiers don’t need to be Ascii

You can use many (but not just any) unicode characters in identifier names.

from math import tau as τ, exp, sqrt

def normal_dist_pdf(μ, σ, x):
return exp(((x - μ) / σ) ** 2 / -2) / (σ * sqrt(τ))

I am not condoning this behaviour, only stating that it is possible. Clearly it’s a pain to type and modify such code, but if you have a function with a high read-to-modify ratio, and using the correct symbols makes it easier to comprehend (and you’re fond of weirdness), go for it.

Full details of what’s allowed.

36. In-place printing with \r

With print, you can set end="\r" so that the next print will overwrite what you’ve printed, because \r means return to the beginning of the line.

I find this approach useful in loops where I want to print progress, but don’t want a wall of logs.

import time

total_steps = 17

for step in range(total_steps):
print(f"Processing {(step+1)/total_steps:.0%}", end="\r")
time.sleep(0.1)

I wouldn’t ship this in package code, because it can be messy, and doesn’t behave well everywhere, but for quick and dirty progress-logging it works a treat.

37. atexit as a decorator

If you need to run some code, even after a crash, use atexit. The easiest way to implement this is to decorate a function with @atexit.register.

import atexit

def do_raise():
raise ValueError("Oh no, we crashed")

@atexit.register
def cleanup():
# Close connections, persist settings, etc
print("Finished in style regardless")

do_raise()

This will run, raise an error, and then print "Finished in style regardless". (Note though this might not fire in something like the PyCharm Python Console).

The atexit module docs.

38. Functions can have attributes

def multiply(a, b):
return a * b

multiply.test_cases = [
((2, 3), 6),
((0, 1), 1),
((4, 4), 16),
]

I used this just the other day for a package that expected a callable class instance with a name attribute. I already had a function that did the job, so I just added my_func.name = "MyFunc" to appease the third-party package.

39. zip(*list) for transposing

This is useful if you’ve got data in some 2D structure and you want to extract the columns into variables.

combined = [
["this", [1, 2, 3]],
["that", [6, 5, 4]],
]

labels, values = zip(*combined)

assert labels == ("this", "that")
assert values == ([1, 2, 3], [6, 5, 4])

If you find it hard to wrap your head around what’s happening here, you’re not alone. My only advice is to tinker about until it clicks. zip is useful surprisingly often.

More about the built-in zip function.

40. http.server couldn’t be simpler

The below will start a simple http server, serving static files from wherever you run the command.

python -m http.server

On some systems (e.g. WSL) you may need to add --bind localhost.

The docs are very clear that this is not for production.

41. webbrowser.open()

If you want to open a URL in the default browser (as defined by the operating system):

import webbrowser

webbrowser.open("http://localhost:8080")

42. copy() doesn’t always really copy

To copy a list or dict you can write my_list.copy() or my_dict.copy(). So far, so good. But if you think that a change to the original version can’t affect the copy, I regret to inform you that you’re wrong.

The complicating factor is compound objects, for example a list of dicts. copy() will create a new list, but each dict in the list will be a reference to the same object in the original list.

The below shows the difference between a shallow copy and a deep copy (which really does copy) for a list-of-dicts.

from copy import deepcopy

my_list = [{"one": 1}]

# Create two types of copies
shallow_copy = my_list.copy()
deep_copy = deepcopy(my_list)

# Change the original
my_list.append({"two": 2})
my_list[0]["one"] = 77

# Look at the changes
assert my_list == [{"one": 77}, {"two": 2}]
assert shallow_copy == [{"one": 77}] # Mutated!
assert deep_copy == [{"one": 1}]

The copy module docs explain in more detail.

43. dict.keys() and set comparison

Three things: a) if you subtract one set from another, you’re left with the items that are only in the first set. b) you can use < to check if one set doesn’t have all the elements of another set. c) dict.keys() is set-like.

Combining these ideas we get the following example: if you’re expecting a dictionary with a certain set of keys, you can check and warn about missing keys.

user_info = {
"name": "Sam",
"age": 6,
}

required_keys = {"name", "age", "weight"} # A set

if user_info.keys() < required_keys:
print(f"Missing info:", *required_keys.difference(user_info.keys()))

44. Flatten a list with sum

You can combine two lists with +, a symbol that (you may have heard) also means ‘sum’. Extending this thought, you can combine many lists with the sum function, as long as you tell it you want to start with an empty list.

list_of_lists = [
[1, 2, 3],
[3, 4, 5],
[5, 6, 7],
]

flat_list = sum(list_of_lists, start=[])

assert flat_list == [1, 2, 3, 3, 4, 5, 5, 6, 7]

This is handy for flattening a list of a few lists, but if fast code is your jam and you have thousands of lists, then you should know it’s ten times faster to use a comprehension:

flat_list = [item for row in list_of_lists for item in row]

Nested for loops in comprehensions can be a bit of an eyeful, but if you remember that the for loops appear in the same order that they would if you wrote them out in their nested form, they’re not so bad.

itertools.chain(*list_of_lists) also does the job and is on par with a comprehension for speed.

Speaking of lists and fastness…

45. deque: the fast list, sometimes

If you have a list and find yourself inserting items at the start on a regular basis, a deque (pronounced ‘deck’) will do the same job in about 1% of the time.

import collections

my_list = []
my_list.insert(0, "left") # Slow as molasses

my_deque = collections.deque()
my_deque.appendleft("left") # Fast as molasses on a plane

Here’s deque in the collections docs. Side note: if you only read one page in the docs, make it the collections page. Followed by math, itertools, and functools. I choose to offer no rationale for this advice.

46. all and any

You can boil down a sequence of booleans into a single boolean with any() and all().

Let’s say you’ve got a list of messages, and a list of things that you care about, and you want to know if any of the things you care about is in the list of messages.

concerns = ["money", "time", "cat"]
messages = ["The cat is sick", "The dog ran away", "The fish is bloated"]

# One of the nine combinations is True, so any() returns True
assert any(concern in message for message in messages for concern in concerns)

If you’re a fan of the functional style, you can use itertools.product, itertools.starmap, and operator.contains to get the same result:

from itertools import product, starmap
from operator import contains

concerns = ["money", "time", "cat"]
messages = ["The cat is sick", "The dog ran away", "The fish is bloated"]

assert any(starmap(contains, product(messages, concerns)))

Here’s all and any on the built-ins page.

47. Optional does not make an arg optional

Let’s say you have a class with an optional argument. I sometimes see code like this, attempting to indicate that short_name is optional.

from typing import Optional

# Bad code
class Config:
def __init__(
self,
port: int,
host: str,
short_name: Optional[str] = None,
):
self.port = port
self.host = host
self.short_name = short_name

But Optional doesn’t mean optional (seriously!), it means None is allowed. Specifically, Optional[str] is shorthand for str | None. (Using the pipe operator to denote a union was added in Python 3.10.)

The below shows the difference between “required”, “required, None is acceptable”, and “not required”

from typing import Optional

class SomeClass:
def __init__(
self,
required: str,
required_none_is_ok: Optional[str],
not_required: str = None,
):
...

My preference is to not use Optional. If None is an acceptable value, I’ll almost always make it the default, thus making the argument optional. In rare cases where I want to force the user to provide a value, even if it’s None, I’ll use str | None rather than Optional[str] simply because Optional is confusing.

Here’s the docs on Optional explaining that “this is not the same concept as an optional argument”.

48. Six ways to print multi-line text

Let’s say you want to print a multi-line message, have each line start without an indent, and have it align nicely in the source code too. You have a few options.

I recommend the last one, the others are to show why this is a fiddly problem.

import textwrap

def function_that_prints_things():
# Triple-quote and manually dedent. Gross.
print(
"""line 1 Lorem ipsum dolor sit amet, consectetur adipiscing
line 2 Lorem ipsum dolor sit amet, consectetur adipiscing
line 3 Lorem ipsum dolor sit amet, consectetur adipiscing"""
)

# Triple-quote with backslash, and manually dedent, also gross.
print(
"""\
line 1 Lorem ipsum dolor sit amet, consectetur adipiscing
line 2 Lorem ipsum dolor sit amet, consectetur adipiscing
line 3 Lorem ipsum dolor sit amet, consectetur adipiscing"""
)

# Rely on auto-concatting strings and manual newlines, meh
print(
"line 1 Lorem ipsum dolor sit amet, consectetur adipiscing\n"
"line 2 Lorem ipsum dolor sit amet, consectetur adipiscing\n"
"line 3 Lorem ipsum dolor sit amet, consectetur adipiscing"
)

# textwrap module with starting backslash, meh but getting there
print(
textwrap.dedent(
"""\
line 1 Lorem ipsum dolor sit amet, consectetur adipiscing
line 2 Lorem ipsum dolor sit amet, consectetur adipiscing
line 3 Lorem ipsum dolor sit amet, consectetur adipiscing"""
)
)

# textwrap module with starting/trailing backslash, ugh
print(
textwrap.dedent(
"""\
line 1 Lorem ipsum dolor sit amet, consectetur adipiscing
line 2 Lorem ipsum dolor sit amet, consectetur adipiscing
line 3 Lorem ipsum dolor sit amet, consectetur adipiscing\
"""
)
)

# textwrap module and strip() for leading space, noice.
print(
textwrap.dedent(
"""
line 1 Lorem ipsum dolor sit amet, consectetur adipiscing
line 2 Lorem ipsum dolor sit amet, consectetur adipiscing
line 3 Lorem ipsum dolor sit amet, consectetur adipiscing
"""
).strip()
)


function_that_prints_things()

textwrap.dedent in the docs.

49. Decorators are pretty easy

You’ve no-doubt seen decorators, but may not realise that they’re quite easy to create. Let’s say we wanted a decorator that prints how long a function takes to run.

from functools import wraps
from timeit import default_timer

# The definition of the decorator
def time_function(func):
# Use wraps() to keep the original function signature
@wraps(func)
def wrapped(*args, **kwargs):
# Start a timer
start_time = default_timer()

# Run the function, store the result
result = func(*args, **kwargs)

# Calculate/print the elapsed time
elapsed = default_timer() - start_time
print(f"{func.__name__}() ran in {elapsed:g}s")

return result

return wrapped

# Using the decorator
@time_function
def do_something():
return sum(x for x in range(1_000_000))

do_something()

# This prints: do_something() ran in 0.016040s

There’s quite a lot to dig into there, but once you’ve done a few decorators you’ll wonder how you survived with them. Or maybe not. I don’t know what sorts of things you wonder about. What sorts of things do you wonder about?

Decorators can be added to function or class definitions.

50. Create a formatting function

In some cases, in order to define the formatting of a value, you’ll need to provide a function (e.g. with Pandas display.float_format). You could write a lambda for this, but the better way is to leverage the .format method that all strings have.

format_percent = "{:.1%}".format

assert format_percent(0.1234) == "12.3%"

Here are the docs for str.format.

51. Getters and setters are easy

Creating a getter property for a class is as easy as adding the @property decorator to a method.

Then, through some sort of witchcraft, you can create a setter using the getter’s name as a decorator.

class MyClass:
_my_prop = 77

@property
def my_property(self):
return self._my_prop

@my_property.setter
def my_property(self, new_value):
self._my_prop = new_value

The best practice is to only use getters for operations that are fast, and stick to methods in other cases. So I tend to use them as a shortcut for otherwise verbose code that has to reach down into nested objects, or if I want to validate the setting of a value.

Docs for the @property built-in.

52. dict(key=val)

You can type out a dictionary literal with strings for keys, or save typing all those quote marks and use the dict() constructor. The two configs below are the same:

config1 = {
"OptionA": True,
"OptionB": 33,
"OptionC": "yes",
}

config2 = dict(
OptionA=True,
OptionB=33,
OptionC="yes",
)

assert config1 == config2

Using the dict(key=val) style limits what you can have as keys (e.g. strings only, no spaces) but I find it easier to read and type.

53. SimpleNamespace

Speaking of ‘easier to type’. Accessing the values of a dictionary — with all those brackets and quote marks — requires quite a lot of right-pinky gymnastics. If you prefer dot-notation to access your values, you might like SimpleNamespaces.

from types import SimpleNamespace

config = SimpleNamespace(
OptionA=True,
OptionB=33,
OptionC="yes",
)

assert config.OptionB == 33

I share this here as an interesting alternative to dicts, but personally I never use them. Both PyCharm and VS Code fail to provide auto-complete for the attributes, and you can’t iterate over the keys or values (since they’re really class attributes). You can access them as a dict using vars(config) though.

Instead, I prefer…

54. Dataclasses

These are great, they save you from having to type a bunch of repetitive code in an __init__ function to map arguments to attributes, like self.name = name, etc.

from dataclasses import dataclass

@dataclass
class Animal:
name: str
kind: str
age: int = None

a_pet = Animal(name="Philbert", kind="fish", age=402)

Since this creates the __init__ function for you, you can’t also write your own; instead you can use __post_init__.

There is a lot more to dataclasses, check out the docs for more.

55. NamedTuples

These are also great! They’re particularly useful as return values for functions. Instead of returning a tuple of several values, return a named tuple:

from typing import NamedTuple

class Things(NamedTuple):
values: list[float]
indices: list[int]

def get_things():
values = [1.2, 3.4, 5.6]
indices = [1, 2, 3]
return Things(values, indices)

things = get_things()

assert things.indices == [1, 2, 3]

This way, if the consumer of a function only wants one part of the returned tuple, they can be explicit with code like get_things().indices.

Another nice feature of the named tuple is that you can treat it like a regular tuple. So from the above code, values, indices = get_things() would also work, as would indices = get_things()[1]. This means if you have a function that returns a tuple, you can upgrade it to return a named tuple without breaking existing code.

56. Enums can have methods

This code probably speaks for itself:

from enum import Enum

class ResponseCode(Enum):
FINE = 200
ALSO_FINE = 203
NOT_FINE = 500

def is_ok(self):
return 200 <= self.value < 300

assert ResponseCode(203).is_ok()
assert not ResponseCode(500).is_ok()

Here’s Enum in the docs. Note that there were some significant changes in 3.11, so pay attention to what version of the docs you’re looking at.

Bonus tip: there’s an HTTPStatus enum built into the http module that has all the status codes. (Fun fact inside a bonus tip: I was beginning to question whether I really needed to read every single word of the docs, when I came across something that made it all worthwhile. HTTP status code 418: IM_A_TEAPOT).

57. Context managers are pretty easy

You’ve probably used a context manager (e.g. with open()), but perhaps you don’t know that they’re really very easy to create yourself; they’re nothing more than a class with __enter__ and __exit__ methods.

Let’s say you wanted to know how long some code takes to run. You can ‘wrap’ it in a context manager (a with statement) that starts a timer, runs the code, then prints the elapsed time.

from dataclasses import dataclass
from timeit import default_timer

@dataclass
class timer:
name: str

def __enter__(self):
self.start_time = default_timer()

def __exit__(self, *args):
elapsed = default_timer() - self.start_time
print(f"{elapsed} ⮜ {self.name}")

# Using the context manager
with timer("Make a big number"):
sum(x for x in range(1_000_000))

Pro-tip: if you’re printing times and want all times to be printed in a consistent format, use datetime.timedelta:

elapsed = timedelta(seconds=default_timer() - self.start_time)
print(f"{elapsed} ⮜ {self.name}")

This makes comparing different times easier. For example I’ve got a data-processing pipeline logging times:

⏱ 0:00:00.157739 ⮜ join_with_stores
⏱ 0:00:01.175930 ⮜ fill_missing_days
⏱ 0:00:00.693886 ⮜ add_school_zones
⏱ 0:00:06.870284 ⮜ add_holidays
⏱ 0:00:08.036416 ⮜ add_features
⏱ 0:00:00.922283 ⮜ fill_closed_periods
⏱ 0:00:03.338278 ⮜ smooth_outliers

If this was a mixture of ‘seconds’ and ‘milliseconds’ with varying decimal places, it would be harder to visually scan and see what’s taking the longest, and the right side wouldn’t be aligned.

58. Run code after a function returns

If you’ve ever wanted to define some code inside a function and have it run after the function returns, you may like ExitStack.

from contextlib import ExitStack


def any_function():
with ExitStack() as stack:
stack.callback(print, "After function return")
print("About to return...")
return 7

If you’re doing this frequently, a decorator might be a better option.

ExitStack in the docs.

59. Unpack an zip/tar file with shutil

There are modules for zipfile and tarfile, but if you want to unpack a compressed file of any supported type, you can do this:

import shutil

shutil.unpack_archive("my_archive.tar.gz", extract_dir="unpacked")

This will work out which module to use based on the file extension. You can type shutil.get_unpack_formats() to see the mappings from file extension to format.

unpack_archive in the shutil docs.

60. Combine raw and formatted strings

If you want to insert a variable into a string, you use f"...", if you want to write a raw string that doesn’t treat a backslash as anything special, you use a raw string r"...". If you want to do both, no problem, just do rf"..." — this is useful when creating regex patterns.

import re

search_term = "hat"
results = re.findall(rf"\b{search_term}\b", "the cat shat in my hat")

assert results == ["hat"]

61. You don’t need to .compile() your regex

I see a lot of code that first uses re.compile() to create a regular expression object before calling a method like findall(). This is not necessary.

The re module internally caches the most recent patterns (512 of them as of Python 3.10), so unless you have an application using (and reusing) a lot of unique regular expressions, you’ve got nothing to gain by using re.compile().

See the note under re.compile() in the re docs for more.

62. Extract multiple values at once with RegEx

The .group method can take multiple values to access multiple matches at once.

import re

key, val = re.match("(.*): (.*)", "this: that").group(1, 2)

assert key == "this"
assert val == "that"

In this case, there only are two groups so we could have used .groups():

key, val = re.match("(.*): (.*)", "this: that").groups()

Or if you like ugly regexes, you can name your capture groups and extract them as a dict. Here the names key and val are defined in the regex.

import re

my_dict = re.match("(?P<key>.*): (?P<val>.*)", "this: that").groupdict()

assert my_dict["key"] == "this"
assert my_dict["val"] == "that"

63. Adjacent strings collapse

If you’ve got two adjacent strings (separated by whitespace, which includes newlines), they’ll automatically be combined into one string. This can be useful for long messages, just don’t forget the spaces at the end of the string.

assert start_angle > 2, (
"There are so many things wrong with what you've done, "
"I don't even know where to begin. "
"For one thing, your start angle is too small. "
"I mean, whatever were you thinking? "
"Have you got the vapours? "
"Why must you always be like this?"
)

And to borrow an example from the Lexical analysis page of the docs, this is also useful for commenting regexs:

import re

re.compile(
"[A-Za-z_]" # letter or underscore
"[A-Za-z0-9_]*" # letter, digit or underscore
)

Reminder: you don’t need to compile your regexes just because the authors of the docs seem to like doing it.

Fun fact: PEP 3126 proposed to remove the string collapsing behaviour, but it was rejected.

64. Use sentinel objects to detect unprovided args

Imagine you have a function and would like to differentiate between an arg being provided with the value None and not being provided at all. You can do this with a ‘sentinel’ object, like so:

_sentinel = object()

def do_something(arg=_sentinel):
if arg is _sentinel:
return "Arg not provided"
else:
return f"Got {arg}"

assert do_something(None) == "Got None"
assert do_something(10) == "Got 10"
assert do_something() == "Arg not provided"

The idea is that the caller of the function isn’t going to accidentally pass that exact _sentinel object, so if that’s what the value of arg is, you can infer that arg was not provided.

This sometimes goes by the name missing or undefined.

This is not exactly a ‘feature’ of Python, but is a pattern used a lot in the source code and even gets a mention in the FAQ.

65. Capturing import times

If you want to know where time is being spent in your imports, you can run a file with the importtime flag.

python -X importtime my_file.py

Or do the same for a specific command with -c:

python -X importtime -c 'import numpy'

Although I wanted to keep this post free of third party packages, if you’re being a good citizen and trying to make your code load faster, you’ll probably want to visualize the resulting logs. For this, I’m a fan of the tuna package.

python -X importtime will output to stderr, so to capture this and send it to a file, use 2> (Windows or Linux), then view that file with tuna:

python -X importtime my_file.py 2> import_times.log
tuna import_times.log

This will give you a flame chart showing where time is being spent.

Here’s importtime in the command line docs.

66. Reference a class before it exists

Let’s say you’ve got a structure that’s recursive. That is, it can contain children with the same type as itself. You can do this by putting the type in quotes, as with the type of children below, which is a list of the things that children is a child of.

from dataclasses import dataclass

@dataclass
class Node:
name: str
children: list["Node"] = None # "Node" refers to the Node class

tree = Node(
name="A parent",
children=[
Node("A child"),
Node("Another child"),
],
)

If you’re not fond of these quote marks, you can reach into the future and fix that with annotations:

from __future__ import annotations
from dataclasses import dataclass

@dataclass
class Node:
name: str
children: list[Node] = None # No more quotes

tree = Node(
name="A parent",
children=[
Node("A child"),
Node("Another child"),
],
)

Normally you can expect things in the __future__ to at one point or another become things in the present. However the release notes for 3.11 mentioned that this has been put on hold indefinitely.

67. Type your strings with Literal

If you’ve got a function parameter that accepts one of several strings, you can type it with Literal to make life easier for consumers.

from typing import Literal

def do_something(color: Literal["Red", "Blue", "Green"]):
...

Both PyCharm and VS Code will warn if an invalid string is provided.

PyCharm will show the options in a tooltip:

VS Code goes a step further and gives autocomplete suggestions:

68. Final: prevent a value from being changed

A variable declared as Final can’t be changed.

from typing import Final

MY_CONST: Final = 12

Well, actually it can, this is just a type hint, but a type checker will complain.

This not only works for top-level variables, but class attributes too. So if you want a class to have an attribute with a particular value, and be sure that even subclasses can’t change this, mark it as Final.

Final in the docs.

69. Type hints without assignment

Let’s say you’re iterating over some un-typed data from an external source (API, JSON file, etc). You can define the type of a variable on its own line, like so:

for row in get_some_data():
row: tuple[int, str, list[str]]

That second row doesn’t do anything, other than tell the typing machinery what sort of thing you’re dealing with, so you get the right autocomplete options. For example, it knows that name is a str in the below:

Side-note: this is PyCharm being quietly brilliant and working out which method I probably want to use (‘upper’) based on the name I gave my variable.

70. Class methods returning Self

Imagine you have a class method that returns self, to allow for method chaining. You can type this return value using a TypeVar, by convention called Self. This even behaves correctly through inheritance.

from typing import TypeVar

Self = TypeVar("Self")

class BaseEstimator:
def fit(self: Self) -> Self:
# Do stuff
return self

class Estimator(BaseEstimator):
def score(self: Self) -> Self:
# Do stuff
return self

If we create an instance of Estimator and call fit(), the system knows that although that method is defined on BaseEstimator, the return value Self refers to Estimator and I see the appropriate completions:

This pattern is so useful that Self was added to the typing module in 3.11. See more in the docs.

71. Convert timedelta into a specific unit

Let’s say you’ve calculated the difference between two datetimes and you want to know “who long was that, in days, as a float?”. You can divide any timedelta by another timedelta of a specific duration (such a 1 day).

from datetime import datetime, timedelta

the_incident = datetime(2010, 11, 3)

time_since_the_incident = datetime.now() - the_incident

days_since_the_incident = time_since_the_incident / timedelta(days=1)

assert isinstance(days_since_the_incident, float)

4,612 days without laughing, time flies!

Pro-tip: if you just want to know the value in seconds, you can use time_since_the_incident.total_seconds(), but not time_since_the_incident.seconds which is something else.

You can even do divmod(time_since_the_incident, timedelta(days=1)) and get an int and a timedelta back, if that’s what floats your boat.

More about timedelta in the datetime docs.

72. Don’t check for ints with isinstance(x, int)

A check for int is typically not what you want. For example NumPy can produce values that very much look like an int or a float but aren’t. So if you want to know whether or not you can consider a value to be an integer, the safe bet is to check for numbers.Integral.

from numbers import Integral
import numpy as np

for item in np.array([1, 2, 3]):
assert isinstance(item, Integral)
assert not isinstance(item, int) # These aren't really ints!

73. Number is a tricky concept

Checking if a variable is a ‘number’ is not as straightforward as one might hope. By now you know that checking for int or float is overly restrictive. If you look in the docs at the numbers module you will see this beacon of hope:

numbers.Number: The root of the numeric hierarchy. If you just want to check if an argument x is a number, without caring what kind, use isinstance(x, Number).

So surely that’s how you check if something’s a number, right?

Well, here’s a fun quiz: is it possible for the following code to raise an error on the return line?

from numbers import Number

def compare_numbers(a, b):
if isinstance(a, Number) and isinstance(b, Number):
return a < b

The answer is yes, because a Number object might be complex and asking if one complex number is less than another is like asking if cheese is greater than turtle.

If you aren’t familiar with the various types of numbers, don’t want to be, and want a general rule, I’d say that numbers.Real is the closest thing to what you’d refer to as a number in everyday language. To be a bit more rigorous, you should consider the operations are you about to perform on the number and look through the hierarchy of types in the numbers docs to see at which level those operators are implemented.

So now you’re all set … except … another quiz: can this function raise an error?

from numbers import Real

def to_int(a):
if isinstance(a, Real):
return int(a)

The answer is of course yes. float("nan") is technically a “real” number, but can’t be converted to int and will raise an error.

A more Pythonic approach is the principle of EAFP (Easier to Ask Forgiveness than Permission). That is, use a try/except.

def to_int(a):
try:
return int(a)
except (ValueError, TypeError):
return None

But then someone comes along and throws infinity at the function which gets you an uncaught OverflowError. So to implement EAFP, you either have to know in advance every possible exception that a function can raise (and even in this very simple case of a single, built-in function it isn’t documented) or resort to except Exception and have people tell you this is too broad and that you’re a bad programmer.

I’ve digressed a little, but the point to all this is that number types are tricky and you should take the time to think through all the edge cases and proceed with caution. And write tests!

74. Fractions: interesting

If you’re dealing with fractions and don’t want to run into floating point troubles, you can use a Fraction. You need to be careful though that you don’t create floating point errors before even passing the value to Fraction.

from fractions import Fraction

assert Fraction("1/10") + Fraction("2/10") == Fraction("3/10")
assert Fraction(1, 10) + Fraction(2, 10) == Fraction(3, 10)
assert Fraction(1 / 10) + Fraction(2 / 10) != Fraction(3 / 10) # Not equal!

The fraction module docs.

75. Decimals: interesting

A similar story to Fractions, Decimals can work around floating point problems, but again you need to be careful:

from decimal import Decimal

assert Decimal("0.1") + Decimal("0.2") == Decimal("0.3")
assert Decimal(0.1) + Decimal(0.2) != Decimal(0.3) # Not equal!

There’s a lot more to decimals, in the docs.

76. Euclidean distance with math.dist()

import math

assert math.dist([0, 0], [3, 4]) == 5

This is not limited to the 2-dimensional case, the arguments can be any vector.

77. Cache your functions

Let’s say you have a function that searches for matching strings. It responds to a user’s input as they type. So a search for ‘abc’ is really a search for ‘a’, followed by a search for ‘ab’, then a search for ‘abc’.

Once you’ve already searched for ‘a’, you might as well use those results as a starting point when it comes time to search for ‘ab’ (we don’t need to search through all the strings that don’t have ‘a’ in them), and when we search for ‘abc’, we only need to search in the results of the search for ‘ab’.

One way to implement this is using cache (or to be more direct: one way to demonstrate @functools.lru_cache is to implement this concept.)

from functools import lru_cache

all_words = ["a", "ab", "abc", "abcd", "butter"]

class Finder:
matched_words = []

@lru_cache # Cache calls to this method
def filter_words(self, text):
return [word for word in self.matched_words if word.startswith(text)]

def find(self, text):
self.matched_words = all_words

if len(text) > 1:
self.matched_words = self.filter_words(text[:-1]) # will hit cache

self.matched_words = self.filter_words(text)

return self.matched_words


finder = Finder()

# Simulate a user typing a-b-c
finder.find("a")
finder.find("ab")
finder.find("abc")

assert (
str(finder.filter_words.cache_info())
== "CacheInfo(hits=2, misses=3, maxsize=128, currsize=3)"
)

So when we call finder.find("abc") (knowing that we’re handling user-typed input on each keystroke) we assume that we’ve already searched for "ab" and first call that, which is fast since it’s cached, and results in a much smaller search space for the search for "abc".

Note the assert at the end showing that our cached method has a cache_info() method attached to it with some useful info about hits and misses. There’s also a cache_clear() method which does what you think it does.

I’ve found that having cache as a mental tool in my toolbox sometimes results in quite different solutions. If I’m struggling to come up with an elegant, performant solution to a problem, I’ll think “what would an elegant, non-performant solution look like” and then ask if cache could be used to make it fast without causing memory use issues.

The docs have a much simpler example and more details.

78. Shelve

shelve is a quick way to store some data on disk. Much like a salad car, it uses pickle under the hood so you can store anything that pickle can store (including not lambdas).

import shelve

with shelve.open("my_shelf") as shelf:
shelf["config"] = dict(port=4000, host="localhost")

# later
with shelve.open("my_shelf") as shelf:
port = shelf["config"]["port"]

assert port == 4000

The shelve docs will appear in front of your eyes if you click this underlined text.

79. Compare values in tuples

You can ask if one tuple is larger than another. Particularly useful when checking version info:

import sys

assert sys.version_info < (4, 7)

Be careful though, the exact rules for comparing sequences are tricky. To quote the Expressions docs: “Collections that support order comparison are ordered the same as their first unequal elements (for example, [1,2,x] <= [1,2,y] has the same value as x <= y). If a corresponding element does not exist, the shorter collection is ordered first (for example, [1,2] < [1,2,3] is true).”

In the case of sys.version_info, which contains more than just the version values, stick to < comparisons and you’ll be fine.

80. Pool makes multiprocessing easy

I’ll preface this by saying that if you’re doing multi-threading, you should probably just use a package like joblib. With that said, some parallel operations are surprisingly easy. Let’s say you’ve got a big 2D matrix and you want the mean of reach row.

import statistics
from multiprocessing import Pool

data = [
[1, 2, 3, 4],
[5, 6, 7, 8],
[7, 6, 5, 4],
]

with Pool() as pool:
row_means = pool.map(statistics.fmean, data)

assert row_means == [2.5, 6.5, 5.5]

By default, Pool will use all your available CPU cores. But this (or any parallelism) doesn’t magically make everything faster. On my machine, the overhead is about 100ms, so anything that runs faster than that isn’t worth parallelising.

Read more about Pool in the multiprocessing docs.

81. Share across processes

Imagine you have a long-running process, keeping some sort of state, and you would like to be able to reach into that process to inspect the state without interrupting it. One way to achieve this is with shared memory.

This first block of code represents the main, long-running process. Where arr is the object we’d like to be able to take a peek at from another process.

import time
import numpy as np
from multiprocessing import shared_memory

arr_base = np.arange(30) # Create an object of the appropriate size
my_memory = shared_memory.SharedMemory(
create=True,
size=arr_base.nbytes, # Use this size when creating the shared memory
name="my_shared_memory", # Remember this name
)

arr = np.frombuffer(my_memory.buf) # Make a NumPy array backed by this buffer
arr[:] = arr_base[:] # Copy the original data into shared memory

# Some long-running process that periodically changes state
for i in range(len(arr)):
print("Tick", i)
arr[i] = 77
time.sleep(1)

And this is the code that we’d run (e.g. in a different shell on the same machine) to access and print the current arr:

import numpy as np
from multiprocessing import shared_memory

my_shared_memory = shared_memory.SharedMemory(name="my_shared_memory")
arr = np.frombuffer(my_shared_memory.buf)

print(arr) # Prints [77, 77, 77, ...]

Please, for the love of Enya don’t go copy-pasting this code blindly, it’s the absolute bare minimum to demonstrate the magic of shared memory. You’ll want to do some docs reading in order to get something working robustly.

And of course this is a problem created to fit the solution, I’m not suggesting shared memory is the smartest way to access the state of a running process.

82. Respect robots.txt with robotparser

If you’re doing some web scraping and want to be a good human, you may wish to respect robots.txt. There’s a module for that: urllib.robotparser. In theory, it works like this:

from urllib.robotparser import RobotFileParser

robo_parser = RobotFileParser()
robo_parser.set_url("https://medium.com/robots.txt")
robo_parser.read()

can_fetch_trending = robo_parser.can_fetch(
useragent="*",
url="/trending",
)

can_hit_api = robo_parser.can_fetch(
useragent="*",
url="/_/api/users",
)

In practice, RobotFileParser tries urllib.request.urlopen(url) to open the robots.txt and if that fails it will silently swallow the error, then always return False for any call to can_fetch.

Dodgy.

As it turns out, medium.com will give you a 403 if you don’t pass a user-agent, so the real solution is a bit nastier:

import urllib.request
from urllib.error import HTTPError
from urllib.request import urlopen
from urllib.robotparser import RobotFileParser

robo_parser = RobotFileParser()
robo_parser.set_url("https://medium.com/robots.txt")

try:
urlopen(robo_parser.url) # OK you can use RobotFileParser
robo_parser.read()
except HTTPError:
# Try getting the file manually, and passing to parse
response = urlopen(
urllib.request.Request(
url=robo_parser.url,
headers={"User-Agent": "David"},
)
)
robo_parser.parse(response.read().decode("utf-8").splitlines())

can_fetch_trending = robo_parser.can_fetch(
useragent="*",
url="/trending",
)

can_hit_api = robo_parser.can_fetch(
useragent="*",
url="/_/api/users",
)

assert not can_fetch_trending # Not Allowed
assert can_hit_api # Allowed

This isn’t great but you can wrap it up as a PatchedRobotFileParser if you’re doing this often, or use a third party package.

83. Set the log level of 3rd party packages

By default, Python runs with the log level set to WARNING, so you won’t see INFO messages logged. But some 3rd party packages have a nasty habit of setting their logger log level to INFO (which is both illogical and annoying) and spamming your output.

You can fix this by reaching into any logger and setting the level.

import logging

logging.getLogger('package_name').setLevel(logging.WARNING)

There’s more a package can do to break logging best practices, stay tuned for a post on taking control of all aspects of logging…

84. Inspect the arguments of a callback

Let’s say you’ve got a function that takes a callback, and you would like to perform different actions based on the types of the callback’s parameters.

You can inspect a function’s parameters with the inspect module, like so:

import inspect

def do_with_callback(callback):
# Extract information about the callback's parameter types
signature = inspect.signature(callback)
arg_types = [val.annotation for val in signature.parameters.values()]

if arg_types == [str, str]:
# Caller is expecting two strings, give 'em two strings
return callback("A string", "another string")

if arg_types == [int, str]:
# This one will match the callback defined below
return callback(77, "A string")

raise ValueError(f"Unsupported callback signature {arg_types}")

# Dummy callback with types [int, str]
def my_callback(num: int, text: str):
return num, text

# Call the function, passing a callback
assert do_with_callback(my_callback) == (77, "A string")

Admittedly this code is unusual, program flow is not normally affected by the types of function parameters (as opposed to the types of the passed arguments), so I would think twice before using this in a public-facing API.

There’s a lot more the inspect module can do, here’s the docs.

85. breakpoint() to drop into the debugger

You can write a breakpoint() statement anywhere in your code to trigger the debugger (which is what you use when your code is buggered and you want to de-bugger it). This is a built-in function, so no imports required. Wrap it in an if statement and tell your friends you made a ‘conditional breakpoint’.

I’ll often have something like this in the training loop of a machine learning model (which can run for hours, and is occasionally interesting to interrupt and inspect).

if Path("BREAK.me").exists():
breakpoint()

Then to interrupt the running script and drop into the debugger, I create a BREAK.me file and it will pause on the next loop. Keep performance in mind when doing this, I’d suggest its more suited to experimentation than production code.

breakpoint() on the built-ins page, and the pdb (Python Debugger) docs.

86. PDB post mortem is magic

You run some code, an error is raised, and you want to know what was going on to cause the crash. Getting to the point where the crash occurred is a piece of cake, using pdb.pm()

Take this code:

def compare_numbers(first, second):
return first < second

compare_numbers(7, 1j)

If I run this, I’ll get an error. If I’m using an interactive interpreter (e.g. in PyCharm), I can then type import pdb; pdb.pm() which will run ‘post mortem’ mode of the Python debugger (PDB), essentially reinstating the interpreter at the point where the error occurred. I can then run all the usual PDB commands to inspect what’s going on. In the picture below, I’ve run the command args to list the arguments at the point of failure:

But what if you’re running a script in a shell/console/terminal? Well, you can’t do much once it’s crashed, but you can run it again with the -i (interactive) flag, this keeps the Python interpreter running after the script has finished (or crashed):

Or you could run python -m pdb your_file.py which will drop into post-mortem mode automatically if the program crashes.

There’s lots more to pdb, and even though the debuggers in PyCharm and VS Code are pretty good, I’ve found that at times pdb is the better choice (e.g. when the other debuggers struggle with multiple processes, are too slow, or I want to use post-mortem). I highly recommend getting to know some basic commands, like stepping back and forward, printing args, and displaying the source code where the debugger is stopped. It can even answer those existential questions that plague us all, like whatis self?

The pdb docs be here.

87. random() will never return 0.05954861408025609

This last one is in no way meant to be useful information, just a final fun fact. You can multiply any random.random() number by 2⁵³ and you’ll get an integer.

import random

assert (random.random() * 2**53).is_integer()

Now you know.

As explained in the docs.

Hey there, we’re all done! Thanks for reading, have a good night’s sleep.

--

--

David Gilbertson
David Gilbertson

Responses (13)