Python concurrency and parallelism explained

Learn how to use Python’s async functions, threads, and multiprocessing capabilities to juggle tasks and improve the responsiveness of your applications.

Garry Knight (CC BY 2.0)

If you program in Python, you have most likely encountered situations where you wanted to speed up some operation by executing multiple tasks in parallel or by interleaving between multiple tasks.

Python has mechanisms for both of these approaches. The first is parallelism and the second is concurrency. In this article, you'll learn the differences between parallelism and concurrency, then we'll discuss how each technique is implemented in Python. I'll also share tips for deciding which technique to employ for different use cases in your programs.

Concurrency vs. parallelism

Concurrency and parallelism are names for two different mechanisms for juggling tasks in a program. Concurrency involves allowing multiple jobs to take turns accessing the same shared resources, like disk, network, or a single CPU core. Parallelism is about allowing several tasks to run side by side on independently partitioned resources, like multiple CPU cores.

Concurrency and parallelism have different aims. The goal of concurrency is to prevent tasks from blocking each other by switching among them when one is forced to wait on an external resource. A common example is completing multiple network requests. The crude way to do it is to launch one request, wait for it to finish, launch another, and so on. The concurrent way to do it is to launch all requests at once, then switch among them as the responses come back. Through concurrency, we can aggregate all the time spent waiting for responses.

Parallelism, by contrast, is about maximizing the use of hardware resources. If you have eight CPU cores, you don’t want to max out only one while the other seven lie idle. Rather, you want to launch processes or threads that make use of all those cores, if possible.

Concurrency and parallelism in Python

Python provides mechanisms for both concurrency and parallelism, each with its own syntax and use cases. For concurrency, Python offers two different mechanisms which share many common components. These are threading and coroutines, or async.

For parallelism, Python offers multiprocessing, which launches multiple instances of the Python interpreter, each one running independently on its own hardware thread.

All three of these mechanisms—threading, coroutines, and multiprocessing—have distinctly different use cases. Threading and coroutines can often be used interchangeably, but not always. Multiprocessing is the most powerful mechanism, used for scenarios where you need to max out the CPU utilization.

Python threading

If you’re familiar with threading in general, threading in Python won’t be a big step. Threads in Python are units of work where you can take one or more functions and execute them independently of the rest of the program. You can then aggregate the results, typically by waiting for all threads to run to completion.

Here is a simple example of threading in Python:

Listing 1. How Python handles threading


from concurrent.futures import ThreadPoolExecutor
import urllib.request as ur

datas = []

def get_from(url):
    connection = ur.urlopen(url)
    data = connection.read()
    datas.append(data)

urls = [
    "https://python.org",
    "https://docs.python.org/"
    "https://wikipedia.org",
    "https://imdb.com",    
]

with ThreadPoolExecutor() as ex:
    for url in urls:
        ex.submit(get_from, url)
       
# let's just look at the beginning of each data stream
# as this could be a lot of data
print ([_[:200] for _ in datas])

This snippet uses threading to read data from multiple URLs at once, using multiple executed instances of the get_from() function. The results are then stored in a list.

Rather than create threads directly, the example uses one of Python’s convenient mechanisms for running threads, ThreadPoolExecutor. We could submit dozens of URLs this way without slowing things down much because each thread yields to the others whenever it’s only waiting for a remote server to respond.

Python users are often confused about whether threads in Python are the same as threads exposed by the underlying operating system. In CPython, the default Python implementation used in the vast majority of Python applications, Python threads are OS threads—they’re just managed by the Python runtime to run cooperatively, yielding to one another as needed.

Advantages of Python threads

Threads in Python provide a convenient, well-understood way to run tasks that wait on other resources. The above example features a network call, but other waiting tasks could include a signal from a hardware device or a signal from the program’s main thread.

Also, as shown in Listing 1, Python’s standard library comes with high-level conveniences for running operations in threads. You don’t need to know how operating system threads work to use Python threads.

Disadvantages of Python threads

As mentioned before, threads are cooperative. The Python runtime divides its attention between them, so that objects accessed by threads can be managed correctly. As a result, threads shouldn’t be used for CPU-intensive work. If you run a CPU-intensive operation in a thread, it will be paused when the runtime switches to another thread, so there will be no performance benefit over running that operation outside of a thread.

Another downside of threads is that you, the programmer, are responsible for managing state between them. In the above example, the only state outside of the threads is the contents of the datas list, which just aggregates the results from each thread. The only synchronization needed is provided automatically by the Python runtime when we append to the list. Nor do we check the state of that object until all threads run to completion anyway.

However, if we were to read and write to datas from different threads, we’d need to manually synchronize these processes to ensure we get the results we expect. The threading module does have tools to make this possible, but it falls to the developer to use them—and they’re complex enough to deserve a separate article.

Python coroutines and async

Coroutines or async are a different way to execute functions concurrently in Python, by way of special programming constructs rather than system threads. Coroutines are also managed by the Python runtime but require far less overhead than threads.

Here is another version of the previous program, written as an async/coroutine construct and using a library that supports asynchronous handling of network requests:

Listing 2. Async handling a network request in Python


import aiohttp
import asyncio

urls = [
    "https://imdb.com",    
    "https://python.org",
    "https://docs.python.org",
    "https://wikipedia.org",
]

async def get_from(session, url):
    async with session.get(url) as r:
        return await r.text()


async def main():
    async with aiohttp.ClientSession() as session:
        datas = await asyncio.gather(*[get_from(session, u) for u in urls])
        print ([_[:200] for _ in datas])

if __name__ == "__main__":
    loop = asyncio.get_event_loop()
    loop.run_until_complete(main())

get_from() is a coroutine, i.e., a function object that can run side by side with other coroutines. asyncio.gather launches several coroutines (multiple instances of get_from() fetching different URLs), waits until they all run to completion, then returns their aggregated results as a list.

The aiohttp library allows network connections to be made asynchronously. We can’t use plain old urllib.request in a coroutine, because it would block the progress of other asynchronous requests.

Advantages of Python coroutines

Coroutines make perfectly clear in the program’s syntax which functions run side by side. You can tell at a glance that get_from() is a coroutine. With threads, any function can be run in a thread, making it more difficult to reason about what may be running in a thread.

Another advantage of coroutines is that they are not bound by some of the architectural limitations of using threads. If you have many coroutines, there is less overhead involved in switching between them, and coroutines require slightly less memory than threads. Coroutines don’t even require threads, as they can be managed directly by the Python runtime, although they can be run in separate threads if needed.

Disadvantages of Python coroutines

Coroutines and async require writing code that follows its own distinct syntax, the use of async def and await. Such code, by design, can’t be mingled with synchronous code. For programmers who aren’t used to thinking about how their code can run asynchonously, using coroutines and async presents a learning curve.

Also, coroutines and async don’t enable CPU-intensive tasks to run efficiently side by side. As with threads, they’re designed for operations that need to wait on some external condition.

Python multiprocessing

Multiprocessing allows you to run many CPU-intensive tasks side by side by launching multiple, independent copies of the Python runtime. Each Python instance receives the code and data needed to run the task in question.

Listing 3 presents our web-reading script rewritten to use multiprocessing.

Listing 3. Multiprocessing in Python


import urllib.request as ur
from multiprocessing import Pool
import re

urls = [
    "https://python.org",
    "https://docs.python.org",
    "https://wikipedia.org",
    "https://imdb.com",    
]

meta_match = re.compile("<meta .*?>")

def get_from(url):
    connection = ur.urlopen(url)
    data = str(connection.read())
    return meta_match.findall(data)

def main():
    with Pool() as p:
        datas = p.map(get_from, urls)
    print (datas)
# We're not truncating data here,
# since we're only getting extracts anyway
if __name__ == "__main__": main()

The Pool() object represents a reuseable group of processes. .map() lets you submit a function to run across these processes, and an iterable to distribute between each instance of the function—in this case, get_from and the list of URLs.

Another key difference in this version of the script is that we perform a CPU-bound operation in get_from(). The regular expression searches for anything that looks like a meta tag. This isn’t the ideal way to look for such things, of course, but the point is that we can perform what could be a computationally expensive operation in get_from without having it block all the other requests.

Advantages of Python multiprocessing

With threading and coroutines, the Python runtime forces all operations to run serially, the better to manage access to any Python objects. Multiprocessing sidesteps this limitation by giving each operation a separate Python runtime and a full CPU core.

Disadvantages of Python multiprocessing

Multiprocessing has two distinct downsides. First, additional overhead is associated with creating the processes. However, you can minimize the impact of this if you spin up those processes once over the lifetime of an application and re-use them. The Pool object in Listing 3 can work like this: Once set up, we can submit jobs to it as needed, so there’s only a one-time cost across the lifetime of the program to start the subprocesses.

The second downside is that each subprocess needs to have a copy of the data it works with sent to it from the main process. Generally, each subprocess also has to return data to the main process. To do this, it uses Python’s pickle protocol, which serializes Python objects into binary form. Common objects (numbers, strings, lists, dictionaries, tuples, bytes, etc.) are all supported, but anything that requires its own object definition will need to have that definition available to the subprocess, too.

Which Python concurrency model should I use?

Whenever you are performing long-running, CPU-intensive operations, use multiprocessing. “CPU-intensive” refers to work happening inside the Python runtime (e.g., the regular expressions in Listing 3). You don’t want the Python runtime constrained to a single instance that blocks when doing CPU-based work.

For operations that don’t involve the CPU but require waiting on an external resource, like a network call, use threading or coroutines. While the difference in efficiency between the two is insignificant when dealing with only a few tasks at once, coroutines will be more efficient when dealing with thousands of tasks, as it’s easier for the runtime to manage large numbers of coroutines than large numbers of threads.

Finally, note that coroutines work best when using libraries that are themselves async-friendly, such as aiohttp in Listing 2. If your coroutines are not async-friendly, they can stall the progress of other coroutines.