For all of Python's great and convenient features, one goal remains out of reach: Python apps running on the CPython reference interpreter and using multiple CPU cores in parallel.
This has long been one of Python's biggest stumbling blocks, especially since all of the workarounds are clumsy. The urgency to find a long-term solution to the issue is growing, notably as core counts on processors continue to ramp up (see Intel's 24-core behemoth).
One lock for all
In truth, it's possible to use threads in Python applications -- plenty of them already do. What's not possible is for CPython to run multithreaded applications with each thread executing in parallel on a different core. CPython's internal memory management isn't thread-safe, so the interpreter runs only one thread at a time, switching between them as needed and controlling access to the global state.
This locking mechanism, the Global Interpreter Lock (GIL), is the single biggest reason why CPython can't run threads in parallel. There are some mitigating factors; for instance, I/O operations like disk or network reads are not bound by the GIL, so those can run freely in their own threads. But anything both multithreaded and CPU-bound is a problem.
For Python programmers, this means heavy computational tasks that benefit from being spread out across multiple cores don't run well, barring the use of an external library. The convenience of working in Python comes at a major performance cost, which is becoming harder to swallow as faster, equally convenient languages like Google's Go come to the fore.
Pick the lock
Over time, a slew of options have emerged that ameliorate -- but do not eliminate -- the limits of the GIL. One standard tactic is to launch multiple instances of CPython and share context and state between them; each instance runs independently of the other in a separate process. But as Jeff Knupp explains, the gains provided by running in parallel can be lost by the effort needed to share state, so this technique is best suited to long-running operations that pool their results over time.
C extensions aren't bound by the GIL, so many libraries for Python that need speed (such as the math-and-stats library Numpy) can run across multiple cores. But the limitations in CPython itself remain. If the best way to avoid the GIL is to use C, that will drive more programmers away from Python and toward C.
PyPy, the Python version that compiles code via JIT, doesn't get rid of the GIL but makes up for it by simply having code run faster. In some ways this isn't a bad substitute: If speed is the main reason you've been eyeing multithreading, PyPy might be able to provide the velocity without the complications of multithreading.
Finally, the GIL itself was reworked somewhat in Python 3, with a better thread-switching handler. But all of its underlying assumptions -- and limitations -- remain. There's still a GIL, and it still holds up proceedings.
No GIL? No problem
Despite all this, the quest for a GIL-less Python, compatible with existing applications, continues. Other implementations of Python have done away with the GIL entirely, but at a cost. Jython, for instance, runs on top of the JVM and uses the JVM's object-tracking system instead of the GIL. IronPython takes the same approach via Microsoft's CLR. But both suffer from inconsistent performance, and they sometimes run much slower than CPython. They also can't interface readily with external C code, so many existing Python applications won't work.
PyParallel, a project created by Trent Nelson of Continuum Analytics, is an "experimental, proof-of-concept fork of Python 3 designed to optimally exploit multiple CPU cores." It doesn't remove the GIL, but ameliorates its impact by replacing the async
module, so apps that use async
for parallelism (such as multithreaded I/O like a web server) benefit most. The project has been dormant for several months, but its documentation states that its developers are comfortable taking their time to get it right, so it can eventually be included in CPython: "There's nothing wrong with slow and steady as long as you're heading in the right direction."
One long-running project by PyPy's creators has been a version of Python that uses a technique called "software transactional memory" (PyPy-STM). The advantage, according to PyPy's creators, is "you can do minor tweaks to your existing, nonmultithreaded programs and get them to use multiple cores."
PyPy-STM sounds like magic, but it has two drawbacks. First, it's a work in progress that currently only supports Python 2.x, and second, it still takes a performance hit for applications running on a single core. Since one of the stipulations cited by Python creator Guido van Rossum for any attempts to remove the GIL from CPython is that its replacement shouldn't degrade performance for single-core, single-threaded applications, a fix like this won't land in CPython in its current state.
Hurry up and wait
Larry Hastings, a core Python developer, shared some of his views at PyCon 2016 about how the GIL could be removed. Hastings documented his attempts to remove the GIL and in doing so ended up with a version of Python that had no GIL, but ran agonizingly slowly because of constant cache misses.
You can lose the GIL, Hastings summed up, but you need to have some way to guarantee that only one thread at a time is modifying global objects -- for instance, by having a dedicated thread in the interpreter handle such state changes.
One piece of long-term good news is that if and when CPython sheds the GIL, developers using the language will already be primed to exploit multithreading. Many changes now baked into Python's syntax, like queues and the async
/await
keywords for Python 3.5, make it easy to apportion tasks across cores at a high level.
Still, the amount of work needed to make Python GIL-less all but guarantees it will show up first in a separate implementation like PyPy-STM. Those who want to try a GIL-less system can do so through such a third-party effort, but the original CPython is likely to remain untouched for now. Here's hoping the wait isn't much longer.