Python's

Parallel

Processing

Possibilities

Samuel Colvin

Who Am I

Built and run TutorCruncher - SAAS platform with monolithic django app on heroku
Contributor to aiohttp, rq and many more
Developer of arq, pydantic and many more

Agenda

I'll try to:

Talk about 4 levels of concurrency
Demonstrate them using Python
Why you might (not) use them
Make it fun

I wont:

Prepare you for a CS exam on distributed computing
Go into detail on protocols
Give an exhaustive description of the technology

source: www.spec.org, github.com/samuelcolvin/analyze-spec-benchmarks

The rational for Parallel Processing

The Metaphor

1. Multiple Machines

Machine = host/computer/virtual machine/container

import requests

def count_words(year: int):
    resp = requests.get(f'https://ep{year}.europython.eu/en/')
    print(f'{year}: {len(resp.text.split())}')

RQ

Example

worker.py

from redis import Redis
from rq import Queue
from worker import count_words

q = Queue(connection=Redis())
for year in range(2016, 2020):
    print(q.enqueue(count_words, year))

rq_example.py

https://python-rq.org/

Multiple Machines - Advantages

Scaling is easy
Linear cost increase
Isolation!

Multiple Machines - Disadvantages

Need to take care of networking between the machines
Harder to setup in a dev environment
No standard library implementation

2. Multiple Processes

Processes are an Operating System concept
Exist (with a little variation) on all OSes
Often used as a stop gap for multiple machines during testing

Processes

Example

from multiprocessing import Process, JoinableQueue
import requests

def count_words(year: int):
    resp = requests.get(f'https://ep{year}.europython.eu/en/')
    print(f'{year}: {len(resp.text.split())} words')

def worker(id):
    while True:
        item = q.get()
        if item is None:
            print('quitting worker', id)
            break
        count_words(item)
        q.task_done()

q = JoinableQueue()
process = []
for id in range(2):
    p = Process(target=worker, args=(id,))
    p.start()
    process.append(p)

for year in range(2016, 2020):
    q.put(year)

q.join()

for _ in process:
    q.put(None)
for p in process:
    p.join()

➤ python multiprocessing_example.py 
2017: 4123 words
2016: 3794 words
2019: 1953 words
2018: 4334 words
quitting worker 0
quitting worker 1

Easy to run
OS guarantees memory separate between processes
Fast to communicate

Multiple Processes - Advantages

Multiple Processes - Disadvantages

Limits to scaling
Fixed capacity

3. Multiple Threads

Threads allow concurrent execution from within a single process
Thus multiple threads can access the same memory
2 varieties: kernel threads, user/green threads
"Threading" in python generally refers to kernel threads

Threading

Example

from queue import Queue
from threading import Thread
import requests

def count_words(year: int):
    resp = requests.get(f'https://ep{year}.europython.eu/en/')
    print(f'{year}: {len(resp.text.split())}')

def worker(id):
    while True:
        item = q.get()
        if item is None:
            print('quitting thread', id)
            break
        count_words(item)
        q.task_done()

q = Queue()
threads = []
for id in range(2):
    t = Thread(target=worker, args=(id,))
    t.start()
    threads.append(t)

for year in range(2016, 2020):
    q.put(year)

q.join()

for _ in threads:
    q.put(None)
for t in threads:
    t.join()

➤ python threading_example.py 
2017: 4123 words
2016: 3794 words
2019: 1953 words
2018: 4334 words
quitting worker 0
quitting worker 1

Multiple Threads - Advantages

Lighter than processes
Faster to create and switch than processes
Share memory (if you dare!)

Multiple Threads - Disadvantages

Memory locking is horrid -

The GIL limits the usefulness of threading with Python:

Do not communicate by sharing memory; instead, share memory by communicating.

- Go Proverb

GIL ... protects access to Python objects, preventing multiple threads from executing Python bytecodes at once

- Python Wiki

The Global Interpreter Lock

from queue import Queue
from threading import Thread
from time import time


def do_calcs(year: int):
    print(sum(range(year * int(1e5))))

t1 = time()
for year in range(2016, 2020):
    do_calcs(year)
t2 = time()
print(f'Time taken without threads: {t2 - t1:0.2f}s')

def worker(id):
    while True:
        item = q.get()
        if item is None:
            print('quitting thread', id)
            break
        do_calcs(item)
        q.task_done()

t3 = time()
...

for year in range(2016, 2020):
    q.put(year)

...
t4 = time()
print(f'Time taken with 2 threads:  {t4 - t3:0.2f}s')

➤  python gil.py 
20321279899200000
20341444899150000
20361619899100000
20381804899050000
Time taken without threads: 7.63s
20321279899200000
20341444899150000
20361619899100000
20381804899050000
quitting thread 1
quitting thread 0
Time taken with 2 threads:  7.65s

from queue import Queue
from threading import Thread
from time import time
import numpy as np

def do_calcs(year: int):
    print(np.sum(np.arange(year * int(1e5))))

t1 = time()
for year in range(2016, 2020):
    do_calcs(year)
t2 = time()
print(f'Time taken without threads: {t2 - t1:0.2f}s')

def worker(id):
    while True:
        item = q.get()
        if item is None:
            print('quitting thread', id)
            break
        do_calcs(item)
        q.task_done()

t3 = time()
...

for year in range(2016, 2020):
    q.put(year)

...
t4 = time()
print(f'Time taken with 2 threads:  {t4 - t3:0.2f}s')

➤  python gil_numpy.py 
20321279899200000
20341444899150000
20361619899100000
20381804899050000
Time taken without threads: 2.36s
20321279899200000
20341444899150000
20381804899050000
20361619899100000
quitting thread 1
quitting thread 0
Time taken with 2 threads:  1.34s

4. Asynchronous I/O

AKA coroutines/green threads/fibers
"Asyncio" in python
Cooperative scheduling
Mostly (but not always) used for networking tasks
based on an event loop which schedules coroutines
1 kernel thread - only one piece of code is running at any time

Without

Asyncio

With

Asyncio

Asyncio Example

from aiohttp import ClientSession
import asyncio


async def count_words(year: int):
    async with ClientSession() as session:
        async with session.get(f'https://ep{year}.europython.eu/en/') as resp:
            text = await resp.text()
    print(f'{year}: {len(text.split())} words')


async def main():
    coroutines = []
    for year in range(2016, 2020):
        coroutines.append(count_words(year))
    await asyncio.gather(*coroutines)


asyncio.run(main())

➤ python asyncio_example.py 
2019: 1953 words
2017: 4123 words
2016: 3782 words
2018: 4334 words

Asyncio - Advantages

Even lighter - easily run thousands of concurrent tasks
Easier to reason with
Less risk of memory corruption

Asyncio - Disadvantages

By default asyncio provides no speedup for CPU bound tasks
Whole new way of thinking
Applications have to be entirely rewritten

explicit cooperative scheduling is awesome, but it can't be implicit

- me

This is where it gets tricky

Machines

Processes

Threads

Asyncio

rq forks the main process to run the worker

ThreadPoolExecutor

ProcessPoolExecutor

aiohttp, arq

multiprocessing.Queue

Using Asyncio for Processes and Threading

performance of processes or threads from the conform of asyncio
ThreadPoolExecutor - for file operations
ProcessPoolExecutor - for CPU bound tasks

ThreadPoolExecutor Example

from concurrent.futures import ThreadPoolExecutor
import asyncio
from time import time
import numpy as np

def do_calcs(year: int):
    print(np.sum(np.arange(year * int(1e5))))

async def main():
    loop = asyncio.get_event_loop()
    with ThreadPoolExecutor(max_workers=2) as pool:
        coroutines = [
            loop.run_in_executor(pool, do_calcs, v)
            for v in range(2016, 2020)
        ]
        await asyncio.gather(*coroutines)

t1 = time()
asyncio.run(main())
print(f'Time taken with 2 threads:  {time() - t1:0.2f}s')

➤  python asyncio_numpy.py 
20321279899200000
20341444899150000
20381804899050000
20361619899100000
Time taken with 2 threads:  1.27s

Summary

4 levels of concurrency: machines, processes, threads, asyncio
All possible with (but not limited to) python
All have strengths, weaknesses and pitfalls
They often interact with each other

It's easy to read the docs but the tricky thing (and what I tried to do today) is understanding the big picture

Thank you

checkout:

arq - asyncio version of rq with superpowers

devtools - better debug printing for python

pydantic - Data parsing using Python type hinting

github.com/samuelcolvin/(arq|python-devtools|pydantic)

Questions?

this presentation: tiny.cc/pythonsppp