How to parallelize for loops in Python

I’ve previously written about parallelizing for loops in C/C++ and C#. This is possible in Python, too, and might even be more important than in the two other languages, as Python’s interpreted nature can make it a bit slow. Slow for-loops are in fact one of my main criticisms of Python. Where possible one should use efficient libraries like NumPy and Pandas (that are implemented in C), but there are cases when you can’t get around implementing a more demanding computation in Python.

multiprocessing.Pool

If you call a function that only requires a single argument and has no return values, such as when processing multiple files in parallel, the easiest way to parallelize this is with multiprocessing.Pool:

import multiprocessing

def process_file(f):
    do_something_with(f)

pool = multiprocessing.Pool()
pool.map(process_file,infiles)
pool.close()

This does not use threads, but subprocesses. It is possible to pass multiple arguments using starmap instead of map, but I find that a bit clunky to use.

joblib

The other option, and the one that I prefer for functions that require multiple arguments and return a value, is joblib. This code example shows how to parallelize a for loop over the rows of a Pandas dataframe:

from joblib import Parallel, delayed

def get_chainage(x:float,y:float,data:pd.DataFrame):
    # compute chainage

result = Parallel(n_jobs=-1)(delayed(get_chainage)(comp['x'][i],comp['y'][i],ref) for i in comp.index)
for i in comp.index:
    comp.loc[i,'chainage']=result[i]

It’s pretty straightforward to see what is going on here:

  1. n_jobs=-1 instructs joblib to use all available cores.
  2. get_chainage is the function to call, the arguments are passed between the brackets after that.
  3. The for loop uses the variable i and loops over comp.index
  4. The result is a list with the number of elements equal to the number of loop iterations. So a second for loop is needed to assigning the results to a dataframe or doing other computations like summations.

You can select which backend to use. By default this is loky, which uses subprocesses. There’s also a threading backend. I’ve had cases where loky was significantly faster, but others where it was very inconsistent, so I would suggest trying both.

Leave a comment

Your email address will not be published. Required fields are marked *