Embarassingly parallel tasks in Python
Recently, I found myself executing the same commands (or some variation thereof) at the command line.
Over and over and over and over.
Sometimes I just write a do loop (bash
) or for loop (python
). But the command I was executing was taking almost a minute to finish. Not terrible if you are doing this once, but several hundred (thousand?) times more and I get antsy.
If you are curious, I was generating .cube
files for a large quantum dot, of which I needed the molecular orbital density data to analyze. There was no way I was waiting half a day for these files to generate.
So instead, I decided to parallelize the for loop that was executing my commands. It was easier than I thought, so I am writing it here not only so I don’t forget how, but also because I’m sure there are others out there like me who (a) aren’t experts at writing parallel code, and (b) are lazy.
Most of the following came from following along here.
First, the package I used was the joblib package in python
. I’ll assume you have it installed, if not, you can use pip or something like that to get it on your system. You want to import Parallel
and delayed
.
So start off your code with
from joblib import Parallel, delayed
If you want to execute a system command, you’ll also need the call
function from the subprocess package. So you have
from joblib import Parallel, delayed
from subprocess import call
Once you have these imported, you have to structure your code (according to the joblib
people) like so:
import ....
def function1(...):
...
def function2(...):
...
...
if __name__ == '__main__':
# do stuff with imports and functions defined about
...
So do you imports first (duh), then define the functions you want to do (in my case, execute a command on the command line), and then finally call that function in the main block.
I learn by example, so I’ll show you how I pieced the rest of it together.
Now, the command I was executing was the Gaussian “ cubegen” utility. So an example command looks like
cubegen 0 MO=50 qd.fchk 50.cube 120 h
Which makes a .cube
file (50.cube
) containing the volumetric data of molecular orbital 50 (MO=50) from the formatted checkpoint file (qd.fchk
). I wanted 120 points per side, and I wanted headers printed (120 h
).
Honestly, the command doesn’t matter. If you want to parallelize
ls -lh
over a for loop, you certainly could. That’s not my business.
What does matter is that we can execute these commands from a python script using the call
function that we imported from the subroutine
package.
So we replace our functions with the system calls
from joblib import Parallel, delayed
from subprocess import call
def makeCube(cube,npts):
call(["cubegen","1","MO="+str(cube),"qd.fchk",str(cube)+".cube",
str(npts),"h"])
def listDirectory(i): #kidding, sorta.
call(["ls", "-lh"])
if __name__ == '__main__':
# do stuff with imports and functions defined about
...
Now that we have the command(s) defined, we need to piece it together in the main block.
In the case of the makeCube
function, I want to feed it a list of molecular orbital (MO) numbers and let that define my for loop. So let’s start at MO #1 and go to, say, MO #500. This will define our inputs. I also want the cube resolution (npts
) as a variable (well, parameter really).
I’ll also use 8 processors, so I’ll define a variable num_cores
and set it to 8. Your mileage may vary. Parallel()
is smart enough to handle fairly dumb inputs.
(Also, if you do decide to use cubegen
, like I did, please make sure you have enough space on disk.)
Putting this in, our code looks like
from joblib import Parallel, delayed
from subprocess import call
def makeCube(cube,npts):
call(["cubegen","1","MO="+str(cube),"qd.fchk",str(cube)+".cube",
str(npts),"h"])
def listDirectory(i): #kidding, sorta
call(["ls", "-lh"])
if __name__ == '__main__':
start = 1
end = 501 # python's range ends at N-1
inputs = range(start,end)
npts = 120
num_cores = 8
Great. Almost done.
Now we need to call this function from within Parallel()
from joblib
.
results = Parallel(n_jobs=num_cores)(delayed(makeCube)(i,npts)
for i in inputs)
The Parallel
function (object?) first takes the number of cores as an input. You could easily hard code this if you want, or let Python’s multiprocessing
package determine the number of CPUs available to you. Next we call the function using the delayed()
function. This is “a trick to create a tuple (function, args, kwargs) with a function-call syntax”.
It’s on the developer’s web page. I can’t make this stuff up.
Then we feed it the list defined by our start and end values.
If you wanted to list the contents of your directory 500 times and over 8 cores, it would look like (assuming you defined the function and inputs above)
results = Parallel(n_jobs=8)(delayed(listDirectory)(i)
for i in inputs)
Essentially we are making the equivalence that
delayed(listDirectory)(i) for i in inputs
is the same as
for i in inputs:
listDirectory(i)
Does that make sense? It’s just
delayed(function)(arguments)
instead of
function(arguments)
Okay. Enough already. Putting it all together we have:
from joblib import Parallel, delayed
from subprocess import call
def makeCube(cube,npts):
call(["cubegen","1","MO="+str(cube),"qd.fchk",str(cube)+".cube",
str(npts),"h"])
def listDirectory(i): #kidding, sorta
call(["ls", "-lh"])
if __name__ == '__main__':
start = 1
end = 501 # python's range ends at N-1
inputs = range(start,end)
npts = 120
num_cores = 8
results = Parallel(n_jobs=num_cores)(delayed(makeCube)(i,npts)
for i in inputs)
There you have it!