Quantum Dynamics Movies

One of the cool things about working with real-time electronic dynamics is that you can make awesome videos.

I recently made a video of a resonant electronic excitation of hydrogen peroxide.

You can check out the video by clicking the picture below.

Quantum Dynamics

The video was made using the real-time TDHF module in Chronus Quantum. We visualize how the electronic density changes over time.

For this simple STO-3G model, hydrogen peroxide strongly absorbs at 18.0 eV. So we apply an oscillating electric field at the same frequency and the electronic density goes nuts. You can do this for any frequency of light, however.

In the future, we can look at more realistic systems with real time TDDFT.

It’s important because this type of phenomena underlies our ability to make better solar cells and faster computer processing devices.

Easy transparent SSH tunnels

Most of my development work is done on remote computing clusters, which are usually firewalled.

This means I can’t log in directly when I work from home. I have to log into a network machine, and then log in to my development machine from there.

It gets kind of tedious to keep doing something along the lines of

home:>$ ssh network-host
network-host:>$ ssh development-machine
development-machine:>$

It also makes it a pain to copy files from home to the remote machine (and vice versa!).

However, you can easily set up a transparent tunnel through your network-host that lets you (effectively) log on to your development machine as if it weren’t even firewalled.

Just open up (or create) ~/.ssh/config:

Add the following lines (replacing network-host and development-machine with the names of your computers). You can name them whatever you want. Here I named each machine network and devel.

# Non-firewalled computer
Host network 
HostName network-machine

# Firewalled computer
Host devel
HostName development-machine 
ProxyCommand ssh -o 'ForwardAgent yes' network 'ssh-add && nc %h %p'

What this does is log you into the non-firewalled network-host, then establish a connection to the firewalled computer through netcat (nc in this case). You never actually see the network host, it acts on your behalf, invisibly in the background.

So, from your home computer, you can login directly to the development machine:

home:>$ ssh devel 
development-machine:>$

You can also scp directly from your development machine now, too.

home:>$ scp devel:~/path/to/files .
files                           100% 4084     4.0KB/s   00:00 
home:>$

Moving from Wordpress to Jekyll

I’m currently in the process of porting over my old WordPress blog to GitHub Pages, which supports Jekyll. I was getting tired of the bloat dealing with Wordpress, and wanted a cleaner, snappier website.

That and now I can blog in markdown, which is a great advantage to using Jekyll! From the terminal! And version control and hosting with git and GitHub!

I’m a little late to the party, but glad I made it.

New NFL team ranking program available

I released some new code today that should allow you to easily rank NFL teams for any given season (back through 2009) using either the Colley or Massey methods. Both methods have several weighting schemes implemented, and with the Massey method you can additionally rank according to point differential or total yard differential.

You can check it out on my GitHub here.

I’ve become increasingly dissatisfied with my old NFL Colley ranking system and its derivatives, mostly because (a) I didn’t write it in an object-oriented style, and (b) it became very (VERY) hard to modify later on. The new structure, by encapsulating everything in a “Season” object should keep everything pretty self-contained.

For example, now

from season import Season

season = Season()  
season.year = 2014 # default season is 2015  
season.massey() # Massey ranking method, unweighted  
for team in season.rating:  
    print team  

Which gives the output

['NE', 11.407605241747872]  
['DEN', 9.6038904227178108]  
['SEA', 9.5169656599013628]  
['GB', 8.2526935620363258]  
...  
['OAK', -8.9899785292554917]  
['TB', -9.8107991356959232]  
['JAC', -10.427123019821691]  
['TEN', -11.805248019821692]

So obviously the NE Patriots were ranked #1 for the 2014 season with this method. You’ll recall they ended up winning the Super Bowl that same season.

So anyway, I’m starting over and making use of some great NFL APIs that I have found elsewhere on GitHub. In particular, I am using nflgame, which does a lot of the heavy lifting for me associated with scraping necessary data.

Check it out if this sounds like it may be something you’re interested in!

Embarassingly parallel tasks in Python

Recently, I found myself executing the same commands (or some variation thereof) at the command line.

Over and over and over and over.

Sometimes I just write a do loop (bash) or for loop (python). But the command I was executing was taking almost a minute to finish. Not terrible if you are doing this once, but several hundred (thousand?) times more and I get antsy.

If you are curious, I was generating .cube files for a large quantum dot, of which I needed the molecular orbital density data to analyze. There was no way I was waiting half a day for these files to generate.

So instead, I decided to parallelize the for loop that was executing my commands. It was easier than I thought, so I am writing it here not only so I don’t forget how, but also because I’m sure there are others out there like me who (a) aren’t experts at writing parallel code, and (b) are lazy.

Most of the following came from following along here.

First, the package I used was the joblib package in python. I’ll assume you have it installed, if not, you can use pip or something like that to get it on your system. You want to import Parallel and delayed.

So start off your code with

  
from joblib import Parallel, delayed  

If you want to execute a system command, you’ll also need the call function from the subprocess package. So you have

  
from joblib import Parallel, delayed  
from subprocess import call  

Once you have these imported, you have to structure your code (according to the joblib people) like so:

  
import ....

def function1(...):  
 ...

def function2(...):  
 ...

...  
if __name__ == '__main__':  
 # do stuff with imports and functions defined about  
 ...  

So do you imports first (duh), then define the functions you want to do (in my case, execute a command on the command line), and then finally call that function in the main block.

I learn by example, so I’ll show you how I pieced the rest of it together.

Now, the command I was executing was the Gaussian “ cubegen” utility. So an example command looks like

cubegen 0 MO=50 qd.fchk 50.cube 120 h  

Which makes a .cube file (50.cube) containing the volumetric data of molecular orbital 50 (MO=50) from the formatted checkpoint file (qd.fchk). I wanted 120 points per side, and I wanted headers printed (120 h).

Honestly, the command doesn’t matter. If you want to parallelize

ls -lh  

over a for loop, you certainly could. That’s not my business.

What does matter is that we can execute these commands from a python script using the call function that we imported from the subroutine package.

So we replace our functions with the system calls

  
from joblib import Parallel, delayed  
from subprocess import call

def makeCube(cube,npts):  
    call(["cubegen","1","MO="+str(cube),"qd.fchk",str(cube)+".cube",  
    str(npts),"h"])

def listDirectory(i): #kidding, sorta.  
    call(["ls", "-lh"])

if __name__ == '__main__':  
 # do stuff with imports and functions defined about  
 ...  

Now that we have the command(s) defined, we need to piece it together in the main block.

In the case of the makeCube function, I want to feed it a list of molecular orbital (MO) numbers and let that define my for loop. So let’s start at MO #1 and go to, say, MO #500. This will define our inputs. I also want the cube resolution (npts) as a variable (well, parameter really).

I’ll also use 8 processors, so I’ll define a variable num_cores and set it to 8. Your mileage may vary. Parallel() is smart enough to handle fairly dumb inputs.

(Also, if you do decide to use cubegen, like I did, please make sure you have enough space on disk.)

Putting this in, our code looks like

  
from joblib import Parallel, delayed  
from subprocess import call

def makeCube(cube,npts):  
    call(["cubegen","1","MO="+str(cube),"qd.fchk",str(cube)+".cube",  
    str(npts),"h"])

def listDirectory(i): #kidding, sorta  
    call(["ls", "-lh"])

if __name__ == '__main__':  
    start = 1  
    end = 501 # python's range ends at N-1  
    inputs = range(start,end)  
    npts = 120  
    num_cores = 8

Great. Almost done.

Now we need to call this function from within Parallel() from joblib.

  
results = Parallel(n_jobs=num_cores)(delayed(makeCube)(i,npts)  
    for i in inputs)  

The Parallel function (object?) first takes the number of cores as an input. You could easily hard code this if you want, or let Python’s multiprocessing package determine the number of CPUs available to you. Next we call the function using the delayed() function. This is “a trick to create a tuple (function, args, kwargs) with a function-call syntax”.

It’s on the developer’s web page. I can’t make this stuff up.

Then we feed it the list defined by our start and end values.

If you wanted to list the contents of your directory 500 times and over 8 cores, it would look like (assuming you defined the function and inputs above)

  
results = Parallel(n_jobs=8)(delayed(listDirectory)(i)  
     for i in inputs)  

Essentially we are making the equivalence that

  
delayed(listDirectory)(i) for i in inputs

is the same as

  
for i in inputs:
    listDirectory(i)

Does that make sense? It’s just

  
delayed(function)(arguments)

instead of

function(arguments)

Okay. Enough already. Putting it all together we have:

  
from joblib import Parallel, delayed  
from subprocess import call

def makeCube(cube,npts):  
    call(["cubegen","1","MO="+str(cube),"qd.fchk",str(cube)+".cube",  
    str(npts),"h"])

def listDirectory(i): #kidding, sorta  
    call(["ls", "-lh"])

if __name__ == '__main__':  
    start = 1  
    end = 501 # python's range ends at N-1  
    inputs = range(start,end)  
    npts = 120  
    num_cores = 8  
    results = Parallel(n_jobs=num_cores)(delayed(makeCube)(i,npts)  
        for i in inputs)  

There you have it!