Embarassingly parallel tasks in Python

Recently, I found myself executing the same commands (or some variation thereof) at the command line.

Over and over and over and over.

Sometimes I just write a do loop (bash) or for loop (python). But the command I was executing was taking almost a minute to finish. Not terrible if you are doing this once, but several hundred (thousand?) times more and I get antsy.

If you are curious, I was generating .cube files for a large quantum dot, of which I needed the molecular orbital density data to analyze. There was no way I was waiting half a day for these files to generate.

So instead, I decided to parallelize the for loop that was executing my commands. It was easier than I thought, so I am writing it here not only so I don’t forget how, but also because I’m sure there are others out there like me who (a) aren’t experts at writing parallel code, and (b) are lazy.

Most of the following came from following along here.

First, the package I used was the joblib package in python. I’ll assume you have it installed, if not, you can use pip or something like that to get it on your system. You want to import Parallel and delayed.

So start off your code with

  
from joblib import Parallel, delayed  

If you want to execute a system command, you’ll also need the call function from the subprocess package. So you have

  
from joblib import Parallel, delayed  
from subprocess import call  

Once you have these imported, you have to structure your code (according to the joblib people) like so:

  
import ....

def function1(...):  
 ...

def function2(...):  
 ...

...  
if __name__ == '__main__':  
 # do stuff with imports and functions defined about  
 ...  

So do you imports first (duh), then define the functions you want to do (in my case, execute a command on the command line), and then finally call that function in the main block.

I learn by example, so I’ll show you how I pieced the rest of it together.

Now, the command I was executing was the Gaussian “ cubegen” utility. So an example command looks like

cubegen 0 MO=50 qd.fchk 50.cube 120 h  

Which makes a .cube file (50.cube) containing the volumetric data of molecular orbital 50 (MO=50) from the formatted checkpoint file (qd.fchk). I wanted 120 points per side, and I wanted headers printed (120 h).

Honestly, the command doesn’t matter. If you want to parallelize

ls -lh  

over a for loop, you certainly could. That’s not my business.

What does matter is that we can execute these commands from a python script using the call function that we imported from the subroutine package.

So we replace our functions with the system calls

  
from joblib import Parallel, delayed  
from subprocess import call

def makeCube(cube,npts):  
    call(["cubegen","1","MO="+str(cube),"qd.fchk",str(cube)+".cube",  
    str(npts),"h"])

def listDirectory(i): #kidding, sorta.  
    call(["ls", "-lh"])

if __name__ == '__main__':  
 # do stuff with imports and functions defined about  
 ...  

Now that we have the command(s) defined, we need to piece it together in the main block.

In the case of the makeCube function, I want to feed it a list of molecular orbital (MO) numbers and let that define my for loop. So let’s start at MO #1 and go to, say, MO #500. This will define our inputs. I also want the cube resolution (npts) as a variable (well, parameter really).

I’ll also use 8 processors, so I’ll define a variable num_cores and set it to 8. Your mileage may vary. Parallel() is smart enough to handle fairly dumb inputs.

(Also, if you do decide to use cubegen, like I did, please make sure you have enough space on disk.)

Putting this in, our code looks like

  
from joblib import Parallel, delayed  
from subprocess import call

def makeCube(cube,npts):  
    call(["cubegen","1","MO="+str(cube),"qd.fchk",str(cube)+".cube",  
    str(npts),"h"])

def listDirectory(i): #kidding, sorta  
    call(["ls", "-lh"])

if __name__ == '__main__':  
    start = 1  
    end = 501 # python's range ends at N-1  
    inputs = range(start,end)  
    npts = 120  
    num_cores = 8

Great. Almost done.

Now we need to call this function from within Parallel() from joblib.

  
results = Parallel(n_jobs=num_cores)(delayed(makeCube)(i,npts)  
    for i in inputs)  

The Parallel function (object?) first takes the number of cores as an input. You could easily hard code this if you want, or let Python’s multiprocessing package determine the number of CPUs available to you. Next we call the function using the delayed() function. This is “a trick to create a tuple (function, args, kwargs) with a function-call syntax”.

It’s on the developer’s web page. I can’t make this stuff up.

Then we feed it the list defined by our start and end values.

If you wanted to list the contents of your directory 500 times and over 8 cores, it would look like (assuming you defined the function and inputs above)

  
results = Parallel(n_jobs=8)(delayed(listDirectory)(i)  
     for i in inputs)  

Essentially we are making the equivalence that

  
delayed(listDirectory)(i) for i in inputs

is the same as

  
for i in inputs:
    listDirectory(i)

Does that make sense? It’s just

  
delayed(function)(arguments)

instead of

function(arguments)

Okay. Enough already. Putting it all together we have:

  
from joblib import Parallel, delayed  
from subprocess import call

def makeCube(cube,npts):  
    call(["cubegen","1","MO="+str(cube),"qd.fchk",str(cube)+".cube",  
    str(npts),"h"])

def listDirectory(i): #kidding, sorta  
    call(["ls", "-lh"])

if __name__ == '__main__':  
    start = 1  
    end = 501 # python's range ends at N-1  
    inputs = range(start,end)  
    npts = 120  
    num_cores = 8  
    results = Parallel(n_jobs=num_cores)(delayed(makeCube)(i,npts)  
        for i in inputs)  

There you have it!

Biorthogonalizing left and right eigenvectors the easy (lazy?) way

Lately I have been diagonalizing some nasty matrices.

Large. Non-Hermitian. Complex. Matrices. The only thing I suppose I have going for me is that they are relatively sparse.

Usually I haven’t have much of a problem getting eigenvalues. Eigenvalues are easy. Plug into ZGEEV, compute, move on.

The problem I ran into came when I wanted to use the eigenvectors. If you are used to using symmetric matrices all the time, you might not realize that non-Hermitian matrices have two sets of eigenvectors, left and right. In general, these eigenvectors of a non-Hermitian matrix are not orthonormal. But you can biorthogonalize them.

Why do you care if your eigenvectors are biorthogonalized?

Well, if your matrix corresponds to a Hamiltonian, and if you want to compute wave function properties, then you need a biorthonormal set of eigenvectors. This happens in linear response coupled cluster theory, for example. It is essential for a unique and physical description of molecular properties, e.g. transition dipole moments.

Now, with Hermitian matrices, your left and right eigenvectors are just conjugate transposes of each other, so it’s super easy to orthogonalize a set of eigenvectors. You can compute the QR decomposition (a la Gram-Schmidt) of your right eigenvectors \(\mathbf{C}\) to get

\[\begin{equation} \mathbf{C} = \mathbf{QR} \end{equation}\]

where \(\mathbf{Q}\) is your set of orthogonalized eigenvectors. (Usually they are orthogonalized anyway during your eigenvalue decomposition.)

For non-Hermitian matrices, this situation is different. You have two eigenvalue equations you want to solve:

\(\mathbf{HR} = \mathbf{RE}\), and, \(\mathbf{LH} = \mathbf{LE}\)

where \(\mathbf{R}\) and \(\mathbf{L}\) are your right and left eigenvectors, and \(\mathbf{H}\) and \(\mathbf{E}\) are your matrix and eigenvalues. The eigenvalues are the same regardless of which side you solve for.

To biorthonormalize \(\mathbf{L}\) and \(\mathbf{R}\), we want to enforce the constraint

\[\mathbf{LR} = \mathbf{I}\]

which says that the inner product of these two set of vectors is the identity. This is what biorthonormalization means.

Many times \(\mathbf{LR} = \mathbf{D}\) where \(\mathbf{D}\) is diagonal. This is easy. It’s already orthogonal, so you can just scale by the norm.

If that isn’t the case, how do you do it? One way would be to modify Gram-Schmidt. You could do that, but you won’t find a LAPACK routine to do it for you (that I know of). So you’d have to write one yourself, and doing that well is time-consuming and may be buggy. Furthermore, I’ve found the modified Gram-Schmdt to run into to serious problems when you encounter degenerate eigenvalues. In that case, the eigenvectors for each degenerate eigenvalue aren’t unique, even after constraining for biorthonormality, and so it’s tough to enforce biorthonormality overall.

Here’s a trick if you just want to get those dang eigenvectors biorthonormalized and be on your way. The trick lies in the LU decomposition.

Consider the following. Take the inner product of \(\mathbf{L}\) and \(\mathbf{R}\) to get the matrix \(\mathbf{M}\).

\[\mathbf{LR} = \mathbf{M}\]

Now take the LU decomposition of \(\mathbf{M}\)

\[\mathbf{M} = \mathbf{M}_L \mathbf{M}_U\]

Where \(\mathbf{M}_L\) is lower triangular, and \(\mathbf{M}_U\) is upper triangular. So our equation now reads:

\[\mathbf{LR} = \mathbf{M}_L\mathbf{M}_U\]

Triangular matrices are super easy to invert, so invert the right hand side to get:

\[\mathbf{M}_L^{-1}\mathbf{LR}\mathbf{M}_U^{-1} = \mathbf{I}\]

Now, since we want left and right eigenvectors that are biorthonormal, we can replace the identity:

\[\mathbf{M}_L^{-1}\mathbf{LR}\mathbf{M}_U^{-1} = \mathbf{L}'\mathbf{R}'\]

where the prime indicates our new biorthonormal left and right eigenvectors.

This suggests our new biorthonormal left and right eigenvectors take the form:

\[\mathbf{L}' = \mathbf{M}_L^{-1}\mathbf{L}\]

and

\[\mathbf{R}' = \mathbf{R}\mathbf{M}_U^{-1}\]

And there you have it! An easy to implement method of biorthonormalizing your eigenvectors. All of the steps have a corresponding LAPACK routine. I’ve found it to be pretty robust. You can go a step further and prove that the new eigenvectors are still eigenvectors of the Hamiltonian. Just plug them in to the eigenvalue equation and you’ll see.

While I think this doesn’t scale any worse than your diagonalization in the first place, it is still a super useful trick. For example, it even works for set of eigenvectors that aren’t full rank (e.g. rectangular). Because you do the LU on the inner product of the left and right eigenvectors, you’ll get a much smaller square matrix the dimension of the number of eigenvectors you have.

Another use you might find with this is if the eigenvectors are nearly biorthonormal (which often happens when you have degenerate eigenvalues). You can do the same trick, but on the subspace of the eigenvectors corresponding to the degenerate eigenvalues. So if you have three degenerate eigenvalues, you can do an LU decomposition plus inversion on a 3x3 matrix.

Several New Papers now Online

The end of the year is rapidly approaching, and I’d like to draw your attention to three papers of mine that have been recently published online.

Theoretical Characterization of Conduction-Band Electrons in Photodoped and Aluminum-Doped Zinc Oxide (AZO) Quantum Dots

This first paper was a fun collaboration between our theoretical work in the Li group and the experimental work done by Dan Gamelin at the UW. The paper looks at calculated UV-Vis spectra of photodoped and Aluminum-doped ZnO quantum dots. Both types of doping added “extra” electrons to the conduction band, and this resulted in some interesting properties that you can see in both theory and experiment.

What I was most surprised to see was that the “extra” electron simply acted like an electron moving in a spherical potential, just like a hydrogen atom! On the left we have the HOMO, which looks like an s-orbital. Then the two on the right are different LUMOs, one that looks like a p-orbital and another that looks like a d-orbital. We called the “super-orbitals”, since they look like atomic orbitals but exist in these tiny crystal structures. I think these are the first images of DFT-computed “superorbitals” ever published, though the “artificial atom” paradigm has been around for some time. It’s interesting to see complicated systems behaving like the simple quantum mechanical models we study when we first learn quantum mechanics!

qds

J. J. Goings, A. Schimpf, J. W. May, R. Johns. D. R. Gamelin, X. Li, “ Theoretical Characterization of Conduction-Band Electrons in Photodoped and Aluminum-Doped Zinc Oxide (AZO) Quantum Dots,” J. Phys. Chem. C, 2014 , 118, 26584.

Assessment of Low-scaling Approximations to the Equation of Motion Coupled-Cluster Singles and Doubles Equations

This second paper details the implementation of several low-scaling methods for computing excited states (e.g. computing UV/Vis absorption). The idea was to take the highly accurate EOM-CCSD equations and simplify them using perturbation theory. That way, we could keep most of the accuracy of the method, while still having fast enough equations that large molecules could be studied. The resulting equations are closely related to other methods like CC2, CIS(D), and ADC(2), and I showed how they all relate to each other. We compared the performance to EOM-CCSD as well as experimental values. The results were promising, and the perturbative equations performed particularly well for Rydberg states. CC2, on the other hand, performs great for valence excitations.

J. J. Goings, M. Caricato, M. Frisch, X. Li, “ Assessment of Low-scaling Approximations to the Equation of Motion Coupled-Cluster Singles and Doubles Equations,” J. Chem. Phys., 2014 , 141, 164116.

Ab Initio Non-Relativistic Spin Dynamics

Finally, in this paper, we extended the generalized Hartree-Fock method to the time domain. In this proof-of-concept paper, we showed how a magnetic field can guide the spin dynamics of simple spin-frustrated systems. The key is reformulating the real-time time-dependent Hartree-Fock equations in the complex spinor basis. This allows the spin magnetizations on each atom to vary as a response to, say, an externally applied magnetic field. Here’s an example with a lithium trimer. Initially (left picture), all the spin magnetizations (that is, the spatial average of spin magnetic moment) point away from each other. Then, applying a static magnetic field (right picture) into the plane of the molecule causes each magnetization to precess at the Larmor frequency. The precession is shown in picoseconds by the colorization.

image of FIG. 3.

It’s really a beautiful idea in my opinion, and there is so much more to be done with it. For example, in our simple ab initio model, the spins only “talk” through Pauli repulsion, so they behave more or less independently. What would happen if we include spin-orbit coupling and other perturbations? That remains to be seen.

F. Ding, J. J. Goings, M. Frisch, X. Li, “ Ab Initio Non-Relativistic Spin Dynamics,” J. Chem. Phys., 2014 , 141, 214111.

Broken Symmetries in Hartree-Fock

It’s been a busy summer, with traveling and giving talks and writing papers.

I’m back in Seattle again (yay!), and have been thinking a lot about how we get the simplest models of electrons in molecules, the Hartree-Fock equations.

While most theorists know about restricted Hartree Fock (RHF) and unrestricted Hartree-Fock (UHF), not as many know that these can be derived from simple (okay, sometimes not-so-simple) symmetry considerations.

RHF and UHF are actually subgroups of a large class of problems called the generalized Hartree Fock equations. GHF allows mixed spin descriptions (not just simple spin up or spin down), as well as complex wave functions.

These are actually quite important for describing frustrated spin systems, like this simple triangular system

Triangular_frustration

The spins can’t all align favorably, so we call it “spin frustrated”. You can try this out on your own with, say, a triangle of hydrogens or lithiums or chromiums. UHF won’t give you the lowest energy solution. It will be GHF-unstable. Only GHF can give you the lowest energy solution, since it allows the MOs to take on both “spin up” and “spin down” character.

Weird, right?

If we insist that the GHF equations are invariant with respect to spin rotations along an axis (say, the z axis), we can get the UHF equations back.

If we insist that the GHF equations are invariant to time-reversal (doesn’t that sound cool?) as well as spin rotations along all axes (x,y,z), we get the real RHF equations.

Unfortunately, the literature on it is pretty sparse, and the content pretty dense. A paradox, if I’ve ever seen one. Thankfully, I found you can derive all the relationships using the tools of linear algebra.

The results are illuminating, and it’s fun to see how the different broken symmetry HF solutions relate to each other.

So here’s my take on re-deriving the different HF classes from generalized Hartree-Fock!

We want to classify broken symmetry wave function by investigating how they transform under the action of the invariance operators \({\hat{g}}\) constituting the symmetry group \({\mathcal{G}}\) of the spin-free electronic Hamiltonian \({\hat{H}}\) which is equivalent to

\[\hat{g}\hat{H}\hat{g}^{-1} = \hat{H}, \qquad (\forall \hat{g} \in \mathcal{G}). \ \ \ \ \ (1)\]

This simply means \({\hat{H}}\) is invariant to transformation by \({\hat{g}}\), whatever \({\hat{g}}\) may be. In general, when \({\vert \psi\rangle}\) is an eigenstate of \({\hat{H}}\), then \({\hat{g}\vert \psi\rangle}\) is also an eigenstate belonging to the same eigenvalue as \({\psi\rangle}\).
Now, exact eigenstates of \({\hat{H}}\) can be chosen to be simultaneous eigenstates of the various symmetries in \({\mathcal{G}}\). In contrast, for approximate variational wave functions (e.g. Hartree-Fock) the symmetry requirements represent additional constraints.

This is Löwdin’s ‘‘symmetry dilemma’’.

To state this dilemma again, we know the exact solution (lowest energy solution) will have certain symmetries, but if we include these symmetries in our approximate, variational Hamiltonian, we can only raise the energy and not lower it.

This means we can get closer to the exact solution by removing physical constraints! This is troublesome, but not a huge deal in practice.

It’s crucial to recognize that this problem arises only because we use an approximate independent-particle Hamiltonian.

There are several ways to break the single-determinental Hartree-Fock model into its various broken symmetry subgroups. However we do this, what we ultimately want to determine is the form of the operators \({\hat{A}}\) that are invariant with respect to similarity transformations by various subgroups \({\mathcal{H} \subset \mathcal{G}}\), i.e.

\[\hat{h}\hat{A}\hat{h}^{-1} = \hat{A}, \qquad (\forall \hat{h} \in \mathcal{H} \subset \mathcal{G}) \ \ \ \ \ (2)\]

I find it easier to put \({\hat{A}}\) into a finite basis and treat this problem with the tools of linear algebra.

The basis for our independent particle model will be the spinor basis,

\[\vert \alpha\rangle = \left( \begin{array}{c} 1 \\ 0 \end{array} \right), \qquad \vert \beta\rangle = \left( \begin{array}{c} 0 \\ 1 \end{array} \right), \qquad \hbox{and} \quad \vert \alpha\rangle\langle\alpha\vert = \vert \beta\rangle\langle\beta\vert = \mathbf{I}_2 \ \ \ \ \ (3)\]

so any arbitrary spin function can be written

\[\vert p\rangle = p_1\vert \alpha\rangle + p_2\vert \beta\rangle = \left( \begin{array}{c} p_1 \\ p_2 \end{array} \right) \ \ \ \ \ (4)\]

We will only worry about one body operators \({\hat{A}}\), which include the Fock operator as well as the unitary parameterization of the single determinant (c.f. Thouless representation).

Putting \({\hat{A}}\) into a spinor basis gives us

\[\hat{A} = \vert i\sigma_1\rangle\langle i\sigma_1\vert \hat{A} \vert j \sigma_2 \rangle \langle j \sigma_2 \vert = A_{i\sigma_1,j\sigma_2} \vert i \sigma_1 \rangle \langle j \sigma_2 \vert = \left( \begin{array}{cc} \mathbf{A}_{\alpha\alpha} & \mathbf{A}_{\alpha\beta} \\ \mathbf{A}_{\beta\alpha} & \mathbf{A}_{\beta\beta} \end{array} \right) \ \ \ \ \ (5)\]

with

\[\left(A_{\sigma_1 \sigma_2}\right)_{ij} \in \mathbb{C} \quad \hbox{and} \quad \sigma_1,\sigma_2 \in \alpha,\beta \ \ \ \ \ (6)\]

Or, in second quantization

\[\hat{A} = \sum\limits_{ij} \sum\limits_{\sigma_1,\sigma_2} A_{i\sigma_1,j\sigma_2} a^{\dagger}_{i\sigma_1} a_{j\sigma_2} = \sum\limits_{ij} \sum\limits_{\sigma_1,\sigma_2} A_{i\sigma_1,j\sigma_2} \hat{E}^{i \sigma_1}_{j \sigma_2} \ \ \ \ \ (7)\]

The transformation properties of \({\hat{A}}\) under symmetry operations can be determined by examining the transformation of the \({U(2n)}\) generators \({\hat{E}^{i \sigma_1}_{j \sigma_2}}\), but it is much simpler to consider the transformations of the \({2\times2}\) (block) matrix \({\mathbf{A}}\), e.g.

\[\hat{h} \left( \begin{array}{cc} \mathbf{A}_{\alpha\alpha} & \mathbf{A}_{\alpha\beta} \\ \mathbf{A}_{\beta\alpha} & \mathbf{A}_{\beta\beta} \end{array} \right)\hat{h}^{-1} = \left( \begin{array}{cc} \mathbf{A}_{\alpha\alpha} & \mathbf{A}_{\alpha\beta} \\ \mathbf{A}_{\beta\alpha} & \mathbf{A}_{\beta\beta} \end{array} \right) \ \ \ \ \ (8)\]

Basically, we are looking for the constraints on \({\mathbf{A}}\) that make the above equation true, for any given symmetry operation.

The general invariance group of the spin-free electronic Hamiltonian involves the spin rotation group SU(2) and the time reversal group \({\mathcal{T}}\), i.e. SU(2) \({\otimes \mathcal{T}}\). SU(2) can be given in terms of spin operators,

\[\hbox{SU(2)}=\{ U_R (\vec{n},\theta) = exp(i\theta \sum\limits_{a=1}^3 \hat{S}_a \hat{n}_a ); \vec{n} = (n_1, n_2, n_3); \vec{n} \in \mathbb{R}^3; \vert \vert \vec{n}\vert \vert = 1, \theta \in (-2\pi, 2\pi] \} \ \ \ \ \ (9)\]

All of this amounts to performing rotations in spin space. The form looks awfully similar to rotations in 3D space (in fact, the relationship between rotations in spin space and 3D space is much deeper than that). The time reversal group is given as

\[\mathcal{T} = \{ \pm \hat1, \pm \hat{\Theta} \} \ \ \ \ \ (10)\]

In general,

\[\hat{\Theta} = \hat{U}_R({\vec{e}}_2,-\pi)\hat{K} = exp(-i\pi\hat{S}_2)\hat{K} \ \ \ \ \ (11)\] \[\hat{\Theta}^{-1} = - \hat{\Theta} = exp(i\pi\hat{S}_2)\hat{K} \ \ \ \ \ (12)\]

Where \({\hat{K}}\) is the complex conjugation operator.

There is nothing special about using \({\hat{S}_2}\), it’s just convention.

Time reversal doesn’t really affect time per se, but rather changes the direction of movement, be it linear momentum or angular momentum.

It’s an antiunitary operator (it must be, in fact) that consists of the spin-looking part (unitary) and then the complex conjugation operator (antiunitary), to be antiunitary overall. It affects electrons by flipping their spin and then taking the complex conjugate.

One final detail about the symmetry operators before we go on, and that is to observe that since SU(2) is a double cover, if we rotate in spin space by \({2\pi}\), we flip the sign of the wave function. If we rotate by another \({2\pi}\), we get the original state back. This is characteristic of fermions, of which electrons are a part.

Let us now consider the unitary transformations of the form \({U= exp(i\hat{B})}\) where \({\hat{B} = \hat{B}^{\dagger}}\), acting on some operator \({\hat{A}}\).This is valid for any unitary transformation that depends on a continuous parameter (which we have absorbed into \({\hat{B}}\)). We insist

\[\hat{A} = \hat{U}\hat{A}\hat{U}^{-1} = e^{(-i\hat{B})} \hat{A} e^{(i\hat{B})} \ \ \ \ \ (13)\]

Now, by the Baker-Campbell-Hausdorff transformation, we can rewrite the right hand side as

\[e^{(-i\hat{B})} \hat{A} e^{(i\hat{B})} = \hat{A} + \left[\hat{A},i\hat{B}\right] + \frac{1}{2!} \left[\left[\hat{A},i\hat{B}\right],i\hat{B}\right] + \cdots \ \ \ \ \ (14)\]

Since by definition \({\hat{A} = e^{(-i\hat{B})} \hat{A} e^{(i\hat{B})}}\) It suffices to show that the constraints on \({\hat{A}}\) introduced by the symmetry operation are satisfied when \({\left[\hat{A},i\hat{B}\right] = 0}\).
This is an excellent result, because it means that we can just look at how the \({2\times2}\) matrix \({\mathbf{A}}\) transforms under the Pauli spin matrices which define the SU(2) spin rotation as well as the time-reversal operation (in the time-reversal case, we can use the BCH expansion of the unitary part \({\hat{S}_y}\) and then absorb the complex conjugation into the commutator expression).

We can show how RHF, UHF, and GHF all fall out of the different symmetry combinations (or lack thereof).

From here on out, I’ll drop the hats on my operators for clarity.

Let’s start by considering how many different types of symmetry subgroups we can have. A moments thought at the form of the spin rotation and time-reversal groups gives the following subgroups:

\({\mathbf{I}}\). This means there is no symmetry.
\({\Theta}\). This means we are just symmetric to time-reversal symmetry.
\({K}\). Only symmetric to complex conjugation.
\({S_z}\). Only symmetric to spin rotations about the z-axis.
\({S_z}\),\({\Theta}\). Symmetric to both spin rotation about z, as well as time-reversal symmetry.
\({S_z}\), \({K}\). Symmetric to spin rotation about z, as well as complex conjugation.
\({S^2}\). Symmetric to all spin rotations.
\({S^2}\), \({\Theta}\). Symmetric to all spin rotations and time reversal symmetries.

That’s it! That’s all the cases you can get.

Let’s go through them, case by case, using the complex Fock matrix as an example, i.e.

\[\left( \begin{array}{cc} \mathbf{F}_{\alpha\alpha} & \mathbf{F}_{\alpha\beta} \\ \mathbf{F}_{\beta\alpha} & \mathbf{F}_{\beta\beta} \end{array} \right) \ \ \ \ \ (15)\]

Where we solve, for each symmetry (or group of symmetries),

\[\mathbf{h} \left( \begin{array}{cc} \mathbf{F}_{\alpha\alpha} & \mathbf{F}_{\alpha\beta} \\ \mathbf{F}_{\beta\alpha} & \mathbf{F}_{\beta\beta} \end{array} \right)\mathbf{h}^{-1} = \left( \begin{array}{cc} \mathbf{F}_{\alpha\alpha} & \mathbf{F}_{\alpha\beta} \\ \mathbf{F}_{\beta\alpha} & \mathbf{F}_{\beta\beta} \end{array} \right) \ \ \ \ \ (16)\]

where \({\mathbf{h}}\) is the generator of the symmetry.

\({\mathbf{I}}\). No symmetry.

In this case, our transformation on \({\mathbf{F}}\) is rather simple. It looks like

\[\mathbf{I} \left( \begin{array}{cc} \mathbf{F}_{\alpha\alpha} & \mathbf{F}_{\alpha\beta} \\ \mathbf{F}_{\beta\alpha} & \mathbf{F}_{\beta\beta} \end{array} \right)\mathbf{I}^{-1} = \left( \begin{array}{cc} \mathbf{F}_{\alpha\alpha} & \mathbf{F}_{\alpha\beta} \\ \mathbf{F}_{\beta\alpha} & \mathbf{F}_{\beta\beta} \end{array} \right) \ \ \ \ \ (17)\]

So we get no constraints. We can mix spin, as well as take on complex values. This is the structure of the complex generalized Hartree Fock (GHF) Fock matrix.

\({K}\). Complex conjugation symmetry.

If the only symmetry that holds is complex conjugation, our transformation looks like

\[K \left( \begin{array}{cc} \mathbf{F}_{\alpha\alpha} & \mathbf{F}_{\alpha\beta} \\ \mathbf{F}_{\beta\alpha} & \mathbf{F}_{\beta\beta} \end{array} \right)K= \left( \begin{array}{cc} \mathbf{F}_{\alpha\alpha}^{*} & \mathbf{F}_{\alpha\beta}^{*} \\ \mathbf{F}_{\beta\alpha}^{*} & \mathbf{F}_{\beta\beta}^{*} \end{array} \right) \ \ \ \ \ (18)\]

Note that \({K}\) is its own inverse. It also only acts to either the left or the right. The asterisk indicates complex conjugation (not and adjoint!).

The constraint we get here is that the values of the Fock matrix have to be identical on complex conjugation. Since this can only happen if the values are real, we get the real GHF Fock equations.
\({\Theta}\). Time reversal symmetry.

Now we start to get slightly more complicated. Using the Pauli matrix

\[\sigma_2 = \left( \begin{array}{cc} 0 & -i \\ i & 0 \end{array} \right) \ \ \ \ \ (19)\]

To represent the unitary \({S_2}\) operation gives us

\[-i\left( \begin{array}{cc} 0 & -i \\ i & 0 \end{array} \right) K \left( \begin{array}{cc} \mathbf{F}_{\alpha\alpha} & \mathbf{F}_{\alpha\beta} \\ \mathbf{F}_{\beta\alpha} & \mathbf{F}_{\beta\beta} \end{array} \right) i K \left( \begin{array}{cc} 0 & i \\ -i & 0 \end{array} \right) = \left( \begin{array}{cc} \mathbf{F}_{\beta\beta}^{*} & -\mathbf{F}_{\beta\alpha}^{*} \\ -\mathbf{F}_{\alpha\beta}^{*} & \mathbf{F}_{\alpha\alpha}^{*} \end{array} \right) \ \ \ \ \ (20)\]

We see that this really only introduces two constraints, so we choose to eliminate \({\mathbf{F}_{\beta\alpha}}\) and \({\mathbf{F}_{\beta\beta}}\). This gives the final result of paired GHF, or

\[\left( \begin{array}{cc} \mathbf{F}_{\alpha\alpha} & \mathbf{F}_{\alpha\beta} \\ -\mathbf{F}_{\alpha\beta}^{*} & \mathbf{F}_{\alpha\alpha}^{*} \end{array} \right) \ \ \ \ \ (21)\]

\({S_z}\). Rotation about spin z-axis.

Here we use the Pauli matrix

\[\sigma_3 = \left( \begin{array}{cc} 1 & 0 \\ 0 & -1 \end{array} \right) \ \ \ \ \ (22)\]

And show that

\[\left( \begin{array}{cc} 1 & 0 \\ 0 & -1 \end{array} \right) \left( \begin{array}{cc} \mathbf{F}_{\alpha\alpha} & \mathbf{F}_{\alpha\beta} \\ \mathbf{F}_{\beta\alpha} & \mathbf{F}_{\beta\beta} \end{array} \right) \left( \begin{array}{cc} 1 & 0 \\ 0 & -1 \end{array} \right) = \left( \begin{array}{cc} \mathbf{F}_{\alpha\alpha} & -\mathbf{F}_{\alpha\beta} \\ -\mathbf{F}_{\beta\alpha} & \mathbf{F}_{\beta\beta} \end{array} \right) \ \ \ \ \ (23)\]

which is only satisfied if

\[\mathbf{F} = \left( \begin{array}{cc} \mathbf{F}_{\alpha\alpha} & \mathbf{0} \\ \mathbf{0} & \mathbf{F}_{\beta\beta} \end{array} \right) \ \ \ \ \ (24)\]

This gives us the complex version of UHF. We see invariance with respect to \({S_z}\) results in two separate spin blocks, with no restriction on whether they take real or complex values, or the dimension of either spin block.

\({S_z}\),\({\Theta}\). Rotation about spin z-axis and time reversal.

We are now at the point where we examine the effect of invariance to multiple symmetry operations.

It might concern you that considering multiple symmetry operations means that the order in which we perform symmetry operations matters. In general symmetry operations do not commute, but we will see that for our purposes order really doesn’t matter. Because we insist on invariance, the multiple symmetries never actually act on each other, therefore we don’t need to consider the commutator between them.

Or, put a different way, because each symmetry operation returns the system to its original state, we can consider each operation separately.The system contains no memory of the previous symmetry operation.

We can show this another way using the BCH expansion. Consider two symmetry operations on \({F}\) parameterized by \({A}\) and \({B}\):

\[e^{(-iA)}e^{(-iB)}Fe^{(iB)}e^{(iA)} = e^{(-iA)} [F + [F,iB] + \cdots ] e^{(iA)} = e^{(-iA)}Fe^{(iA)} = [F + [F,iA] + \cdots ] \ \ \ \ \ (25)\]

Which is true if and only if = [F, iA] = 0 Which decouples \({A}\) and \({B}\). Considering multiple symmetry operations only gives us more constraints, and order doesn’t matter. Let’s see it in action then for time reversal and z-axial spin symmetry. Using the results for time reversal symmetry, we have

\[\left( \begin{array}{cc} 1 & 0 \\ 0 & -1 \end{array} \right) \left( \begin{array}{cc} \mathbf{F}_{\alpha\alpha} & \mathbf{F}_{\alpha\beta} \\ -\mathbf{F}_{\alpha\beta}^{*} & \mathbf{F}_{\alpha\alpha}^{*} \end{array} \right) \left( \begin{array}{cc} 1 & 0 \\ 0 & -1 \end{array} \right) = \left( \begin{array}{cc} \mathbf{F}_{\alpha\alpha} & -\mathbf{F}_{\alpha\beta} \\ \mathbf{F}_{\alpha\beta}^{*} & \mathbf{F}_{\alpha\alpha}^{*} \end{array} \right) \ \ \ \ \ (26)\]

Which means the off diagonals must go to zero, giving our final result of paired UHF,

\[\left( \begin{array}{cc} \mathbf{F}_{\alpha\alpha} & \mathbf{0} \\ \mathbf{0} & \mathbf{F}_{\alpha\alpha}^{*} \end{array} \right) \ \ \ \ \ (27)\]

\({S_z}\),\({K}\). Rotation about spin z-axis and complex conjugation.

We do a similar thing as above for rotation about spin z-axis and complex conjugation. This one is particularly easy to show, starting from the results of complex conjugation symmetry.

Since symmetry with respect to \({K}\) forces all matrix elements to be real, we just get the real version of the results of symmetry with respect to \({S_z}\) — we get the real UHF equations!

\[\mathbf{F} = \left( \begin{array}{cc} \mathbf{F}_{\alpha\alpha} & \mathbf{0} \\ \mathbf{0} & \mathbf{F}_{\beta\beta} \end{array} \right), \qquad (\mathbf{F} \in \mathbb{R}) \ \ \ \ \ (28)\]

\({S^2}\),\({S_z}\). Rotation about all spin axes.

We finally move onto invariance with respect to rotations about all spin axes. Again, this is a little weird because these operations aren’t commutative, but we have already shown that insisting on invariance leads to a decoupling of the symmetry operations.

(I should mention that if you are still unsure of this, feel free to brute force through all orders of symmetry operations. You’ll see that it makes no difference.)

Two things worth mentioning here: first, technically \({S_z}\) is already a part of the total spin rotation group, so it’s a little weird to separate them into \({S^2}\) and \({S_z}\), but we understand this as drawing the distinction that you can be symmetric to \({S_z}\) but not \({S^2}\).

If you are invariant to \({S^2}\), though, you will be invariant to \({S_z}\).

Second, while \({S^2}\) does technically have an operator representation, it is not a symmetry operation. Think about the form of the operator \({S^2}\): it’s essentially the identity, right? So when we say invariant with respect to \({S^2}\), what we really mean is that we are invariant to the whole spin rotation group, \({S_x,S_y,S_z}\).

To show invariance with respect to the spin group, it suffices just to consider any two spin rotations, since each spin operator can be generated by the commutator of the other two.

You can show this using the Jacobi identity. Say we are looking the invariance of \({F}\) with respect to generators \({A}\), \({B}\), and \({C}\), and \({[A,B] = iC}\) (like our spin matrices do). We want to show

\[[F, A] = [F, B] = 0 \implies [F, C] = 0\]

The Jacobi identity tells us

\[[[A, [B, F]] + [B, [F, A]] + [F, [A, B]] = 0\]

Now, by definition the first two terms are zero, and we can evaluate the commutator of \({[A,B] = iC}\), which means if \({[F,A] = [F,B] = 0}\), then it must follow that \({[F,C] = 0}\) as well (the imaginary in front doesn’t make a difference; expand to see).

That being said, we can evaluate invariance with respect to all spin axes by using the results of \({S_z}\) and applying the generator \({S_x}\) defined by Pauli matrix \({\sigma_2}\) to it, where

\[\sigma_2 = \left( \begin{array}{cc} 0 & 1 \\ 1 & 0 \end{array} \right) \ \ \ \ \ (29)\]

Applying this gives

\[\left( \begin{array}{cc} 0 & 1 \\ 1 & 0 \end{array} \right)\left( \begin{array}{cc} \mathbf{F}_{\alpha\alpha} & \mathbf{0} \\ \mathbf{0} & \mathbf{F}_{\beta\beta} \end{array} \right)\left( \begin{array}{cc} 0 & 1 \\ 1 & 0 \end{array} \right) = \left( \begin{array}{cc} \mathbf{F}_{\beta\beta} & \mathbf{0} \\ \mathbf{0} & \mathbf{F}_{\alpha\alpha} \end{array} \right) \ \ \ \ \ (30)\]

Which means that \({\mathbf{F}_{\alpha\alpha} = \mathbf{F}_{\beta\beta}}\), or

\[\mathbf{F} = \left( \begin{array}{cc} \mathbf{F}_{R} & \mathbf{0} \\ \mathbf{0} & \mathbf{F}_{R} \end{array} \right), \qquad (\mathbf{F}_{R} \in \mathbb{C}) \ \ \ \ \ (31)\]

Invariance in this symmetry group results in the complex RHF equations, where the alpha and beta spin blocks are equivalent. Thus orbitals are doubly occupied.

\({S^2}\),\({S_z}\),\({\Theta}\),\({K}\). All spin rotations and time reversal.

Given the results we just obtained above, and understanding that time reversal contains the \({S_y}\) operator, we only need to take the previous results and make them invariant to complex conjugation. This is very simple, and we see that

\[K\left( \begin{array}{cc} \mathbf{F}_{R} & \mathbf{0} \\ \mathbf{0} & \mathbf{F}_{R} \end{array} \right)K = \left( \begin{array}{cc} \mathbf{F}_{R}^{*} & \mathbf{0} \\ \mathbf{0} & \mathbf{F}_{R}^{*} \end{array} \right) \ \ \ \ \ (32)\]

In other words, the real RHF equations, since invariance with respect to complex conjugation forces the elements to be real.

N.B. Most of this work was first done by Fukutome, though my notes are based off of Stuber and Paldus. The notes here agree with both authors, though the derivations here make a rather large departure from the literature. Thus, it is helpful to reference the following work:

Fukutome, Hideo. ‘‘Unrestricted Hartree-Fock theory and its applications to molecules and chemical reactions.” International Journal of Quantum Chemistry 20.5 (1981): 955-1065.

Stuber, J. L., and J. Paldus. ‘‘Symmetry breaking in the independent particle model.” Fundamental World of Quantum Chemistry, A Tribute Volume to the Memory of Per-Olov Löwdin 1 (2003): 67-139.

I’d strongly recommend looking at Stuber and Paldus’ work first. Their terminology is more consistent with mainstream electronic structure theory.

Linear Response Coupled Cluster

Linear response coupled cluster (LR-CC) is a way of getting electronic excitation energies at a coupled cluster level of theory. It’s also a general way of getting many response properties, which basically answer the question of “what do molecules do when you poke them”?

Here is a simple derivation of LR-CC, which I like because it parallels the derivation of linear response time-dependent DFT and Hartree Fock. There are much more robust ways to get these results, for example, propagator approaches have been particularly successful when applied to coupled cluster response theory, but I think it can mask a lot of the physics if you aren’t careful.

Anyway, here it is.

The coupled cluster ground state equations are given by

\[\hat{H}^{(0)} e^{\hat{T}^{(0)}} \vert \phi_0 \rangle = E_{CC}^{(0)} e^{T^{(0)}} \vert \phi_0 \rangle \ \ \ \ \ (1)\]

Since this is, by definition, a stationary state the following is equivalent

\[\hat{H}^{(0)} e^{\hat{T}^{(0)}} \vert \phi_0 \rangle e^{-iE_{CC}^{(0)}t} = E_{CC}^{(0)} e^{T^{(0)}} \vert \phi_0 \rangle e^{-iE_{CC}^{(0)}t} \ \ \ \ \ (2)\]

Since

\[\vert \phi(t) \rangle = \vert \phi(0)\rangle e^{-i E t} \ \ \ \ \ (3)\]

Now, perturbing the system with monochromatic light of frequency \({\omega}\) gives us the modified Hamiltonian (in the dipole approximation):

\[\hat{H} = \hat{H}^{(0)} + \hat{f}^{(1)}e^{-i\omega t} + \hat{f}^{(1)*}e^{i\omega t} \ \ \ \ \ (4)\]

This induces a response to the wavefunction, which is parameterized by the cluster operator \({\hat{T}}\).

\[\hat{T} = \hat{T}^{(0)} + \hat{R}^{(1)}e^{-i \omega t} + R^{(1)*}e^{i \omega t} + \cdots \ \ \ \ \ (5)\]

The extra terms are higher order responses in the cluster operator. I should also note that this means T and R are in the same excitation manifold: in other words, if T is a single excitation operator, so is R. Same for doubles, triples, etc. Anyway, we are only considering the linear response here. Furthermore, from here on out I won’t write the complex conjugate parts. Though they are still there (and I will indicate them by dots), ultimately we collect terms of only \({e^{-i \omega t}}\) to get our final equations. We could collect the conjugates as well, but would get the same equations in the end.

The time dependent Schrödinger equation is

\[\hat{H}\vert \psi \rangle = i \frac{\partial}{\partial t} \vert \psi \rangle \ \ \ \ \ (6)\]

Plugging in the above perturbed expressions yields

\[\left( \hat{H}^{(0)} + \hat{f}^{(1)}e^{-i \omega t} + \cdots \right) \left( e^{\hat{T}^{(0)} + \hat{R}^{(1)}e^{-i\omega t} + \cdots} \right) \vert \phi_0 \rangle e^{-i E_{CC}^{(0)} t} = i \frac{\partial}{\partial t} \left( e^{\hat{T}^{(0)} + \hat{R}^{(1)}e^{-i\omega t} + \cdots} \right) \vert \phi_0 \rangle e^{-i E_{CC}^{(0)} t} \ \ \ \ \ (7)\]

Since \({\hat{T^{(0)}}}\) and \({\hat{R}^{(1)}}\) commute, we can rewrite as

\[\left( \hat{H}^{(0)} + \hat{f}^{(1)}e^{-i \omega t} + \cdots \right) \left( e^{\hat{T}^{(0)}}e^{\hat{R}^{(1)}e^{-i\omega t} + \cdots} \right) \vert \phi_0 \rangle e^{-i E_{CC}^{(0)} t} = i \frac{\partial}{\partial t} \left( e^{\hat{T}^{(0)}}e^{\hat{R}^{(1)}e^{-i\omega t} + \cdots} \right) \vert \phi_0 \rangle e^{-i E_{CC}^{(0)} t} \ \ \ \ \ (8)\]

Expanding the exponential containing \({\hat{R}^{(1)}}\)

\[\left( \hat{H}^{(0)} + \hat{f}^{(1)}e^{-i \omega t} + \cdots \right) \left[e^{\hat{T}^{(0)}}\left(1 + \hat{R}^{(1)}e^{-i\omega t} + \cdots \right) \right] \vert \phi_0 \rangle e^{-i E_{CC}^{(0)} t} = i \frac{\partial}{\partial t} \left[e^{\hat{T}^{(0)}}\left(1 + \hat{R}^{(1)}e^{-i\omega t} + \cdots \right) \right] \vert \phi_0 \rangle e^{-i E_{CC}^{(0)} t} \ \ \ \ \ (9)\]

Finally perform the derivation, making sure not to forget the time dependent phase

\[\left( \hat{H}^{(0)} + \hat{f}^{(1)}e^{-i \omega t} + \cdots \right) \left[e^{\hat{T}^{(0)}}\left(1 + \hat{R}^{(1)}e^{-i\omega t} + \cdots \right) \right] \vert \phi_0 \rangle e^{-i E_{CC}^{(0)} t} = i e^{\hat{T}^{(0)}} \left(-i \omega {R}^{(1)}e^{-i\omega t} - i E_{CC}^{(0)} \hat{R}^{(1)} e^{-i \omega t} + \cdots \right) \vert \phi_0 \rangle e^{-i E_{CC}^{(0)} t} \ \ \ \ \ (10)\]

Collect terms with \({e^{-i\omega t}}\), cancel out phase factors

\[\hat{H}^{(0)}e^{\hat{T}^{(0)}} \hat{R}^{(1)} \vert \phi_0 \rangle + \hat{f}^{(1)} e^{\hat{T}^{(0)}} \vert \phi_0 \rangle = \omega e^{\hat{T}^{(0)}} \hat{R}^{(1)} \vert \phi_0 \rangle + E_{CC}^{(0)} e^{\hat{T}^{(0)}} \hat{R}^{(1)} \vert \phi_0 \rangle \ \ \ \ \ (11)\]

In linear response theory, we assume the perturbation is small and send \({\hat{f}^{(1)} \rightarrow 0}\).

\[\hat{H}^{(0)}e^{\hat{T}^{(0)}} \hat{R}^{(1)} \vert \phi_0 \rangle = (\omega + E_{CC}^{(0)}) e^{\hat{T}^{(0)}} \hat{R}^{(1)} \vert \phi_0 \rangle \ \ \ \ \ (12)\]

Thus the eigenvalues are just the ground state coupled cluster energy plus an excitation energy.

As an aside, it is possible (in fact, this is how LR-CC is done in practice) to get rid of the \({E_{CC}^{(0)}}\) entirely, so that you solve for the excitation energies directly. How? The simplest way to think about it is that when you evaluate the left hand side, you find that it contains expressions for the ground state energy (times \({R}\)), so if we leave out these terms when solving the equations, we can solve for the excitation energies directly. In other words we force the ground state energies on both sides to cancel by being careful how we set up the equation. And so we get an equation like so:

\[(e^{-\hat{T}^{(0)}}\hat{H}^{(0)}e^{\hat{T}^{(0)}})_c \hat{R}^{(1)} \vert \phi_0 \rangle = \omega \hat{R}^{(1)} \vert \phi_0 \rangle \ \ \ \ \ (13)\]

Here’s the more jargony answer: if you apply Wick’s theorem to the Hausdorff expansion of the similarity transformed coupled cluster Hamiltonian, you retain only terms that share a sum with each other. Diagrammatically, this means we keep only the connected diagrams. If you go ahead and derive all diagrams, connected and disconnected, you find that the disconnected diagrams correspond exactly to the terms you want to cancel on the right.