Davidson's Method

Many times in quantum chemistry you want the lowest few eigenvalues of very large matrices. For example, in my work with quantum dots, we want the first few excitation energies of our dots from a TDDFT calculation. Our smallest systems have 1000 basis functions with roughly 300 electrons, which requires creating and diagonalizing a 210,000 by 210,000 matrix (the dimension is number occupied orbitals times the number unoccupied orbitals). This is way too huge! For perspective, just storing this matrix on disk requires about 300 GB space.

Ernest Davidson ran into this problem decades ago working on diagonalizing the Configuration Interaction Hamiltonians, which are generally enormous. He came up with a method — the so-called Davidson method – which iteratively diagonalizes a subspace of the matrix instead of the whole thing, and gives you the first few lowest (or highest!) eigenvalues. It is much more cost-efficient, and actually doesn’t require you to create the whole matrix in the first place (it projects the matrix onto an appropriate subspace instead). The method turned out to work so well that quantum chemists adopted it and have used it ever since.

I wanted to try it out, so I implemented Davidson’s method on a Hermitian matrix (in Python, of course :)).

Let’s step through what I’ve done and see if it makes this method any clearer. The first bit simply creates our fake Hamiltonian.

  
sparsity = 0.000001  
A = np.zeros((n,n))  
for i in range(0,n):  
    A[i,i] = i + 1  
A = A + sparsity*np.random.randn(n,n)  
A = (A.T + A)/2  

While it may look arbitrary, take a closer look at the structure. First, it is diagonally dominant. The diagonal is filled with increasing integers, while the off-diagonals are random numbers multiplied by a scaling factor to “mute” them somewhat. This adds sparsity. Davidson’s method really excels with sparse, diagonally dominant matrices. This is actually very similar to the Hamiltonians we encounter as quantum chemists. If you scale the sparsity down and approach zero, the Davidson method speeds up fast. (But don’t set it to exactly zero or it will crash — in this case the matrix is already diagonalized!)

Next we set up our subspace “trial vectors”:

  
k = 8 # number of initial guess vectors  
eig = 4 # number of eignvalues to solve  
t = np.eye(n,k) # set of k unit vectors as guess  
V = np.zeros((n,n)) # array of zeros to hold guess vec  
I = np.eye(n) # identity matrix same dimen as A  

Since we are choosing to find the first four eigenvalues, we need at least four guess vectors k. In practice, we choose maybe twice to three times that, because we want to increase the span of our guess space. In other words, it helps us hone in on the appropriate eigenvectors faster. But don’t make the guess too big! If it gets too large, we basically end up diagonalizing the whole matrix — which we don’t want to do, since that is the whole point of Davidson’s method. Speaking of the span of the guess vectors, it is important to make a good initial guess. Because the matrix is diagonally dominant, I chose a set of unit vectors as my guess, which is a good since our matrix is so close to being scalar multiples of the identity matrix.

Finally we get to the meat of the main routine:

  
for m in range(k,mmax,k):
    if m <= k:
        for j in range(0,k):
            V[:,j] = t[:,j]/np.linalg.norm(t[:,j])
        theta_old = 1 
    elif m > k:
        theta_old = theta[:eig]
    V[:,:m],R = np.linalg.qr(V[:,:m])
    T = np.dot(V[:,:m].T,np.dot(A,V[:,:m]))
    THETA,S = np.linalg.eig(T)
    idx = THETA.argsort()
    theta = THETA[idx]
    s = S[:,idx]
    for j in range(0,k):
        w = np.dot((A - theta[j]*I),np.dot(V[:,:m],s[:,j])) 
        q = w/(theta[j]-A[j,j])
        V[:,(m+j)] = q
    norm = np.linalg.norm(theta[:eig] - theta_old)
    if norm < tol:
        break

(Thanks to Matthew Goldey for pointing out ways to speed up the code!)

We first check to see if this is our first iteration (e.g. m < k). If it is, we add our guess vectors to our set V, and set theta_old equal to one. theta_one equal to one is arbitrary, and is used to ensure we don’t “converge” on our first try (we can’t, since we have to compare at least two iterations to determine convergence). If this isn’t our first iteration, we set theta_old to our last iteration’s eigenvalues.

Next we ensure our vectors are orthonormal among each other with numpy’s QR decomposition routine. Vectors must be orthonormal or the routine will fail. Then we project our matrix A onto the subspace defined by our guess vectors, \(\mathbf{V}^{T}\mathbf{A}\mathbf{V}\). Then we diagonalize this matrix projection subspace, sort the eigenvalues to find the lowest few desired, and compute the residual r. The residual will be zero if the guess eigenvectors are the real eigenvectors. If the residual is lower than some criterion, then we quit. Else, we orthonormalize the eigenvectors from that iteration and add them to our guess vectors! In this way, our subspace grows each time we iterate…the hope is that each added vector will be in the span of the real eigenvectors. Once that happens, we will have solved the problem of getting the lowest few eigenvalues.

Here is some sample output:

davidson = [0.99999921 2.00000133 3.00000042 3.99999768] ; 0.6596 seconds  
numpy = [0.99999921 2.00000133 3.00000042 3.99999768] ; 1.7068 seconds

You can see it works! (And the eigenvalues are really similar to the integer values we put along the diagonal).

I’ve attached the full routine at the end. With a sparse enough matrix, I can beat numpy by about a second. Of course, comparisons aren’t really fair, since I make numpy compute all the eigenvalues. From what I know, this method has a lot of intricacies, and I am still learning many of them. If you know of any ways I can improve this code, let me know! Comments and messages are always appreciated.

#!/bin/python
from __future__ import division
from __future__ import print_function
import math
import numpy as np
import time

''' Block Davidson, Joshua Goings (2013)

    Block Davidson method for finding the first few
	lowest eigenvalues of a large, diagonally dominant,
    sparse Hermitian matrix (e.g. Hamiltonian)
'''

n = 1200					# Dimension of matrix
tol = 1e-8				# Convergence tolerance
mmax = n//2				# Maximum number of iterations	

''' Create sparse, diagonally dominant matrix A with 
	diagonal containing 1,2,3,...n. The eigenvalues
    should be very close to these values. You can 
    change the sparsity. A smaller number for sparsity
    increases the diagonal dominance. Larger values
    (e.g. sparsity = 1) create a dense matrix
'''

sparsity = 0.0001
A = np.zeros((n,n))
for i in range(0,n):
    A[i,i] = i + 1 
A = A + sparsity*np.random.randn(n,n) 
A = (A.T + A)/2 


k = 8					# number of initial guess vectors 
eig = 4					# number of eignvalues to solve 
t = np.eye(n,k)			# set of k unit vectors as guess
V = np.zeros((n,n))		# array of zeros to hold guess vec
I = np.eye(n)			# identity matrix same dimen as A

# Begin block Davidson routine

start_davidson = time.time()

for m in range(k,mmax,k):
    if m <= k:
        for j in range(0,k):
            V[:,j] = t[:,j]/np.linalg.norm(t[:,j])
        theta_old = 1 
    elif m > k:
        theta_old = theta[:eig]
    V[:,:m],R = np.linalg.qr(V[:,:m])
    T = np.dot(V[:,:m].T,np.dot(A,V[:,:m]))
    THETA,S = np.linalg.eig(T)
    idx = THETA.argsort()
    theta = THETA[idx]
    s = S[:,idx]
    for j in range(0,k):
        w = np.dot((A - theta[j]*I),np.dot(V[:,:m],s[:,j])) 
        q = w/(theta[j]-A[j,j])
        V[:,(m+j)] = q
    norm = np.linalg.norm(theta[:eig] - theta_old)
    if norm < tol:
        break

end_davidson = time.time()

# End of block Davidson. Print results.

print("davidson = ", theta[:eig],";",
    end_davidson - start_davidson, "seconds")

# Begin Numpy diagonalization of A

start_numpy = time.time()

E,Vec = np.linalg.eig(A)
E = np.sort(E)

end_numpy = time.time()

# End of Numpy diagonalization. Print results.

print("numpy = ", E[:eig],";",
     end_numpy - start_numpy, "seconds") 

Monte Carlo Simulation of the "Bookshelf" Problem

I had a friend share with me this probability problem her professor had posed to the class — and, yes, it was actually posed this colorfully:

“During a massive earthquake, all twelve of your books — of which five are mathematical, four are literary, and three are reference — are cast haphazardly across your living room. A fireman graciously put them back on the shelf while your wounds are being tended; however, he simply decides to randomly arrange them. What is the probability that the books are in natural sets; i.e., all mathematical books are adjacent, and all reference books are adjacent?”

Okay. Now, probability is one of those things I love doing, but am absolutely horrid at. I had no idea how to approach the problem, so I thought it would be a good candidate for a Monte Carlo simulation instead. The idea behind a Monte Carlo simulation comes from the frequentist school of probability. The idea is that the probability of an event happening is the relative frequency of that event happening in the limit of a large number of trials. For example, if we were to say that flipping a coin has a 50% probability of landing ‘heads’ and a 50% probability of landing ‘tails’, this would be equivalent to saying that if we were to flip a coin two million times, we’d expect it to land heads 1 million of those times. Now flipping a coin two million times (or more!) is tedious and terrible work for people, but computers excel at doing the same thing over and over again. So with a suitable random number generator, computers can accurately reproduce probabilities of events, especially if the exact probability is not as obvious as flipping a coin. Like arranging a bookshelf, say.

In the bookshelf problem, we can program our computer to take the 12 books, arrange them at random, and then count up the times it formed a ‘natural set’. Simply divide the number of times the bookshelf was ordered by the number of times we arranged the bookshelf, and we get a probability for an ordered bookshelf! Here is an example of how I solved it in Python:

First, import the random and itertools modules, to help us generate random numbers and iterate among the lists. Then define two functions: one to check if two lists (or parts of lists) have all the same elements, and one to generate random numbers from 1 to 12 inclusive.

#!/bin/python

from __future__ import division  
import random  
import itertools  
from time import time

# returns True if lists are the same, False if not  
def checkEqual(lst):  
    return lst[1:] == lst[:-1]

# Get random integer from [1,12]  
def rnum():  
    return random.randint(0,12)  

Then we build our bookshelf. You can see that — pre-earthquake — the books formed a so-called ‘natural set’. We initialize the total number of times we arrange our shelf, as well as the number of ordered arrangements, to zero. We haven’t done anything yet!

Then we begin the main loop. Each iteration of the loop does two big things: first, it randomly arranges the bookshelf, and two, it checks to see if it forms a natural set. The checking is where you see all the “if/elif/else” sections of code. Since there are only 6 ways we can form a natural set (3! ways to arrange 3 types of books), I just go ahead and check each case explicitly.

#!/bin/python

from __future__ import division
import random
import itertools
from time import time

# returns True if lists are the same, False if not
def checkEqual(lst):
  return lst[1:] == lst[:-1]

# Get random integer from [1,12]
def rnum():
  return random.randint(0,12)

# Build our bookshelf!
bookshelf = ['math','math','math','math','math','lit','lit','lit','lit','ref','ref','ref']
TOTAL = 0
ORDERED = 0
t0 = time()
for i in range(0,5000000):
  TOTAL += 1   # Count number of times we put the bookshelf back together
  pile = [] # Nothing in our pile yet!
  pile += bookshelf # Books falling off into our pile!
  new_shelf = [] #Now the shelf is empty!
  while len(pile) > 0:
    book = random.choice(pile) # Put the books on the shelf at random
    new_shelf.append(book)
    pile.remove(book) # remove book from pile once we put it on the shelf

  # At this point, we evaluate how the shelf is arranged. There are 3! ways
  # of arranging the shelf so that it it ordered, and we evaluate each one to see
  # if the new shelf fits the 'natural set' criteria.
  if checkEqual(new_shelf[:5]):
    if checkEqual(new_shelf[5:9]):
      if checkEqual(new_shelf[9:]):
        ORDERED +=1
    elif checkEqual(new_shelf[5:8]):
      if checkEqual(new_shelf[8:]):
        ORDERED +=1
  if checkEqual(new_shelf[:4]):
    if checkEqual(new_shelf[4:9]):
      if checkEqual(new_shelf[9:]):
        ORDERED +=1
    elif checkEqual(new_shelf[4:7]):
      if checkEqual(new_shelf[7:]):
        ORDERED +=1
  if checkEqual(new_shelf[:3]):
    if checkEqual(new_shelf[3:8]):
      if checkEqual(new_shelf[8:]):
        ORDERED +=1
    elif checkEqual(new_shelf[3:7]):
      if checkEqual(new_shelf[7:]):
        ORDERED +=1
t1 = time()
print 'ORDERED = ',ORDERED
print 'TOTAL = ',TOTAL
print 'PERCENT = ',(ORDERED/TOTAL)*100,'%'
print 'TIME = ',round(t1-t0,5),'seconds'

And we’re done! Running the code for a suitable number of iterations gives the probability that the books are ordered. More simulations gives a more accurate answer. We’d get the perfect answer if we ran this for an infinite amount of times. Running it 1 million times gave me a probability of 0.0228%. Of course, since they were random trials, this number may change a bit, and if you run the code, you may get a different number — though hopefully not too different!

So how does this compare? The exact answer is 0.02165%. We get this by treating the math books as one set, the literary books as one set, and the reference books as one set. For these sets, there are 3! ways to arrange them. Inside each set, there are 5! ways to arrange the math books, 4! ways to arrange the literary books, and 3! ways to arrange the reference books. Thus we have (3!5!4!3!) ways to arrange the books in a natural set. Ignoring order, since we have 12 books total, we have 12! ways to arrange the books in any order. This gives us our final probability of (3!5!4!3!)/(12!) or 0.02165%.

So our Monte Carlo simulation works, though it obviously does not get the exact answer. This is not surprising, considering we performed only 1 million trials — which seems like a lot, but for a computer, it isn’t. To really reproduce the data, we’d need the total number of trials to come close to the 12! we had in the exact denominator…about 479 million times. For many problems, a big challenge is determining how many trials you have to perform for confident results.

All said and done: don’t bet on that dashing firefighter putting your books back in order :)

UPDATE: I was still curious about how the simulation would run if we approached the total number of ways to arrange the books — that is, somewhere around 479 million trials. So I uploaded my program to our group’s computing clusters, and parallelized the run over 5 processors, each running 100 million trials. The results? Out of 500 million trials, 108084 of them were ordered. This gives us a probability of 0.021617%, which is accurate to +/- 0.0001%! Not bad at all. In case you were curious, each 100 million trials took about 45 minutes. Glad I ran them on separate nodes — otherwise I’d be waiting almost 4 hours! (Which admittedly is not very long in scientific computing…but considering the problem we are dealing with, even 45 minutes is excessive).

Coupled Cluster with Singles and Doubles (CCSD) in Python

Coupled cluster methods are among the most accurate electronic structure methods available today. For example, with a good choice for a basis, the CCSD(T) equations will give you the correct energy of a molecular system to chemical accuracy (~0.1 kcal/mol). They are also quite formidable to derive, and (in my opinion) to program. Which is why I only coded up CCSD!

Thankfully there are some wonderful resources available to understand the coupled cluster methods. I highly recommend Crawford and Schaefer’s “ An Introduction to Coupled Cluster Theory for Computational Chemists” (2000). I have yet to find a clearer and more complete explanation of coupled cluster theory. The derivations inside it are nasty, but once you get a grasp of diagrammatic techniques, it isn’t so bad :)

In order to understand coupled cluster a bit better, I recently programmed the CCSD energy and amplitude equations in Python. It is for a HeH+ molecule with a bond length of 0.9295 Angstrom, using an STO-3G basis — same system I’ve used before on this blog. The results match what Gaussian09 calculates as well, so I was pretty happy to see it work. As always, I’ve hard-coded the two-electron integrals and other SCF results into the program, so you can just focus on what CCSD does. The functions will look esoteric, and unless you’ve worked with coupled-cluster before, the program should NOT look intuitive or easy to understand — point is, don’t panic. But I’ve provided a reference Stanton (1991) that contains all the equations used. Between Stanton and Crawford, you can understand what is going on here. Read through the comments to get a better idea: the main idea is to take the results of an SCF calculation and apply them to a similarity transformation of the Hamiltonian. The transformed Hamiltonian now contains ‘excited’ determinants, which is the requirement for electron correlation — in other words you get a multi-reference quality calculation from a single reference (a Hartree-Fock calculation).

Anyway, here you have it:

(Special thanks to Junhao Li for pointing out bugs in the code! Everything should be correct now.)

#!/usr/bin/python

####################################
#
#  CCSD ENERGY CALCULATION ON 
#    HeH+ / STO-3G / R = 0.9295 Ang
#
#  Reference for Equations:
#    Stanton, et al
#      J. Chem. Phys 94 (6),
#      15 March 1991
#
####################################

from __future__ import division
import math
import numpy as np

####################################
#
#   FUNCTIONS
#
####################################

# Return compound index given four indices
def eint(a,b,c,d):
  if a > b: ab = a*(a+1)/2 + b
  else: ab = b*(b+1)/2 + a
  if c > d: cd = c*(c+1)/2 + d
  else: cd = d*(d+1)/2 + c
  if ab > cd: abcd = ab*(ab+1)/2 + cd
  else: abcd = cd*(cd+1)/2 + ab
  return abcd

# Return Value of spatial MO two electron integral
# Example: (12\vert 34) = tei(1,2,3,4)
def teimo(a,b,c,d):
  return ttmo.get(eint(a,b,c,d),0.0e0)

####################################
#
#  INITIALIZE ORBITAL ENERGIES 
#  AND TRANSFORMED TWO ELECTRON
#  INTEGRALS  
#
####################################

Nelec = 2 # we have 2 electrons in HeH+
dim = 2 # we have two spatial basis functions in STO-3G
E = [-1.52378656, -0.26763148] # molecular orbital energies
# python dictionary containing two-electron repulsion integrals
ttmo = {5.0: 0.94542695583037617, 12.0: 0.17535895381500544, 14.0: 0.12682234020148653, 17.0: 0.59855327701641903, 19.0: -0.056821143621433257, 20.0: 0.74715464784363106}
ENUC = 1.1386276671 # nuclear repulsion energy for HeH+ -- constant
EN   = -3.99300007772 # SCF energy
####################################################
#
#  CONVERT SPATIAL TO SPIN ORBITAL MO
#
####################################################

# This makes the spin basis double bar integral (physicists' notation)

spinints=np.zeros((dim*2,dim*2,dim*2,dim*2))
for p in range(1,dim*2+1):
  for q in range(1,dim*2+1):
    for r in range(1,dim*2+1):
      for s in range(1,dim*2+1):
        value1 = teimo((p+1)//2,(r+1)//2,(q+1)//2,(s+1)//2) * (p%2 == r%2) * (q%2 == s%2)
        value2 = teimo((p+1)//2,(s+1)//2,(q+1)//2,(r+1)//2) * (p%2 == s%2) * (q%2 == r%2)
        spinints[p-1,q-1,r-1,s-1] = value1 - value2

#####################################################
#
#  Spin basis fock matrix eigenvalues 
#
#####################################################

fs = np.zeros((dim*2))
for i in range(0,dim*2):
    fs[i] = E[i//2]
fs = np.diag(fs) # put MO energies in diagonal array

#######################################################
#
#   CCSD CALCULATION
#
#######################################################
dim = dim*2 # twice the dimension of spatial orbital

# Init empty T1 (ts) and T2 (td) arrays

ts = np.zeros((dim,dim))
td = np.zeros((dim,dim,dim,dim))

# Initial guess T2 --- from MP2 calculation!

for a in range(Nelec,dim):
  for b in range(Nelec,dim):
    for i in range(0,Nelec):
      for j in range(0,Nelec):
        td[a,b,i,j] += spinints[i,j,a,b]/(fs[i,i] + fs[j,j] - fs[a,a] - fs[b,b])

# Make denominator arrays Dai, Dabij
# Equation (12) of Stanton
Dai = np.zeros((dim,dim))
for a in range(Nelec,dim):
  for i in range(0,Nelec):
    Dai[a,i] = fs[i,i] - fs[a,a]

# Stanton eq (13)
Dabij = np.zeros((dim,dim,dim,dim))
for a in range(Nelec,dim):
  for b in range(Nelec,dim):
    for i in range(0,Nelec):
      for j in range(0,Nelec):
        Dabij[a,b,i,j] = fs[i,i] + fs[j,j] - fs[a,a] - fs[b,b]

# Stanton eq (9)
def taus(a,b,i,j):
  taus = td[a,b,i,j] + 0.5*(ts[a,i]*ts[b,j] - ts[b,i]*ts[a,j])
  return taus

# Stanton eq (10)
def tau(a,b,i,j):
  tau = td[a,b,i,j] + ts[a,i]*ts[b,j] - ts[b,i]*ts[a,j]
  return tau

# We need to update our intermediates at the beginning, and 
# at the end of each iteration. Each iteration provides a new
# guess at the amplitudes T1 (ts) and T2 (td), that *hopefully*
# converges to a stable, ground-state, solution.

def updateintermediates(x):
  if x == True:
    # Stanton eq (3)
    Fae = np.zeros((dim,dim))
    for a in range(Nelec,dim):
      for e in range(Nelec,dim):
        Fae[a,e] = (1 - (a == e))*fs[a,e]
        for m in range(0,Nelec):
          Fae[a,e] += -0.5*fs[m,e]*ts[a,m]
          for f in range(Nelec,dim):
            Fae[a,e] += ts[f,m]*spinints[m,a,f,e] 
            for n in range(0,Nelec):
              Fae[a,e] += -0.5*taus(a,f,m,n)*spinints[m,n,e,f]

    # Stanton eq (4)
    Fmi = np.zeros((dim,dim))
    for m in range(0,Nelec):
      for i in range(0,Nelec):
        Fmi[m,i] = (1 - (m == i))*fs[m,i]
        for e in range(Nelec,dim):
          Fmi[m,i] += 0.5*ts[e,i]*fs[m,e]
          for n in range(0,Nelec):
            Fmi[m,i] += ts[e,n]*spinints[m,n,i,e] 
            for f in range(Nelec,dim):
              Fmi[m,i] += 0.5*taus(e,f,i,n)*spinints[m,n,e,f]

    # Stanton eq (5)
    Fme = np.zeros((dim,dim))
    for m in range(0,Nelec):
      for e in range(Nelec,dim):
        Fme[m,e] = fs[m,e]
        for n in range(0,Nelec):
          for f in range(Nelec,dim):
            Fme[m,e] += ts[f,n]*spinints[m,n,e,f]

    # Stanton eq (6)
    Wmnij = np.zeros((dim,dim,dim,dim))
    for m in range(0,Nelec):
      for n in range(0,Nelec):
        for i in range(0,Nelec):
          for j in range(0,Nelec):
            Wmnij[m,n,i,j] = spinints[m,n,i,j]
            for e in range(Nelec,dim):
              Wmnij[m,n,i,j] += ts[e,j]*spinints[m,n,i,e] - ts[e,i]*spinints[m,n,j,e]
              for f in range(Nelec,dim):
                Wmnij[m,n,i,j] += 0.25*tau(e,f,i,j)*spinints[m,n,e,f]

    # Stanton eq (7)
    Wabef = np.zeros((dim,dim,dim,dim))
    for a in range(Nelec,dim):
      for b in range(Nelec,dim):
        for e in range(Nelec,dim):
          for f in range(Nelec,dim):
            Wabef[a,b,e,f] = spinints[a,b,e,f]
            for m in range(0,Nelec):
              Wabef[a,b,e,f] += -ts[b,m]*spinints[a,m,e,f] + ts[a,m]*spinints[b,m,e,f]
              for n in range(0,Nelec):
                Wabef[a,b,e,f] += 0.25*tau(a,b,m,n)*spinints[m,n,e,f]

    # Stanton eq (8)
    Wmbej = np.zeros((dim,dim,dim,dim))
    for m in range(0,Nelec):
      for b in range(Nelec,dim):
        for e in range(Nelec,dim):
          for j in range(0,Nelec):
            Wmbej[m,b,e,j] = spinints[m,b,e,j]
            for f in range(Nelec,dim):
              Wmbej[m,b,e,j] += ts[f,j]*spinints[m,b,e,f]
            for n in range(0,Nelec):
              Wmbej[m,b,e,j] += -ts[b,n]*spinints[m,n,e,j]
              for f in range(Nelec,dim):
                Wmbej[m,b,e,j] += -(0.5*td[f,b,j,n] + ts[f,j]*ts[b,n])*spinints[m,n,e,f]

    return Fae, Fmi, Fme, Wmnij, Wabef, Wmbej

# makeT1 and makeT2, as they imply, construct the actual amplitudes necessary for computing
# the CCSD energy (or computing an EOM-CCSD Hamiltonian, etc)

# Stanton eq (1)
def makeT1(x,ts,td):
  if x == True:
    tsnew = np.zeros((dim,dim))
    for a in range(Nelec,dim):
      for i in range(0,Nelec):
        tsnew[a,i] = fs[i,a]
        for e in range(Nelec,dim):
          tsnew[a,i] += ts[e,i]*Fae[a,e]
        for m in range(0,Nelec):
          tsnew[a,i] += -ts[a,m]*Fmi[m,i]
          for e in range(Nelec,dim):
            tsnew[a,i] += td[a,e,i,m]*Fme[m,e]
            for f in range(Nelec,dim):
              tsnew[a,i] += -0.5*td[e,f,i,m]*spinints[m,a,e,f]
            for n in range(0,Nelec):
              tsnew[a,i] += -0.5*td[a,e,m,n]*spinints[n,m,e,i]
        for n in range(0,Nelec):
          for f in range(Nelec,dim): 
            tsnew[a,i] += -ts[f,n]*spinints[n,a,i,f]
        tsnew[a,i] = tsnew[a,i]/Dai[a,i]
  return tsnew

# Stanton eq (2)
def makeT2(x,ts,td):
  if x == True:
    tdnew = np.zeros((dim,dim,dim,dim))
    for a in range(Nelec,dim):
      for b in range(Nelec,dim):
        for i in range(0,Nelec):
          for j in range(0,Nelec):
            tdnew[a,b,i,j] += spinints[i,j,a,b]
            for e in range(Nelec,dim):
              tdnew[a,b,i,j] += td[a,e,i,j]*Fae[b,e] - td[b,e,i,j]*Fae[a,e]
              for m in range(0,Nelec):
                tdnew[a,b,i,j] += -0.5*td[a,e,i,j]*ts[b,m]*Fme[m,e] + 0.5*td[b,e,i,j]*ts[a,m]*Fme[m,e]
                continue
            for m in range(0,Nelec):
              tdnew[a,b,i,j] += -td[a,b,i,m]*Fmi[m,j] + td[a,b,j,m]*Fmi[m,i]
              for e in range(Nelec,dim):
                tdnew[a,b,i,j] += -0.5*td[a,b,i,m]*ts[e,j]*Fme[m,e] + 0.5*td[a,b,j,m]*ts[e,i]*Fme[m,e]
                continue
            for e in range(Nelec,dim):
              tdnew[a,b,i,j] += ts[e,i]*spinints[a,b,e,j] - ts[e,j]*spinints[a,b,e,i]
              for f in range(Nelec,dim):
                tdnew[a,b,i,j] += 0.5*tau(e,f,i,j)*Wabef[a,b,e,f]
                continue
            for m in range(0,Nelec):
              tdnew[a,b,i,j] += -ts[a,m]*spinints[m,b,i,j] + ts[b,m]*spinints[m,a,i,j]  
              for e in range(Nelec,dim):
                tdnew[a,b,i,j] +=  td[a,e,i,m]*Wmbej[m,b,e,j] - ts[e,i]*ts[a,m]*spinints[m,b,e,j]
                tdnew[a,b,i,j] += -td[a,e,j,m]*Wmbej[m,b,e,i] + ts[e,j]*ts[a,m]*spinints[m,b,e,i]
                tdnew[a,b,i,j] += -td[b,e,i,m]*Wmbej[m,a,e,j] + ts[e,i]*ts[b,m]*spinints[m,a,e,j]
                tdnew[a,b,i,j] +=  td[b,e,j,m]*Wmbej[m,a,e,i] - ts[e,j]*ts[b,m]*spinints[m,a,e,i]
                continue
              for n in range(0,Nelec):
                tdnew[a,b,i,j] += 0.5*tau(a,b,m,n)*Wmnij[m,n,i,j]
                continue
            tdnew[a,b,i,j] = tdnew[a,b,i,j]/Dabij[a,b,i,j] 
    return tdnew

# Expression from Crawford, Schaefer (2000) 
# DOI: 10.1002/9780470125915.ch2
# Equation (134) and (173)
# computes CCSD energy given T1 and T2
def ccsdenergy():
  ECCSD = 0.0
  for i in range(0,Nelec):
    for a in range(Nelec,dim):
      ECCSD += fs[i,a]*ts[a,i]
      for j in range(0,Nelec):
        for b in range(Nelec,dim):
          ECCSD += 0.25*spinints[i,j,a,b]*td[a,b,i,j] + 0.5*spinints[i,j,a,b]*(ts[a,i])*(ts[b,j]) 
  return ECCSD

#================
# MAIN LOOP
# CCSD iteration
#================
ECCSD = 0
DECC = 1.0
while DECC > 0.000000001: # arbitrary convergence criteria
  OLDCC = ECCSD
  Fae,Fmi,Fme,Wmnij,Wabef,Wmbej = updateintermediates(True)
  tsnew = makeT1(True,ts,td)
  tdnew = makeT2(True,ts,td)
  ts = tsnew
  td = tdnew
  ECCSD = ccsdenergy()
  DECC = abs(ECCSD - OLDCC)

print "E(corr,CCSD) = ", ECCSD
print "E(CCSD) = ", ECCSD + ENUC + EN

Boom: -2.8626 Hartrees :)

Equation of Motion (EOM) derivation of RPA

The Equation of Motion derivations of excited state and response properties are elegant, and (in my opinion) very direct. Here I’ll give an example of how it can be used to derive TDHF/RPA. We’ve already done it two other ways, and the agreement between them is important if we wish to extend these methods further.

Given an exact ground state, we can say that

\[H \vert n \rangle = E_n \vert n \rangle \ \ \ \ \\]

Define operator \({Q_n^{\dagger}}\) and \({Q_n}\):
\(\vert n \rangle = Q_n^{\dagger} \vert 0 \rangle, \qquad Q_n\vert 0 \rangle = 0 \implies Q_n^{\dagger} = \vert n \rangle \langle 0 \vert \ \ \ \ \\)

These operators generate excited states from the ground state (not excited determinants, as in the case of post-HF correlation methods). So it is clear that, when acting on an exact ground state:

\[[H, Q_n^{\dagger}]\vert 0 \rangle = (E_n - E_0)Q_n^{\dagger} \vert 0 \rangle = \hbar\omega_{0n} Q_n^{\dagger} \vert 0 \rangle\]

Multiply on left by arbitrary state of form \({\langle 0 \vert \delta Q}\), giving
\(\langle 0 \vert [\delta Q, [H, Q_n^{\dagger}]]\vert 0 \rangle = \hbar\omega_{0n} \langle 0 \vert [\delta Q, Q_n^{\dagger}] \vert 0 \rangle \ \ \ \ \\)

Where we have made use of the fact that \({\langle 0 \vert Q_n^{\dagger} = \langle 0 \vert HQ_n^{\dagger} = 0}\). Note that is we express \({Q}\) by particle-hole operators \({a_p^{\dagger}a_q}\), \({a_p^{\dagger}a_q^{\dagger}a_r a_s}\) with coefficients \({C_{pq}}\) and \({C_{pqrs}}\), then \({\delta Q}\) is given by \({\frac{\partial Q}{\partial C} \delta C}\) for arbitrary variations \({\delta C}\). These are in principle exact, since \({\delta Q \vert 0 \rangle}\) exhausts the whole Hilbert space, such that the above equation corresponds to the full Schrödinger equation. Tamm-Dancoff (or Configuration Interaction Singles) can be obtained by approximating \({\vert 0 \rangle \rightarrow \vert \hbox{HF} \rangle}\) and the operator \({Q_n^{\dagger} = \sum\limits_{ia} C_{ia} a_a^{\dagger}a_i}\), restricting ourselves to 1p-1h excitations. Thus \({\delta Q \vert 0 \rangle = \sum_{ia} a_a^{\dagger}a_i \vert \hbox{HF} \rangle \delta C_{ai}}\), (\({\delta C_{ai}}\) cancels), and
\(\sum\limits_{bj} \langle \hbox{HF} \vert [a_i^{\dagger}a_a, [H, a_b^{\dagger}a_j]]\vert \hbox{HF} \rangle C_{jb} = \hbar\omega \langle \hbox{HF} \vert [a_i^{\dagger}a_a, a_a^{\dagger}a_i] \vert \hbox{HF} \rangle C_{ia} \ \ \ \ \\)

These are the CIS equations. Put another way:
\(\sum\limits_{bj} \left\{(\epsilon_a - \epsilon_i)\delta_{ab}\delta_{ij} + \langle aj \vert \vert ib \rangle \right\} C_{bj} = E^{CIS}C_{ai} \ \ \ \ \\)

Similarly, for RPA/TDHF, if we consider a ground state containing 2p-2h correlations, we can not only create a p-h pair, but also destroy one. Thus (choosing the minus sign for convenience):
\(Q_n^{\dagger} = \sum\limits_{ia} X_{ia} a_a^{\dagger}a_i - \sum\limits_{ia} Y_{ia} a_i^{\dagger}a_a, \qquad \hbox{and} Q_n \vert RPA \rangle = 0 \ \ \ \ \\)

So instead of the basis of only single excitations, and therefore one matrix \({C_{ia}}\), we work in a basis of single excitations and single de-excitations, and have two matrices \({X_{ia}}\) and \({Y_{ia}}\). We also have two kinds of variations \({\delta Q \vert 0 \rangle}\), namely \({a_a^{\dagger}a_i \vert 0 \rangle}\) and \({a_i^{\dagger}a_a \vert 0 \rangle}\). This gives us two sets of equations:
\(\langle \hbox{RPA} \vert [a_i^{\dagger}a_a, [H,Q_n^{\dagger}]]\vert \hbox{RPA} \rangle = \hbar \omega \langle \hbox{RPA} \vert [a_i^{\dagger}a_a, Q_n^{\dagger}] \vert \hbox{RPA} \rangle \notag \ \ \ \ \\)

\[\langle \hbox{RPA} \vert [a_a^{\dagger}a_i, [H,Q_n^{\dagger}]]\vert \hbox{RPA} \rangle = \hbar \omega \langle \hbox{RPA} \vert [a_a^{\dagger}a_i, Q_n^{\dagger}] \vert \hbox{RPA} \rangle \ \ \ \ \\]

These contain only expectation values of our four Fermion operators, which cannot be calculated since we still do not know \({\vert \hbox{RPA} \rangle}\). Thus we assume \({\vert \hbox{RPA} \rangle \rightarrow \vert \hbox{HF} \rangle}\). This gives
\(\langle \hbox{RPA} \vert [a_i^{\dagger}a_a, a_b^{\dagger}a_j] \vert \hbox{RPA} \rangle = \langle \hbox{HF} \vert [a_i^{\dagger}a_a, a_b^{\dagger}a_j] \vert \hbox{HF} \rangle = \delta_{ij}\delta_{ab} \ \ \ \ \\)

The probability of finding states \({a_a^{\dagger}a_i \vert 0 \rangle}\) and \({a_i^{\dagger}a_a \vert 0 \rangle}\) in excited state \({\vert n\rangle}\), that is, the p-h and h-p matrix elements of transition density matrix \({\rho^{(1)}}\) are:
\(\rho^{(1)}_{ai} = \langle 0 \vert a_i^{\dagger}a_a \vert n \rangle \simeq \langle \hbox{HF} \vert [a_i^{\dagger}a_a, Q_n^{\dagger}] \vert \hbox{HF} \rangle = X_{ia} \ \ \ \ \\)

\[\rho^{(1)}_{ia} = \langle 0 \vert a_a^{\dagger}a_i \vert n \rangle \simeq \langle \hbox{HF} \vert [a_a^{\dagger}a_i, Q_n^{\dagger}] \vert \hbox{HF} \rangle = Y_{ia} \ \ \ \ \\]

Thus altogether
\(\begin{bmatrix} \mathbf{A} & \mathbf{B} \\ \mathbf{B}^* & \mathbf{A}^* \\ \end{bmatrix} \begin{bmatrix} \mathbf{X} \\ \mathbf{Y} \\ \end{bmatrix} = \hbar\omega \begin{bmatrix} \mathbf{1} & \mathbf{0} \\ \mathbf{0} & \mathbf{-1} \\ \end{bmatrix} \begin{bmatrix} \mathbf{X} \\ \mathbf{Y} \\ \end{bmatrix} \ \ \ \ \\)

with
\(A_{ia,jb} = \langle \hbox{HF} \vert [a_i^{\dagger}a_a, [H,a_b^{\dagger}a_j]]\vert \hbox{HF} \rangle \ \ \ \ \\)

and
\(B_{ia,jb} = - \langle \hbox{HF} \vert [a_i^{\dagger}a_a, [H,a_j^{\dagger}a_b]]\vert \hbox{HF} \rangle \ \ \ \ \\)

which are the TDHF/RPA equations.

Polarization Propagator Derivation of TDHF/RPA

Following immediately where we left off last time, when \({A}\) and \({B}\) are one-electron number conserving, e.g. creation/annihilation operators are in pairs, we have

\[A = \sum\limits_{rs} A_{rs}a_r^{\dagger}a_s, \qquad B = \sum\limits_{rs} B_{rs}a^{\dagger}_ra_s \ \ \ \ \ (28)\]

which allow us to define the general polarization propagator. If the elements did not conserve the number of electrons, we could use the same formalism to get the electron propagator, which can describe the attachment/detachment of electrons from a system, thus describing processes such as ionization. In general, it is sufficient to consider the typical elements \({A \rightarrow a_r^{\dagger}a_s}\) and \({B \rightarrow a_t^{\dagger}a_u}\). If spin-orbitals of reference are \({\psi_i, \psi_j}\),…, then a particle-hole basis includes only excitation and de-excitation operators \({a_a^{\dagger}a_i}\) and \({a_i^{\dagger}a_a}\), respectively. Thus the basis and its dual:
\(\mathbf{n} = (\mathbf{e} \quad \mathbf{d}), \qquad \mathbf{n}^{\dagger} = (\mathbf{e} \quad \mathbf{d})^{\dagger} \ \ \ \ \\)
\(e_{ia} = a_a^{\dagger}a_i = E_{ia}, \qquad d_{ai} = a_i^{\dagger}a_a = E_{ia}^{\dagger} \ \ \ \ \ (29)\)

This gives, for our resolvent,
\(R(\omega) = \left[\mathbf{n}^{\dagger}(\hbar\omega\hat{1} - \hat{H})\mathbf{n}\right]^{-1} = \begin{bmatrix} \mathbf{e}^{\dagger}(\hbar\omega\hat{1} - \hat{H})\mathbf{e} & \mathbf{e}^{\dagger}(\hbar\omega\hat{1} - \hat{H})\mathbf{d} \\ \mathbf{d}^{\dagger}(\hbar\omega\hat{1} - \hat{H})\mathbf{e} & \mathbf{d}^{\dagger}(\hbar\omega\hat{1} - \hat{H})\mathbf{d} \\ \end{bmatrix}^{-1} \ \ \ \ \ (30)\)

As an example, we give the elements of the first block:
\(E^{\dagger}_{ia}(\hbar\omega\hat{1} - \hat{H})E_{jb} = \hbar\omega E^{\dagger}_{ia} E_{jb} - E^{\dagger}_{ia} \left[H,E_{jb}\right] \ \ \ \ \ (31)\)

with scalar products
\(E^{\dagger}_{ia} E_{jb} = \langle \psi_0 \vert [E^{\dagger}_{ia},E_{jb}]\vert \psi_0 \rangle, \qquad E^{\dagger}_{ia} \left[H,E_{jb}\right] = \langle \psi_0 \vert [E^{\dagger}_{ia},[H,E_{jb}]]\vert \psi_0 \rangle \ \ \ \ \ (32)\)

Evaluating these products gives
\(E^{\dagger}_{ia} E_{jb} = \langle \psi_0 \vert [E^{\dagger}_{ia},E_{jb}]\vert \psi_0 \rangle = \langle \psi_0 \vert E^{\dagger}_{ia}E_{jb}\vert \psi_0 \rangle - \langle \psi_0 \vert E_{jb}E^{\dagger}_{ia}\vert \psi_0 \rangle = \langle \psi_0 \vert E^{\dagger}_{ia}E_{jb}\vert \psi_0 \rangle = \delta_{ia,jb} \ \ \ \ \ (33)\)

and

\(E^{\dagger}_{ia} \left[H,E_{jb}\right] = \langle \psi_0 \vert [E^{\dagger}_{ia},[H,E_{jb}]]\vert \psi_0 \rangle \ \ \ \ \\)
\(E^{\dagger}_{ia} \left[H,E_{jb}\right] = \langle \psi_0 \vert E^{\dagger}_{ia} H E_{jb}\vert \psi_0 \rangle - \langle \psi_0 \vert E^{\dagger}_{ia}E_{jb} H\vert \psi_0 \rangle - \langle \psi_0 \vert H E_{jb} E^{\dagger}_{ia}\vert \psi_0 \rangle + \langle \psi_0 \vert E_{jb} H E^{\dagger}_{ia} \vert \psi_0 \rangle \ \ \ \ \\)
\(E^{\dagger}_{ia} \left[H,E_{jb}\right] = \langle \psi^a_i \vert H \vert \psi_j^b \rangle - \delta_{ia}\delta_{jb}\langle \psi_0 \vert H \vert \psi_0 \rangle \ \ \ \ \\)
\(E^{\dagger}_{ia} \left[H,E_{jb}\right] = E_0\delta_{ij}\delta_{ab} + F_{ab}\delta_{ij} - F_{ij}\delta_{ab} + \langle aj \vert \vert ib \rangle - \delta_{ia}\delta{jb}E_0 \ \ \ \ \\)
\(E^{\dagger}_{ia} \left[H,E_{jb}\right] = (\epsilon_a - \epsilon_i)\delta_{ia}\delta_{ab} + \langle aj \vert \vert ib \rangle = \mathbf{A} \ \ \ \ \\)
Which is precisely the same elements as we have derived earlier for TDHF. Completing the rest, we have:
\(R(\omega) = M(\omega)^{-1} = \begin{bmatrix} \mathbf{A} & \mathbf{B} \\ \mathbf{B}^* & \mathbf{A}^* \\ \end{bmatrix} - \hbar\omega \begin{bmatrix} \mathbf{1} & \mathbf{0} \\ \mathbf{0} & \mathbf{-1} \\ \end{bmatrix} \ \ \ \ \ (34)\)

With \({\mathbf{B} = \langle ab \vert \vert ij \rangle}\). Now, we saw from the definition of the resolvent that \({R(\omega) \rightarrow \infty}\) at \({\omega \rightarrow \omega_{0n}}\), which are the poles of \({R(\omega)}\). Therefore, \({R(\omega)^{-1} \rightarrow 0}\) and
\(\begin{bmatrix} \mathbf{A} & \mathbf{B} \\ \mathbf{B}^* & \mathbf{A}^* \\ \end{bmatrix} \begin{bmatrix} \mathbf{X} \\ \mathbf{Y} \\ \end{bmatrix} = \hbar\omega \begin{bmatrix} \mathbf{1} & \mathbf{0} \\ \mathbf{0} & \mathbf{-1} \\ \end{bmatrix} \begin{bmatrix} \mathbf{X} \\ \mathbf{Y} \\ \end{bmatrix} \ \ \ \ \ (35)\)

which are the RPA/TDHF equations. The eigencolumns determine the linear combinations of excitation and de-excitation operators that produce corresponding approximate excited states when working on the reference state \({\vert \psi_0 \rangle}\).