Many times in quantum chemistry you want the lowest few eigenvalues of very large matrices. For example, in my work with quantum dots, we want the first few excitation energies of our dots from a TDDFT calculation. Our smallest systems have 1000 basis functions with roughly 300 electrons, which requires creating and diagonalizing a 210,000 by 210,000 matrix (the dimension is number occupied orbitals times the number unoccupied orbitals). This is way too huge! For perspective, just storing this matrix on disk requires about 300 GB space.
Ernest Davidson ran into this problem decades ago working on diagonalizing the Configuration Interaction Hamiltonians, which are generally enormous. He came up with a method — the so-called Davidson method – which iteratively diagonalizes a subspace of the matrix instead of the whole thing, and gives you the first few lowest (or highest!) eigenvalues. It is much more cost-efficient, and actually doesn’t require you to create the whole matrix in the first place (it projects the matrix onto an appropriate subspace instead). The method turned out to work so well that quantum chemists adopted it and have used it ever since.
I wanted to try it out, so I implemented Davidson’s method on a Hermitian matrix (in Python, of course :)).
Let’s step through what I’ve done and see if it makes this method any clearer. The first bit simply creates our fake Hamiltonian.
While it may look arbitrary, take a closer look at the structure. First, it is diagonally dominant. The diagonal is filled with increasing integers, while the off-diagonals are random numbers multiplied by a scaling factor to “mute” them somewhat. This adds sparsity. Davidson’s method really excels with sparse, diagonally dominant matrices. This is actually very similar to the Hamiltonians we encounter as quantum chemists. If you scale the sparsity down and approach zero, the Davidson method speeds up fast. (But don’t set it to exactly zero or it will crash — in this case the matrix is already diagonalized!)
Next we set up our subspace “trial vectors”:
Since we are choosing to find the first four eigenvalues, we need at least four guess vectors k. In practice, we choose maybe twice to three times that, because we want to increase the span of our guess space. In other words, it helps us hone in on the appropriate eigenvectors faster. But don’t make the guess too big! If it gets too large, we basically end up diagonalizing the whole matrix — which we don’t want to do, since that is the whole point of Davidson’s method. Speaking of the span of the guess vectors, it is important to make a good initial guess. Because the matrix is diagonally dominant, I chose a set of unit vectors as my guess, which is a good since our matrix is so close to being scalar multiples of the identity matrix.
Finally we get to the meat of the main routine:
(Thanks to Matthew Goldey for pointing out ways to speed up the code!)
We first check to see if this is our first iteration (e.g. m < k). If it is, we add our guess vectors to our set V, and set theta_old equal to one. theta_one equal to one is arbitrary, and is used to ensure we don’t “converge” on our first try (we can’t, since we have to compare at least two iterations to determine convergence). If this isn’t our first iteration, we set theta_old to our last iteration’s eigenvalues.
Next we ensure our vectors are orthonormal among each other with numpy’s QR decomposition routine. Vectors must be orthonormal or the routine will fail. Then we project our matrix A onto the subspace defined by our guess vectors, \(\mathbf{V}^{T}\mathbf{A}\mathbf{V}\). Then we diagonalize this matrix projection subspace, sort the eigenvalues to find the lowest few desired, and compute the residual r. The residual will be zero if the guess eigenvectors are the real eigenvectors. If the residual is lower than some criterion, then we quit. Else, we orthonormalize the eigenvectors from that iteration and add them to our guess vectors! In this way, our subspace grows each time we iterate…the hope is that each added vector will be in the span of the real eigenvectors. Once that happens, we will have solved the problem of getting the lowest few eigenvalues.
Here is some sample output:
You can see it works! (And the eigenvalues are really similar to the integer values we put along the diagonal).
I’ve attached the full routine at the end. With a sparse enough matrix, I can beat numpy by about a second. Of course, comparisons aren’t really fair, since I make numpy compute all the eigenvalues. From what I know, this method has a lot of intricacies, and I am still learning many of them. If you know of any ways I can improve this code, let me know! Comments and messages are always appreciated.
I had a friend share with me this probability problem her professor had posed to the class — and, yes, it was actually posed this colorfully:
“During a massive earthquake, all twelve of your books — of which five are mathematical, four are literary, and three are reference — are cast haphazardly across your living room. A fireman graciously put them back on the shelf while your wounds are being tended; however, he simply decides to randomly arrange them. What is the probability that the books are in natural sets; i.e., all mathematical books are adjacent, and all reference books are adjacent?”
Okay. Now, probability is one of those things I love doing, but am absolutely horrid at. I had no idea how to approach the problem, so I thought it would be a good candidate for a Monte Carlo simulation instead. The idea behind a Monte Carlo simulation comes from the frequentist school of probability. The idea is that the probability of an event happening is the relative frequency of that event happening in the limit of a large number of trials. For example, if we were to say that flipping a coin has a 50% probability of landing ‘heads’ and a 50% probability of landing ‘tails’, this would be equivalent to saying that if we were to flip a coin two million times, we’d expect it to land heads 1 million of those times. Now flipping a coin two million times (or more!) is tedious and terrible work for people, but computers excel at doing the same thing over and over again. So with a suitable random number generator, computers can accurately reproduce probabilities of events, especially if the exact probability is not as obvious as flipping a coin. Like arranging a bookshelf, say.
In the bookshelf problem, we can program our computer to take the 12 books, arrange them at random, and then count up the times it formed a ‘natural set’. Simply divide the number of times the bookshelf was ordered by the number of times we arranged the bookshelf, and we get a probability for an ordered bookshelf! Here is an example of how I solved it in Python:
First, import the random and itertools modules, to help us generate random numbers and iterate among the lists. Then define two functions: one to check if two lists (or parts of lists) have all the same elements, and one to generate random numbers from 1 to 12 inclusive.
Then we build our bookshelf. You can see that — pre-earthquake — the books formed a so-called ‘natural set’. We initialize the total number of times we arrange our shelf, as well as the number of ordered arrangements, to zero. We haven’t done anything yet!
Then we begin the main loop. Each iteration of the loop does two big things: first, it randomly arranges the bookshelf, and two, it checks to see if it forms a natural set. The checking is where you see all the “if/elif/else” sections of code. Since there are only 6 ways we can form a natural set (3! ways to arrange 3 types of books), I just go ahead and check each case explicitly.
And we’re done! Running the code for a suitable number of iterations gives the probability that the books are ordered. More simulations gives a more accurate answer. We’d get the perfect answer if we ran this for an infinite amount of times. Running it 1 million times gave me a probability of 0.0228%. Of course, since they were random trials, this number may change a bit, and if you run the code, you may get a different number — though hopefully not too different!
So how does this compare? The exact answer is 0.02165%. We get this by treating the math books as one set, the literary books as one set, and the reference books as one set. For these sets, there are 3! ways to arrange them. Inside each set, there are 5! ways to arrange the math books, 4! ways to arrange the literary books, and 3! ways to arrange the reference books. Thus we have (3!5!4!3!) ways to arrange the books in a natural set. Ignoring order, since we have 12 books total, we have 12! ways to arrange the books in any order. This gives us our final probability of (3!5!4!3!)/(12!) or 0.02165%.
So our Monte Carlo simulation works, though it obviously does not get the exact answer. This is not surprising, considering we performed only 1 million trials — which seems like a lot, but for a computer, it isn’t. To really reproduce the data, we’d need the total number of trials to come close to the 12! we had in the exact denominator…about 479 million times. For many problems, a big challenge is determining how many trials you have to perform for confident results.
All said and done: don’t bet on that dashing firefighter putting your books back in order :)
UPDATE: I was still curious about how the simulation would run if we approached the total number of ways to arrange the books — that is, somewhere around 479 million trials. So I uploaded my program to our group’s computing clusters, and parallelized the run over 5 processors, each running 100 million trials. The results? Out of 500 million trials, 108084 of them were ordered. This gives us a probability of 0.021617%, which is accurate to +/- 0.0001%! Not bad at all. In case you were curious, each 100 million trials took about 45 minutes. Glad I ran them on separate nodes — otherwise I’d be waiting almost 4 hours! (Which admittedly is not very long in scientific computing…but considering the problem we are dealing with, even 45 minutes is excessive).
Coupled cluster methods are among the most accurate electronic structure methods available today. For example, with a good choice for a basis, the CCSD(T) equations will give you the correct energy of a molecular system to chemical accuracy (~0.1 kcal/mol). They are also quite formidable to derive, and (in my opinion) to program. Which is why I only coded up CCSD!
Thankfully there are some wonderful resources available to understand the coupled cluster methods. I highly recommend Crawford and Schaefer’s “ An Introduction to Coupled Cluster Theory for Computational Chemists” (2000). I have yet to find a clearer and more complete explanation of coupled cluster theory. The derivations inside it are nasty, but once you get a grasp of diagrammatic techniques, it isn’t so bad :)
In order to understand coupled cluster a bit better, I recently programmed the CCSD energy and amplitude equations in Python. It is for a HeH+ molecule with a bond length of 0.9295 Angstrom, using an STO-3G basis — same system I’ve used before on this blog. The results match what Gaussian09 calculates as well, so I was pretty happy to see it work. As always, I’ve hard-coded the two-electron integrals and other SCF results into the program, so you can just focus on what CCSD does. The functions will look esoteric, and unless you’ve worked with coupled-cluster before, the program should NOT look intuitive or easy to understand — point is, don’t panic. But I’ve provided a reference Stanton (1991) that contains all the equations used. Between Stanton and Crawford, you can understand what is going on here. Read through the comments to get a better idea: the main idea is to take the results of an SCF calculation and apply them to a similarity transformation of the Hamiltonian. The transformed Hamiltonian now contains ‘excited’ determinants, which is the requirement for electron correlation — in other words you get a multi-reference quality calculation from a single reference (a Hartree-Fock calculation).
Anyway, here you have it:
(Special thanks to Junhao Li for pointing out bugs in the code! Everything should be correct now.)
The Equation of Motion derivations of excited state and response properties are elegant, and (in my opinion) very direct. Here I’ll give an example of how it can be used to derive TDHF/RPA. We’ve already done it two other ways, and the agreement between them is important if we wish to extend these methods further.
These operators generate excited states from the ground state (not excited determinants, as in the case of post-HF correlation methods). So it is clear that, when acting on an exact ground state:
Multiply on left by arbitrary state of form \({\langle 0 \vert \delta Q}\), giving
\(\langle 0 \vert [\delta Q, [H, Q_n^{\dagger}]]\vert 0 \rangle = \hbar\omega_{0n} \langle 0 \vert [\delta Q, Q_n^{\dagger}] \vert 0 \rangle \ \ \ \ \\)
Where we have made use of the fact that \({\langle 0 \vert Q_n^{\dagger} = \langle 0 \vert HQ_n^{\dagger} = 0}\). Note that is we express \({Q}\) by particle-hole operators \({a_p^{\dagger}a_q}\), \({a_p^{\dagger}a_q^{\dagger}a_r a_s}\) with coefficients \({C_{pq}}\) and \({C_{pqrs}}\), then \({\delta Q}\) is given by \({\frac{\partial Q}{\partial C} \delta C}\) for arbitrary variations \({\delta C}\). These are in principle exact, since \({\delta Q \vert 0 \rangle}\) exhausts the whole Hilbert space, such that the above equation corresponds to the full Schrödinger equation. Tamm-Dancoff (or Configuration Interaction Singles) can be obtained by approximating \({\vert 0 \rangle \rightarrow \vert \hbox{HF} \rangle}\) and the operator \({Q_n^{\dagger} = \sum\limits_{ia} C_{ia} a_a^{\dagger}a_i}\), restricting ourselves to 1p-1h excitations. Thus \({\delta Q \vert 0 \rangle = \sum_{ia} a_a^{\dagger}a_i \vert \hbox{HF} \rangle \delta C_{ai}}\), (\({\delta C_{ai}}\) cancels), and
\(\sum\limits_{bj} \langle \hbox{HF} \vert [a_i^{\dagger}a_a, [H, a_b^{\dagger}a_j]]\vert \hbox{HF} \rangle C_{jb} = \hbar\omega \langle \hbox{HF} \vert [a_i^{\dagger}a_a, a_a^{\dagger}a_i] \vert \hbox{HF} \rangle C_{ia} \ \ \ \ \\)
These are the CIS equations. Put another way:
\(\sum\limits_{bj} \left\{(\epsilon_a - \epsilon_i)\delta_{ab}\delta_{ij} + \langle aj \vert \vert ib \rangle \right\} C_{bj} = E^{CIS}C_{ai} \ \ \ \ \\)
Similarly, for RPA/TDHF, if we consider a ground state containing 2p-2h correlations, we can not only create a p-h pair, but also destroy one. Thus (choosing the minus sign for convenience):
\(Q_n^{\dagger} = \sum\limits_{ia} X_{ia} a_a^{\dagger}a_i - \sum\limits_{ia} Y_{ia} a_i^{\dagger}a_a, \qquad \hbox{and} Q_n \vert RPA \rangle = 0 \ \ \ \ \\)
So instead of the basis of only single excitations, and therefore one matrix \({C_{ia}}\), we work in a basis of single excitations and single de-excitations, and have two matrices \({X_{ia}}\) and \({Y_{ia}}\). We also have two kinds of variations \({\delta Q \vert 0 \rangle}\), namely \({a_a^{\dagger}a_i \vert 0 \rangle}\) and \({a_i^{\dagger}a_a \vert 0 \rangle}\). This gives us two sets of equations:
\(\langle \hbox{RPA} \vert [a_i^{\dagger}a_a, [H,Q_n^{\dagger}]]\vert \hbox{RPA} \rangle = \hbar \omega \langle \hbox{RPA} \vert [a_i^{\dagger}a_a, Q_n^{\dagger}] \vert \hbox{RPA} \rangle \notag \ \ \ \ \\)
These contain only expectation values of our four Fermion operators, which cannot be calculated since we still do not know \({\vert \hbox{RPA} \rangle}\). Thus we assume \({\vert \hbox{RPA} \rangle \rightarrow \vert \hbox{HF} \rangle}\). This gives
\(\langle \hbox{RPA} \vert [a_i^{\dagger}a_a, a_b^{\dagger}a_j] \vert \hbox{RPA} \rangle = \langle \hbox{HF} \vert [a_i^{\dagger}a_a, a_b^{\dagger}a_j] \vert \hbox{HF} \rangle = \delta_{ij}\delta_{ab} \ \ \ \ \\)
The probability of finding states \({a_a^{\dagger}a_i \vert 0 \rangle}\) and \({a_i^{\dagger}a_a \vert 0 \rangle}\) in excited state \({\vert n\rangle}\), that is, the p-h and h-p matrix elements of transition density matrix \({\rho^{(1)}}\) are:
\(\rho^{(1)}_{ai} = \langle 0 \vert a_i^{\dagger}a_a \vert n \rangle \simeq \langle \hbox{HF} \vert [a_i^{\dagger}a_a, Q_n^{\dagger}] \vert \hbox{HF} \rangle = X_{ia} \ \ \ \ \\)
Following immediately where we left off last time, when \({A}\) and \({B}\) are one-electron number conserving, e.g. creation/annihilation operators are in pairs, we have
which allow us to define the general polarization propagator. If the elements did not conserve the number of electrons, we could use the same formalism to get the electron propagator, which can describe the attachment/detachment of electrons from a system, thus describing processes such as ionization. In general, it is sufficient to consider the typical elements \({A \rightarrow a_r^{\dagger}a_s}\) and \({B \rightarrow a_t^{\dagger}a_u}\). If spin-orbitals of reference are \({\psi_i, \psi_j}\),…, then a particle-hole basis includes only excitation and de-excitation operators \({a_a^{\dagger}a_i}\) and \({a_i^{\dagger}a_a}\), respectively. Thus the basis and its dual:
\(\mathbf{n} = (\mathbf{e} \quad \mathbf{d}), \qquad \mathbf{n}^{\dagger} = (\mathbf{e} \quad \mathbf{d})^{\dagger} \ \ \ \ \\)
\(e_{ia} = a_a^{\dagger}a_i = E_{ia}, \qquad d_{ai} = a_i^{\dagger}a_a = E_{ia}^{\dagger} \ \ \ \ \ (29)\)
As an example, we give the elements of the first block:
\(E^{\dagger}_{ia}(\hbar\omega\hat{1} - \hat{H})E_{jb} = \hbar\omega E^{\dagger}_{ia} E_{jb} - E^{\dagger}_{ia} \left[H,E_{jb}\right] \ \ \ \ \ (31)\)
\(E^{\dagger}_{ia} \left[H,E_{jb}\right] = \langle \psi_0 \vert [E^{\dagger}_{ia},[H,E_{jb}]]\vert \psi_0 \rangle \ \ \ \ \\)
\(E^{\dagger}_{ia} \left[H,E_{jb}\right] = \langle \psi_0 \vert E^{\dagger}_{ia} H E_{jb}\vert \psi_0 \rangle - \langle \psi_0 \vert E^{\dagger}_{ia}E_{jb} H\vert \psi_0 \rangle - \langle \psi_0 \vert H E_{jb} E^{\dagger}_{ia}\vert \psi_0 \rangle + \langle \psi_0 \vert E_{jb} H E^{\dagger}_{ia} \vert \psi_0 \rangle \ \ \ \ \\)
\(E^{\dagger}_{ia} \left[H,E_{jb}\right] = \langle \psi^a_i \vert H \vert \psi_j^b \rangle - \delta_{ia}\delta_{jb}\langle \psi_0 \vert H \vert \psi_0 \rangle \ \ \ \ \\)
\(E^{\dagger}_{ia} \left[H,E_{jb}\right] = E_0\delta_{ij}\delta_{ab} + F_{ab}\delta_{ij} - F_{ij}\delta_{ab} + \langle aj \vert \vert ib \rangle - \delta_{ia}\delta{jb}E_0 \ \ \ \ \\)
\(E^{\dagger}_{ia} \left[H,E_{jb}\right] = (\epsilon_a - \epsilon_i)\delta_{ia}\delta_{ab} + \langle aj \vert \vert ib \rangle = \mathbf{A} \ \ \ \ \\)
Which is precisely the same elements as we have derived earlier for TDHF. Completing the rest, we have:
\(R(\omega) = M(\omega)^{-1} = \begin{bmatrix} \mathbf{A} & \mathbf{B} \\ \mathbf{B}^* & \mathbf{A}^* \\ \end{bmatrix} - \hbar\omega \begin{bmatrix} \mathbf{1} & \mathbf{0} \\ \mathbf{0} & \mathbf{-1} \\ \end{bmatrix} \ \ \ \ \ (34)\)
With \({\mathbf{B} = \langle ab \vert \vert ij \rangle}\). Now, we saw from the definition of the resolvent that \({R(\omega) \rightarrow \infty}\) at \({\omega \rightarrow \omega_{0n}}\), which are the poles of \({R(\omega)}\). Therefore, \({R(\omega)^{-1} \rightarrow 0}\) and
\(\begin{bmatrix} \mathbf{A} & \mathbf{B} \\ \mathbf{B}^* & \mathbf{A}^* \\ \end{bmatrix} \begin{bmatrix} \mathbf{X} \\ \mathbf{Y} \\ \end{bmatrix} = \hbar\omega \begin{bmatrix} \mathbf{1} & \mathbf{0} \\ \mathbf{0} & \mathbf{-1} \\ \end{bmatrix} \begin{bmatrix} \mathbf{X} \\ \mathbf{Y} \\ \end{bmatrix} \ \ \ \ \ (35)\)
which are the RPA/TDHF equations. The eigencolumns determine the linear combinations of excitation and de-excitation operators that produce corresponding approximate excited states when working on the reference state \({\vert \psi_0 \rangle}\).