Most higher-order response methods use some form of the polarization propagator, which is what I intend to derive here. It can then be used to derive other methods, such as TDHF/RPA and SOPPA.
We have derived earlier a response function, or frequency dependent polarizability,
\[\Pi(BA\vert \omega) = \lim_{\eta \rightarrow 0} \left(\frac{1}{\hbar}\right) \sum\limits_{n\neq0} \left\{ \frac{\langle 0 \vert B \vert n\rangle\langle n \vert A \vert 0\rangle}{\omega + i\eta - \omega_{0n}} - \frac{\langle 0 \vert A \vert n\rangle \langle n \vert B \vert 0 \rangle}{\omega + i\eta + \omega_{0n}} \right\} \ \ \ \ \ (1)\]
where \({A}\) is the applied perturbation, and \({B}\) is the observable, and both are assumed to be Hermitian. \({\omega_{0n}}\) is the excitation energy for the change between states \({0}\) and \({n}\). It should be clear that the response function has poles when \({\omega}\) — the applied field frequency – equals to the excitation energy \({\omega_{0n}}\). Finding these poles is precisely the goal of polarization propagator methods. In the polarization propagator approach, the above equation has \({\eta}\) set to 0, and the response function (the `propagator’), defined as:
\(\langle \langle B;A \rangle \rangle_{\omega} \equiv \sum\limits_{n\neq0} \left\{ \frac{\langle 0 \vert B \vert n\rangle\langle n \vert A \vert 0\rangle}{\hbar\omega - \hbar\omega_{0n}} + \frac{\langle 0 \vert A \vert n\rangle \langle n \vert B \vert 0 \rangle}{-\hbar\omega - \hbar\omega_{0n}} \right\} \ \ \ \ \ (2)\)
Now we want to describe the propagator in terms of commutators between \({A}\) and \({B}\). Make the observation that \({\frac{ab}{c+d} = \frac{ab}{c} - \frac{d}{c}\left(\frac{ab}{c+d}\right)}\), and applying to the first term of the above yields:
\(\sum\limits_{n\neq0} \frac{\langle 0 \vert B \vert n\rangle\langle n \vert A \vert 0\rangle}{\hbar\omega - \hbar\omega_{0n}} = \sum\limits_{n\neq0} \frac{\langle 0 \vert B \vert n\rangle\langle n \vert A \vert 0\rangle}{\hbar\omega} -\sum\limits_{n\neq0} \frac{-\hbar\omega_{0n}\langle 0 \vert B \vert n\rangle\langle n \vert A \vert 0\rangle}{\hbar\omega\left(\hbar\omega - \hbar\omega_{0n}\right)} \ \ \ \ \ (3)\)
Do the same for the second term and combine, recognizing that the \({n=0}\) term vanishes in the first part (thus we get a sum over all \({n}\)), and making use of the fact that \({1 = \sum\limits_{n}\vert n\rangle\langle n\vert }\) and \({H\vert n \rangle = E_n\vert n\rangle}\) and \({\hbar\omega_{0n} = E_n - E_0}\):
\(\langle \langle B;A \rangle \rangle_{\omega} = \frac{1}{\hbar\omega} \langle 0 \vert \left[B,A\right] \vert 0 \rangle + \frac{1}{\hbar\omega} \sum\limits_{n\neq0} \left\{\frac{\langle 0 \vert B \vert n\rangle\langle n \vert \left[H,A\right] \vert 0\rangle}{\hbar\omega - \hbar\omega_{0n}} + \frac{\langle 0 \vert \left[H,A\right] \vert n\rangle\langle n \vert B \vert 0\rangle}{-\hbar\omega - \hbar\omega_{0n}}\right\} \ \ \ \ \ (4)\)
Which is to say that
\(\langle \langle B;A \rangle \rangle_{\omega} = \frac{1}{\hbar\omega} \langle 0 \vert \left[B,A\right] \vert 0 \rangle + \frac{1}{\hbar\omega}\langle \langle B;\left[H,A\right] \rangle \rangle_{\omega} \ \ \ \ \ (5)\)
Or, as we will use it:
\(\hbar\omega\langle \langle B;A \rangle \rangle_{\omega} = \langle 0 \vert \left[B,A\right] \vert 0 \rangle - \langle \langle \left[H,B\right] A \rangle \rangle_{\omega} \ \ \ \ \ (6)\)
As you may have started to see, we can define the propagator iteratively in terms of commutator expectation values of ever-increasing complexity. This is what is known as the so-called ‘‘moment expansion’’ of the propagator. Thus by iteration:
\(\langle \langle B;A \rangle \rangle_{\omega} = \frac{1}{\hbar\omega} \left\{ \langle 0 \vert \left[B,A\right] \vert 0 \rangle + \left(\frac{-1}{\hbar\omega}\right)\langle 0 \vert \left[\left[H,B\right],A\right] \vert 0 \rangle + \left(\frac{-1}{\hbar\omega}\right)^2\langle 0 \vert \left[\left[H,\left[H,B\right]\right],A\right] \vert 0 \rangle + \cdots \right\} \ \ \ \ \ (7)\)
We introduce the ‘‘superoperator’’ (analogous to the Liouville operator in Statistical Mechanics), which acts on operators to give their commutator:
With this definition, we have the power series
\(\langle \langle B;A \rangle \rangle_{\omega} = \frac{1}{\hbar\omega} \sum\limits_{n=0}^{\infty} \left(\frac{-1}{\hbar\omega}\right)^n \langle 0 \vert \left[\hat{H}^nB,A\right]\vert 0\rangle \ \ \ \ \ (9)\)
At this point we make two useful observations. First, recognize that
\(\langle 0 \vert \left[\hat{H}B,A\right]\vert 0\rangle = -\langle0\vert \left[B,\hat{H}A\right]\vert 0\rangle \ \ \ \ \ (10)\)
and so \({\hat{H}}\) can be applied to \({A}\) instead of \({B}\) insofar as we introduce a factor of \({(-1)^n}\). Furthermore, note that the power series is equivalent to
\(\frac{1}{1-x} = 1 + x + x^2 + x^3 + \cdots \ \ \ \ \ (11)\)
Making use of these two observations (and using \({\hat{1}X = X}\) and \({\hat{H}^0 = \hat{1}}\), where \({\hat{1}}\) is the unit superoperator), we have
\(\langle \langle B;A \rangle \rangle_{\omega} = \langle 0 \vert \left[B,\left(\hbar\omega\hat{1} - \hat{H}\right)^{-1}A\right]\vert 0\rangle \ \ \ \ \ (12)\)
Which is merely a cosmetic change at this point, as the superoperator resolvent is defined by the series expansion. We need to find a matrix representation of the resolvent, which implies that we find a complete basis set of operators. To do this, we are going to develop an operator space, where \({\hat{H}}\) is defined by its effect on operators instead of vectors. Introducing the notation
\(\left(X\vert Y\right) = \langle 0 \vert \left[X^{\dagger},Y\right] \vert 0 \rangle \ \ \ \ \ (13)\)
and it follows that \({\left(Y\vert X\right) = \left(X\vert Y\right)^*}\). As defined, we now have
\(\langle \langle B;A \rangle \rangle_{\omega} = \left(B^{\dagger}\vert \left(\hbar\omega\hat{1} - \hat{H}\right)^{-1}\vert A \right) \ \ \ \ \ (14)\)
Which is formally exact, albeit useless until we develop approximations. However, the form of the above equation does look similar to ordinary vector spaces in Hartree-Fock, etc. methods. Truncation of a basis in linear vector space \({V}\) to \({n}\) elements produces a subspace \({V_n}\), and truncation of a general vector corresponds to finding its projection onto the subspace. It follows, then, that we need to find a projection operator \({\rho}\), associated with the truncated basis. If the basis (\({\mathbf{e}}\), say) is orthonormal we write
\(\rho = \sum\limits_i e_ie_i^* = \mathbf{ee}^{\dagger} \ \ \ \ \ (15)\)
which in a complete basis gives:
\(\rho = \sum\limits_i \vert e_i\rangle \langle e_i\vert = \mathbf{1} \ \ \ \ \ (16)\)
If it is not an orthonormal basis, we must include the metric matrix \({\mathbf{S} = \mathbf{e}^{\dagger}\mathbf{e}}\) (or L"{o}wdin basis \({\mathbf{\overline{e}}\mathbf{S}^{-1/2}}\)):
\(\rho = \mathbf{eS}^{-1}\mathbf{e}^{\dagger} = \sum\limits_{ij} e_i (S^{-1})_{ij}e_j^* \ \ \ \ \ (17)\)
When using a truncated basis in operator space, two kinds of projections are useful (Löwdin, 1977, 1982),
\(A' = \rho A \rho, \qquad A'' = A^{1/2} \rho A^{1/2} \ \ \ \ \ (18)\)
which are the outer projection and inner projection, respectively, onto space \({V_n}\) defined by \({\rho}\). Note that \(AB = C\) does not imply \(A'B' = C'\) does not imply \(A''B'' = C''\). Plugging the metric into \({A''}\):
\(A'' = A^{1/2}\mathbf{eS}^{-1}\mathbf{e}^{\dagger}A^{1/2} \ \ \ \ \ (19)\)
and we define
\(\mathbf{f} \equiv A^{1/2}\mathbf{e} = \left(A^{1/2}\mathbf{e}_1 \quad A^{1/2}\mathbf{e}_1 \quad \cdots \right) \ \ \ \ \ (20)\)
We assume that \({A}\) is Hermitian and positive-definite, so that \({A^{1/2}}\) can be defined. Note that \({\mathbf{S} = \mathbf{e}^{\dagger}\mathbf{e} = \left(A^{-1/2}\mathbf{f}\right)^{\dagger}\left(A^{-1/2}\mathbf{f}\right) = \mathbf{f}^{\dagger}A^{-1}\mathbf{f} \implies A'' = \mathbf{f}\left(\mathbf{f}^{\dagger}A^{-1}\mathbf{f}\right)^{-1}\mathbf{f}^{\dagger}}\). Because \({A}\) is arbitrary, replace it with \({A^{-1}}\), and since \({\mathbf{f}^{\dagger}A\mathbf{f} = \mathbf{A}}\) with \({A_{ij} = \langle f_i \vert A\vert f_j\rangle}\):
\(\left(\mathbf{A}^{-1}\right)'' = \mathbf{f}\left(\mathbf{f}^{\dagger}A\mathbf{f}\right)^{-1}\mathbf{f}^{\dagger} = \mathbf{fA}^{-1}\mathbf{f}^{\dagger} \ \ \ \ \ (21)\)
As the basis \({V_n \rightarrow V}\), the inner projection \({\rightarrow \mathbf{A}^{-1}}\), else it is simply a finite basis approximation to the inverse. This is the operator inverse in terms of a matrix inverse. Since \({\mathbf{e}}\) was an arbitrary basis defining \({V_n}\), let \({\mathbf{f}}\) define n-dimensional subspace \({V_n'}\). Thus:
\(\mathbf{A}^{-1} \approx \mathbf{eA}^{-1}\mathbf{e}^{\dagger} = \sum\limits_{ij} e_i(\mathbf{e}^{\dagger}\mathbf{Ae})^{-1}_{ij}\mathbf{e}^*_j \ \ \ \ \ (22)\)
Thus the inner projection leads to an approximation for the projector. Let us define the (as of yet undefined) operator basis:
\(\mathbf{n} = (\mathbf{n}_1 \quad \mathbf{n}_2 \quad \mathbf{n}_3 \quad \cdots) \ \ \ \ \ (23)\)
Given that the binary product (compare with \({\langle x\vert y\rangle = x^*y}\) for vectors)
\(X^{\dagger} \cdot Y = (X\vert Y) = \langle 0 \vert \left[X^{\dagger},Y\right] \vert 0 \rangle \ \ \ \ \ (24)\)
then for our resolvent superoperator we have
\(\hat{R}(\omega) = (\hbar\omega\hat{1} - \hat{H})^{-1} = \mathbf{n}R(\omega)\mathbf{n}^{\dagger} = \sum\limits_{r,s} \mathbf{n}_r[R(\omega)]_{rs}\mathbf{n}_{s}^{\dagger} \ \ \ \ \ (25)\)
where \({\mathbf{n}_r}\) and \({\mathbf{n}_s^{\dagger}}\) are the analogues of \({\mathbf{e}}\) and \({\mathbf{e}^*}\) in operator space. Finally, if
\(R(\omega) = M(\omega)^{-1}, \qquad M(\omega) = \mathbf{n}^{\dagger} (\hbar\omega\hat{1} - \hat{H})\mathbf{n} \ \ \ \ \ (26)\)
then we have
\(\langle \langle B;A \rangle \rangle_{\omega} = B\cdot R(\omega)\cdot A = B \cdot \mathbf{n} R(\omega) \mathbf{n}^{\dagger} \cdot A \ \ \ \ \ (27)\)
which is the key to calculating approximations to response properties. The matrix M is determined once we have chosen an operator basis. This approximation depends on two things: 1) the basis \({\mathbf{n}}\) (and its truncations), and 2) the reference function, that is not the exact ground state. Any approximations to these two things are where we get out various response methods.
If you have the transformed two electron integrals after a Hartree-Fock calculation, it’s trivial to perform Moller-Plesset Second Order Perturbation theory (MP2). It is a non-iterative way to calculate electron correlation energy (e.g. the energy obtained from having electrons “avoid” each other in molecules). The expression for MP2 energy contribution (in a spin orbital basis) looks like this:
\[E_{\rm MP2}=\frac{1}{4}\sum_{ijab}\frac{\vert \langle ij \vert \vert ab \rangle\vert ^2}{\epsilon_i+\epsilon_j-\epsilon_a-\epsilon_b}\]
Where i and j are the indices of the occupied orbitals, and a and b are the indices of the virtual orbitals. The numerator is a “double-bar integral”, which accounts for the Coulomb and Exchange energies between pairs of electrons, and the bottom are the orbital energies (eigenvalues). The numerator is a result of transforming the two-electron integrals to a molecular orbital basis, and the orbital energies are the result of a Hartree-Fock calculation. (Well, strictly speaking all of this is from a Hartree-Fock calculation…) You can find a derivation of this in Szabo and Ostlund (1996).
This is actually one of the easiest electronic structure methods to program. It is non-iterative (which means no guess vectors, chaotic behavior, etc), and can be readily constructed from integrals on hand.
Computationally speaking, we can see that we must construct four loops, to loop over our four indices. The first two must loop over occupied orbitals, and the second two must loop over virtual orbitals. I’ve attached some Python code that accomplishes this. Similar to the CIS and TDHF code, I have hard coded the transformed integrals for HeH+ in an STO-3G basis, at a bond length of 0.9295 Angstroms.
If you run the code you’ll get a correlation energy of -0.00640 Hartrees. This is pretty small, and in fact most correlation energies are. Hartree-Fock does a good job at recovering ~99% of the energy of a molecule. However, correlation energies still translate to the order of 10 kcal/mol, which is chemically significant. This is why including correlation effects are so important. It’s hard to make accurate predictions about reactivities with such large error bars when you neglect correlation. MP2 is an easy way to do so: much more accurate methods, like coupled-cluster methods, will give you energetics to chemical accuracy (chemical accuracy is ~0.1 kcal/mol). However, they are often very costly!
So you completed a Hartree-Fock procedure, and you even transformed your two electron integrals. Now what can you do? We can use those results, specifically the orbital energies (eigenvalues of the Fock matrix) and the transformed two-electron integrals (using the eigenvalues of the Fock matrix), to calculate some simple response properties, such as excitation energies!. Configuration Interaction Singles (CIS), is a subset of the TDHF equations. If you haven’t performed the previous calculations, I’ll include my initial eigenvalues and transformed two electron integrals at the end. The values are for HeH+ at a bond length of 0.9295 Angstrom with an STO-3G basis set. We create and solve:
Now, when we transformed our two electron integrals from atomic orbital (AO) basis to molecular orbital (MO) basis, we made the implicit assumption that we were working under a closed-shell approximation. Electrons have spin, and we assumed we had an even number of electrons so that all their spins paired and integrated out. If you’ve taken any general chemistry, you know that we fill each orbital with two electrons: one spin up, and one spin down. (Electrons are fermions, which means two electrons of opposite spin can occupy the same spatial location. Their opposite, bosons, are a little weirder, which is why we can get Bose-Einstein condensates — think a lot of particles occupying the same place in space). Anyway, we make the assumption that opposite spin electrons can share the same spatial orbital, which is reasonable for most calculations, but not strictly true – after all, the definition of an orbital is a one electron wavefunction. Moving along now…
We need to explicitly account for the spin of the electrons, which means we need to satisfy this transformation:
Where \(\omega\) is the spin of the electron, with coordinates 1 and 2. Note that \((pq\vert rs) \neq \langle pq \vert rs\rangle\), this is because we will be moving from chemists’ notation to physicists’ notation, which dominates the literature. Since the CIS and TDHF equations need only double bar integrals, e.g.
We can also account for that in our transformation. In Python this gives:
Now, because Python begins its indexing at 0, and I began my indexing at 1, I had to make the index change at the end. From now on, the indexing starts at 0, so that \((11\vert 11) = (00\vert 00)\) and so on. Now that we have our integrals, we can also spin adapt our orbital energies. This basically maps spatial orbital energy 1 (which contains two electrons) to spin orbital 1 and 2: odd numbers are spin up, even numbers are spin down. If the spatial orbitals are found in an array E, now moved to an array of double the dimension fs:
Simple enough.
Putting it altogether then (I hard coded the initial values for self-containment):
Running the code we find our first excitation energy at the CIS level is E(CIS) = 0.911, and our first excitation at the TDHF level is E(TDHF) = 0.902, (all in Hartrees) in agreement with the literature values in the paper here. A couple of notes about the method. First, the TDHF method can be seen as an extension of the CIS approach. The \(A\) matrix in the TDHF working equations is just the CIS matrix. This also means that the TDHF matrix is twice the dimension of the CIS matrix ( four times as large), which can get very costly. With a little bit of (linear) algebra, we can get the TDHF equations in the form:
Which is the same dimension as the CIS matrix. This is why you rarely see CIS in practice: for the same cost, you can get a TDHF calculation, which is more accurate because it includes the block off-diagonal elements. I also want to mention that TDDFT takes almost exactly the same form. The only difference is that the replace the two electron integrals with the exchange elements in DFT, and the HF orbital energies are the Kohn-Sham(KS) orbital eigenvalues. Because DFT generally handles correlation better, most single excitations are modeled using TDDFT. Finally, we did solve for all the excitation energies in the minimal basis…but in general we don’t need them all, and it is way to costly to do so. In this case, we can use iterative solvers — like the Davidson iteration — to pick off just the lowest few eigenvalues. Gaussian09, by default, solves for just the lowest three eigenvalues unless you ask for more.
For your reference, the transformed two electron integrals \((pq\vert rs)\) are given below, first four columns are index of p,q,r,s, and the last column is the value:
Addendum: Getting transition dipole moment
Let’s say you have your z-component dipole moment integrals (in the molecular orbital basis) collected into a matrix \(Z_{pq} = \langle p \vert \overrightarrow{z} \vert q \rangle\).
You can compute the z-component of the transition dipole moment from your transition density, which is either the appropriate eigenvector for a given excitation \(X_{ia}\) (CIS) or \(X_{ia}\) and \(Y_{ai}\) (TDHF). If the dipole integral matrix is \(N \times N\), you’ll want to reshape your \(X_{ia}\) and \(Y_{ai}\) to be \(N \times N\) as well, instead of \(OV \times 1\). This gives you a \(N \times N\) transition density matrix of
The expectation value of the z-component of this transition dipole moment is then the trace of the density dotted into the z-component dipole integrals, e.g.
\[\langle z \rangle = Tr(\mathbf{D}\mathbf{Z})\]
Then do the same for \(\langle x \rangle\) and \(\langle y \rangle\) and sum each component for the total transition dipole.
You’ll also need to repeat the process for each transition you are interested in.
Note that most dipole moment integrals are computed in the atomic orbital basis, so don’t forget to transform your transition density to the AO basis (or dipole integrals to MO–it doesn’t matter which way, expectation values are independent of basis–the key is that both matrices are in the same basis).
Low-scaling algorithms are important in any computational development, and this is never more true than in quantum chemistry. A golden example is in the integral transformation routines in electronic structure packages. Without a smarter algorithm, the transformations scale as \(O(N^8)\) (yikes!!), making computations on large systems nearly impossible. But with a little thought, we can reign in that transformation to a (not nearly as bad) \(N^5\) method.
If you have performed a Hartree-Fock (HF) calculation, you are left with a matrix of coefficients. Mathematically, these coefficients are the eigenvectors that correspond to your HF Hamiltonian (Fock matrix). The eigenvalues are your orbital energies. So far so good. The real accurate (and interesting) work in electronic structure theory comes afterward, where we either refine the energies we have obtained or calculate molecular properties like UV-Vis spectra and so on. Most of these methods – actually, all these methods – require transforming your two electron integrals from an atomic orbital basis (your standard run-of-the-mill basis functions are your AO basis) into a molecular orbital basis. Put another way, now that you have solved the wavefunction of the Hartree-Fock system, you need to project your two electron integrals onto that wavefunction. The way you do this looks like so:
Holy cow. EIGHT loops of dimension number of basis functions. That should scale on the order of \(N^8\). Few methods in electronic structure theory scale worse than that (though they do exist…here’s lookin’ at you Full CI). This means that for most calculations, if you use this method of integral transformation, this transformation will be your most expensive step. Thankfully, we can come up with a smarter way to do the above transformation. Because the coefficients are independent (i.e \(C^{p}_\mu\) is independent of \(C^{q}_\nu\)), we can rewrite our first equation as
Where we perform four “quarter-transformations”, save each transformation, and use in the next transformation. Doing it this way gives us four steps of 5-dimension loops, so it should scale on the order of \(N^5\). In code, that looks like:
Much nicer. You’ll notice that we had to pre-allocate the ‘temp’ matrices, to store the results between quarter transformations. The transformation also makes use of the ‘slice’ notation in Python/ NumPy. Using this, we perform a transformation over a whole dimension, instead of one index at a time. It’s a little weird to be working with full dimensions, instead of just indices, but it works well. Here is the full code, with random integer arrays built in to act as our toy four-dimensional integrals. The toy matrix of coefficients, is, like all matrices, 2D. I built in a check, so you can compare the two methods. It spits out two transformed integrals with randomly chosen indices – if/when you run it, you should make sure that the values match. If they don’t, something is wrong!
When I ran the code, moving from a dimension of 4 to a dimension of 8 (e.g I doubled the basis functions), the first method went from 0.29 seconds to 71.5 seconds, a jump of 246 times longer, versus the second method, which went from 0.01 seconds to 0.29 seconds, a jump of only 29 times. This is almost exactly as predicted. Doubling the basis for an \(N^8\) method gives \(2^8 = 256\) times longer, and doubling the basis for an \(N^5\) algorithm gives \(2^5 = 32\) times longer.
The new method is also very amenable to parallelization. Most electronic structure computations need to be parallelized, as model systems get larger and larger. Note that the \(N^5\) method is performed in four independent steps. Because of this independence, we can make our code run in parallel, and perform the quarter transformations on separate processors.
A great example why choosing your algorithm matters in quantum chemistry!
When I am generating UV-Vis spectra (in some cases near-IR spectra) for some system, I am often dealing with hundreds of individual electronic transitions. And most of the time, these transitions don’t even matter (e.g. their oscillator strength is ~ 0). This happens a ton when I am generating spectra for semiconductor quantum dots. TD analysis shows tons of ‘transitions’ but they exist in the band gap, and I’d rather filter them out. See, for example, below:
TD-DFT actually shows tons of transitions from ~2 to 4.5 eV…they just don’t matter! So for my own sanity in pouring through the data, I wrote this Awk script (well, Awk in a Bash script) to pull out the transitions I want, based on oscillator strength (intensity). If you don’t know Awk, you really should. It has nasty syntax (like Perl…), but once you get used to it, it makes text editing and parsing SO SO EASY. I love it. I use Awk all the time for my work. Take a look, and feel free to grab a copy for yourself. To run the script, just type (minus the quotes, and assuming you saved it as ‘tdanalysis’):
Enjoy!
You may need to change gawk to awk, depending on your system. I’ve never had problems with gawk though.