Coupled cluster methods are among the most accurate electronic structure methods available today. For example, with a good choice for a basis, the CCSD(T) equations will give you the correct energy of a molecular system to chemical accuracy (~0.1 kcal/mol). They are also quite formidable to derive, and (in my opinion) to program. Which is why I only coded up CCSD!
Thankfully there are some wonderful resources available to understand the coupled cluster methods. I highly recommend Crawford and Schaefer’s “ An Introduction to Coupled Cluster Theory for Computational Chemists” (2000). I have yet to find a clearer and more complete explanation of coupled cluster theory. The derivations inside it are nasty, but once you get a grasp of diagrammatic techniques, it isn’t so bad :)
In order to understand coupled cluster a bit better, I recently programmed the CCSD energy and amplitude equations in Python. It is for a HeH+ molecule with a bond length of 0.9295 Angstrom, using an STO-3G basis — same system I’ve used before on this blog. The results match what Gaussian09 calculates as well, so I was pretty happy to see it work. As always, I’ve hard-coded the two-electron integrals and other SCF results into the program, so you can just focus on what CCSD does. The functions will look esoteric, and unless you’ve worked with coupled-cluster before, the program should NOT look intuitive or easy to understand — point is, don’t panic. But I’ve provided a reference Stanton (1991) that contains all the equations used. Between Stanton and Crawford, you can understand what is going on here. Read through the comments to get a better idea: the main idea is to take the results of an SCF calculation and apply them to a similarity transformation of the Hamiltonian. The transformed Hamiltonian now contains ‘excited’ determinants, which is the requirement for electron correlation — in other words you get a multi-reference quality calculation from a single reference (a Hartree-Fock calculation).
Anyway, here you have it:
(Special thanks to Junhao Li for pointing out bugs in the code! Everything should be correct now.)
The Equation of Motion derivations of excited state and response properties are elegant, and (in my opinion) very direct. Here I’ll give an example of how it can be used to derive TDHF/RPA. We’ve already done it two other ways, and the agreement between them is important if we wish to extend these methods further.
These operators generate excited states from the ground state (not excited determinants, as in the case of post-HF correlation methods). So it is clear that, when acting on an exact ground state:
Multiply on left by arbitrary state of form \({\langle 0 \vert \delta Q}\), giving
\(\langle 0 \vert [\delta Q, [H, Q_n^{\dagger}]]\vert 0 \rangle = \hbar\omega_{0n} \langle 0 \vert [\delta Q, Q_n^{\dagger}] \vert 0 \rangle \ \ \ \ \\)
Where we have made use of the fact that \({\langle 0 \vert Q_n^{\dagger} = \langle 0 \vert HQ_n^{\dagger} = 0}\). Note that is we express \({Q}\) by particle-hole operators \({a_p^{\dagger}a_q}\), \({a_p^{\dagger}a_q^{\dagger}a_r a_s}\) with coefficients \({C_{pq}}\) and \({C_{pqrs}}\), then \({\delta Q}\) is given by \({\frac{\partial Q}{\partial C} \delta C}\) for arbitrary variations \({\delta C}\). These are in principle exact, since \({\delta Q \vert 0 \rangle}\) exhausts the whole Hilbert space, such that the above equation corresponds to the full Schrödinger equation. Tamm-Dancoff (or Configuration Interaction Singles) can be obtained by approximating \({\vert 0 \rangle \rightarrow \vert \hbox{HF} \rangle}\) and the operator \({Q_n^{\dagger} = \sum\limits_{ia} C_{ia} a_a^{\dagger}a_i}\), restricting ourselves to 1p-1h excitations. Thus \({\delta Q \vert 0 \rangle = \sum_{ia} a_a^{\dagger}a_i \vert \hbox{HF} \rangle \delta C_{ai}}\), (\({\delta C_{ai}}\) cancels), and
\(\sum\limits_{bj} \langle \hbox{HF} \vert [a_i^{\dagger}a_a, [H, a_b^{\dagger}a_j]]\vert \hbox{HF} \rangle C_{jb} = \hbar\omega \langle \hbox{HF} \vert [a_i^{\dagger}a_a, a_a^{\dagger}a_i] \vert \hbox{HF} \rangle C_{ia} \ \ \ \ \\)
These are the CIS equations. Put another way:
\(\sum\limits_{bj} \left\{(\epsilon_a - \epsilon_i)\delta_{ab}\delta_{ij} + \langle aj \vert \vert ib \rangle \right\} C_{bj} = E^{CIS}C_{ai} \ \ \ \ \\)
Similarly, for RPA/TDHF, if we consider a ground state containing 2p-2h correlations, we can not only create a p-h pair, but also destroy one. Thus (choosing the minus sign for convenience):
\(Q_n^{\dagger} = \sum\limits_{ia} X_{ia} a_a^{\dagger}a_i - \sum\limits_{ia} Y_{ia} a_i^{\dagger}a_a, \qquad \hbox{and} Q_n \vert RPA \rangle = 0 \ \ \ \ \\)
So instead of the basis of only single excitations, and therefore one matrix \({C_{ia}}\), we work in a basis of single excitations and single de-excitations, and have two matrices \({X_{ia}}\) and \({Y_{ia}}\). We also have two kinds of variations \({\delta Q \vert 0 \rangle}\), namely \({a_a^{\dagger}a_i \vert 0 \rangle}\) and \({a_i^{\dagger}a_a \vert 0 \rangle}\). This gives us two sets of equations:
\(\langle \hbox{RPA} \vert [a_i^{\dagger}a_a, [H,Q_n^{\dagger}]]\vert \hbox{RPA} \rangle = \hbar \omega \langle \hbox{RPA} \vert [a_i^{\dagger}a_a, Q_n^{\dagger}] \vert \hbox{RPA} \rangle \notag \ \ \ \ \\)
These contain only expectation values of our four Fermion operators, which cannot be calculated since we still do not know \({\vert \hbox{RPA} \rangle}\). Thus we assume \({\vert \hbox{RPA} \rangle \rightarrow \vert \hbox{HF} \rangle}\). This gives
\(\langle \hbox{RPA} \vert [a_i^{\dagger}a_a, a_b^{\dagger}a_j] \vert \hbox{RPA} \rangle = \langle \hbox{HF} \vert [a_i^{\dagger}a_a, a_b^{\dagger}a_j] \vert \hbox{HF} \rangle = \delta_{ij}\delta_{ab} \ \ \ \ \\)
The probability of finding states \({a_a^{\dagger}a_i \vert 0 \rangle}\) and \({a_i^{\dagger}a_a \vert 0 \rangle}\) in excited state \({\vert n\rangle}\), that is, the p-h and h-p matrix elements of transition density matrix \({\rho^{(1)}}\) are:
\(\rho^{(1)}_{ai} = \langle 0 \vert a_i^{\dagger}a_a \vert n \rangle \simeq \langle \hbox{HF} \vert [a_i^{\dagger}a_a, Q_n^{\dagger}] \vert \hbox{HF} \rangle = X_{ia} \ \ \ \ \\)
Following immediately where we left off last time, when \({A}\) and \({B}\) are one-electron number conserving, e.g. creation/annihilation operators are in pairs, we have
which allow us to define the general polarization propagator. If the elements did not conserve the number of electrons, we could use the same formalism to get the electron propagator, which can describe the attachment/detachment of electrons from a system, thus describing processes such as ionization. In general, it is sufficient to consider the typical elements \({A \rightarrow a_r^{\dagger}a_s}\) and \({B \rightarrow a_t^{\dagger}a_u}\). If spin-orbitals of reference are \({\psi_i, \psi_j}\),…, then a particle-hole basis includes only excitation and de-excitation operators \({a_a^{\dagger}a_i}\) and \({a_i^{\dagger}a_a}\), respectively. Thus the basis and its dual:
\(\mathbf{n} = (\mathbf{e} \quad \mathbf{d}), \qquad \mathbf{n}^{\dagger} = (\mathbf{e} \quad \mathbf{d})^{\dagger} \ \ \ \ \\)
\(e_{ia} = a_a^{\dagger}a_i = E_{ia}, \qquad d_{ai} = a_i^{\dagger}a_a = E_{ia}^{\dagger} \ \ \ \ \ (29)\)
As an example, we give the elements of the first block:
\(E^{\dagger}_{ia}(\hbar\omega\hat{1} - \hat{H})E_{jb} = \hbar\omega E^{\dagger}_{ia} E_{jb} - E^{\dagger}_{ia} \left[H,E_{jb}\right] \ \ \ \ \ (31)\)
\(E^{\dagger}_{ia} \left[H,E_{jb}\right] = \langle \psi_0 \vert [E^{\dagger}_{ia},[H,E_{jb}]]\vert \psi_0 \rangle \ \ \ \ \\)
\(E^{\dagger}_{ia} \left[H,E_{jb}\right] = \langle \psi_0 \vert E^{\dagger}_{ia} H E_{jb}\vert \psi_0 \rangle - \langle \psi_0 \vert E^{\dagger}_{ia}E_{jb} H\vert \psi_0 \rangle - \langle \psi_0 \vert H E_{jb} E^{\dagger}_{ia}\vert \psi_0 \rangle + \langle \psi_0 \vert E_{jb} H E^{\dagger}_{ia} \vert \psi_0 \rangle \ \ \ \ \\)
\(E^{\dagger}_{ia} \left[H,E_{jb}\right] = \langle \psi^a_i \vert H \vert \psi_j^b \rangle - \delta_{ia}\delta_{jb}\langle \psi_0 \vert H \vert \psi_0 \rangle \ \ \ \ \\)
\(E^{\dagger}_{ia} \left[H,E_{jb}\right] = E_0\delta_{ij}\delta_{ab} + F_{ab}\delta_{ij} - F_{ij}\delta_{ab} + \langle aj \vert \vert ib \rangle - \delta_{ia}\delta{jb}E_0 \ \ \ \ \\)
\(E^{\dagger}_{ia} \left[H,E_{jb}\right] = (\epsilon_a - \epsilon_i)\delta_{ia}\delta_{ab} + \langle aj \vert \vert ib \rangle = \mathbf{A} \ \ \ \ \\)
Which is precisely the same elements as we have derived earlier for TDHF. Completing the rest, we have:
\(R(\omega) = M(\omega)^{-1} = \begin{bmatrix} \mathbf{A} & \mathbf{B} \\ \mathbf{B}^* & \mathbf{A}^* \\ \end{bmatrix} - \hbar\omega \begin{bmatrix} \mathbf{1} & \mathbf{0} \\ \mathbf{0} & \mathbf{-1} \\ \end{bmatrix} \ \ \ \ \ (34)\)
With \({\mathbf{B} = \langle ab \vert \vert ij \rangle}\). Now, we saw from the definition of the resolvent that \({R(\omega) \rightarrow \infty}\) at \({\omega \rightarrow \omega_{0n}}\), which are the poles of \({R(\omega)}\). Therefore, \({R(\omega)^{-1} \rightarrow 0}\) and
\(\begin{bmatrix} \mathbf{A} & \mathbf{B} \\ \mathbf{B}^* & \mathbf{A}^* \\ \end{bmatrix} \begin{bmatrix} \mathbf{X} \\ \mathbf{Y} \\ \end{bmatrix} = \hbar\omega \begin{bmatrix} \mathbf{1} & \mathbf{0} \\ \mathbf{0} & \mathbf{-1} \\ \end{bmatrix} \begin{bmatrix} \mathbf{X} \\ \mathbf{Y} \\ \end{bmatrix} \ \ \ \ \ (35)\)
which are the RPA/TDHF equations. The eigencolumns determine the linear combinations of excitation and de-excitation operators that produce corresponding approximate excited states when working on the reference state \({\vert \psi_0 \rangle}\).
Most higher-order response methods use some form of the polarization propagator, which is what I intend to derive here. It can then be used to derive other methods, such as TDHF/RPA and SOPPA.
We have derived earlier a response function, or frequency dependent polarizability,
\[\Pi(BA\vert \omega) = \lim_{\eta \rightarrow 0} \left(\frac{1}{\hbar}\right) \sum\limits_{n\neq0} \left\{ \frac{\langle 0 \vert B \vert n\rangle\langle n \vert A \vert 0\rangle}{\omega + i\eta - \omega_{0n}} - \frac{\langle 0 \vert A \vert n\rangle \langle n \vert B \vert 0 \rangle}{\omega + i\eta + \omega_{0n}} \right\} \ \ \ \ \ (1)\]
where \({A}\) is the applied perturbation, and \({B}\) is the observable, and both are assumed to be Hermitian. \({\omega_{0n}}\) is the excitation energy for the change between states \({0}\) and \({n}\). It should be clear that the response function has poles when \({\omega}\) — the applied field frequency – equals to the excitation energy \({\omega_{0n}}\). Finding these poles is precisely the goal of polarization propagator methods. In the polarization propagator approach, the above equation has \({\eta}\) set to 0, and the response function (the `propagator’), defined as:
\(\langle \langle B;A \rangle \rangle_{\omega} \equiv \sum\limits_{n\neq0} \left\{ \frac{\langle 0 \vert B \vert n\rangle\langle n \vert A \vert 0\rangle}{\hbar\omega - \hbar\omega_{0n}} + \frac{\langle 0 \vert A \vert n\rangle \langle n \vert B \vert 0 \rangle}{-\hbar\omega - \hbar\omega_{0n}} \right\} \ \ \ \ \ (2)\)
Now we want to describe the propagator in terms of commutators between \({A}\) and \({B}\). Make the observation that \({\frac{ab}{c+d} = \frac{ab}{c} - \frac{d}{c}\left(\frac{ab}{c+d}\right)}\), and applying to the first term of the above yields:
\(\sum\limits_{n\neq0} \frac{\langle 0 \vert B \vert n\rangle\langle n \vert A \vert 0\rangle}{\hbar\omega - \hbar\omega_{0n}} = \sum\limits_{n\neq0} \frac{\langle 0 \vert B \vert n\rangle\langle n \vert A \vert 0\rangle}{\hbar\omega} -\sum\limits_{n\neq0} \frac{-\hbar\omega_{0n}\langle 0 \vert B \vert n\rangle\langle n \vert A \vert 0\rangle}{\hbar\omega\left(\hbar\omega - \hbar\omega_{0n}\right)} \ \ \ \ \ (3)\)
Do the same for the second term and combine, recognizing that the \({n=0}\) term vanishes in the first part (thus we get a sum over all \({n}\)), and making use of the fact that \({1 = \sum\limits_{n}\vert n\rangle\langle n\vert }\) and \({H\vert n \rangle = E_n\vert n\rangle}\) and \({\hbar\omega_{0n} = E_n - E_0}\):
\(\langle \langle B;A \rangle \rangle_{\omega} = \frac{1}{\hbar\omega} \langle 0 \vert \left[B,A\right] \vert 0 \rangle + \frac{1}{\hbar\omega} \sum\limits_{n\neq0} \left\{\frac{\langle 0 \vert B \vert n\rangle\langle n \vert \left[H,A\right] \vert 0\rangle}{\hbar\omega - \hbar\omega_{0n}} + \frac{\langle 0 \vert \left[H,A\right] \vert n\rangle\langle n \vert B \vert 0\rangle}{-\hbar\omega - \hbar\omega_{0n}}\right\} \ \ \ \ \ (4)\)
Which is to say that
\(\langle \langle B;A \rangle \rangle_{\omega} = \frac{1}{\hbar\omega} \langle 0 \vert \left[B,A\right] \vert 0 \rangle + \frac{1}{\hbar\omega}\langle \langle B;\left[H,A\right] \rangle \rangle_{\omega} \ \ \ \ \ (5)\)
Or, as we will use it:
\(\hbar\omega\langle \langle B;A \rangle \rangle_{\omega} = \langle 0 \vert \left[B,A\right] \vert 0 \rangle - \langle \langle \left[H,B\right] A \rangle \rangle_{\omega} \ \ \ \ \ (6)\)
As you may have started to see, we can define the propagator iteratively in terms of commutator expectation values of ever-increasing complexity. This is what is known as the so-called ‘‘moment expansion’’ of the propagator. Thus by iteration:
\(\langle \langle B;A \rangle \rangle_{\omega} = \frac{1}{\hbar\omega} \left\{ \langle 0 \vert \left[B,A\right] \vert 0 \rangle + \left(\frac{-1}{\hbar\omega}\right)\langle 0 \vert \left[\left[H,B\right],A\right] \vert 0 \rangle + \left(\frac{-1}{\hbar\omega}\right)^2\langle 0 \vert \left[\left[H,\left[H,B\right]\right],A\right] \vert 0 \rangle + \cdots \right\} \ \ \ \ \ (7)\)
We introduce the ‘‘superoperator’’ (analogous to the Liouville operator in Statistical Mechanics), which acts on operators to give their commutator:
With this definition, we have the power series
\(\langle \langle B;A \rangle \rangle_{\omega} = \frac{1}{\hbar\omega} \sum\limits_{n=0}^{\infty} \left(\frac{-1}{\hbar\omega}\right)^n \langle 0 \vert \left[\hat{H}^nB,A\right]\vert 0\rangle \ \ \ \ \ (9)\)
At this point we make two useful observations. First, recognize that
\(\langle 0 \vert \left[\hat{H}B,A\right]\vert 0\rangle = -\langle0\vert \left[B,\hat{H}A\right]\vert 0\rangle \ \ \ \ \ (10)\)
and so \({\hat{H}}\) can be applied to \({A}\) instead of \({B}\) insofar as we introduce a factor of \({(-1)^n}\). Furthermore, note that the power series is equivalent to
\(\frac{1}{1-x} = 1 + x + x^2 + x^3 + \cdots \ \ \ \ \ (11)\)
Making use of these two observations (and using \({\hat{1}X = X}\) and \({\hat{H}^0 = \hat{1}}\), where \({\hat{1}}\) is the unit superoperator), we have
\(\langle \langle B;A \rangle \rangle_{\omega} = \langle 0 \vert \left[B,\left(\hbar\omega\hat{1} - \hat{H}\right)^{-1}A\right]\vert 0\rangle \ \ \ \ \ (12)\)
Which is merely a cosmetic change at this point, as the superoperator resolvent is defined by the series expansion. We need to find a matrix representation of the resolvent, which implies that we find a complete basis set of operators. To do this, we are going to develop an operator space, where \({\hat{H}}\) is defined by its effect on operators instead of vectors. Introducing the notation
\(\left(X\vert Y\right) = \langle 0 \vert \left[X^{\dagger},Y\right] \vert 0 \rangle \ \ \ \ \ (13)\)
and it follows that \({\left(Y\vert X\right) = \left(X\vert Y\right)^*}\). As defined, we now have
\(\langle \langle B;A \rangle \rangle_{\omega} = \left(B^{\dagger}\vert \left(\hbar\omega\hat{1} - \hat{H}\right)^{-1}\vert A \right) \ \ \ \ \ (14)\)
Which is formally exact, albeit useless until we develop approximations. However, the form of the above equation does look similar to ordinary vector spaces in Hartree-Fock, etc. methods. Truncation of a basis in linear vector space \({V}\) to \({n}\) elements produces a subspace \({V_n}\), and truncation of a general vector corresponds to finding its projection onto the subspace. It follows, then, that we need to find a projection operator \({\rho}\), associated with the truncated basis. If the basis (\({\mathbf{e}}\), say) is orthonormal we write
\(\rho = \sum\limits_i e_ie_i^* = \mathbf{ee}^{\dagger} \ \ \ \ \ (15)\)
which in a complete basis gives:
\(\rho = \sum\limits_i \vert e_i\rangle \langle e_i\vert = \mathbf{1} \ \ \ \ \ (16)\)
If it is not an orthonormal basis, we must include the metric matrix \({\mathbf{S} = \mathbf{e}^{\dagger}\mathbf{e}}\) (or L"{o}wdin basis \({\mathbf{\overline{e}}\mathbf{S}^{-1/2}}\)):
\(\rho = \mathbf{eS}^{-1}\mathbf{e}^{\dagger} = \sum\limits_{ij} e_i (S^{-1})_{ij}e_j^* \ \ \ \ \ (17)\)
When using a truncated basis in operator space, two kinds of projections are useful (Löwdin, 1977, 1982),
\(A' = \rho A \rho, \qquad A'' = A^{1/2} \rho A^{1/2} \ \ \ \ \ (18)\)
which are the outer projection and inner projection, respectively, onto space \({V_n}\) defined by \({\rho}\). Note that \(AB = C\) does not imply \(A'B' = C'\) does not imply \(A''B'' = C''\). Plugging the metric into \({A''}\):
\(A'' = A^{1/2}\mathbf{eS}^{-1}\mathbf{e}^{\dagger}A^{1/2} \ \ \ \ \ (19)\)
and we define
\(\mathbf{f} \equiv A^{1/2}\mathbf{e} = \left(A^{1/2}\mathbf{e}_1 \quad A^{1/2}\mathbf{e}_1 \quad \cdots \right) \ \ \ \ \ (20)\)
We assume that \({A}\) is Hermitian and positive-definite, so that \({A^{1/2}}\) can be defined. Note that \({\mathbf{S} = \mathbf{e}^{\dagger}\mathbf{e} = \left(A^{-1/2}\mathbf{f}\right)^{\dagger}\left(A^{-1/2}\mathbf{f}\right) = \mathbf{f}^{\dagger}A^{-1}\mathbf{f} \implies A'' = \mathbf{f}\left(\mathbf{f}^{\dagger}A^{-1}\mathbf{f}\right)^{-1}\mathbf{f}^{\dagger}}\). Because \({A}\) is arbitrary, replace it with \({A^{-1}}\), and since \({\mathbf{f}^{\dagger}A\mathbf{f} = \mathbf{A}}\) with \({A_{ij} = \langle f_i \vert A\vert f_j\rangle}\):
\(\left(\mathbf{A}^{-1}\right)'' = \mathbf{f}\left(\mathbf{f}^{\dagger}A\mathbf{f}\right)^{-1}\mathbf{f}^{\dagger} = \mathbf{fA}^{-1}\mathbf{f}^{\dagger} \ \ \ \ \ (21)\)
As the basis \({V_n \rightarrow V}\), the inner projection \({\rightarrow \mathbf{A}^{-1}}\), else it is simply a finite basis approximation to the inverse. This is the operator inverse in terms of a matrix inverse. Since \({\mathbf{e}}\) was an arbitrary basis defining \({V_n}\), let \({\mathbf{f}}\) define n-dimensional subspace \({V_n'}\). Thus:
\(\mathbf{A}^{-1} \approx \mathbf{eA}^{-1}\mathbf{e}^{\dagger} = \sum\limits_{ij} e_i(\mathbf{e}^{\dagger}\mathbf{Ae})^{-1}_{ij}\mathbf{e}^*_j \ \ \ \ \ (22)\)
Thus the inner projection leads to an approximation for the projector. Let us define the (as of yet undefined) operator basis:
\(\mathbf{n} = (\mathbf{n}_1 \quad \mathbf{n}_2 \quad \mathbf{n}_3 \quad \cdots) \ \ \ \ \ (23)\)
Given that the binary product (compare with \({\langle x\vert y\rangle = x^*y}\) for vectors)
\(X^{\dagger} \cdot Y = (X\vert Y) = \langle 0 \vert \left[X^{\dagger},Y\right] \vert 0 \rangle \ \ \ \ \ (24)\)
then for our resolvent superoperator we have
\(\hat{R}(\omega) = (\hbar\omega\hat{1} - \hat{H})^{-1} = \mathbf{n}R(\omega)\mathbf{n}^{\dagger} = \sum\limits_{r,s} \mathbf{n}_r[R(\omega)]_{rs}\mathbf{n}_{s}^{\dagger} \ \ \ \ \ (25)\)
where \({\mathbf{n}_r}\) and \({\mathbf{n}_s^{\dagger}}\) are the analogues of \({\mathbf{e}}\) and \({\mathbf{e}^*}\) in operator space. Finally, if
\(R(\omega) = M(\omega)^{-1}, \qquad M(\omega) = \mathbf{n}^{\dagger} (\hbar\omega\hat{1} - \hat{H})\mathbf{n} \ \ \ \ \ (26)\)
then we have
\(\langle \langle B;A \rangle \rangle_{\omega} = B\cdot R(\omega)\cdot A = B \cdot \mathbf{n} R(\omega) \mathbf{n}^{\dagger} \cdot A \ \ \ \ \ (27)\)
which is the key to calculating approximations to response properties. The matrix M is determined once we have chosen an operator basis. This approximation depends on two things: 1) the basis \({\mathbf{n}}\) (and its truncations), and 2) the reference function, that is not the exact ground state. Any approximations to these two things are where we get out various response methods.
If you have the transformed two electron integrals after a Hartree-Fock calculation, it’s trivial to perform Moller-Plesset Second Order Perturbation theory (MP2). It is a non-iterative way to calculate electron correlation energy (e.g. the energy obtained from having electrons “avoid” each other in molecules). The expression for MP2 energy contribution (in a spin orbital basis) looks like this:
\[E_{\rm MP2}=\frac{1}{4}\sum_{ijab}\frac{\vert \langle ij \vert \vert ab \rangle\vert ^2}{\epsilon_i+\epsilon_j-\epsilon_a-\epsilon_b}\]
Where i and j are the indices of the occupied orbitals, and a and b are the indices of the virtual orbitals. The numerator is a “double-bar integral”, which accounts for the Coulomb and Exchange energies between pairs of electrons, and the bottom are the orbital energies (eigenvalues). The numerator is a result of transforming the two-electron integrals to a molecular orbital basis, and the orbital energies are the result of a Hartree-Fock calculation. (Well, strictly speaking all of this is from a Hartree-Fock calculation…) You can find a derivation of this in Szabo and Ostlund (1996).
This is actually one of the easiest electronic structure methods to program. It is non-iterative (which means no guess vectors, chaotic behavior, etc), and can be readily constructed from integrals on hand.
Computationally speaking, we can see that we must construct four loops, to loop over our four indices. The first two must loop over occupied orbitals, and the second two must loop over virtual orbitals. I’ve attached some Python code that accomplishes this. Similar to the CIS and TDHF code, I have hard coded the transformed integrals for HeH+ in an STO-3G basis, at a bond length of 0.9295 Angstroms.
If you run the code you’ll get a correlation energy of -0.00640 Hartrees. This is pretty small, and in fact most correlation energies are. Hartree-Fock does a good job at recovering ~99% of the energy of a molecule. However, correlation energies still translate to the order of 10 kcal/mol, which is chemically significant. This is why including correlation effects are so important. It’s hard to make accurate predictions about reactivities with such large error bars when you neglect correlation. MP2 is an easy way to do so: much more accurate methods, like coupled-cluster methods, will give you energetics to chemical accuracy (chemical accuracy is ~0.1 kcal/mol). However, they are often very costly!