Broken Symmetries in Hartree-Fock

It’s been a busy summer, with traveling and giving talks and writing papers.

I’m back in Seattle again (yay!), and have been thinking a lot about how we get the simplest models of electrons in molecules, the Hartree-Fock equations.

While most theorists know about restricted Hartree Fock (RHF) and unrestricted Hartree-Fock (UHF), not as many know that these can be derived from simple (okay, sometimes not-so-simple) symmetry considerations.

RHF and UHF are actually subgroups of a large class of problems called the generalized Hartree Fock equations. GHF allows mixed spin descriptions (not just simple spin up or spin down), as well as complex wave functions.

These are actually quite important for describing frustrated spin systems, like this simple triangular system

Triangular_frustration

The spins can’t all align favorably, so we call it “spin frustrated”. You can try this out on your own with, say, a triangle of hydrogens or lithiums or chromiums. UHF won’t give you the lowest energy solution. It will be GHF-unstable. Only GHF can give you the lowest energy solution, since it allows the MOs to take on both “spin up” and “spin down” character.

Weird, right?

If we insist that the GHF equations are invariant with respect to spin rotations along an axis (say, the z axis), we can get the UHF equations back.

If we insist that the GHF equations are invariant to time-reversal (doesn’t that sound cool?) as well as spin rotations along all axes (x,y,z), we get the real RHF equations.

Unfortunately, the literature on it is pretty sparse, and the content pretty dense. A paradox, if I’ve ever seen one. Thankfully, I found you can derive all the relationships using the tools of linear algebra.

The results are illuminating, and it’s fun to see how the different broken symmetry HF solutions relate to each other.

So here’s my take on re-deriving the different HF classes from generalized Hartree-Fock!

We want to classify broken symmetry wave function by investigating how they transform under the action of the invariance operators \({\hat{g}}\) constituting the symmetry group \({\mathcal{G}}\) of the spin-free electronic Hamiltonian \({\hat{H}}\) which is equivalent to

\[\hat{g}\hat{H}\hat{g}^{-1} = \hat{H}, \qquad (\forall \hat{g} \in \mathcal{G}). \ \ \ \ \ (1)\]

This simply means \({\hat{H}}\) is invariant to transformation by \({\hat{g}}\), whatever \({\hat{g}}\) may be. In general, when \({\vert \psi\rangle}\) is an eigenstate of \({\hat{H}}\), then \({\hat{g}\vert \psi\rangle}\) is also an eigenstate belonging to the same eigenvalue as \({\psi\rangle}\).
Now, exact eigenstates of \({\hat{H}}\) can be chosen to be simultaneous eigenstates of the various symmetries in \({\mathcal{G}}\). In contrast, for approximate variational wave functions (e.g. Hartree-Fock) the symmetry requirements represent additional constraints.

This is Löwdin’s ‘‘symmetry dilemma’’.

To state this dilemma again, we know the exact solution (lowest energy solution) will have certain symmetries, but if we include these symmetries in our approximate, variational Hamiltonian, we can only raise the energy and not lower it.

This means we can get closer to the exact solution by removing physical constraints! This is troublesome, but not a huge deal in practice.

It’s crucial to recognize that this problem arises only because we use an approximate independent-particle Hamiltonian.

There are several ways to break the single-determinental Hartree-Fock model into its various broken symmetry subgroups. However we do this, what we ultimately want to determine is the form of the operators \({\hat{A}}\) that are invariant with respect to similarity transformations by various subgroups \({\mathcal{H} \subset \mathcal{G}}\), i.e.

\[\hat{h}\hat{A}\hat{h}^{-1} = \hat{A}, \qquad (\forall \hat{h} \in \mathcal{H} \subset \mathcal{G}) \ \ \ \ \ (2)\]

I find it easier to put \({\hat{A}}\) into a finite basis and treat this problem with the tools of linear algebra.

The basis for our independent particle model will be the spinor basis,

\[\vert \alpha\rangle = \left( \begin{array}{c} 1 \\ 0 \end{array} \right), \qquad \vert \beta\rangle = \left( \begin{array}{c} 0 \\ 1 \end{array} \right), \qquad \hbox{and} \quad \vert \alpha\rangle\langle\alpha\vert = \vert \beta\rangle\langle\beta\vert = \mathbf{I}_2 \ \ \ \ \ (3)\]

so any arbitrary spin function can be written

\[\vert p\rangle = p_1\vert \alpha\rangle + p_2\vert \beta\rangle = \left( \begin{array}{c} p_1 \\ p_2 \end{array} \right) \ \ \ \ \ (4)\]

We will only worry about one body operators \({\hat{A}}\), which include the Fock operator as well as the unitary parameterization of the single determinant (c.f. Thouless representation).

Putting \({\hat{A}}\) into a spinor basis gives us

\[\hat{A} = \vert i\sigma_1\rangle\langle i\sigma_1\vert \hat{A} \vert j \sigma_2 \rangle \langle j \sigma_2 \vert = A_{i\sigma_1,j\sigma_2} \vert i \sigma_1 \rangle \langle j \sigma_2 \vert = \left( \begin{array}{cc} \mathbf{A}_{\alpha\alpha} & \mathbf{A}_{\alpha\beta} \\ \mathbf{A}_{\beta\alpha} & \mathbf{A}_{\beta\beta} \end{array} \right) \ \ \ \ \ (5)\]

with

\[\left(A_{\sigma_1 \sigma_2}\right)_{ij} \in \mathbb{C} \quad \hbox{and} \quad \sigma_1,\sigma_2 \in \alpha,\beta \ \ \ \ \ (6)\]

Or, in second quantization

\[\hat{A} = \sum\limits_{ij} \sum\limits_{\sigma_1,\sigma_2} A_{i\sigma_1,j\sigma_2} a^{\dagger}_{i\sigma_1} a_{j\sigma_2} = \sum\limits_{ij} \sum\limits_{\sigma_1,\sigma_2} A_{i\sigma_1,j\sigma_2} \hat{E}^{i \sigma_1}_{j \sigma_2} \ \ \ \ \ (7)\]

The transformation properties of \({\hat{A}}\) under symmetry operations can be determined by examining the transformation of the \({U(2n)}\) generators \({\hat{E}^{i \sigma_1}_{j \sigma_2}}\), but it is much simpler to consider the transformations of the \({2\times2}\) (block) matrix \({\mathbf{A}}\), e.g.

\[\hat{h} \left( \begin{array}{cc} \mathbf{A}_{\alpha\alpha} & \mathbf{A}_{\alpha\beta} \\ \mathbf{A}_{\beta\alpha} & \mathbf{A}_{\beta\beta} \end{array} \right)\hat{h}^{-1} = \left( \begin{array}{cc} \mathbf{A}_{\alpha\alpha} & \mathbf{A}_{\alpha\beta} \\ \mathbf{A}_{\beta\alpha} & \mathbf{A}_{\beta\beta} \end{array} \right) \ \ \ \ \ (8)\]

Basically, we are looking for the constraints on \({\mathbf{A}}\) that make the above equation true, for any given symmetry operation.

The general invariance group of the spin-free electronic Hamiltonian involves the spin rotation group SU(2) and the time reversal group \({\mathcal{T}}\), i.e. SU(2) \({\otimes \mathcal{T}}\). SU(2) can be given in terms of spin operators,

\[\hbox{SU(2)}=\{ U_R (\vec{n},\theta) = exp(i\theta \sum\limits_{a=1}^3 \hat{S}_a \hat{n}_a ); \vec{n} = (n_1, n_2, n_3); \vec{n} \in \mathbb{R}^3; \vert \vert \vec{n}\vert \vert = 1, \theta \in (-2\pi, 2\pi] \} \ \ \ \ \ (9)\]

All of this amounts to performing rotations in spin space. The form looks awfully similar to rotations in 3D space (in fact, the relationship between rotations in spin space and 3D space is much deeper than that). The time reversal group is given as

\[\mathcal{T} = \{ \pm \hat1, \pm \hat{\Theta} \} \ \ \ \ \ (10)\]

In general,

\[\hat{\Theta} = \hat{U}_R({\vec{e}}_2,-\pi)\hat{K} = exp(-i\pi\hat{S}_2)\hat{K} \ \ \ \ \ (11)\] \[\hat{\Theta}^{-1} = - \hat{\Theta} = exp(i\pi\hat{S}_2)\hat{K} \ \ \ \ \ (12)\]

Where \({\hat{K}}\) is the complex conjugation operator.

There is nothing special about using \({\hat{S}_2}\), it’s just convention.

Time reversal doesn’t really affect time per se, but rather changes the direction of movement, be it linear momentum or angular momentum.

It’s an antiunitary operator (it must be, in fact) that consists of the spin-looking part (unitary) and then the complex conjugation operator (antiunitary), to be antiunitary overall. It affects electrons by flipping their spin and then taking the complex conjugate.

One final detail about the symmetry operators before we go on, and that is to observe that since SU(2) is a double cover, if we rotate in spin space by \({2\pi}\), we flip the sign of the wave function. If we rotate by another \({2\pi}\), we get the original state back. This is characteristic of fermions, of which electrons are a part.

Let us now consider the unitary transformations of the form \({U= exp(i\hat{B})}\) where \({\hat{B} = \hat{B}^{\dagger}}\), acting on some operator \({\hat{A}}\).This is valid for any unitary transformation that depends on a continuous parameter (which we have absorbed into \({\hat{B}}\)). We insist

\[\hat{A} = \hat{U}\hat{A}\hat{U}^{-1} = e^{(-i\hat{B})} \hat{A} e^{(i\hat{B})} \ \ \ \ \ (13)\]

Now, by the Baker-Campbell-Hausdorff transformation, we can rewrite the right hand side as

\[e^{(-i\hat{B})} \hat{A} e^{(i\hat{B})} = \hat{A} + \left[\hat{A},i\hat{B}\right] + \frac{1}{2!} \left[\left[\hat{A},i\hat{B}\right],i\hat{B}\right] + \cdots \ \ \ \ \ (14)\]

Since by definition \({\hat{A} = e^{(-i\hat{B})} \hat{A} e^{(i\hat{B})}}\) It suffices to show that the constraints on \({\hat{A}}\) introduced by the symmetry operation are satisfied when \({\left[\hat{A},i\hat{B}\right] = 0}\).
This is an excellent result, because it means that we can just look at how the \({2\times2}\) matrix \({\mathbf{A}}\) transforms under the Pauli spin matrices which define the SU(2) spin rotation as well as the time-reversal operation (in the time-reversal case, we can use the BCH expansion of the unitary part \({\hat{S}_y}\) and then absorb the complex conjugation into the commutator expression).

We can show how RHF, UHF, and GHF all fall out of the different symmetry combinations (or lack thereof).

From here on out, I’ll drop the hats on my operators for clarity.

Let’s start by considering how many different types of symmetry subgroups we can have. A moments thought at the form of the spin rotation and time-reversal groups gives the following subgroups:

\({\mathbf{I}}\). This means there is no symmetry.
\({\Theta}\). This means we are just symmetric to time-reversal symmetry.
\({K}\). Only symmetric to complex conjugation.
\({S_z}\). Only symmetric to spin rotations about the z-axis.
\({S_z}\),\({\Theta}\). Symmetric to both spin rotation about z, as well as time-reversal symmetry.
\({S_z}\), \({K}\). Symmetric to spin rotation about z, as well as complex conjugation.
\({S^2}\). Symmetric to all spin rotations.
\({S^2}\), \({\Theta}\). Symmetric to all spin rotations and time reversal symmetries.

That’s it! That’s all the cases you can get.

Let’s go through them, case by case, using the complex Fock matrix as an example, i.e.

\[\left( \begin{array}{cc} \mathbf{F}_{\alpha\alpha} & \mathbf{F}_{\alpha\beta} \\ \mathbf{F}_{\beta\alpha} & \mathbf{F}_{\beta\beta} \end{array} \right) \ \ \ \ \ (15)\]

Where we solve, for each symmetry (or group of symmetries),

\[\mathbf{h} \left( \begin{array}{cc} \mathbf{F}_{\alpha\alpha} & \mathbf{F}_{\alpha\beta} \\ \mathbf{F}_{\beta\alpha} & \mathbf{F}_{\beta\beta} \end{array} \right)\mathbf{h}^{-1} = \left( \begin{array}{cc} \mathbf{F}_{\alpha\alpha} & \mathbf{F}_{\alpha\beta} \\ \mathbf{F}_{\beta\alpha} & \mathbf{F}_{\beta\beta} \end{array} \right) \ \ \ \ \ (16)\]

where \({\mathbf{h}}\) is the generator of the symmetry.

\({\mathbf{I}}\). No symmetry.

In this case, our transformation on \({\mathbf{F}}\) is rather simple. It looks like

\[\mathbf{I} \left( \begin{array}{cc} \mathbf{F}_{\alpha\alpha} & \mathbf{F}_{\alpha\beta} \\ \mathbf{F}_{\beta\alpha} & \mathbf{F}_{\beta\beta} \end{array} \right)\mathbf{I}^{-1} = \left( \begin{array}{cc} \mathbf{F}_{\alpha\alpha} & \mathbf{F}_{\alpha\beta} \\ \mathbf{F}_{\beta\alpha} & \mathbf{F}_{\beta\beta} \end{array} \right) \ \ \ \ \ (17)\]

So we get no constraints. We can mix spin, as well as take on complex values. This is the structure of the complex generalized Hartree Fock (GHF) Fock matrix.

\({K}\). Complex conjugation symmetry.

If the only symmetry that holds is complex conjugation, our transformation looks like

\[K \left( \begin{array}{cc} \mathbf{F}_{\alpha\alpha} & \mathbf{F}_{\alpha\beta} \\ \mathbf{F}_{\beta\alpha} & \mathbf{F}_{\beta\beta} \end{array} \right)K= \left( \begin{array}{cc} \mathbf{F}_{\alpha\alpha}^{*} & \mathbf{F}_{\alpha\beta}^{*} \\ \mathbf{F}_{\beta\alpha}^{*} & \mathbf{F}_{\beta\beta}^{*} \end{array} \right) \ \ \ \ \ (18)\]

Note that \({K}\) is its own inverse. It also only acts to either the left or the right. The asterisk indicates complex conjugation (not and adjoint!).

The constraint we get here is that the values of the Fock matrix have to be identical on complex conjugation. Since this can only happen if the values are real, we get the real GHF Fock equations.
\({\Theta}\). Time reversal symmetry.

Now we start to get slightly more complicated. Using the Pauli matrix

\[\sigma_2 = \left( \begin{array}{cc} 0 & -i \\ i & 0 \end{array} \right) \ \ \ \ \ (19)\]

To represent the unitary \({S_2}\) operation gives us

\[-i\left( \begin{array}{cc} 0 & -i \\ i & 0 \end{array} \right) K \left( \begin{array}{cc} \mathbf{F}_{\alpha\alpha} & \mathbf{F}_{\alpha\beta} \\ \mathbf{F}_{\beta\alpha} & \mathbf{F}_{\beta\beta} \end{array} \right) i K \left( \begin{array}{cc} 0 & i \\ -i & 0 \end{array} \right) = \left( \begin{array}{cc} \mathbf{F}_{\beta\beta}^{*} & -\mathbf{F}_{\beta\alpha}^{*} \\ -\mathbf{F}_{\alpha\beta}^{*} & \mathbf{F}_{\alpha\alpha}^{*} \end{array} \right) \ \ \ \ \ (20)\]

We see that this really only introduces two constraints, so we choose to eliminate \({\mathbf{F}_{\beta\alpha}}\) and \({\mathbf{F}_{\beta\beta}}\). This gives the final result of paired GHF, or

\[\left( \begin{array}{cc} \mathbf{F}_{\alpha\alpha} & \mathbf{F}_{\alpha\beta} \\ -\mathbf{F}_{\alpha\beta}^{*} & \mathbf{F}_{\alpha\alpha}^{*} \end{array} \right) \ \ \ \ \ (21)\]

\({S_z}\). Rotation about spin z-axis.

Here we use the Pauli matrix

\[\sigma_3 = \left( \begin{array}{cc} 1 & 0 \\ 0 & -1 \end{array} \right) \ \ \ \ \ (22)\]

And show that

\[\left( \begin{array}{cc} 1 & 0 \\ 0 & -1 \end{array} \right) \left( \begin{array}{cc} \mathbf{F}_{\alpha\alpha} & \mathbf{F}_{\alpha\beta} \\ \mathbf{F}_{\beta\alpha} & \mathbf{F}_{\beta\beta} \end{array} \right) \left( \begin{array}{cc} 1 & 0 \\ 0 & -1 \end{array} \right) = \left( \begin{array}{cc} \mathbf{F}_{\alpha\alpha} & -\mathbf{F}_{\alpha\beta} \\ -\mathbf{F}_{\beta\alpha} & \mathbf{F}_{\beta\beta} \end{array} \right) \ \ \ \ \ (23)\]

which is only satisfied if

\[\mathbf{F} = \left( \begin{array}{cc} \mathbf{F}_{\alpha\alpha} & \mathbf{0} \\ \mathbf{0} & \mathbf{F}_{\beta\beta} \end{array} \right) \ \ \ \ \ (24)\]

This gives us the complex version of UHF. We see invariance with respect to \({S_z}\) results in two separate spin blocks, with no restriction on whether they take real or complex values, or the dimension of either spin block.

\({S_z}\),\({\Theta}\). Rotation about spin z-axis and time reversal.

We are now at the point where we examine the effect of invariance to multiple symmetry operations.

It might concern you that considering multiple symmetry operations means that the order in which we perform symmetry operations matters. In general symmetry operations do not commute, but we will see that for our purposes order really doesn’t matter. Because we insist on invariance, the multiple symmetries never actually act on each other, therefore we don’t need to consider the commutator between them.

Or, put a different way, because each symmetry operation returns the system to its original state, we can consider each operation separately.The system contains no memory of the previous symmetry operation.

We can show this another way using the BCH expansion. Consider two symmetry operations on \({F}\) parameterized by \({A}\) and \({B}\):

\[e^{(-iA)}e^{(-iB)}Fe^{(iB)}e^{(iA)} = e^{(-iA)} [F + [F,iB] + \cdots ] e^{(iA)} = e^{(-iA)}Fe^{(iA)} = [F + [F,iA] + \cdots ] \ \ \ \ \ (25)\]

Which is true if and only if = [F, iA] = 0 Which decouples \({A}\) and \({B}\). Considering multiple symmetry operations only gives us more constraints, and order doesn’t matter. Let’s see it in action then for time reversal and z-axial spin symmetry. Using the results for time reversal symmetry, we have

\[\left( \begin{array}{cc} 1 & 0 \\ 0 & -1 \end{array} \right) \left( \begin{array}{cc} \mathbf{F}_{\alpha\alpha} & \mathbf{F}_{\alpha\beta} \\ -\mathbf{F}_{\alpha\beta}^{*} & \mathbf{F}_{\alpha\alpha}^{*} \end{array} \right) \left( \begin{array}{cc} 1 & 0 \\ 0 & -1 \end{array} \right) = \left( \begin{array}{cc} \mathbf{F}_{\alpha\alpha} & -\mathbf{F}_{\alpha\beta} \\ \mathbf{F}_{\alpha\beta}^{*} & \mathbf{F}_{\alpha\alpha}^{*} \end{array} \right) \ \ \ \ \ (26)\]

Which means the off diagonals must go to zero, giving our final result of paired UHF,

\[\left( \begin{array}{cc} \mathbf{F}_{\alpha\alpha} & \mathbf{0} \\ \mathbf{0} & \mathbf{F}_{\alpha\alpha}^{*} \end{array} \right) \ \ \ \ \ (27)\]

\({S_z}\),\({K}\). Rotation about spin z-axis and complex conjugation.

We do a similar thing as above for rotation about spin z-axis and complex conjugation. This one is particularly easy to show, starting from the results of complex conjugation symmetry.

Since symmetry with respect to \({K}\) forces all matrix elements to be real, we just get the real version of the results of symmetry with respect to \({S_z}\) — we get the real UHF equations!

\[\mathbf{F} = \left( \begin{array}{cc} \mathbf{F}_{\alpha\alpha} & \mathbf{0} \\ \mathbf{0} & \mathbf{F}_{\beta\beta} \end{array} \right), \qquad (\mathbf{F} \in \mathbb{R}) \ \ \ \ \ (28)\]

\({S^2}\),\({S_z}\). Rotation about all spin axes.

We finally move onto invariance with respect to rotations about all spin axes. Again, this is a little weird because these operations aren’t commutative, but we have already shown that insisting on invariance leads to a decoupling of the symmetry operations.

(I should mention that if you are still unsure of this, feel free to brute force through all orders of symmetry operations. You’ll see that it makes no difference.)

Two things worth mentioning here: first, technically \({S_z}\) is already a part of the total spin rotation group, so it’s a little weird to separate them into \({S^2}\) and \({S_z}\), but we understand this as drawing the distinction that you can be symmetric to \({S_z}\) but not \({S^2}\).

If you are invariant to \({S^2}\), though, you will be invariant to \({S_z}\).

Second, while \({S^2}\) does technically have an operator representation, it is not a symmetry operation. Think about the form of the operator \({S^2}\): it’s essentially the identity, right? So when we say invariant with respect to \({S^2}\), what we really mean is that we are invariant to the whole spin rotation group, \({S_x,S_y,S_z}\).

To show invariance with respect to the spin group, it suffices just to consider any two spin rotations, since each spin operator can be generated by the commutator of the other two.

You can show this using the Jacobi identity. Say we are looking the invariance of \({F}\) with respect to generators \({A}\), \({B}\), and \({C}\), and \({[A,B] = iC}\) (like our spin matrices do). We want to show

\[[F, A] = [F, B] = 0 \implies [F, C] = 0\]

The Jacobi identity tells us

\[[[A, [B, F]] + [B, [F, A]] + [F, [A, B]] = 0\]

Now, by definition the first two terms are zero, and we can evaluate the commutator of \({[A,B] = iC}\), which means if \({[F,A] = [F,B] = 0}\), then it must follow that \({[F,C] = 0}\) as well (the imaginary in front doesn’t make a difference; expand to see).

That being said, we can evaluate invariance with respect to all spin axes by using the results of \({S_z}\) and applying the generator \({S_x}\) defined by Pauli matrix \({\sigma_2}\) to it, where

\[\sigma_2 = \left( \begin{array}{cc} 0 & 1 \\ 1 & 0 \end{array} \right) \ \ \ \ \ (29)\]

Applying this gives

\[\left( \begin{array}{cc} 0 & 1 \\ 1 & 0 \end{array} \right)\left( \begin{array}{cc} \mathbf{F}_{\alpha\alpha} & \mathbf{0} \\ \mathbf{0} & \mathbf{F}_{\beta\beta} \end{array} \right)\left( \begin{array}{cc} 0 & 1 \\ 1 & 0 \end{array} \right) = \left( \begin{array}{cc} \mathbf{F}_{\beta\beta} & \mathbf{0} \\ \mathbf{0} & \mathbf{F}_{\alpha\alpha} \end{array} \right) \ \ \ \ \ (30)\]

Which means that \({\mathbf{F}_{\alpha\alpha} = \mathbf{F}_{\beta\beta}}\), or

\[\mathbf{F} = \left( \begin{array}{cc} \mathbf{F}_{R} & \mathbf{0} \\ \mathbf{0} & \mathbf{F}_{R} \end{array} \right), \qquad (\mathbf{F}_{R} \in \mathbb{C}) \ \ \ \ \ (31)\]

Invariance in this symmetry group results in the complex RHF equations, where the alpha and beta spin blocks are equivalent. Thus orbitals are doubly occupied.

\({S^2}\),\({S_z}\),\({\Theta}\),\({K}\). All spin rotations and time reversal.

Given the results we just obtained above, and understanding that time reversal contains the \({S_y}\) operator, we only need to take the previous results and make them invariant to complex conjugation. This is very simple, and we see that

\[K\left( \begin{array}{cc} \mathbf{F}_{R} & \mathbf{0} \\ \mathbf{0} & \mathbf{F}_{R} \end{array} \right)K = \left( \begin{array}{cc} \mathbf{F}_{R}^{*} & \mathbf{0} \\ \mathbf{0} & \mathbf{F}_{R}^{*} \end{array} \right) \ \ \ \ \ (32)\]

In other words, the real RHF equations, since invariance with respect to complex conjugation forces the elements to be real.

N.B. Most of this work was first done by Fukutome, though my notes are based off of Stuber and Paldus. The notes here agree with both authors, though the derivations here make a rather large departure from the literature. Thus, it is helpful to reference the following work:

Fukutome, Hideo. ‘‘Unrestricted Hartree-Fock theory and its applications to molecules and chemical reactions.” International Journal of Quantum Chemistry 20.5 (1981): 955-1065.

Stuber, J. L., and J. Paldus. ‘‘Symmetry breaking in the independent particle model.” Fundamental World of Quantum Chemistry, A Tribute Volume to the Memory of Per-Olov Löwdin 1 (2003): 67-139.

I’d strongly recommend looking at Stuber and Paldus’ work first. Their terminology is more consistent with mainstream electronic structure theory.

Linear Response Coupled Cluster

Linear response coupled cluster (LR-CC) is a way of getting electronic excitation energies at a coupled cluster level of theory. It’s also a general way of getting many response properties, which basically answer the question of “what do molecules do when you poke them”?

Here is a simple derivation of LR-CC, which I like because it parallels the derivation of linear response time-dependent DFT and Hartree Fock. There are much more robust ways to get these results, for example, propagator approaches have been particularly successful when applied to coupled cluster response theory, but I think it can mask a lot of the physics if you aren’t careful.

Anyway, here it is.

The coupled cluster ground state equations are given by

\[\hat{H}^{(0)} e^{\hat{T}^{(0)}} \vert \phi_0 \rangle = E_{CC}^{(0)} e^{T^{(0)}} \vert \phi_0 \rangle \ \ \ \ \ (1)\]

Since this is, by definition, a stationary state the following is equivalent

\[\hat{H}^{(0)} e^{\hat{T}^{(0)}} \vert \phi_0 \rangle e^{-iE_{CC}^{(0)}t} = E_{CC}^{(0)} e^{T^{(0)}} \vert \phi_0 \rangle e^{-iE_{CC}^{(0)}t} \ \ \ \ \ (2)\]

Since

\[\vert \phi(t) \rangle = \vert \phi(0)\rangle e^{-i E t} \ \ \ \ \ (3)\]

Now, perturbing the system with monochromatic light of frequency \({\omega}\) gives us the modified Hamiltonian (in the dipole approximation):

\[\hat{H} = \hat{H}^{(0)} + \hat{f}^{(1)}e^{-i\omega t} + \hat{f}^{(1)*}e^{i\omega t} \ \ \ \ \ (4)\]

This induces a response to the wavefunction, which is parameterized by the cluster operator \({\hat{T}}\).

\[\hat{T} = \hat{T}^{(0)} + \hat{R}^{(1)}e^{-i \omega t} + R^{(1)*}e^{i \omega t} + \cdots \ \ \ \ \ (5)\]

The extra terms are higher order responses in the cluster operator. I should also note that this means T and R are in the same excitation manifold: in other words, if T is a single excitation operator, so is R. Same for doubles, triples, etc. Anyway, we are only considering the linear response here. Furthermore, from here on out I won’t write the complex conjugate parts. Though they are still there (and I will indicate them by dots), ultimately we collect terms of only \({e^{-i \omega t}}\) to get our final equations. We could collect the conjugates as well, but would get the same equations in the end.

The time dependent Schrödinger equation is

\[\hat{H}\vert \psi \rangle = i \frac{\partial}{\partial t} \vert \psi \rangle \ \ \ \ \ (6)\]

Plugging in the above perturbed expressions yields

\[\left( \hat{H}^{(0)} + \hat{f}^{(1)}e^{-i \omega t} + \cdots \right) \left( e^{\hat{T}^{(0)} + \hat{R}^{(1)}e^{-i\omega t} + \cdots} \right) \vert \phi_0 \rangle e^{-i E_{CC}^{(0)} t} = i \frac{\partial}{\partial t} \left( e^{\hat{T}^{(0)} + \hat{R}^{(1)}e^{-i\omega t} + \cdots} \right) \vert \phi_0 \rangle e^{-i E_{CC}^{(0)} t} \ \ \ \ \ (7)\]

Since \({\hat{T^{(0)}}}\) and \({\hat{R}^{(1)}}\) commute, we can rewrite as

\[\left( \hat{H}^{(0)} + \hat{f}^{(1)}e^{-i \omega t} + \cdots \right) \left( e^{\hat{T}^{(0)}}e^{\hat{R}^{(1)}e^{-i\omega t} + \cdots} \right) \vert \phi_0 \rangle e^{-i E_{CC}^{(0)} t} = i \frac{\partial}{\partial t} \left( e^{\hat{T}^{(0)}}e^{\hat{R}^{(1)}e^{-i\omega t} + \cdots} \right) \vert \phi_0 \rangle e^{-i E_{CC}^{(0)} t} \ \ \ \ \ (8)\]

Expanding the exponential containing \({\hat{R}^{(1)}}\)

\[\left( \hat{H}^{(0)} + \hat{f}^{(1)}e^{-i \omega t} + \cdots \right) \left[e^{\hat{T}^{(0)}}\left(1 + \hat{R}^{(1)}e^{-i\omega t} + \cdots \right) \right] \vert \phi_0 \rangle e^{-i E_{CC}^{(0)} t} = i \frac{\partial}{\partial t} \left[e^{\hat{T}^{(0)}}\left(1 + \hat{R}^{(1)}e^{-i\omega t} + \cdots \right) \right] \vert \phi_0 \rangle e^{-i E_{CC}^{(0)} t} \ \ \ \ \ (9)\]

Finally perform the derivation, making sure not to forget the time dependent phase

\[\left( \hat{H}^{(0)} + \hat{f}^{(1)}e^{-i \omega t} + \cdots \right) \left[e^{\hat{T}^{(0)}}\left(1 + \hat{R}^{(1)}e^{-i\omega t} + \cdots \right) \right] \vert \phi_0 \rangle e^{-i E_{CC}^{(0)} t} = i e^{\hat{T}^{(0)}} \left(-i \omega {R}^{(1)}e^{-i\omega t} - i E_{CC}^{(0)} \hat{R}^{(1)} e^{-i \omega t} + \cdots \right) \vert \phi_0 \rangle e^{-i E_{CC}^{(0)} t} \ \ \ \ \ (10)\]

Collect terms with \({e^{-i\omega t}}\), cancel out phase factors

\[\hat{H}^{(0)}e^{\hat{T}^{(0)}} \hat{R}^{(1)} \vert \phi_0 \rangle + \hat{f}^{(1)} e^{\hat{T}^{(0)}} \vert \phi_0 \rangle = \omega e^{\hat{T}^{(0)}} \hat{R}^{(1)} \vert \phi_0 \rangle + E_{CC}^{(0)} e^{\hat{T}^{(0)}} \hat{R}^{(1)} \vert \phi_0 \rangle \ \ \ \ \ (11)\]

In linear response theory, we assume the perturbation is small and send \({\hat{f}^{(1)} \rightarrow 0}\).

\[\hat{H}^{(0)}e^{\hat{T}^{(0)}} \hat{R}^{(1)} \vert \phi_0 \rangle = (\omega + E_{CC}^{(0)}) e^{\hat{T}^{(0)}} \hat{R}^{(1)} \vert \phi_0 \rangle \ \ \ \ \ (12)\]

Thus the eigenvalues are just the ground state coupled cluster energy plus an excitation energy.

As an aside, it is possible (in fact, this is how LR-CC is done in practice) to get rid of the \({E_{CC}^{(0)}}\) entirely, so that you solve for the excitation energies directly. How? The simplest way to think about it is that when you evaluate the left hand side, you find that it contains expressions for the ground state energy (times \({R}\)), so if we leave out these terms when solving the equations, we can solve for the excitation energies directly. In other words we force the ground state energies on both sides to cancel by being careful how we set up the equation. And so we get an equation like so:

\[(e^{-\hat{T}^{(0)}}\hat{H}^{(0)}e^{\hat{T}^{(0)}})_c \hat{R}^{(1)} \vert \phi_0 \rangle = \omega \hat{R}^{(1)} \vert \phi_0 \rangle \ \ \ \ \ (13)\]

Here’s the more jargony answer: if you apply Wick’s theorem to the Hausdorff expansion of the similarity transformed coupled cluster Hamiltonian, you retain only terms that share a sum with each other. Diagrammatically, this means we keep only the connected diagrams. If you go ahead and derive all diagrams, connected and disconnected, you find that the disconnected diagrams correspond exactly to the terms you want to cancel on the right.

Metal-Organic Frameworks Paper Now Online

I had my second first author paper published a few weeks back in the Journal of Physical Chemistry A, and wanted to share it here. It’s an investigation of how the tiny little hydrogen molecule binds to open metal sites in metal organic frameworks.

Metal organic frameworks are like molecular sponges. They are very porous and small gases seem drawn to them, so researchers want to use them to store hydrogen gas for fuel cells. Another interesting use is to use them for purifying gases! Imagine if we could store and condense carbon dioxide right out of the air! Lots of potential environmental applications — problem is, no one really knows how they work.

We studied the hydrogen gas binding to a series of different metals in these metal organic frameworks. Here is a picture of one of the systems we studied. strucs The full metal organic framework is on the left. To simplify our method, we made ‘mimics’, meaning we studied just one part of a metal organic framework. That’s what you see in the center and on the right. Here is a close up of an interaction staggered2 Hydrogen is kind of weird in that it has no charge and only two electrons. So why does bind to the metal centers? To understand why, we used a method called Symmetry Adapted Perturbation Theory (SAPT). This method was designed to study weak interactions between molecules. In addition to calculating the energy of the hydrogen sticking to the metal, it also decomposes the interaction into physical forces: like electrostatic, dispersion, induction, etc. It can be an extremely accurate method (we compared it to CCSD(T) energies and got pretty much the same results).

What we found might surprise you, as it did me! Hydrogen binds weakly (like ~ 3 kcal/mol weakly) and in many cases a good third of the interaction comes from dispersion forces. Now, if you remember from general chemistry, the more electrons you have, the greater the effects of dispersion become. Yet hydrogen — that little H2, with only two electrons — had an interaction largely mediated by dispersion (electrostatics was the other major component, which wasn’t too surprising).

The conclusion? Don’t ignore (or underestimate) your dispersion forces! This is very important to researchers who continue to study these metal organic frameworks, as they like to use density functional theory (DFT), which has some problems describing dispersion (if you aren’t careful).

You can read the whole paper here.

Kinetic Balance

Kinetic balance was discovered when early attempts at relativistic SCF calculations failed to converge to a bound state. Often, the energy was too low because of the negative energy continuum. The crux of the problem was that the four components of the spinor in relativistic methods were each allowed to vary independently. In other words, scientists treated each component with its own, independent basis without regard to how the components depended on each other. This problem was eventually solved by paying attention to the non-relativistic limit of the kinetic energy, and noting that the small and large components of the four-spinor are not independent. There is, in fact, a coupling that we refer to as ‘‘kinetic balance’’. I’ll show you how it works.

First, as you may have guessed, we only need to consider the one-electron operator. This is the matrix form of the Dirac equation, and it contains (in addition to other terms) the contributions to the kinetic energy. Written as a pair of coupled equations, we have

\[\begin{array}{c} \left[\mathbf{V}^{LL} - E \mathbf{S}^{LL} \right] \mathbf{c}^L + c \mathbf{\Pi}^{LS} \mathbf{c}^S = 0 \\ c \mathbf{\Pi}^{SL} \mathbf{c}^L + \left[\mathbf{V}^{SS} - (2mc^2 + E)\mathbf{S}^{SS}\right]\mathbf{c}^S = 0 \end{array} \ \ \ \ \ (1)\]

Where \({\mathbf{V}^{LL} = \langle \chi^{L} \vert V \vert \chi^{L} \rangle}\), \({\mathbf{S}^{LL} = \langle \chi^{L} \vert \chi^{L} \rangle}\), and \({\mathbf{\Pi}^{LS} = \langle \chi^{L} \vert -i\hbar \mathbf{\sigma} \cdot \mathbf{\nabla} \vert \chi^{S} \rangle}\), and so on. These are the potential, overlap, and momentum terms of the Dirac equation in a basis \({\vert \chi \rangle}\).

Now, if we look at the second of the two paired equations, we note that for potentials of chemical interest (e.g. molecular or atomic), \({\mathbf{V}^{SS}}\) is negative definite. We also know that when we solve the equation, we are looking for and energy above the negative energy continuum, which is to say we want \({E \> -2mc^2}\). Since the overlap is positive definite, putting all of these constraints together mean that we have a nonsingular matrix (and therefore invertible!) matrix in the second of our coupled equations. We can rewrite the second equation as

\[\mathbf{c}^S = \left[(2mc^2 + E)\mathbf{S}^{SS} - \mathbf{V}^{SS} \right]^{-1}c\mathbf{\Pi}^{SL} \mathbf{c}^L \ \ \ \ \ (2)\]

Substituting this expression back into the first (top) equation yields

\[\left[\mathbf{V}^{LL} - E \mathbf{S}^{LL} \right] \mathbf{c}^L + c \mathbf{\Pi}^{LS} \left[(2mc^2 + E)\mathbf{S}^{SS} - \mathbf{V}^{SS} \right]^{-1}c\mathbf{\Pi}^{SL} \mathbf{c}^L = 0 \ \ \ \ \ (3)\]

This form is very useful for analysis. Now, we are going to use the matrix relation

\[(\mathbf{A} - \mathbf{B})^{-1} = \mathbf{A}^{-1} + \mathbf{A}^{-1}\mathbf{B}(\mathbf{A}-\mathbf{B})^{-1} \ \ \ \ \ (4)\]

with \({\mathbf{A} = 2mc^2\mathbf{S}^{SS}}\) and \({\mathbf{B} = \mathbf{V}^{SS} - E\mathbf{S}^{SS}}\). This leads to the rather long expression

\[\begin{array}{c} \left[\mathbf{V}^{LL} - E\mathbf{S}^{LL} + \frac{1}{2m} \mathbf{\Pi}^{LS} \left[\mathbf{S}^{SS}\right]^{-1}\mathbf{\Pi}^{SL}\right] \mathbf{c}^{L} = \cdots \\ \cdots = \frac{1}{2m}\mathbf{\Pi}^{LS}\left[\mathbf{S}^{SS}\right]^{-1} \left[\mathbf{V}^{SS} - E\mathbf{S}^{SS}\right]\left[\mathbf{V}^{SS} - (2mc^2+E)\mathbf{S}^{SS}\right]^{-1} \mathbf{\Pi}^{SL}\mathbf{c}^{L} \end{array} \ \ \ \ \ (5)\]

Why this ridiculous form? Look closely at each side and their dependence on the speed of light, \({c}\). The left hand side has the \({c^0}\) terms, and the right hand side has the \({c^{-2}}\) terms. Since the non relativistic limit is found when the speed of light is infinite (\({c \rightarrow \infty}\)), the whole right hand side goes to zero. This gives us

\[\left[\mathbf{V}^{LL} - E\mathbf{S}^{LL} + \frac{1}{2m} \mathbf{\Pi}^{LS} \left[\mathbf{S}^{SS}\right]^{-1}\mathbf{\Pi}^{SL}\right] \mathbf{c}^{L} = 0 \ \ \ \ \ (6)\]

Now, if this is indeed the true non-relativistic limit, then we find that our kinetic energy term \({\mathbf{T}^{LL}}\) is given by

\[\mathbf{T}^{LL} = \frac{1}{2m} \mathbf{\Pi}^{LS} \left[\mathbf{S}^{SS}\right]^{-1}\mathbf{\Pi}^{SL} \ \ \ \ \ (7)\]

Or, more explicitly,

\[T^{LL}_{\mu \nu} = \frac{-\hbar^2}{2m} \sum\limits_{\kappa \lambda} \langle \chi_{\mu}^{L} \vert \mathbf{\sigma}\cdot\mathbf{\nabla}\vert \chi_{\kappa}^{S} \langle \left[S^{SS}\right]_{\kappa \lambda}^{-1} \langle \chi_{\lambda}^{S} \vert \mathbf{\sigma} \cdot \mathbf{\nabla} \vert \chi_{\nu}^{L} \rangle \ \ \ \ \ (8)\]

Where that inner part, \({\vert \chi_{\kappa}^{S} \langle \left[S^{SS}\right]_{\kappa \lambda}^{-1} \langle \chi_{\lambda}^{S} \vert }\), is an inner projection onto the small component basis space. Less formally, the small component is the ‘‘mathematical glue’’ that connects the two momentum operators. If the small component spans the same space as the momentum operators in the large component space, \({\{\sigma\cdot\nabla\chi_{\mu}^{L}\}}\), then that inner projection just becomes the identity. This means that the expression becomes

\[T_{\mu \nu} = \frac{-\hbar^2}{2m}\langle \chi_{\mu}^{L} \vert (\mathbf{\sigma}\cdot\mathbf{\nabla})(\mathbf{\sigma} \cdot \mathbf{\nabla}) \vert \chi_{\nu}^{L} \rangle = \frac{-\hbar^2}{2m} \langle \chi_{\mu}^{L} \vert \mathbf{\nabla}^2 \vert \chi_{\nu}^{L} \rangle \ \ \ \ \ (9)\]

Which is the kinetic energy term in the non-relativistic formulation! (N.B. We used the relation \({(\mathbf{\sigma}\cdot\mathbf{\nabla})(\mathbf{\sigma}\cdot\mathbf{\nabla}) = \mathbf{\nabla}^2}\)).

So, when we set up our relativistic calculations, as long as we have the constraint that

\[\chi_{\mu}^{S} = (\mathbf{\sigma}\cdot\mathbf{p})\chi_{\mu}^{L} \ \ \ \ \ (10)\]

Then we will find that we recover the correct non-relativistic limit of our equations. The basis is called ‘‘kinetically balanced’’, and we won’t collapse to energies lower than \({-2mc^2}\).

A few stray observations before we finish. First, if we enforce the relation between small and large component basis functions, then we find that \({\mathbf{\Pi}^{SL} = \mathbf{S}^{SS}}\) and \({\mathbf{\Pi}^{LS} = 2m\mathbf{T}^{LL} = 2m\mathbf{T}}\). Second, this constraint actually maximizes kinetic energy, and any approximation that does not satisfy the kinetic balance condition will lower the energy. This was weird to me coming from the non relativistic Hartree-Fock background, where variationally, if you remove basis functions you raise the energy. The thing is that when doing relativistic calculations, you aren’t bounded from below like their non-relativistic counterparts. While you can get variational stability, you are actually doing an ‘‘excited state’’ calculation (I am using the term ‘‘excited state’’ very loosely). Kind of odd, but the negative energy continuum does exist, and was a big factor in the predicting the existence of antimatter.

Finally, modern methods of relativistic electronic structure theory make use of the kinetic balance between large and small component basis functions to eliminate the small component completely. These are called ‘‘Dirac-exact’’ methods. One such example is NESC, or Normalized Elimination of the Small Component. In addition to reproducing the Dirac equation exactly, they have numerous computational benefits, as well as easily allowing for most (if not all) non-relativistic correlated methods to be applied directly. Thus after doing an NESC calculation, you get relativistic ‘‘orbitals’’ which can immediately be used in, say, a coupled cluster calculation with no computational modification.

Comments, questions, or corrections? Let me know!

Dirac Hartree Fock equations

I wanted to try and understand more of how the physical picture changes going from non-relativistic theories to relativistic theories in electronic structure theory. So I figured a good place to start would be with the relativistic analog to the Hartree-Fock theory we know and love so well :)

Why do I care about relativistic electronic structure theory? It is the most fundamental theory of molecular science we have available! In particular, it is the most natural way to model and understand spin in molecules: molecular magnets, spin-orbit couplings, and high-energy spectroscopy all intrinsically depend on a theory of molecules that includes special relativity and a first principles treatment of spin. It also explains the reactivity (or lack thereof) of heavy metals, as well as their colors and properties. Did you know that if we did not include special relativity in our theories we’d predict gold to be silver instead of yellow? It’s just one of the reasons I think the next decade or so will refine relativistic electronic structure theory into the de facto theory of molecules. Kind of like what coupled cluster did to the electron correlation problem over the past few decades, relativistic theories are maturing fast and should become standard in your favorite quantum chemistry packages. Anyway…

The Dirac-Hartree-Fock (DHF) operator for a single determinant wave function in terms of atomic spinors is identical to that in terms of atomic orbitals. The derivation is the same: you find the energetic stationary point with respect to spinor (orbital) rotations. Thus we start with the DHF operator:

\[f_{pq} = h_{pq} + \sum\limits_{i}^{N} \left( pq \vert \vert ii \right) \ \ \ \ \ (1)\]

The first term, \({h_{pq}}\), is the one-electron part. The second term is the two-electron part. We will look at the one-electron part first. For \({h_{pq}}\), we (obviously) cannot use the Schrödinger equation equivalent. Instead, we use the Dirac equation, which is the relativistic equation for a single electron. In two-spinor form, where

\[\psi = \left( \begin{array}{c} \psi^{L} \\ \psi^{S} \end{array} \right) \ \ \ \ \ (2)\]

(Which is to say \({\psi^{L}}\) is the large component and \({\psi^{S}}\) is the small component.) We have the expression

\[\left[\begin{array}{cc} V-E & c(\mathbf{\sigma}\cdot\mathbf{p}) \\ c(\mathbf{\sigma}\cdot\mathbf{p}) & V-E-2mc^2 \end{array} \right] \left[\begin{array}{c} \psi^{L} \\ \psi^{S} \end{array} \right] = 0 \ \ \ \ \ (3)\]

In this case \({c}\) is the speed of light and \({m}\) is the electron rest mass. \({V}\) is the potential, which in the non-relativistic case is the point charge nucleus, though in general relativistic quantum chemistry assumes a finite nucleus. The bold operators are vectors, for example momentum \({\mathbf{p} = (p_x, p_y, p_z)}\). Same with the Pauli operators. Now we will expand \({\psi^L}\) and \({\psi^S}\) in a basis. Again, the bold type means a vector.

\[\psi^L = \vert \chi^L \rangle \mathbf{c}^L; \qquad \psi^S = \vert \chi^S \rangle \mathbf{c}^S \ \ \ \ \ (4)\]

The \({\mathbf{c}}\) is a column vector containing the basis coefficients, just like in non-relativistic Hartree-Fock theory. Similarly, the \({\vert \chi \rangle}\) ket is our basis. Inserting these expressions and multiplying on the left by \({\left[\langle \chi^L \vert \quad \langle \chi^S \vert \right]}\) (compare with Szabo and Ostlund p. 137) we transform the integro-differential equations into a matrix equation.

\[\left[\begin{array}{cc} \langle \chi^L \vert V \vert \chi^L \rangle - E\langle \chi^L \vert \chi^L \rangle & c\langle \chi^L \vert -i\hbar \mathbf{\sigma}\cdot\mathbf{\nabla} \vert \chi^S \rangle \\ c\langle \chi^S \vert -i\hbar \mathbf{\sigma}\cdot\mathbf{\nabla} \vert \chi^L \rangle & \langle \chi^S \vert V \vert \chi^S \rangle - (2mc^2 +E) \langle \chi^S \vert \chi^S \rangle \end{array} \right] \left[\begin{array}{c} \mathbf{c}^L \\ \mathbf{c}^S \end{array} \right] = 0 \ \ \ \ \ (5)\]

Which, we will simplify to

\[\left[\begin{array}{cc} \mathbf{V}^{LL} - E\mathbf{S}^{LL} & c\mathbf{\Pi}^{LS} \\ c\mathbf{\Pi}^{SL} & \mathbf{V}^{SS} - (2mc^2 +E) \mathbf{S}^{SS} \end{array} \right] \left[\begin{array}{c} \mathbf{c}^L \\ \mathbf{c}^S \end{array} \right] = 0 \ \ \ \ \ (6)\]

Hopefully, it is obvious we made the notational replacements

\[\mathbf{V}^{LL} = \langle \chi^L \vert V \vert \chi^L \rangle; \quad \mathbf{S}^{LL} = \langle \chi^L \vert \chi^L \rangle; \quad \mathbf{\Pi}^{LS} = \langle \chi^L \vert -i\hbar \mathbf{\sigma}\cdot\mathbf{\nabla} \vert \chi^S \rangle; ~\hbox{etc.} \ \ \ \ \ (7)\]

For the DHF equations, we will slide the parts containing energy \({E}\) over to the right hand side to recover the eigenvalue equation. We must now consider the analog to the Coulomb and exchange integrals in the non-relativistic HF equations to the relativistic Dirac equations. There are a few extra things to consider. The first is made clearest if we recall the definition of the two electron integrals (in Mulliken notation) for the non relativistic case. In general, we have

\[(pq\vert rs) = \int\int \psi^{*}_p(\mathbf{r}_1) \psi_q(\mathbf{r}_1) \frac{1}{r_{12}} \psi^{*}_r(\mathbf{r}_2) \psi_s(\mathbf{r}_2)d\mathbf{r}_1d\mathbf{r}_2 \ \ \ \ \ (8)\]

In the relativistic case, we swap the orbitals \({\psi}\) with their four-component spinors. We will have the same equation as above, except instead of the complex conjugate, we have the adjoint.

\[(pq\vert rs) = \int\int \psi^{\dagger}_p(\mathbf{r}_1) \psi_q(\mathbf{r}_1) \frac{1}{r_{12}} \psi^{\dagger}_r(\mathbf{r}_2) \psi_s(\mathbf{r}_2)d\mathbf{r}_1d\mathbf{r}_2 \ \ \ \ \ (9)\]

This slight change is just because we aren’t dealing with scalar functions anymore. For the most part things stay the same (at least in appearance), and we deal with charge distributions between two four spinors (in two spinor form) like so:

\[\psi_p^{\dagger}\psi_q = \left( \psi_p^{L\dagger} \psi_p^{S\dagger} \right) \left( \begin{array}{c} \psi_q^{L} \\ \psi_q^S \end{array} \right) = \psi_p^{L\dagger}\psi_q^{L} + \psi_p^{S\dagger}\psi_q^{S} \ \ \ \ \ (10)\]

So it is ever so slightly more involved, but nothing too out of the ordinary. Extending this idea to the two electron integrals gives

\[(pq\vert rs) = (p^{L}q^{L}\vert r^{L}s^{L}) + (p^{L}q^{L}\vert r^{S}s^{S}) + (p^{S}q^{S}\vert r^{L}s^{L}) + (p^{S}q^{S}\vert r^{S}s^{S}) \ \ \ \ \ (11)\]

Okay. Now implicit to all of this so far is that the interaction between two charge densities is just Coulombic and hasn’t changed going from non relativistic to relativistic theories. This is not really the case, because the Coulombic interaction assumes an instantaneous response between electrons. QED tells us that this interaction is mediated by photons and therefore the interaction cannot occur any faster than the speed of light. In other words, there is a retardation effect between electrons that we must account for. This correction to the Coulomb interaction is called the Breit interaction, given by

\[V^{C}(0,r_{ij}) = \frac{1}{r_{ij}} - \frac{\mathbf{\alpha_i}\cdot\mathbf{\alpha}_j}{r_{ij}} + \frac{(\mathbf{\alpha}_i \times r_{ij})(\mathbf{\alpha}_j \times r_{ij})}{r_{ij}^3} \ \ \ \ \ (12)\]

We will ignore this for the rest of the derivation (thus ultimately giving the Dirac-Coulomb-Hartree-Fock equations), but to see how the interaction may be accounted for, consider that you now have new integrals that take the form

\[\psi_p^{\dagger}\mathbf{\alpha}\psi_q = \left( \psi_p^{L\dagger} \psi_p^{S\dagger} \right) \left( \begin{array}{cc} 0_2 & \mathbf{\sigma} \\ \mathbf{\sigma} & 0_2 \end{array} \right) \left( \begin{array}{c} \psi_q^{L} \\ \psi_q^S \end{array} \right) = \psi_p^{S\dagger}\mathbf{\sigma}\psi_q^{L} + \psi_p^{L\dagger}\mathbf{\sigma}\psi_q^{S} \ \ \ \ \ (13)\]

Which basically adds some different coupling elements. The alpha is the dame as in the Dirac equation, and the sigma is again your Pauli matrices. Anyway, going back, we now have a form of the two-electron integrals in 2-spinor form. We can insert the same basis set expansion elements as before for the one-electron case, and we get, for example

\[(p^L q^L \vert r^L s^L) = \sum\limits_{\mu \nu \kappa \lambda} c^{L*}_{\mu p} c^{L}_{\nu q} c^{L*}_{\kappa r} c^{L}_{\lambda s} (\mu^L \nu^L \vert \kappa^L \lambda^L) \ \ \ \ \ (14)\]

And similar expressions hold for the other terms. This should look nearly identical to the molecular-orbital/atomic-orbital relation in Hartree Fock theory.

And that’s really all there is to it! To clean up our expressions, we introduce density matrix \({\mathbf{P}}\), where, for example

\[\mathbf{P} = \left[\begin{array} {cc} \mathbf{P}^{LL} & \mathbf{P}^{LS} \\ \mathbf{P}^{SL} & \mathbf{P}^{SS} \end{array} \right] \ \ \ \ \ (15)\]

with

\[\mathbf{P}^{LL} = \mathbf{c}^{L}\mathbf{c}^{L\dagger}, ~\hbox{etc.} \ \ \ \ \ (16)\]

Then our Dirac-(Coulomb-) Hartree Fock matrix looks like

\[\mathbf{F} = \left[\begin{array} {cc} \mathbf{F}^{LL} & \mathbf{F}^{LS} \\ \mathbf{F}^{SL} & \mathbf{F}^{SS} \end{array} \right] \ \ \ \ \ (17)\]

With

\[F^{LL}_{\mu \nu} = V^{LL}_{\mu \nu} + \sum\limits_{\kappa \lambda} P^{LL}_{\kappa \lambda} \left[(\mu^L \nu^L \vert \kappa^L \lambda^{L}) - (\mu^{L} \lambda^{L} \vert \kappa^{L} \nu^{L})\right] + \sum\limits_{\kappa\lambda} P^{SS}_{\kappa \lambda} \left[(\mu^L \nu^L \vert \kappa^S \lambda^S )\right] \ \ \ \ \ (18)\] \[F^{LS}_{\mu \nu} = c \Pi^{LS}_{\mu \nu} - \sum\limits_{\kappa \lambda} P_{\kappa \lambda}^{SL} (\mu^L \lambda^L \vert \kappa^{S} \nu^{S} ) \ \ \ \ \ (19)\] \[F^{SS}_{\mu \nu} = V^{SS}_{\mu \nu} - 2c^2 S^{SS}_{\mu\nu} + \sum\limits_{\kappa \lambda} P_{\kappa \lambda}^{LL} (\mu^S \nu^S \vert \kappa^L \lambda^L) + \sum\limits_{\kappa \lambda} P_{\kappa \lambda}^{SS} \left[(\mu^S \nu^S \vert \kappa^S \lambda^S) - (\mu^S \lambda^S \vert \kappa^S \nu^S ) \right] \ \ \ \ \ (20)\]

And of course the form of the DHF equations is the same as in the non-relativistic case (with \({\mathbf{C}}\) the matrix of all eigenvectors \({\mathbf{c}}\)):

\[\mathbf{FC} = \mathbf{SC}E \ \ \ \ \ (21)\]

The Dirac Fock matrix is Hermitian, and you can see that it depends on the density \({\mathbf{P}=\mathbf{CC^{\dagger}}}\) as well, meaning we have to solve the equations iteratively.

Now, there is still a lot more to do with these equations, such as finding a suitable basis set and actually setting up the problem and integrals. It’s not a trivial task, but I plan to explore these more in the future.

Questions, comments, or corrections? Let me know! I’d love to hear from you.