Thouless Theorem

Kind of a dry post today (hey, I put in a cat picture at least)…but I had to share a theorem – the Thouless theorem – that is so important in computational and quantum chemistry, and is really quite unknown. It ties together Hartree-Fock, Brillouin’s theorem, the successes of coupled cluster, and even some MCSCF ideas. It basically explains why single excitation operators are so important and useful in quantum chemistry, and what they are actually doing when we run our calculations and devise new methods. For someone working closely in electronic structure, it is hard to understate the importance of this theorem. Also the proof is kind of hard to a) find and, b) understand. So I tried to make it simpler and more accessible!

The Thouless theorem ( Thouless, 1960) says that the effect of the \({e^{\hat{T}_1}}\) operator is to transform any single determinant into any other (non-orthogonal!) single determinant. In other words, we can write
\(\vert \phi\rangle = [\hbox{exp}\left(\sum\limits_{i=1}^{N}\sum\limits_{a = N+1}^{\infty} t_i^a a^{\dagger}_{a}a_i\right)]\vert \phi_0\rangle \ \ \ \ \ (1)\)
or
\(\vert \phi\rangle= e^{\hat{T}_1} \vert \phi_0 \rangle \ \ \ \ \ (2)\)

In other words, we can generate any single determinant in terms of another determinant with virtual orbitals (or single excitations) mixed in with it. This is sometimes called orbital rotation’’. If we minimize the energy of the Slater determinant with respect to the rotation parameters (or \({t_ai}\) amplitudes), we get the Hartree-Fock wavefunction out. This is the Brillouins condition, and says that the Hartree Fock reference will not interact (directly) with singly excited determinants. And of course it shouldn’t! We just variationally optimized with respect to those parameters! The Thouless theorem also explains why there is no such thing as a coupled cluster singles equation (it gives Hartree-Fock back). Now, we do include single excitations in CCSD, but they work themselves in because of correlation generated by higher excitations (Ds or Ts, etc.) The Thouless theorem explains why CCSD and higher methods are so insensitive to reference choice; you generate the optimal reference through the \({\hat{T}_1}\) operator. I’m sure there are plenty more uses (you see them a lot in MCSCF actually) that exist in the quantum chemistry literature.

Let’s say we have a single N-electron Slater determinant, written in second quantization as

\[\vert \phi_0 \rangle = a^{\dagger}_{1}a^{\dagger}_{2}\cdots a^{\dagger}_{N}\vert 0\rangle = \prod\limits_{i=1}^{N} a^{\dagger}_{i} \vert 0\rangle \ \ \ \ \ (3)\]

where \({N}\) is the number of electrons in our aptly named N-electron wavefunction, and \({\vert 0\rangle}\) is a vacuum. Let’s say we know what this wavefunction \({\vert \phi_0\rangle}\) is, which is to say it will be our reference determinant. This might be the result of a Hartree-Fock calculation for example. Now say we want to make a new single determinant \({\vert \phi\rangle}\) in terms of our reference determinant (I’ll use determinant and wavefunction interchangably). First we write our wavefunction:
\(\vert \phi \rangle = \tilde{a}^{\dagger}_{1}\tilde{a}^{\dagger}_{2}\cdots \tilde{a}^{\dagger}_{N}\vert 0\rangle = \prod\limits_{\alpha=1}^{N} \tilde{a}^{\dagger}_{\alpha} \vert 0\rangle \ \ \ \ \ (4)\)

The difference between \({a^{\dagger}}\) and \({\tilde{a}^{\dagger}}\) is simply that they create electrons in different (though not orthonormal) orbitals in our Slater determinant. We will assume that the sets of creation (and corresponding annihilation) operators form complete sets and are non-orthogonal with respect to each other. If this is the case we can write one set in terms of the other like so:
\(\tilde{a}^{\dagger}_{\alpha} = \sum\limits_{i}^{\infty} u_{\alpha i} a^{\dagger}_{i} \ \ \ \ \ (5)\)

This is just a change of basis operation. See, for example, Szabo and Ostlund p13. This suggests that we can write
\(\vert \phi\rangle = \prod\limits_{\alpha=1}^{N}\tilde{a}^{\dagger}_{\alpha} \vert 0\rangle = \prod\limits_{\alpha = 1}^{N} \left(\sum\limits_{i}^{\infty} u_{\alpha i} a^{\dagger}_{i}\right) \vert 0\rangle \ \ \ \ \ (6)\)

Following the presentation of Thouless, we then split the sum over occupied and virtual spaces (with respect to the reference):
\(\vert \phi\rangle = \prod\limits_{\alpha = 1}^{N} \left(\sum\limits_{i}^{\infty} u_{\alpha i} a^{\dagger}_{i}\right) \vert 0\rangle = \prod\limits_{\alpha = 1}^{N} \left(\sum\limits_{i}^{N} u_{\alpha i} a^{\dagger}_{i} + \sum\limits_{m= N + 1}^{\infty} u_{\alpha m} a^{\dagger}_{m}\right) \vert 0\rangle \ \ \ \ \ (7)\)

Alright. Since we assumed \({\vert \phi\rangle}\) and \({\vert \phi_0\rangle}\) were not orthogonal, we can choose an intermediate normalization, setting
\(\langle \phi_0 \vert \phi \rangle = \mathbf{1} \ \ \ \ \ (8)\)

Which further implies (again, check out Szabo and Ostlund p13) that the transformation matrix composed of \({u_{\alpha i}}\) is unitary when \({\alpha}\) and \({i}\) run from 1 to \({N}\) (remember \({u}\) is actually a rectangular matrix according to our change of basis definition in 5. This means the square \({N \times N}\) subsection of \({u}\) is invertible, and we represent its \({N \times N}\) inverse as \({U_{i\alpha}}\). Here’s another way to look at it, as well as derive some of the properties of \({\mathbf{u}}\) and \({\mathbf{U}}\):
\(\mathbf{Uu} = \mathbf{U} \begin{bmatrix} \mathbf{u}_{(N\times N)} & \mathbf{u}_{N \times (N+1):\infty} \end{bmatrix} = \begin{bmatrix} \mathbf{I} & \mathbf{T} \end{bmatrix} \ \ \ \ \ (9)\)

where
\(\mathbf{U}\mathbf{u}_{(N\times N)} = \mathbf{I} \qquad \hbox{or} \qquad \sum\limits_{\alpha=1}^{N} U_{i \alpha} u_{\alpha j} = \delta_{i j} \ \ \ \ \ (10)\)

and conversely,
\(\mathbf{u}_{(N\times N)} \mathbf{U} = \mathbf{I} \qquad \hbox{or} \qquad \sum\limits_{i=1}^{N} u_{\alpha i} U_{i \beta} = \delta_{\alpha \beta} \ \ \ \ \ (11)\)

and that odd looking \({\mathbf{T}}\) matrix is the result of the parts of \({\mathbf{u}}\) that extend past the number of electrons \({N}\):
\(\mathbf{U}\mathbf{u}_{(N\times (N+1):\infty)} = \mathbf{T} \qquad \hbox{or} \qquad \sum\limits_{\alpha=1}^{N} U_{i \alpha} u_{\alpha m} = t_{mi} \ \ \ \ \ (12)\)

Where \({i\leq N}\) and \({m\>N}\) for the above equation. Now we put it all together. Because \({\mathbf{U}}\) is unitary, we can write \({N}\) linearly independent combinations of the creation operators, each of which we’ll call \({\bar{a}^{\dagger}_i}\). (Yes, we have now introduced three sets of creation operators, and yes this is essentially what Thouless does, but one of the sets is intermediate to our results and then we will go back to relating two sets. Trust me.)
\(\bar{a}_i^{\dagger} = \sum\limits_{\alpha =1}^{N} U_{i \alpha} \tilde{a}_{\alpha}^{\dagger} \ \ \ \ \ (13)\)

Now the fun happens and using equations 5,10, 11, and 12 we can rewrite the above in terms of just our reference creation and annihilation operators
\(\bar{a}_i^{\dagger} = \sum\limits_{\alpha =1}^{N} U_{i \alpha} \tilde{a}_{\alpha}^{\dagger} \ \ \ \ \ (13)\)

\[\bar{a}_i^{\dagger} = \sum\limits_{\alpha =1}^{N} U_{i \alpha} \sum\limits_{j=1}^{\infty} u_{\alpha j} a_j^{\dagger} \ \ \ \ \ (14)\] \[\bar{a}_i^{\dagger} = \sum\limits_{\alpha =1}^{N} U_{i \alpha} \left( \sum\limits_{j=1}^{N} u_{\alpha j} a_j^{\dagger} + \sum\limits_{m=N+1}^{\infty} u_{\alpha m} a_m^{\dagger}\right) \ \ \ \ \ (15)\] \[\bar{a}_i^{\dagger} = \sum\limits_{\alpha =1}^{N}\sum\limits_{j=1}^{N} U_{i \alpha} u_{\alpha j} a_j^{\dagger} + \sum\limits_{\alpha=1}^{N}\sum\limits_{m=N+1}^{\infty} U_{i \alpha} u_{\alpha m} a_m^{\dagger} \ \ \ \ \ (16)\] \[\bar{a}_i^{\dagger} = \sum\limits_{j = 1}^{N} \delta_{i j} a_{j}^{\dagger} + \sum\limits_{m=N+1}^{\infty} t_{mi} a_{m}^{\dagger} \ \ \ \ \ (17)\] \[\bar{a}_i^{\dagger} = a_{i}^{\dagger} + \sum\limits_{m=N+1}^{\infty} t_{mi} a_m^{\dagger} \ \ \ \ \ (18)\]

In other words, any new orbital may be generated by mixing in contributions from virtual orbitals. We apply this to generating any new single determinant below.
\(\vert \psi\rangle = \prod\limits_i^{N}\bar{a}_i^{\dagger} \vert 0\rangle \ \ \ \ \ (19)\)

\[\vert \psi\rangle= \prod\limits_i^{N}\left(a_{i}^{\dagger} + \sum\limits_{m=N+1}^{\infty}t_{mi}a_m^{\dagger}\right)\vert 0\rangle \ \ \ \ \ (20)\] \[\vert \psi\rangle= \prod\limits_i^{N}\left(1 + \sum\limits_{m=N+1}^{\infty}t_{mi}a_m^{\dagger}a_i\right)a_i^{\dagger}\vert 0\rangle \ \ \ \ \ (21)\] \[\vert \psi\rangle= \prod\limits_i^{N} \prod\limits_{m=N+1}^{\infty} \left(1 + t_{mi}a_m^{\dagger}a_i\right)\vert \psi_0\rangle \ \ \ \ \ (22)\] \[\vert \psi\rangle = e^{\hat{T}_1}\vert \psi_0 \rangle \ \ \ \ \ (23)\]

We got to Eq. (22) from Eq. (21) by realizing that \({\prod\limits_i^{N} a_i^{\dagger} \vert 0\rangle = \vert \psi_0\rangle}\), and \({a_ia_i\vert \psi_0\rangle = 0}\). The fact that all terms in which the same creation operator occurs more than once is also the reason we can write the infinite product as an exponential.

HT to Ismail Aydin and Matthias Degroote for pointing out some typos that have now been fixed!

Joshua Goings Blog Publications

Thouless Theorem

Related Posts

Maximum Entropy Distributions

Variational Quantum Eigensolver (VQE) Example

Neural Networks by Analogy with Linear Regression