I have a new paper published in the latest issue of Advances in Quantum Chemistry, called “ Self-Consistent Field using Direct Inversion in Iterative Subspace Method and Quasi-Newton Vectors”. Quite the mouthful, but hey, it is science. The idea behind the paper was to “steal” some ideas from geometry optimizations, and apply them to the problem of converging an SCF calculation (that is, a Hartree-Fock or DFT calculation). The general acceleration method for converging SCF type equations is to use an algorithm called DIIS (pronounced “dice”). Without going into the mathematics, DIIS is a way of improving your iterative guess at each step of the calculation. A really cool introduction can be found online at Dr. David Sherrill’s website. Generally, it brings a modest speed-up to your calculation, and some form of DIIS is always used nowadays in electronic structure calculations. Our twist was to use a different type of vector (a quasi-Newton vector, which gets a lot of use in geometry optimizations) to help the DIIS acceleration. Practically, we found that using a QN-DIIS method got a Hartree-Fock calculation to converge faster by a few iterations. That’s pretty cool!
You can read the paper here, but it’s probably behind a paywall for most of you. If you are really interested (it won’t hurt my feelings if you are not), feel free to email me and I’ll send you a reprint.
Ab initio calculations of UV-Vis spectra can be kind of daunting, and there are many methods available. Here are a few of the most popular methods you’ll find in the literature (and in packages like Gaussian, or CFOUR, etc), along with my take on them.
Configuration Interaction Singles (CIS). This is perhaps the most basic method available, and the basis idea is to treat an electronic excitation as a (linear) combination of singly excited determinants. In other words, we take the Hartree-Fock wavefunction and place one of the electrons in an occupied orbital into a virtual, unoccupied one. We do this for every possible single excitation out of our Hartree-Fock wavefunction. From these singly excited determinants, we make combinations such that the energy is minimized. Just like we take linear combinations of atomic orbitals to get our Hartree-Fock wavefunction, we take linear combinations of singly excited determinants to get our CIS wavefunction. It’s important to note that because of Brillouin’s theorem, the ground state and singly excited determinants don’t interact. The practical result is that CIS excitation energies don’t account for electron correlation. They are cheap to compute — about as much as a Hartree-Fock calculation — but you can often expect them to fail, even qualitatively, for many cases.
Time Dependent Hartree Fock (TDHF). Since CIS fails to include electron correlation, can we find a way to include it? The simplest way is through TDHF, which treats the process of electron excitation as a perturbation. TDHF is derived using linear response theory. Practically, it includes some selected off diagonal’’ elements that mix with the CIS wavefunction. It turns out that TDHF more or less has the same computational cost as CIS, so if you can afford to do a CIS, you are better off doing TDHF.
Time Dependent Density Functional Theory (TDDFT). Similar to TDHF, if we do a linear response calculation on a DFT reference instead of a HF wavefunction, we get time dependent density functional theory. This is hands-down your best choice for most practical calculations. It has roughly the same cost as a CIS or TDHF calculation, so you are going to want to choose TDDFT one over the other two. It is far from perfect, as I’ll explain, but the critical observation is that a DFT reference (it isn’t really a wavefunction…) contains electron correlation, unlike Hartree-Fock. Practically, this carries over to a TDDFT calculation and it is possible to get absorption energies really close to experimental values. Of course I say possible. The problem is that we have no idea whatsoever what functional to use to reproduce experimental results. This makes predictions with TDDFT notoriously tricky, and really you have to verify your results with experiment. So it’s use is limited. However, with a cleverly chosen functional, I have seen TDDFT results perform better than many other more complicated methods. The downside is that you don’t know the functional — so between that and the density basis (as opposed to wavefunction based methods like Hartree Fock, etc) there is no way to systematically improve TDDFT.
Equation of Motion and Linear Response Coupled Cluster (EOM-CC, LR-CC). Recall how I just said that while we can’t systematically improve TDDFT, we can improve TDHF? EOM methods are just those systematic improvements. Instead of a Hartree-Fock reference we use a coupled cluster reference, which does account for electron correlation (and often quite well, I might add). There are two methods in particular I want to note. The first is called CC2, or LR-CC2, which cost-wise is the next step up from TDHF. TDHF (and TDDFT and CIS) scale as \(O(N^4)\), which means that as you increase your basis set (N is the number of basis functions) the cost increases as that number to the fourth. In contrast, CC2 scales as \(O(N^5)\). So whatever amount of electron correlation you get from the CC2 reference, you pay for in increased computational cost. If you are doing work with small- to medium-sized molecules, however, this is the method you want to use. Aside from a few failures, it generally blows TDDFT out of the water. TDDFT fails to describe charge-transfer states and systems where dispersion forces are prominent (say, clusters or DNA stacking), whereas CC2 does not have these failures. I haven’t seen this method used too much, but if it is available, I really recommend at least trying it out for your system, especially if it has the features I just mentioned. The other one is EOM-CCSD, which treats electron correlation in a much more balanced manner. More detailed description is beyond the scope here (coupled cluster literature can be very challenging to decipher!), but suffice to say that it is easily the best method I will describe here. Unfortunately it scales as \(O(N^6)\), which makes it unusable for many practical computations. You’re really limited to small to medium sized molecules with a decent basis set. For example, I can run an EOM-CCSD calculation on a molecule with ~200 basis functions in maybe 8 hours with a single processor. Your mileage may vary.
In summary: use EOM-CCSD if you can, then CC2 if available, then TDDFT. I can’t see a practical use for TDHF/CIS, frankly. Of course, your level of accuracy may not be as stringent, so in most cases, TDDFT is your best bet. Just be sure and do your homework when choosing a functional!
Now a quick note about basis sets: when doing excited state calculations, always, always, ALWAYS use diffuse functions. If I see an excited state calculation, the first thing I’ll check is if the basis includes diffuse functions. You can read more on the why in this article. While the choice of basis is situation dependent, I recommend starting with a 6-31+G*. The + symbol means a set of diffuse functions added to the heavy elements. (The * means polarization functions – in general very good to have). You might also see diffuse functions in the Dunning sets as an ‘aug-‘ prefix. Use these; say, start with aug-cc-pVDZ.
Have any more questions, need some references, or find any errors? Let me know! I’m glad to help.
(NB. Astute readers might have noticed I only mentioned single reference methods. If you know enough to wonder if a multi-reference calculation is necessary for your problem, you probably don’t need my advice anyway :) ).
I’ve finally gotten around to cleaning up my home-brewed research code (at least a little bit), and I’ve hosted it here on my GitHub. I’m calling the project pyqchem, and it contains Hartree Fock, MP2, coupled cluster and a smattering of excited state methods. I’d love it if you check it out!
You’ll notice that I still don’t have a way of reading in molecular geometry and basis set, and so I am still reliant on parsing G09 outputfor the atomic orbital integrals. Oh well, it’s a future project :) I’d also love to learn how to make it object oriented, as well as optimize the lower level routines in C/C++. It is so far far from optimized right now it is ridiculous. I think my EOM-CCSD code scales as N^8 instead of N^6, just because I haven’t factored the equations right. But it works, and it has helped me learn so much about electronic structure methods. If you get the chance to look at it, I would love input for the future. I’m still developing as a programmer and I could use all the help I can get!
I hosted a lot of premade molecules (and their AO integrals), so all you have to do is execute pyqchem.py followed by the folder of interest.
For example:
Would execute the pyqchem.py script on the folder h2_3-21G, which contains all the precomputed atomic-orbital basis integrals. The type of calculation is changed by editing the pyqchem.py script itself. Say I wanted to perform an MP2 calculation on H2 in a 3-21G basis. I open pyqchem.py, and edit at the top:
Then run:
And you’ll see the pretty output dump to your terminal :)
Kind of a dry post today (hey, I put in a cat picture at least)…but I had to share a theorem – the Thouless theorem – that is so important in computational and quantum chemistry, and is really quite unknown. It ties together Hartree-Fock, Brillouin’s theorem, the successes of coupled cluster, and even some MCSCF ideas. It basically explains why single excitation operators are so important and useful in quantum chemistry, and what they are actually doing when we run our calculations and devise new methods. For someone working closely in electronic structure, it is hard to understate the importance of this theorem. Also the proof is kind of hard to a) find and, b) understand. So I tried to make it simpler and more accessible!
The Thouless theorem ( Thouless, 1960) says that the effect of the \({e^{\hat{T}_1}}\) operator is to transform any single determinant into any other (non-orthogonal!) single determinant. In other words, we can write
\(\vert \phi\rangle = [\hbox{exp}\left(\sum\limits_{i=1}^{N}\sum\limits_{a = N+1}^{\infty} t_i^a a^{\dagger}_{a}a_i\right)]\vert \phi_0\rangle \ \ \ \ \ (1)\)
or
\(\vert \phi\rangle= e^{\hat{T}_1} \vert \phi_0 \rangle \ \ \ \ \ (2)\)
In other words, we can generate any single determinant in terms of another determinant with virtual orbitals (or single excitations) mixed in with it. This is sometimes called orbital rotation’’. If we minimize the energy of the Slater determinant with respect to the rotation parameters (or \({t_ai}\) amplitudes), we get the Hartree-Fock wavefunction out. This is the Brillouins condition, and says that the Hartree Fock reference will not interact (directly) with singly excited determinants. And of course it shouldn’t! We just variationally optimized with respect to those parameters! The Thouless theorem also explains why there is no such thing as a coupled cluster singles equation (it gives Hartree-Fock back). Now, we do include single excitations in CCSD, but they work themselves in because of correlation generated by higher excitations (Ds or Ts, etc.) The Thouless theorem explains why CCSD and higher methods are so insensitive to reference choice; you generate the optimal reference through the \({\hat{T}_1}\) operator. I’m sure there are plenty more uses (you see them a lot in MCSCF actually) that exist in the quantum chemistry literature.
Let’s say we have a single N-electron Slater determinant, written in second quantization as
where \({N}\) is the number of electrons in our aptly named N-electron wavefunction, and \({\vert 0\rangle}\) is a vacuum. Let’s say we know what this wavefunction \({\vert \phi_0\rangle}\) is, which is to say it will be our reference determinant. This might be the result of a Hartree-Fock calculation for example. Now say we want to make a new single determinant \({\vert \phi\rangle}\) in terms of our reference determinant (I’ll use determinant and wavefunction interchangably). First we write our wavefunction:
\(\vert \phi \rangle = \tilde{a}^{\dagger}_{1}\tilde{a}^{\dagger}_{2}\cdots \tilde{a}^{\dagger}_{N}\vert 0\rangle = \prod\limits_{\alpha=1}^{N} \tilde{a}^{\dagger}_{\alpha} \vert 0\rangle \ \ \ \ \ (4)\)
The difference between \({a^{\dagger}}\) and \({\tilde{a}^{\dagger}}\) is simply that they create electrons in different (though not orthonormal) orbitals in our Slater determinant. We will assume that the sets of creation (and corresponding annihilation) operators form complete sets and are non-orthogonal with respect to each other. If this is the case we can write one set in terms of the other like so:
\(\tilde{a}^{\dagger}_{\alpha} = \sum\limits_{i}^{\infty} u_{\alpha i} a^{\dagger}_{i} \ \ \ \ \ (5)\)
This is just a change of basis operation. See, for example, Szabo and Ostlund p13. This suggests that we can write
\(\vert \phi\rangle = \prod\limits_{\alpha=1}^{N}\tilde{a}^{\dagger}_{\alpha} \vert 0\rangle = \prod\limits_{\alpha = 1}^{N} \left(\sum\limits_{i}^{\infty} u_{\alpha i} a^{\dagger}_{i}\right) \vert 0\rangle \ \ \ \ \ (6)\)
Following the presentation of Thouless, we then split the sum over occupied and virtual spaces (with respect to the reference):
\(\vert \phi\rangle = \prod\limits_{\alpha = 1}^{N} \left(\sum\limits_{i}^{\infty} u_{\alpha i} a^{\dagger}_{i}\right) \vert 0\rangle = \prod\limits_{\alpha = 1}^{N} \left(\sum\limits_{i}^{N} u_{\alpha i} a^{\dagger}_{i} + \sum\limits_{m= N + 1}^{\infty} u_{\alpha m} a^{\dagger}_{m}\right) \vert 0\rangle \ \ \ \ \ (7)\)
Alright. Since we assumed \({\vert \phi\rangle}\) and \({\vert \phi_0\rangle}\) were not orthogonal, we can choose an intermediate normalization, setting
\(\langle \phi_0 \vert \phi \rangle = \mathbf{1} \ \ \ \ \ (8)\)
Which further implies (again, check out Szabo and Ostlund p13) that the transformation matrix composed of \({u_{\alpha i}}\) is unitary when \({\alpha}\) and \({i}\) run from 1 to \({N}\) (remember \({u}\) is actually a rectangular matrix according to our change of basis definition in 5. This means the square \({N \times N}\) subsection of \({u}\) is invertible, and we represent its \({N \times N}\) inverse as \({U_{i\alpha}}\). Here’s another way to look at it, as well as derive some of the properties of \({\mathbf{u}}\) and \({\mathbf{U}}\):
\(\mathbf{Uu} = \mathbf{U} \begin{bmatrix} \mathbf{u}_{(N\times N)} & \mathbf{u}_{N \times (N+1):\infty} \end{bmatrix} = \begin{bmatrix} \mathbf{I} & \mathbf{T} \end{bmatrix} \ \ \ \ \ (9)\)
and that odd looking \({\mathbf{T}}\) matrix is the result of the parts of \({\mathbf{u}}\) that extend past the number of electrons \({N}\):
\(\mathbf{U}\mathbf{u}_{(N\times (N+1):\infty)} = \mathbf{T} \qquad \hbox{or} \qquad \sum\limits_{\alpha=1}^{N} U_{i \alpha} u_{\alpha m} = t_{mi} \ \ \ \ \ (12)\)
Where \({i\leq N}\) and \({m\>N}\) for the above equation. Now we put it all together. Because \({\mathbf{U}}\) is unitary, we can write \({N}\) linearly independent combinations of the creation operators, each of which we’ll call \({\bar{a}^{\dagger}_i}\). (Yes, we have now introduced three sets of creation operators, and yes this is essentially what Thouless does, but one of the sets is intermediate to our results and then we will go back to relating two sets. Trust me.)
\(\bar{a}_i^{\dagger} = \sum\limits_{\alpha =1}^{N} U_{i \alpha} \tilde{a}_{\alpha}^{\dagger} \ \ \ \ \ (13)\)
Now the fun happens and using equations 5,10, 11, and 12 we can rewrite the above in terms of just our reference creation and annihilation operators
\(\bar{a}_i^{\dagger} = \sum\limits_{\alpha =1}^{N} U_{i \alpha} \tilde{a}_{\alpha}^{\dagger} \ \ \ \ \ (13)\)
In other words, any new orbital may be generated by mixing in contributions from virtual orbitals. We apply this to generating any new single determinant below.
\(\vert \psi\rangle = \prod\limits_i^{N}\bar{a}_i^{\dagger} \vert 0\rangle \ \ \ \ \ (19)\)
We got to Eq. (22) from Eq. (21) by realizing that \({\prod\limits_i^{N} a_i^{\dagger} \vert 0\rangle = \vert \psi_0\rangle}\), and \({a_ia_i\vert \psi_0\rangle = 0}\). The fact that all terms in which the same creation operator occurs more than once is also the reason we can write the infinite product as an exponential.
HT to Ismail Aydin and Matthias Degroote for pointing out some typos that have now been fixed!
I have a friend here at the UW that is using mass spectroscopy to identify proteins. Proteins, also called polypeptides, are made up of building blocks called amino acids/peptides. Because polypeptides fragment differently, they can use the fragmentation patterns to determine what an unknown protein is. Kind of a cool idea, I think. The hope is to be able to use it for blood work analysis in medical settings, though I am sure there are plenty of other applications. In fact, C&EN just ran an article this past week on how mass spec was really getting used in biology. You can read it here (sorry, it’s behind a paywall…).
Anyway, he was trying to fit patterns of polypeptides together to fit the different peaks he found. Like if he found a peak at 500.38 M/Z, he would assume he had a GLN-TRP-TRP residue, which has a mass of 500.38 (or some combination thereof). The problem, then, is determining what different combinations of amino acids could fit the peaks! Now, there are really only 20 amino acids he could be looking at, so this turns out to be an interesting combinatorics problem: how many different ways can we combine amino acids residues to fit a given mass (or mass range?)
I realized it wouldn’t be too bad to write a program, and so I am sharing the script with you!
So for this problem, we need to have a set of amino acids, and their masses. So I just looked up the information and hard-coded it into the program. This is the set we will be drawing our combinations from, and it looks like this:
We zip them together to pair up the name of the residue (or amino acid) and its mass.
Once we have this, we will start building our combinations. First we take all combinations of 1 amino acid, then 2 amino acids, then 3 and so on. We don’t need to explicitly make and check each combination if we know will be below the mass specified, or if the chain will always be too high for our desired target mass. For example, say we want all combinations of residues that have a mass between 500.3 and 500.5. In this case, no chain of 2 amino acids alone can reach this, because the largest residue we are working with has a mass of 186.12! Similarly, if the lightest chain possible is too big (e.g. a chain of glycines), we better stop right there because there is no way we will find any more combinations of residues. So we have a check for if the mass is too small (increment chain length by one and continue) or if the mass is too big (break and exit). These bounds keep our computation relatively sane! Here it is in action:
Finally we need to generate our combinations. So inside the loop, we use itertools to generate all the unique combinations of length N from our set of amino acids. itertools has two tools to perform this: combinations and combinations_with_replacements. combinations won’t allow a residue to be used more than once, so we choose to use combinations_with_replacements. This allows for combinations such as a polyalanine chain, but it also allows for many more combinations. So it can get pretty slow! If we have a length 10 chain, we get 10! combinations with combinations, but 10^10 combinations with combinations_with_replacements. Once we have our unique combinations, we sum the mass of each one, see if it matches the criteria, and if it does, we add it to a list that stores all the polypeptides we want. Here it is:
This function will then return a list of all the polypeptide chains that fit our mass criteria. All we have to do is print/write to file and go! This script gets pretty slow quite quickly, as we saw from the scaling criteria. We scale as N^N, where N is the length of our chain. I’m sure there is even more going on behind the scenes in the itertools module, but simply treating the creation of one unique combination as an “operation”, we have an incredibly slow scaling method. Do you know a better way to do this? Let me know! I’d love to hear how to make this more efficient, and I am certain that some computer scientist somewhere has explored this question before… Here is the whole script: