Computing absorption spectra

Light_dispersion_conceptual_waves

Ab initio calculations of UV-Vis spectra can be kind of daunting, and there are many methods available. Here are a few of the most popular methods you’ll find in the literature (and in packages like Gaussian, or CFOUR, etc), along with my take on them.

  1. Configuration Interaction Singles (CIS). This is perhaps the most basic method available, and the basis idea is to treat an electronic excitation as a (linear) combination of singly excited determinants. In other words, we take the Hartree-Fock wavefunction and place one of the electrons in an occupied orbital into a virtual, unoccupied one. We do this for every possible single excitation out of our Hartree-Fock wavefunction. From these singly excited determinants, we make combinations such that the energy is minimized. Just like we take linear combinations of atomic orbitals to get our Hartree-Fock wavefunction, we take linear combinations of singly excited determinants to get our CIS wavefunction. It’s important to note that because of Brillouin’s theorem, the ground state and singly excited determinants don’t interact. The practical result is that CIS excitation energies don’t account for electron correlation. They are cheap to compute — about as much as a Hartree-Fock calculation — but you can often expect them to fail, even qualitatively, for many cases.
  2. Time Dependent Hartree Fock (TDHF). Since CIS fails to include electron correlation, can we find a way to include it? The simplest way is through TDHF, which treats the process of electron excitation as a perturbation. TDHF is derived using linear response theory. Practically, it includes some selected off diagonal’’ elements that mix with the CIS wavefunction. It turns out that TDHF more or less has the same computational cost as CIS, so if you can afford to do a CIS, you are better off doing TDHF.
  3. Time Dependent Density Functional Theory (TDDFT). Similar to TDHF, if we do a linear response calculation on a DFT reference instead of a HF wavefunction, we get time dependent density functional theory. This is hands-down your best choice for most practical calculations. It has roughly the same cost as a CIS or TDHF calculation, so you are going to want to choose TDDFT one over the other two. It is far from perfect, as I’ll explain, but the critical observation is that a DFT reference (it isn’t really a wavefunction…) contains electron correlation, unlike Hartree-Fock. Practically, this carries over to a TDDFT calculation and it is possible to get absorption energies really close to experimental values. Of course I say possible. The problem is that we have no idea whatsoever what functional to use to reproduce experimental results. This makes predictions with TDDFT notoriously tricky, and really you have to verify your results with experiment. So it’s use is limited. However, with a cleverly chosen functional, I have seen TDDFT results perform better than many other more complicated methods. The downside is that you don’t know the functional — so between that and the density basis (as opposed to wavefunction based methods like Hartree Fock, etc) there is no way to systematically improve TDDFT.
  4. Equation of Motion and Linear Response Coupled Cluster (EOM-CC, LR-CC). Recall how I just said that while we can’t systematically improve TDDFT, we can improve TDHF? EOM methods are just those systematic improvements. Instead of a Hartree-Fock reference we use a coupled cluster reference, which does account for electron correlation (and often quite well, I might add). There are two methods in particular I want to note. The first is called CC2, or LR-CC2, which cost-wise is the next step up from TDHF. TDHF (and TDDFT and CIS) scale as \(O(N^4)\), which means that as you increase your basis set (N is the number of basis functions) the cost increases as that number to the fourth. In contrast, CC2 scales as \(O(N^5)\). So whatever amount of electron correlation you get from the CC2 reference, you pay for in increased computational cost. If you are doing work with small- to medium-sized molecules, however, this is the method you want to use. Aside from a few failures, it generally blows TDDFT out of the water. TDDFT fails to describe charge-transfer states and systems where dispersion forces are prominent (say, clusters or DNA stacking), whereas CC2 does not have these failures. I haven’t seen this method used too much, but if it is available, I really recommend at least trying it out for your system, especially if it has the features I just mentioned. The other one is EOM-CCSD, which treats electron correlation in a much more balanced manner. More detailed description is beyond the scope here (coupled cluster literature can be very challenging to decipher!), but suffice to say that it is easily the best method I will describe here. Unfortunately it scales as \(O(N^6)\), which makes it unusable for many practical computations. You’re really limited to small to medium sized molecules with a decent basis set. For example, I can run an EOM-CCSD calculation on a molecule with ~200 basis functions in maybe 8 hours with a single processor. Your mileage may vary.

In summary: use EOM-CCSD if you can, then CC2 if available, then TDDFT. I can’t see a practical use for TDHF/CIS, frankly. Of course, your level of accuracy may not be as stringent, so in most cases, TDDFT is your best bet. Just be sure and do your homework when choosing a functional!

Now a quick note about basis sets: when doing excited state calculations, always, always, ALWAYS use diffuse functions. If I see an excited state calculation, the first thing I’ll check is if the basis includes diffuse functions. You can read more on the why in this article. While the choice of basis is situation dependent, I recommend starting with a 6-31+G*. The + symbol means a set of diffuse functions added to the heavy elements. (The * means polarization functions – in general very good to have). You might also see diffuse functions in the Dunning sets as an ‘aug-‘ prefix. Use these; say, start with aug-cc-pVDZ.

Have any more questions, need some references, or find any errors? Let me know! I’m glad to help.

(NB. Astute readers might have noticed I only mentioned single reference methods. If you know enough to wonder if a multi-reference calculation is necessary for your problem, you probably don’t need my advice anyway :) ).