Understanding the Relativistic Dirac Equation for a Free Particle

Dirac

According to Einstein’s special theory of relativity, the relation between energy (\({E}\)) and momentum (\({\mathbf{p}}\)) is given by the expression
\(E^2 = \mathbf{p}^2 c^2 + m_0^2 c^4 \ \ \ \ \ (1)\)

This relates a particle’s rest mass (\({m_0}\), which we will call \({m}\) from here on out), momentum, and total energy. This equation sets time and space coordinates on equal footing (a requirement for Lorentz invariance), and is a suitable starting point for developing a relativistic quantum mechanics. Now, when we derive non-relativistic quantum mechanics (the Schrödinger equation) we make use of the correspondence principle and replace classical momentum and energy with their quantum mechanical operators
\(E \rightarrow i\hbar \frac{\partial}{\partial t} \ \ \ \ \ (2)\)

and
\(\mathbf{p} \rightarrow -i \hbar \nabla \ \ \ \ \ (3)\)

It would make sense then, to extend this idea and substitute the operators into the relativistic energy-momentum expression. Doing so gives

\[\left(i\hbar \frac{\partial}{\partial t}\right)^2 = \left(-i\hbar \nabla \right)^2 c^2 + m^2 c^4 \ \ \ \ \ (4)\]

which reduces to (and is more commonly seen as)

\[\left( \frac{1}{c^2}\frac{\partial^2}{\partial t^2} - \nabla^2 + \frac{m^2 c^2}{\hbar^2} \right)\psi = 0 \ \ \ \ \ (5)\]

Where \({\psi}\) is our wave function. This equation is known as the Klein-Gordon equation and was one of the first attempts to merge special relativity with the Schrödinger equation. A couple things you might note about the Klein-Gordon equation right off the bat: first, it treats time and position equally as second order derivatives. This was one of the flaws in the non-relativistic Schrödinger equation, in that time was first order but position was second order. To be Lorentz invariant, and therefore a proper relativistic theory, both coordinates must be treated equally. The second thing you might notice is that the equation is spinless. Now, it turns out that the Klein-Gordon equation fails as a fundamental equation on account of not having a positive definite probability density. Probability can be negative in the Klein-Gordon equation! (Why? It turns out that having a time second derivative is the culprit. When integrating with respect to time, you can choose two independent integration constants for the wave function and its time derivative, which allow for both positive and negative probability densities.) Now, Feynman later re-interpreted the Klein-Gordon equation as an equation of motion for a spinless particle, so it isn’t completely useless. It also turns out that the Dirac equation (which is a fundamental equation) solutions will always be solutions for the Klein-Gordon equation, just not the other way around.
Okay, so the first attempt at deriving a relativistic Schrödinger equation didn’t quite work out. We still want to use the energy-momentum relation, and we still want to use the correspondence principle, so what do we try next? Since the second order time derivatives caused the problems for Klein-Gordon, so let’s try an equation of the form

\[E = \sqrt{\mathbf{p}^2 c^2 + m^2 c^4} = c\sqrt{\mathbf{p}^2 + m^2 c^2} \ \ \ \ \ (6)\]

If we then insert the corresponding operators, we get
\(i\hbar\frac{\partial}{\partial t} = c \sqrt{-\hbar^2 \nabla^2 + m^2c^2} \ \ \ \ \ (7)\)

Now as you might have guessed, this form presents some problems because we can’t easily solve for the square root of the squared momentum operator. We could try a series expansion of the form
\(i\hbar\frac{\partial}{\partial t} = mc^2 \sqrt{1 - \left(\frac{\hbar \nabla}{mc}\right)^2} = mc^2 - \frac{\hbar^2 \nabla^2}{2m} + \frac{\hbar^4 \nabla^4}{8m^3c^2} + \cdots \ \ \ \ \ (8)\)

But this gets quite complicated. Moreover, to be soluble we have to truncate at some point, and any truncation will destroy our Lorentz invariance. So we must look elsewhere to solve this problem. Dirac’s insight was to go back to the argument of the square root operator and assume that it could be written as a perfect square. In other words, determine \({\mathbf{\alpha}}\) and \({\beta}\) such that
\(\mathbf{p}^2 + m^2c^2 = \left(\mathbf{\alpha} \cdot \mathbf{p} + \beta m c\right)^2 \ \ \ \ \ (9)\)

If we do this, the square root goes away, and we get the Dirac equation for a free particle.
\(i\hbar\frac{\partial}{\partial t} \psi = c \alpha \cdot \left( - i \hbar \nabla \right) \psi + \beta mc^2 \psi \ \ \ \ \ (10)\)

Of course, we have said absolutely nothing about what \({\alpha}\) and \({\beta}\) actually are. We can see that \({\alpha}\) is going to have three components, just like momentum has \({x,y,z}\), and that \({\beta}\) will have a single component. To satisfy the perfect square’’ constraint, we can distill’’ out three constraints.
\(\alpha_i^2 = \beta^2 = 1, \qquad \alpha_i\alpha_j = - \alpha_j \alpha_i, i \neq j, \qquad \alpha_i\beta = -\beta\alpha_i \ \ \ \ \ (11)\)

Well, there is no way we are going to satisfy the constraints with \({\alpha}\) and \({\beta}\) as scalars! So let’s look to rank-2 matrices, such as the Pauli matrices:
\(\sigma_x = \begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix} \ \ \ \ \ (12)\)

\[\sigma_y = \begin{bmatrix} 0 & -i \\ i & 0 \end{bmatrix} \ \ \ \ \ (13)\] \[\sigma_z = \begin{bmatrix} 1 & 0 \\ 0 & -1 \end{bmatrix} \ \ \ \ \ (14)\]

and the identity
\(\mathbf{I}_2 = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} \ \ \ \ \ (15)\)

Well, while it might seem like these would work, unfortunately, we have to include the identity. Because the identity commutes with all of the rank-2 matrices, we will never get the proper commutation relations to satisfy the constraints above. So we must look to a higher rank. It turns out that the lowest rank that satisfies the Dirac equation constraints is the set of rank-4 unitary matrices
\(\alpha_k = \begin{bmatrix} \mathbf{0}_2 & \sigma_k \\ \sigma_k & \mathbf{0}_2 \end{bmatrix}, k = x,y,z \ \ \ \ \ (16)\)

and
\(\beta = \begin{bmatrix} \mathbf{I}_2 & \mathbf{0}_2 \\ \mathbf{0}_2 & \mathbf{I}_2 \end{bmatrix} \ \ \ \ \ (17)\)

A few notes before we wrap things up here. First, these matrices are by no means unique. This representation of the \({\alpha}\) matrices is known as the standard representation. Any other representation is just a similarity transformation away! For example, another representation is known as the Weyl representation. We also got the matrices purely by mathematical means. The use of Pauli matrices surely hints at a connection to spin. In the words of Dirac (1928): The \({\alpha}\)’s are new dynamical variables which it is necessary to introduce in order to satisfy the conditions of the problem. They may be regarded as describing some internal motions of the electron, which for most purposes may be taken as the spin of the electron postulated in previous theories’’. You’ll also note that the \({\alpha}\)’s are connected to the momentum operator, which suggests a coupling between spin and momentum. As for the \({\beta}\), it can be interpreted as the inverse of the Lorentz \({\gamma}\) factor in special relativity. Finally, you will also see that the solutions to the equation (which we haven’t solved yet) will be rank-4. The wave function will be rank-4 as well, leading to four-component methods. Single electron wave functions of this type are known as 4-spinors’’. (Spinor is the relativistic analog to orbital.)

Questions? Comments? Find any errors? Let me know!

"We can calculate everything"

“This dogma of modern numerical quantum chemistry (Clementi, 1973) has its own origin in the following famous statement by P.A.M. Dirac (1929a) dating from the pioneer time of quantum mechanics: “The underlying physical laws necessary for the mathematical theory of a larger part of physics and the whole of chemistry are thus completely known, and the difficulty is only that the application of these laws leads to equations much too complicated to be soluble.’‘ As we know today, this claim is not correct in at least two respects. Firstly, the formalism of pioneer quantum mechanics of 1929 is a very special case of modern (generalized) quantum mechanics which we now regard as fundamental for every theory of molecular matter. Secondly, the interpretative problems had not been solved in a satisfactory way in 1929. At that time, the conceptual problems of reducing chemical theories to the epistemologically very differently structured quantum mechanics were not discussed, and Dirac viewed the problem of reduction only as a very complicated mathematical problem.

The discovery of the electric nature of chemical forces by quantum mechanics took the valence problem out of its chemical isolation and made it accessible to an exact mathematical treatment. The last fifty years taught us a lot about the relation between chemistry and the Schrodinger equation, and have led to a penetrating understanding of chemical structures. Since the advent of electronic computers, numerical quantum chemistry has been a tremendous success. Nowadays we have a number of masterful analyses of electronic wave functions. Many calculations have been extremely sophisticated, designed by some of the foremost researchers in this field to extract a maximum amount of insight from quantum theory. For simple molecules, outstanding agreement between calculated and measured data has been obtained. Yet, the concept of a chemical bond could not be found anywhere in these calculations. We can calculate bonding energies without even knowing what a chemical bond is!

There is a danger to be sidetracked into purely numerical problems and to forget the original impetus of our enterprise: understanding the behavior of matter. We should not confuse a useful theoretical tool with theory. Numerical quantum mechanics is a most important tool for chemistry but it cannot replace thinking. The final explanation of empirical facts is not achieved by merely calculating measurable quantities from first principles. The ultimate objective of a theory is not to determine numbers but to create a large, consistent abstract structure that mirrors the observable phenomena. We have understood a phenomenon only when we can show that it is an immediately obvious and necessary part of a larger context. The vision of some theoreticians has been narrowed down to problems that can be formulated numerically. Some even take no notice of genuine chemical and biological patterns or deny them scientific significance since they are not computable by their methods. Such a one-sided view of numerical quantum chemistry is by no means the inevitable penalty for the attempt to reduce chemistry to physics. Rather, it is the result of the utilitarian character of contemporary research which lost all philosophical ambitions and has only a very restricted insight into its own limitations.

The important concepts of chemistry have never been well-treated by ab-initio quantum chemistry so that quantum mechanics has not become the primary tool in the chemist’s understanding of matter. Brute-force numerical quantum chemistry can hardly do justice to the qualitative features of chemistry. But without insight into the qualitative concepts we are losing chemistry. The allegedly basic methods often fail to illuminate the essential function of a molecule or a reaction that is evident to the experimental scientists. As a result practical chemists had to look for inspiration elsewhere and generated the ad-hoc semiempirical methods of quantum chemistry. This approach “has become a part of the chemical structure theory, which has an essentially empirical (inductive) basis; it was suggested by quantum mechanics, but it is no longer just a branch of quantum mechanics’’ (Linus Pauling, 1959, p.388). Despite the erudition, imagination and common sense used to create the semiempirical methods of quantum chemistry, the success of this craft remains a central enigma for the theoreticians. The models of semiempirical quantum chemistry are built upon an inadequate conceptual basis, and their mathematical foundation is so wobbly that they are a source of frustration. Moreover, they give us an image of matter that does not conform to what we are led to expect from the first principles of quantum mechanics. But experimentalists are not at all impressed by such scruples. And properly so.

Some contemporary theoreticians have attempted to narrow the scope of scientific inquiry by requiring operational definitions and first-principle underpinnings for all concepts. In theoretical chemistry, there is a distinct tendency to throw out typically chemical variables, admitting that they have served a noble purpose in the past but asserting that they are now obsolete. The premise underlying such a view is that the only valid meaning of any chemical concept is one which can be unequivocally defined in terms of present-day ab-initio quantum chemistry. This method of problem solving by rejection has been proposed for such concepts as valence, bond, structure, localized orbitals, aromaticity, acidity, color, smell, water repellence etc. A particularly powerful variant is the method of problem solving by dissolving definitions. Using this procedure we can easily solve even the “problem of life’’ by claiming that the distinction between living and non-living entities is a pre-scientific idea, obliterated by modern research. But some people consider such a line of action as unfair and shallow. We need a creative approach to the rich field of chemistry, not just a critical one that bids us to dismiss most problems of chemistry as meaningless. The task of theoretical chemistry is to sharpen and explain chemical concepts and not to reject a whole area of inquiry.”

– Hans Primas, Chemistry, Quantum Mechanics and Reductionism (1983) [emphasis mine]

Over thirty years old, but Primas’ critique of theoretical chemistry remains relevant.

New paper published!

I have a new paper published in the latest issue of Advances in Quantum Chemistry, called “ Self-Consistent Field using Direct Inversion in Iterative Subspace Method and Quasi-Newton Vectors”. Quite the mouthful, but hey, it is science. The idea behind the paper was to “steal” some ideas from geometry optimizations, and apply them to the problem of converging an SCF calculation (that is, a Hartree-Fock or DFT calculation). The general acceleration method for converging SCF type equations is to use an algorithm called DIIS (pronounced “dice”). Without going into the mathematics, DIIS is a way of improving your iterative guess at each step of the calculation. A really cool introduction can be found online at Dr. David Sherrill’s website. Generally, it brings a modest speed-up to your calculation, and some form of DIIS is always used nowadays in electronic structure calculations. Our twist was to use a different type of vector (a quasi-Newton vector, which gets a lot of use in geometry optimizations) to help the DIIS acceleration. Practically, we found that using a QN-DIIS method got a Hartree-Fock calculation to converge faster by a few iterations. That’s pretty cool!

You can read the paper here, but it’s probably behind a paywall for most of you. If you are really interested (it won’t hurt my feelings if you are not), feel free to email me and I’ll send you a reprint.

Computing absorption spectra

Light_dispersion_conceptual_waves

Ab initio calculations of UV-Vis spectra can be kind of daunting, and there are many methods available. Here are a few of the most popular methods you’ll find in the literature (and in packages like Gaussian, or CFOUR, etc), along with my take on them.

  1. Configuration Interaction Singles (CIS). This is perhaps the most basic method available, and the basis idea is to treat an electronic excitation as a (linear) combination of singly excited determinants. In other words, we take the Hartree-Fock wavefunction and place one of the electrons in an occupied orbital into a virtual, unoccupied one. We do this for every possible single excitation out of our Hartree-Fock wavefunction. From these singly excited determinants, we make combinations such that the energy is minimized. Just like we take linear combinations of atomic orbitals to get our Hartree-Fock wavefunction, we take linear combinations of singly excited determinants to get our CIS wavefunction. It’s important to note that because of Brillouin’s theorem, the ground state and singly excited determinants don’t interact. The practical result is that CIS excitation energies don’t account for electron correlation. They are cheap to compute — about as much as a Hartree-Fock calculation — but you can often expect them to fail, even qualitatively, for many cases.
  2. Time Dependent Hartree Fock (TDHF). Since CIS fails to include electron correlation, can we find a way to include it? The simplest way is through TDHF, which treats the process of electron excitation as a perturbation. TDHF is derived using linear response theory. Practically, it includes some selected off diagonal’’ elements that mix with the CIS wavefunction. It turns out that TDHF more or less has the same computational cost as CIS, so if you can afford to do a CIS, you are better off doing TDHF.
  3. Time Dependent Density Functional Theory (TDDFT). Similar to TDHF, if we do a linear response calculation on a DFT reference instead of a HF wavefunction, we get time dependent density functional theory. This is hands-down your best choice for most practical calculations. It has roughly the same cost as a CIS or TDHF calculation, so you are going to want to choose TDDFT one over the other two. It is far from perfect, as I’ll explain, but the critical observation is that a DFT reference (it isn’t really a wavefunction…) contains electron correlation, unlike Hartree-Fock. Practically, this carries over to a TDDFT calculation and it is possible to get absorption energies really close to experimental values. Of course I say possible. The problem is that we have no idea whatsoever what functional to use to reproduce experimental results. This makes predictions with TDDFT notoriously tricky, and really you have to verify your results with experiment. So it’s use is limited. However, with a cleverly chosen functional, I have seen TDDFT results perform better than many other more complicated methods. The downside is that you don’t know the functional — so between that and the density basis (as opposed to wavefunction based methods like Hartree Fock, etc) there is no way to systematically improve TDDFT.
  4. Equation of Motion and Linear Response Coupled Cluster (EOM-CC, LR-CC). Recall how I just said that while we can’t systematically improve TDDFT, we can improve TDHF? EOM methods are just those systematic improvements. Instead of a Hartree-Fock reference we use a coupled cluster reference, which does account for electron correlation (and often quite well, I might add). There are two methods in particular I want to note. The first is called CC2, or LR-CC2, which cost-wise is the next step up from TDHF. TDHF (and TDDFT and CIS) scale as \(O(N^4)\), which means that as you increase your basis set (N is the number of basis functions) the cost increases as that number to the fourth. In contrast, CC2 scales as \(O(N^5)\). So whatever amount of electron correlation you get from the CC2 reference, you pay for in increased computational cost. If you are doing work with small- to medium-sized molecules, however, this is the method you want to use. Aside from a few failures, it generally blows TDDFT out of the water. TDDFT fails to describe charge-transfer states and systems where dispersion forces are prominent (say, clusters or DNA stacking), whereas CC2 does not have these failures. I haven’t seen this method used too much, but if it is available, I really recommend at least trying it out for your system, especially if it has the features I just mentioned. The other one is EOM-CCSD, which treats electron correlation in a much more balanced manner. More detailed description is beyond the scope here (coupled cluster literature can be very challenging to decipher!), but suffice to say that it is easily the best method I will describe here. Unfortunately it scales as \(O(N^6)\), which makes it unusable for many practical computations. You’re really limited to small to medium sized molecules with a decent basis set. For example, I can run an EOM-CCSD calculation on a molecule with ~200 basis functions in maybe 8 hours with a single processor. Your mileage may vary.

In summary: use EOM-CCSD if you can, then CC2 if available, then TDDFT. I can’t see a practical use for TDHF/CIS, frankly. Of course, your level of accuracy may not be as stringent, so in most cases, TDDFT is your best bet. Just be sure and do your homework when choosing a functional!

Now a quick note about basis sets: when doing excited state calculations, always, always, ALWAYS use diffuse functions. If I see an excited state calculation, the first thing I’ll check is if the basis includes diffuse functions. You can read more on the why in this article. While the choice of basis is situation dependent, I recommend starting with a 6-31+G*. The + symbol means a set of diffuse functions added to the heavy elements. (The * means polarization functions – in general very good to have). You might also see diffuse functions in the Dunning sets as an ‘aug-‘ prefix. Use these; say, start with aug-cc-pVDZ.

Have any more questions, need some references, or find any errors? Let me know! I’m glad to help.

(NB. Astute readers might have noticed I only mentioned single reference methods. If you know enough to wonder if a multi-reference calculation is necessary for your problem, you probably don’t need my advice anyway :) ).

Research code dump!

compy
I’ve finally gotten around to cleaning up my home-brewed research code (at least a little bit), and I’ve hosted it here on my GitHub. I’m calling the project pyqchem, and it contains Hartree Fock, MP2, coupled cluster and a smattering of excited state methods. I’d love it if you check it out!

You’ll notice that I still don’t have a way of reading in molecular geometry and basis set, and so I am still reliant on parsing G09 outputfor the atomic orbital integrals. Oh well, it’s a future project :) I’d also love to learn how to make it object oriented, as well as optimize the lower level routines in C/C++. It is so far far from optimized right now it is ridiculous. I think my EOM-CCSD code scales as N^8 instead of N^6, just because I haven’t factored the equations right. But it works, and it has helped me learn so much about electronic structure methods. If you get the chance to look at it, I would love input for the future. I’m still developing as a programmer and I could use all the help I can get!

I hosted a lot of premade molecules (and their AO integrals), so all you have to do is execute pyqchem.py followed by the folder of interest.

For example:

 >>$ python pyqchem.py h2_3-21G 

Would execute the pyqchem.py script on the folder h2_3-21G, which contains all the precomputed atomic-orbital basis integrals. The type of calculation is changed by editing the pyqchem.py script itself. Say I wanted to perform an MP2 calculation on H2 in a 3-21G basis. I open pyqchem.py, and edit at the top:

  

""" Edit below to perform the calculation desired  
"""

do_DIIS = True         # DIIS acceleration (just keep on)  
do_ao2mo = True        # Set to true so we use our optimized (MO) orbitals from SCF  
do_mp2 = True          # Set to True so we do an MP2  
do_cistdhf = False     # Set to False so we don't do a CIS/TDHF  
do_ccsd = False        # Set to False so we don't do CCSD  
do_eomccsd = False     # Set to False so we don't do EOM-CCSD  
do_eommbpt2 = False    # Set to False so we don't do EOM-MBPT2  
do_eommbptp2 = False   # Set to False so we don't do EOM-MBPT(2)  
do_eommbptd = False    # Set to False so we don't do EOM-MBPT(D)  
printops = True        # True prints more stuff  
convergence = 1.0e-8   # Our iterative convergence criteria

Then run:

 >>$ python pyqchem.py h2_3-21G  

And you’ll see the pretty output dump to your terminal :)

That’s all there is to it! Enjoy!