Efficient Two-Electron Integral Transformations in Python, or, Adventures in Scaling
Low-scaling algorithms are important in any computational development, and this is never more true than in quantum chemistry. A golden example is in the integral transformation routines in electronic structure packages. Without a smarter algorithm, the transformations scale as \(O(N^8)\) (yikes!!), making computations on large systems nearly impossible. But with a little thought, we can reign in that transformation to a (not nearly as bad) \(N^5\) method.
If you have performed a Hartree-Fock (HF) calculation, you are left with a matrix of coefficients. Mathematically, these coefficients are the eigenvectors that correspond to your HF Hamiltonian (Fock matrix). The eigenvalues are your orbital energies. So far so good. The real accurate (and interesting) work in electronic structure theory comes afterward, where we either refine the energies we have obtained or calculate molecular properties like UV-Vis spectra and so on. Most of these methods – actually, all these methods – require transforming your two electron integrals from an atomic orbital basis (your standard run-of-the-mill basis functions are your AO basis) into a molecular orbital basis. Put another way, now that you have solved the wavefunction of the Hartree-Fock system, you need to project your two electron integrals onto that wavefunction. The way you do this looks like so:
\[(pq\vert rs)=\sum_\mu\sum_\nu\sum_\lambda\sum_\sigma C^{p}_\mu C^{q}_\nu C^{r}_\lambda C^{s}_\sigma(\mu\nu\vert \lambda\sigma)\]In code, that looks something like:
Holy cow. EIGHT loops of dimension number of basis functions. That should scale on the order of \(N^8\). Few methods in electronic structure theory scale worse than that (though they do exist…here’s lookin’ at you Full CI). This means that for most calculations, if you use this method of integral transformation, this transformation will be your most expensive step. Thankfully, we can come up with a smarter way to do the above transformation. Because the coefficients are independent (i.e \(C^{p}_\mu\) is independent of \(C^{q}_\nu\)), we can rewrite our first equation as
\[(pq\vert rs)=\sum_\mu C^{p}_\mu [\sum_\nu C^{q}_\nu [\sum_\lambda C^{r}_\lambda [\sum_\sigma C^{s}_\sigma(\mu\nu\vert \lambda\sigma)]]]\]Or, like this, which is a lot clearer to me:
\[(\mu\nu\vert \lambda s)=\sum_\sigma C^{s}_\sigma(\mu\nu\vert \lambda\sigma)\] \[(\mu\nu\vert rs)=\sum_\lambda C^{r}_\lambda(\mu\nu\vert \lambda s)\] \[(\mu q\vert rs)=\sum_\nu C^{q}_\nu(\mu\nu\vert rs)\] \[(pq\vert rs)=\sum_\mu C^{p}_\mu(\mu q\vert rs)\]Where we perform four “quarter-transformations”, save each transformation, and use in the next transformation. Doing it this way gives us four steps of 5-dimension loops, so it should scale on the order of \(N^5\). In code, that looks like:
Much nicer. You’ll notice that we had to pre-allocate the ‘temp’ matrices, to store the results between quarter transformations. The transformation also makes use of the ‘slice’ notation in Python/ NumPy. Using this, we perform a transformation over a whole dimension, instead of one index at a time. It’s a little weird to be working with full dimensions, instead of just indices, but it works well. Here is the full code, with random integer arrays built in to act as our toy four-dimensional integrals. The toy matrix of coefficients, is, like all matrices, 2D. I built in a check, so you can compare the two methods. It spits out two transformed integrals with randomly chosen indices – if/when you run it, you should make sure that the values match. If they don’t, something is wrong!
When I ran the code, moving from a dimension of 4 to a dimension of 8 (e.g I doubled the basis functions), the first method went from 0.29 seconds to 71.5 seconds, a jump of 246 times longer, versus the second method, which went from 0.01 seconds to 0.29 seconds, a jump of only 29 times. This is almost exactly as predicted. Doubling the basis for an \(N^8\) method gives \(2^8 = 256\) times longer, and doubling the basis for an \(N^5\) algorithm gives \(2^5 = 32\) times longer.
The new method is also very amenable to parallelization. Most electronic structure computations need to be parallelized, as model systems get larger and larger. Note that the \(N^5\) method is performed in four independent steps. Because of this independence, we can make our code run in parallel, and perform the quarter transformations on separate processors.
A great example why choosing your algorithm matters in quantum chemistry!