Automatic auxiliary basis set generation
The basis_set_exchange.auxgen subpackage implements the
pivoted-Cholesky procedure for automatically generating a primitive
auxiliary (density-fitting / resolution-of-the-identity) basis set
from a given orbital basis, following
S. Lehtola, Straightforward and accurate automatic auxiliary basis set generation for molecular calculations with atomic orbital basis sets, J. Chem. Theory Comput. 17, 6886 (2021), https://doi.org/10.1021/acs.jctc.1c00607.
S. Lehtola, Automatic generation of accurate and cost-efficient auxiliary basis sets, J. Chem. Theory Comput. 19, 6242 (2023), https://doi.org/10.1021/acs.jctc.3c00670.
The closed-form one-center Coulomb integrals are taken from R. M. Pitzer, Atomic self-consistent-field program by the basis set expansion method: Columbus version, Comput. Phys. Commun. 170, 239 (2005), https://doi.org/10.1016/j.cpc.2005.04.003.
The generated basis is written in the standard BSE component-format schema; any of the format writers available in the rest of the library can therefore be used to emit the result.
Algorithm overview
For each element the orbital basis is decontracted to primitives \(\chi_\mu(r) = r^{n_\mu} Y_{l_\mu m_\mu}(\hat r)\,e^{-\alpha_\mu r^2}\). Spherical and Cartesian shells are both accepted: a Cartesian shell of nominal angular momentum \(L\) contributes spherical components at \(l = L, L-2, \ldots,\) all sharing the same radial power \(n = L\). ECP entries on the element are ignored.
For every unordered pair of primitives \((\mu, \nu)\) (with the
diagonal \(\mu = \nu\) included) and every
angular momentum \(L \in \{|l_\mu - l_\nu|, |l_\mu - l_\nu| + 2,
\ldots, l_\mu + l_\nu\}\) (real-Gaunt parity), a candidate auxiliary
function is generated. The product radial form
\(r^{n_\mu + n_\nu} e^{-(\alpha_\mu + \alpha_\nu) r^2}\) is mapped
to a standard primitive \(r^L e^{-\alpha_{\rm eff} r^2}\) using the
<r>-matching transformation of the 2021 paper (Appendix II eq 16),
with \(n_{\rm rad} = n_\mu + n_\nu\). The candidates are then gathered per \(L\) into a Coulomb-overlap metric (paper eq 7), normalized to unit diagonal, and thinned by pivoted Cholesky.
Two variants of the four-index pre-screening are supported:
scheme='basic': every primitive pair is taken as a candidate.scheme='reduced'(default): shell-pair-driven pivoted Cholesky of the four-index(μν|ρσ)tensor is performed first to thin the list of contributing orbital shell-pairs. Within a chosen shell-pair, all m-resolved components are added together as in the ERKALE implementation.
To make the most of the unit-diagonal candidate metric (where the first pivot is degenerate), each \(L\) block runs pivoted Cholesky under three ordering families and keeps the shortest pivot set:
linear order (insertion order of the candidates),
increasing off-diagonal-norm presort (paper Sect. 3),
n_randomindependent random permutations (the Note Added in Proof of the 2021 paper).
The drop tolerance τ is applied directly to the residual diagonal
of the Cholesky factorisation (absolute threshold).
Two optional refinements from the 2023 paper are available (both on by default):
contract=Trueapplies the SVD-based general contraction (Section 2.1 of the 2023 paper). Per \(l\), the three-index integrals \(I_{\mu\nu, A} = (\mu\nu | A)\) and the two-index Coulomb metric \(V_{AB} = (A|B)\) are formed; the primitives are Coulomb-normalized (so the work is done in the well-conditioned unit-diagonal metric \(S = D^{-1} V D^{-1}\) rather than the ill-conditioned raw metric), \(S\) is symmetrically orthogonalized, and \(W = I^T I\) is diagonalized in that basis. Eigenvectors with eigenvalue above the thresholdεdefine general contractions, written in the overlap-normalized primitive convention used by basis-set libraries via ERKALE’s \(c = \mathrm{Wvec} \cdot \sqrt{z}\) rescaling.prune_lmax=Truedrops shells above the \(l_{\rm keep} = \max(2 l_{\rm occ}^{\max}, l_{\rm occ}^{\max} + l_{\rm OBS}^{\max} + l_{\rm inc})\) cap of the 2023 paper, eq 9. The default \(l_{\rm occ}^{\max}\) follows the row-based table of the paper (0 for H/He, 1 for \(Z \le 18\), 2 for \(Z \le 54\), 3 otherwise).
The size argument selects the standard accuracy presets of the 2023
paper, overriding contract_threshold (ε) and linc:
verylarge = \((\epsilon, l_{\rm inc}) = (10^{-6}, 1)\),
large = \((10^{-5}, 1)\), small = \((10^{-4}, 0)\).
Two further options control how each candidate orbital product is mapped to a standard auxiliary primitive, and how the orbital basis itself is fed into the selection candidate pool:
mappingselects the criterion for the \((n_{\rm rad}, \alpha_{\rm rad}) \to \alpha_{\rm eff}\) transformation in the candidate pool:'moment'(default) preserves the radial expectation value \(\langle r \rangle\) (the 2021 paper, Appendix II);'selfrepulsion'preserves the Coulomb self-energy \((i|i)\) of the (overlap-normalized) candidate and standard primitive. Both reduce to \(\alpha_{\rm eff} = \alpha_{\rm rad}\) when \(n_{\rm rad} = L\).collapse_contractions(opt-in, default ``False``) replaces each contracted orbital function by a single primitive before building the candidate pool (free primitives are kept; the contraction step still uses the true contracted AOs). Values:'moment'matches the radial expectation value of the contracted function;'selfrepulsion'matches its orbital Coulomb self-energy \((\chi\chi|\chi\chi)\). Both have closed-form solutions and reduce to \(\beta = \alpha\) for a single-primitive contraction.
Command-line interface
The bse autogen-aux subcommand reads an orbital basis from a file
and writes the generated auxiliary basis to a file:
bse autogen-aux <input_file> <output_file>
[--in-fmt FMT] [--out-fmt FMT]
[--threshold 1e-7]
[--scheme {basic,reduced}]
[--n-random 100] [--seed 0]
[--mapping {moment,selfrepulsion}]
[--collapse-contractions {moment,selfrepulsion}]
[--size {small,large,verylarge}]
[--contract | --no-contract] [--contract-threshold 1e-5]
[--prune-lmax | --no-prune-lmax] [--linc 1]
Input and output formats are auto-detected from the file extension unless overridden. All standard BSE writers (NWChem, Molcas, Psi4, Turbomole, JSON, …) are accepted on the output side.
Example: build an auxiliary basis for hydrogen, carbon, and oxygen from cc-pVTZ at a tight \(\tau = 10^{-7}\) tolerance:
bse get-basis cc-pVTZ nwchem --elements H,C,O > /tmp/ccpvtz.nw
bse autogen-aux /tmp/ccpvtz.nw /tmp/ccpvtz_aux.nw --threshold 1e-7
Contraction and lmax pruning are applied by default; build the
small standard preset instead:
bse autogen-aux /tmp/ccpvtz.nw /tmp/ccpvtz_aux_small.nw --size small
or generate the uncontracted primitive auxiliary basis:
bse autogen-aux /tmp/ccpvtz.nw /tmp/ccpvtz_aux_prim.nw \
--no-contract --no-prune-lmax
Note
The procedure derives the auxiliary set from the radial richness of the input orbital basis: minimal and contracted double-zeta sets only carry one or two distinct exponents per occupied angular momentum, which is too little information for the pivoted-Cholesky selection to produce a useful aux basis. Minimal-basis and double-zeta input (e.g. STO-3G, MINI, 6-31G, cc-pVDZ, def2-SVP) is therefore not recommended: the resulting auxiliary set will under-cover the radial space, leaving residual fitting errors that dwarf the orbital basis error itself. Use at least a triple-zeta orbital basis (e.g. cc-pVTZ, def2-TZVP, ANO-RCC-VTZP) – quadruple-zeta and larger parents give correspondingly better-conditioned auxiliary sets.
Python API
The high-level entry point produces a BSE component-format basis dict:
import basis_set_exchange as bse
from basis_set_exchange.auxgen import generate_auxiliary_basis
orbital = bse.get_basis('cc-pVTZ', elements=[1, 6, 8])
aux = generate_auxiliary_basis(
orbital,
threshold=1.0e-7,
scheme='reduced',
n_random=100,
# contract and prune_lmax default to True; pass False to disable.
)
# `aux` is a dict in the BSE component schema; emit any format:
print(bse.write_formatted_basis_str(aux, 'nwchem'))
The size presets can also be reached straight from
basis_set_exchange.get_basis() via the get_aux argument, which
goes through the same code path:
aux_small = bse.get_basis('cc-pVTZ', elements=[6], get_aux='cholesky-small')
aux_large = bse.get_basis('cc-pVTZ', elements=[6], get_aux='cholesky-large')
aux_verylarge = bse.get_basis('cc-pVTZ', elements=[6], get_aux='cholesky-verylarge')
get_aux=None (the default) returns the orbital basis; the strings
'autoaux' and 'autoabs' reach the legacy non-Cholesky generators.
For back-compatibility the integer aliases 0..``5`` are also accepted
(0 = orbital, 1 = autoaux, 2 = autoabs, 3..``5`` =
cholesky-small/large/verylarge). The cholesky-* modes require
numpy at runtime, plus wignernj (preferred) or sympy for the
Gaunt evaluator; the rest of the package has no optional-import
requirements.
The CLI mirrors this:
bse get-basis cc-pVTZ nwchem --elements C --get-aux cholesky-large
A per-element entry point is also exported for use cases that already have an element dict to hand:
from basis_set_exchange.auxgen.auxgen import (
generate_auxiliary_basis_for_element,
)
aux_c = generate_auxiliary_basis_for_element(
orbital['elements']['6'], # cc-pVTZ or larger
threshold=1.0e-7,
scheme='reduced',
)
Parameters
Parameter |
Default |
Description |
|---|---|---|
|
|
Pivoted-Cholesky drop tolerance \(\tau\) (absolute, applied to the residual diagonal). The same tolerance is used for the four-index pre-screening (reduced scheme) and the per-L candidate Cholesky. |
|
|
|
|
|
Number of random orderings tried in the per-L candidate
Cholesky. In addition the linear order and the off-diagonal-
norm presort are always tried; the smallest pivot set across
all attempts is kept. Setting |
|
|
Seed for the random orderings, for reproducibility. |
|
|
Apply the SVD-based general contraction of the 2023 paper. |
|
|
Eigenvalue cutoff \(\epsilon\) for keeping contractions
(matches the |
|
|
Drop shells with \(L > l_{\rm keep}\) per the 2023 paper eq 9. |
|
|
Increment \(l_{\rm inc}\) in the pruning rule. |
STO (Slater-type-orbital) driver
The companion module basis_set_exchange.auxgen.sto runs the same
pivoted-Cholesky procedure for STO orbital bases, using Pitzer’s
closed-form one-centre Slater two-electron radial integral instead of
the Gaussian one. Primitives are
\(\chi_{n L M}(r) = r^{n - 1} e^{-\zeta r}\,Y_{L M}(\hat r)\) with
integer n >= L + 1; the I/O format mirrors what ADF ships in
atomicdata/ZORA/.
Python:
from basis_set_exchange.auxgen.sto import (
generate_sto_auxiliary_basis, sto_diagonal_ri_error,
)
# Per-element STO orbital basis: {L: [(n, zeta), ...]}. As with the
# GTO driver, triple-zeta or larger parent sets are recommended; the
# ADF TZ2P hydrogen set is used here for illustration.
orbital = {
0: [(1, 0.76), (1, 1.28), (2, 1.97)], # H TZ2P S shells
1: [(2, 1.25), (2, 2.50)], # H TZ2P P shells
}
aux = generate_sto_auxiliary_basis(orbital, threshold=1.0e-7)
err = sto_diagonal_ri_error(orbital, aux)
print('total diagonal RI error =', err)
By default each candidate keeps its natural radial power
n_aux = n_a + n_b - 1 from the orbital product (so a Cholesky
selection on a typical orbital basis emits 1S/2S/3S/…, 2P/3P/…,
3D/4D/… at every L, like standard STO aux sets). Pass
compact=True to collapse every candidate onto the minimum-power
form n_aux = L + 1 via the <r>-matching map
sto_zeta_eff(). Pass
prune_lmax=True together with lmax_occ=<int> to apply the same
\(l_{\rm keep}\) cap as the GTO driver.
The module also includes a small ADF basis-file reader/writer and a
main() entry point:
# Run as a script (`-m` invocation also works):
python -m basis_set_exchange.auxgen.sto Fe Fe_new.adf --threshold 1e-7
# Available flags
# --threshold TAU Cholesky drop tolerance (default 1e-7)
# --scheme {basic,reduced} default reduced
# --n-random N --seed S randomised pivot orderings
# --prune-lmax [--linc N] [--lmax-occ N]
# --compact collapse to n = L+1 via <r>-matching
# --no-benchmark skip the diagonal-RI-error reports
The driver reports the diagonal RI error of both the old (incoming) and the new (regenerated) FIT block for direct comparison. The output file preserves the BASIS, CORE and DESCRIPTION sections from the source; the FITCOEFFICIENTS block is dropped because it refers to the old FIT exponents.
Notes
The generated auxiliary basis is always spherical; this matches the convention adopted by the 2021 paper and by most quantum chemistry programs. Cartesian input shells are supported (with their radial contamination correctly tracked) but produce spherical output.
Sympy is used only for the real-spherical Gaunt coefficients in
basis_set_exchange.auxgen.gaunt, and is lazy-imported. The radial integrals are evaluated with a pure closed form (no sympy at runtime).The PySCF-based test
basis_set_exchange/tests/test_auxgen.py::test_eri_full_tensor_vs_pyscf_spdcross-checks every one-center primitive(ab|cd)integral over an s/p/d basis againstlibcintto machine precision.The base package has no runtime dependencies. Using
auxgenrequiresnumpy, plus eitherwignernj(preferred, exact integer arithmetic) orsympyfor the Gaunt evaluator – both are lazy-imported on first call.