Developer Curation API

curate - Functions for helping to curate data

Functions for helping curate BSE basis set data

basis_set_exchange.curate.add_basis(bs_file, data_dir, subdir, file_base, name, family, role, description, version, revision_description, data_source, refs=None, file_fmt=None)

Add a basis set to this library

This takes in a single file containing the basis set is some format, parses it, and create the component, element, and table basis set files in the given data_dir (and subdir). The metadata file for the basis is created if it doesn’t exist, and the main metadata file is also updated.

Parameters:
  • bs_file (str) – Path to the file with formatted basis set information

  • data_dir (str) – Path to the data directory to add the data to

  • subdir (str) – Subdirectory of the data directory to add the basis set to

  • file_base (str) – Base name for new files

  • name (str) – Name of the basis set

  • family (str) – Family to which this basis set belongs

  • role (str) – Role of the basis set (orbital, etc)

  • description (str) – Description of the basis set

  • version (str) – Version of the basis set

  • revision_description (str) – Description of this version of the basis set

  • data_source (str) – Description of where this data came from

  • refs (dict or str) –

    Mapping of references to elements. This can be a dictionary with a compressed string of elements as keys and a list of reference strings as values. For example, {‘H,Li-B,Kr’: [‘kumar2018a’]}

    If a list or string is passed, then those reference(s) will be used for all elements.

    Elements that exist in the file but do not have a reference are given the usual ‘noref’ extension and the references entry is empty.

  • file_fmt (str) – Format of the input basis data (None = autodetect)

basis_set_exchange.curate.add_basis_from_dict(bs_data, data_dir, subdir, file_base, name, family, role, description, version, revision_description, data_source, refs=None)

Add a basis set to this library

This takes in a basis set dictionary, and create the component, element, and table basis set files in the given data_dir (and subdir). The metadata file for the basis is created if it doesn’t exist, and the main metadata file is also updated.

Parameters:
  • bs_data (dict) – Basis set dictionary

  • data_dir (str) – Path to the data directory to add the data to

  • subdir (str) – Subdirectory of the data directory to add the basis set to

  • file_base (str) – Base name for new files

  • name (str) – Name of the basis set

  • family (str) – Family to which this basis set belongs

  • role (str) – Role of the basis set (orbital, etc)

  • description (str) – Description of the basis set

  • version (str) – Version of the basis set

  • revision_description (str) – Description of this version of the basis set

  • data_source (str) – Description of where this data came from

  • refs (dict or str) –

    Mapping of references to elements. This can be a dictionary with a compressed string of elements as keys and a list of reference strings as values. For example, {‘H,Li-B,Kr’: [‘kumar2018a’]}

    If a list or string is passed, then those reference(s) will be used for all elements.

    Elements that exist in the file but do not have a reference are given the usual ‘noref’ extension and the references entry is empty.

  • file_fmt (str) – Format of the input basis data (None = autodetect)

basis_set_exchange.curate.add_from_components(component_files, data_dir, subdir, file_base, name, family, role, description, version, revision_description)

Add a basis set to this library that is a combination of component files

This takes in a list of component basis files and creates a new basis set for the intersection of all the elements contained in those files. This creates the element, and table basis set files in the given data_dir (and subdir). The metadata file for the basis is created if it doesn’t exist, and the main metadata file is also updated.

Parameters:
  • component_files (str) – Path to component json files (in BSE format already)

  • data_dir (str) – Path to the data directory to add the data to

  • subdir (str) – Subdirectory of the data directory to add the basis set to

  • file_base (str) – Base name for new files

  • name (str) – Name of the basis set

  • family (str) – Family to which this basis set belongs

  • role (str) – Role of the basis set (orbital, etc)

  • description (str) – Description of the basis set

  • version (str) – Version of the basis set

  • revision_description (str) – Description of this version of the basis set

basis_set_exchange.curate.basis_comparison_report(bs1, bs2, uncontract_general=False)

Compares two basis set dictionaries and prints a report about their differences

basis_set_exchange.curate.compare_basis(bs1, bs2, compare_electron_shells_meta=False, compare_ecp_pots_meta=False, compare_elements_meta=False, compare_meta=False, rel_tol=0.0)

Determine if two basis set dictionaries are the same

bs1dict

Full basis information

bs2dict

Full basis information

compare_electron_shells_metabool

Compare the metadata of electron shells

compare_ecp_pots_metabool

Compare the metadata of ECP potentials

compare_elements_metabool

Compare the overall element metadata

compare_meta: bool

Compare the metadata for the basis set (name, description, etc)

rel_tolfloat

Maximum relative error that is considered equal

basis_set_exchange.curate.compare_basis_against_file(basis_name, src_filepath, file_type=None, version=None, uncontract_general=False, data_dir=None)

Compare a basis set in the BSE against a reference file

basis_set_exchange.curate.compare_basis_files(file_path_1, file_path_2, file_type_1=None, file_type_2=None, uncontract_general=False)

Compare two files containing formatted basis sets

basis_set_exchange.curate.compare_basis_sets(basis_name_1, basis_name_2, version_1=None, version_2=None, uncontract_general=False, data_dir_1=None, data_dir_2=None)

Compare two files containing formatted basis sets

basis_set_exchange.curate.compare_ecp_pots(potential1, potential2, compare_meta=False, rel_tol=0.0)

Compare two ecp potentials for approximate equality (exponents/coefficients are within a tolerance)

If compare_meta is True, the metadata is also compared for exact equality.

basis_set_exchange.curate.compare_electron_shells(shell1, shell2, compare_meta=False, rel_tol=0.0)

Compare two electron shells for approximate equality (exponents/coefficients are within a tolerance)

If compare_meta is True, the metadata is also compared for exact equality.

basis_set_exchange.curate.compare_elements(element1, element2, compare_electron_shells_meta=False, compare_ecp_pots_meta=False, compare_meta=False, rel_tol=0.0)

Determine if the basis information for two elements is the same as another

Exponents/coefficients are compared using a tolerance.

Parameters:
  • element1 (dict) – Basis information for an element

  • element2 (dict) – Basis information for another element

  • compare_electron_shells_meta (bool) – Compare the metadata of electron shells

  • compare_ecp_pots_meta (bool) – Compare the metadata of ECP potentials

  • compare_meta (bool) – Compare the overall element metadata

  • rel_tol (float) – Maximum relative error that is considered equal

basis_set_exchange.curate.component_file_refs(filelist)

Get a list of what elements/references exist in component JSON files

Parameters:

filelist (list) – A list of paths to json files

Returns:

Keys are the file path, value is a list of tuples (compacted element string, refs tuple)

Return type:

dict

basis_set_exchange.curate.create_metadata_file(output_path, data_dir)

Creates a METADATA.json file from a data directory

The file is written to output_path

basis_set_exchange.curate.diff_basis_dict(left_list, right_list)

Compute the difference between two sets of basis set dictionaries

The result is a list of dictionaries that correspond to each dictionary in left_list. Each resulting dictionary will contain only the elements/shells that exist in that entry and not in any of the dictionaries in right_list.

This only works on the shell level, and will only subtract entire shells that are identical. ECP potentials are not affected.

The return value contains deep copies of the input data

Parameters:
  • left_list (list of dict) – Dictionaries to use as the base

  • right_list (list of dict) – Dictionaries of basis data to subtract from each dictionary of left_list

Returns:

Each object in left_list containing data that does not appear in right_list

Return type:

list

basis_set_exchange.curate.diff_json_files(left_files, right_files)

Compute the difference between two sets of basis set JSON files

The output is a set of files that correspond to each file in left_files. Each resulting dictionary will contain only the elements/shells that exist in that entry and not in any of the files in right_files.

This only works on the shell level, and will only subtract entire shells that are identical. ECP potentials are not affected.

left_files and right_files are lists of file paths. The output is written to files with the same names as those in left_files, but with .diff added to the end. If those files exist, they are overwritten.

Parameters:
  • left_files (list of str) – Paths to JSON files to use as the base

  • right_files (list of str) – Paths to JSON files to subtract from each file of left_files

Return type:

None

basis_set_exchange.curate.ecp_pots_are_equal(pots1, pots2, compare_meta=False, rel_tol=0.0)

Determine if a list of electron shells is the same as another

The potentials are compared approximately (exponents/coefficients are within a tolerance)

If compare_meta is True, the metadata is also compared for exact equality.

basis_set_exchange.curate.ecp_pots_are_subset(subset, superset, compare_meta=False, rel_tol=0.0)

Determine if a list of ecp potentials is a subset of another

If ‘subset’ is a subset of the ‘superset’, True is returned.

The potentials are compared approximately (exponents/coefficients are within a tolerance)

If compare_meta is True, the metadata is also compared for exact equality.

basis_set_exchange.curate.electron_shells_are_equal(shells1, shells2, compare_meta=False, rel_tol=0.0)

Determine if a list of electron shells is the same as another

The shells are compared approximately (exponents/coefficients are within a tolerance)

If compare_meta is True, the metadata is also compared for exact equality.

basis_set_exchange.curate.electron_shells_are_subset(subset, superset, compare_meta=False, rel_tol=0.0)

Determine if a list of electron shells is a subset of another

If ‘subset’ is a subset of the ‘superset’, True is returned.

The shells are compared approximately (exponents/coefficients are within a tolerance)

If compare_meta is True, the metadata is also compared for exact equality.

basis_set_exchange.curate.elements_in_files(filelist)

Get a list of what elements exist in JSON files

This works on table, element, and component data files

Parameters:

filelist (list) – A list of paths to json files

Returns:

Keys are the file path, value is a compacted element string of what elements are in that file

Return type:

dict

basis_set_exchange.curate.potentials_difference(p1, p2)

Computes and prints the differences between two lists of potentials

If the shells contain a different number primitives, or the lists are of different length, inf is returned. Otherwise, the maximum relative difference is returned.

basis_set_exchange.curate.shells_difference(s1, s2)

Computes and prints the differences between two lists of shells

If the shells contain a different number primitives, or the lists are of different length, inf is returned. Otherwise, the maximum relative difference is returned.