Skip to content

Latest commit

 

History

History
166 lines (121 loc) · 7.79 KB

File metadata and controls

166 lines (121 loc) · 7.79 KB

Frequently Asked Questions (FAQ)

Questions

What keywords are recommended for psi4 specifications?

Recommended Psi4 Program Specification Keywords

SCF Multipole Reference Point

  • QCPortal Input QCSpecification(keywords={"properties_origin": ["NUCLEAR_CHARGE"]})
  • Purpose: Ensures that SCF dipole, quadrupole, and higher-order moments are computed relative to the nuclear charge origin rather than the coordinate origin.
  • Warning: Datasets that did not include this keyword can only reliably use the dipole moment of neutral molecules. Higher-order moments and multipole moments of charged molecules should be estimated from the partial charges.

Multipole Moments

  • QCPortal Input QCSpecification(keywords={"scf_properties": ['dipole', 'quadrupole']})
  • QCSubmit Input QCSpec(scf_properties=['dipole', 'quadrupole'])
  • Purpose: Compute the SCF dipole and quadrupole moments using the reference point defined by "properties_origin", otherwise the coordinate origin is used.

Partial Charges

  • QCPortal Input QCSpecification(keywords={"scf_properties": ['lowdin_charges', 'mulliken_charges', 'mbis_charges']})
  • QCSubmit Input QCSpec(scf_properties=['lowdin_charges', 'mbis_charges'])
  • Purpose: Compute atomic partial charges with a variety of methods. Lowdin is a popular option that seems to have the best balance of element coverage and passing the common sense test. MBIS charges are a popular option, but have a limited number of supported elements. Mulliken charges are often produced by default in other QC packages, but not generally recommended.
  • Warning: MBIS charges will error for unsupported elements (e.g., I)

Atomic Spin Population

  • QCPortal Input QCSpecification(keywords={"scf_properties": ['lowdin_spin']})
  • Purpose: Compute the per atom fractional number of unpaired electrons. They can be used as is or used to estimate the spin angular momentum as ( S_i(S_i + 1)\hbar^2 ).

Bond Indices

  • QCPortal Input QCSpecification(keywords={"scf_properties": ['wiberg_lowdin_indices', 'mayer_indices']})
  • QCSubmit Input QCSpec(scf_properties=['wiberg_lowdin_indices', 'mayer_indices'])
  • Purpose: These indices provide a measure of bond order derived from the density matrix and are useful for analyzing the bonding characteristics in a molecule. The Wiberg bond indices calculated using the Löwdin orthogonalized atomic orbitals. The Mayer bond indices are commonly used to quantify the strength of bonding interactions between atoms in a molecule.

Dipole Polarizabilities

  • QCPortal Input QCSpecification(keywords={'function_kwargs': {'properties': ['dipole_polarizabilities']}})
  • QCSubmit Input QCSpec(scf_properties=['dipole_polarizabilities'])

Other Keywords

See the psi4 keyword documentation for other options.

Example:

spec = QCSpecification(
        program='psi4',
        driver=SinglepointDriver.gradient,
        method='b3lyp-d3bj',
        basis='dzvp',
        keywords={
            'maxiter': 500, 
            'scf_properties': ['dipole', 'quadrupole', 'wiberg_lowdin_indices', 'mayer_indices', 'lowdin_charges', 'mulliken_charges', 'mbis_charges'],
            'function_kwargs': {'properties': ['dipole_polarizabilities']},
            'properties_origin': ['NUCLEAR_CHARGE']
        },
        protocols={'wavefunction': 'none'}
    )
How to define a torsion drive?

Defining Torsion Drives

A torsion drive is defined by a single torsion in a molecule which is scanned. This torsion is defined in the dataset submission file via specifying the indices of the four atoms making up the torsion.

WARNING: Molecule.connectivity must be defined so that dihedrals can be verified.

Referencing datasets in the submissions folder, such as 2025-04-10-OpenFF-Additional-Generated-ChEMBL-TorsionDrives-4.0. The specification program would be "torsiondrive"

In QCPortal:

TorsiondriveDatasetEntry(
    initial_molecules=[offmol],
    additional_keywords={
        "dihedrals": [[ 3, 0, 1, 2]],
        "grid_spacing": [15],
        "dihedral_ranges": [[ -165, 180]],
        "energy_upper_limit": 0.05,
})

In QCSubmit:

from openff.qcsubmit.utils import get_symmetry_classes, get_symmetry_group
from openff.qcsubmit.workflow_components import TorsionIndexer

torsion_indexer = TorsionIndexer()
symmetry_classes = get_symmetry_classes(offmol)

atom_indices = (3, 0, 1, 2)
central_bond = tuple(sorted(atom_indices[1:-1]))
symmetry_group = get_symmetry_group(central_bond, symmetry_classes)
torsion_indexer.add_torsion(atom_indices, symmetry_group, (-165, 180))

offmol.properties["dihedrals"] = torsion_indexer
  • Note: Ensure that the indices correspond to the correct atoms in the molecule, the connectivity of the molecule must be present for validation.
How to implement constraints (in geomeTRIC)?

Implementing Constraints with program='geometric'

Most geometry optimizations here are done with geomeTRIC. If one were to constrain a bond, angle, or dihedral during a geometry optimization, there is an additional option that must be set.

Example Dataset Reference

For a practical example of implementing constraints in geomeTRIC, refer to the dataset submission 2025-03-05-OpenFF-Protein-PDB-4mer-v4.0

In QCPortal:

OptimizationEntry(
    initial_molecule=offmol,
    additional_keywords={"constraints": {"freeze": [{ "type": "dihedral", "indices": [ 1, 3, 4, 5]},]}
)

In QCSubmit:

offmol.add_constraint(
    constraint = 'freeze', 
    constraint_type = 'dihedral', 
    indices = [ 1, 3, 4, 5],
    bonded=True
)

In either case, these constraints are passed to geomeTRIC, which has its own separate syntax for defining constraints that does not directly translate to the syntax of neither QCPortal nor QCSubmit.

Why do some directories have scaffold.json files instead of dataset.json files?

There are some molecular systems, e.g., transition metal complexes, that are not handled well by neither OpenEye nor RDKit (the toolkits leveraged in QCSubmit). The scaffold.json files are a serialization of QCPortal datasets created in a FractalSnowflake server to bypass QCSubmit, but the CI can still validate the dataset contents and submit the dataset for you. If your dataset can be submitted with QCSubmit, it MUST be prepared with QCSubmit, otherwise we will ask to update your PR.

Tagging records for compute, and tagging by molecular weight

Records can be assigned a specific compute tag with our GitHub Action CI. In the PR used to create the dataset, add a label compute-<my tag>.
The <my tag> portion will be the updated compute tag to be used to find the records on QCArchive, such as in NRP. Commonly the PR number is chosen.

Once the GitHub tag is in place, you must run the GitHub Action "Dataset Lifecycle - Reprioritize/Retag" to propagate the GitHub labels to QCArchive.

To more efficiently use compute, the CI can also automatically separate the molecules into molecular weight (MW) bins and tag accordingly.

A tag like, compute-pr123_200-400-600 will group the molecules into groups where:

  • pr123-200 has a MW of 200 Da or less,
  • pr123-400 is between 200 Da and 400 Da,
  • pr123-600 is similarly between 400 Da and 600 Da, and
  • pr123-large is > 600 Da.

Any number of sequential bin boundaries can be strung together with hyphens.