API Reference

TODO: split this file into respective files where both API reference and some more high level information is provided to the reader.

Atomic Data

TODO: describe the data structure and its use.

class mlcg.data.atomic_data.AtomicData(**kwargs)[source]

A data object holding atomic structures. The attribute names are defined in mlcg.data._keys

pos

set of atomic positions in each structures

Type:

[n_atoms * n_structures, 3]

atom_types

if atoms then it’s the atomic number, if it’s a CG bead then it’s a number defined by the CG mapping

Type:

[n_atoms * n_structures]

masses

Masses of each atom

Type:

[n_atoms * n_structures]

pbc

periodic boundary conditions

Type:

[n_structures, 3] (Optional)

cell

unit cell of the atomic structure. Lattice vectors are defined row wise

Type:

[n_structures, 3, 3] (Optional)

tag

metadata about each structure

Type:

[n_structures] (Optional)

energy

reference energy associated with each structures

Type:

[n_structures] (Optional)

forces

reference forces associated with each structures

Type:

[n_atoms * n_structures, 3] (Optional)

velocities

velocities associated with each structure

Type:

[n_atoms * n_structures, 3] (optional)

neighborlist

contains information about the connectivity formatted according to mlcg.neighbor_list.neighbor_list.make_neighbor_list.

Type:

Dict[str, Dict[str, Any]] (Optional)

batch

maps the atoms to their structure index (from 0 to n_structures-1)

Type:

[n_atoms]

static from_ase(frame, energy_tag='energy', force_tag='forces')[source]

Method for constructing an AtomicData instance from an ASE Atoms instance. Atom types are taken from atomic numbers, while periodic flags, masses, cell vectors, and configuration tags are taken from the ASE Atoms attributes.

Parameters:
  • frame – Input ase.atom.Atoms instance

  • energy_tag (str) – String value used to access the ASE energy

  • force_tag (str) – String value used to access the ASE forces

Returns:

Populated AtomicData instance

Return type:

data

static from_points(pos, atom_types, masses=None, pbc=None, cell=None, tag=None, energy=None, forces=None, velocities=None, neighborlist=None, **kwargs)[source]

Method for constructing an AtomicData instance directly from input tensors

Parameters:
  • pos (Tensor) – Cartesian coordinates of a single configuration, of shape (n_atoms, 3)

  • atom_types (Tensor) – Atom type labels of shape (n_atoms)

  • masses (Optional[Tensor]) – Atom masses of shape (n_atoms)

  • pbc (Optional[Tensor]) – Period boundary condition flags for the frame

  • cell (Optional[Tensor]) – Box vectors for the configuration, for computing periodic images, of shape (3,3)

  • tag (Optional[str]) – String tag for the configuration

  • energy (Optional[Tensor]) – Reference scalar energy for the configuration

  • forces (Optional[Tensor]) – Reference cartesian forces for the configuration, of shape (n_atoms, 3)

  • velocities (Optional[Tensor]) – Atomic velocities, of shape (n_atoms, 3)

  • neighborlist (Optional[Dict[str, Dict[str, Any]]]) – Neighborlist dictionary for tagged features. Eg,

Returns:

Populated AtomicData instance

Return type:

data

Coarse Graining Utilities

TODO: document the CG mapping format.

mlcg.cg.projection.build_cg_matrix(topology, cg_mapping=None, special_terminal=True)[source]

Function for producing coarse grain types, masses, and mapping matrices using a slicing strategy and a predetermined set of atoms to retain at the coarse grain resolution.

Parameters:
  • topology (Topology) – System topology instance

  • cg_mapping (Optional[Dict[Tuple[str, str], Tuple[str, int, int]]]) –

    Mapping dictionary with the following structure:

    {
        (residue name, atom name) : (compound name, type, mass)
        ...
    }
    

    Eg, a row for an alanine carbon alpha atom would be:

    {
        ("ALA", "CA") : ("CA_A", 1, 12)
        ...
    }
    

  • special_termini – If True, special types will be reserved for the first and last CG atoms

Return type:

Tuple[ndarray, ndarray, ndarray, OrderedDict]

Returns:

  • np.ndarray – Array of CG atom types

  • np.ndarray – Array of CG masses

  • np.ndarray – One-hot transformation matrix of shape (n_high_res_atoms, n_cg_atoms) that maps atoms for the high resolution repesentation to the coarse grain representation

  • OrderedDict – Ordered dictionary mapping each CG atom index (with respect to the CG topology) to a list containing the CG atom name, CG atom type and the CG atom mass

mlcg.cg.projection.build_cg_topology(topology, cg_mapping=None, special_terminal=True, bonds=<function add_chain_bonds>, angles=<function add_chain_angles>, dihedrals=<function add_chain_dihedrals>)[source]

Takes an mlcg.geometry.topology.Topology instance and returns another mlcg.geometry.topology.Topology instance conditioned on the supplied CG mapping

Parameters:
  • topology (Topology) – Original MLCG topology before coarse graining

  • cg_mapping (Optional[Dict[Tuple[str, str], Tuple[str, int, int]]]) – A suitable CG mapping. See mclg.cg._mapping.py for examples.

  • special_termini – If True, the first and last CG atoms recieve their own special types

  • bonds (Optional[Callable]) – Function to enumerate and define bonds in the final CG topology

  • angles (Optional[Callable]) – Function to enumerate and define angles in the final CG topology

  • dihedrals (Optional[Callable]) – Function to enumerate and define dihedrals in the final CG topology

Returns:

CG topoology

Return type:

Topology

mlcg.cg.projection.build_opeps_connectivity_matrix(resnames)[source]

Builds bonded connectivity matrix for (N,CA,CB,C,O) octapeptide CG mapping scheme, based on amino acid sequence. GLY residues are constructed as (N,CA,C,O).

Parameters:

resnames (List[str]) – List of three letter amino acid strings that specify the protein/peptide sequence.

Returns:

Undirected graph connectivity matrix for the specified sequence of residues.

Return type:

connectivity_matrix

mlcg.cg.projection.isolate_features(target_atoms_list, full_feature_set)[source]

Helper function for isolating molecular features from specified lists of beads.

Parameters:
  • target_atoms_list (List[List[int]]) – if a feature atom from full_feature_set is in this list, that corresponding feature is moved to the isolated feature list. If no feature atom is in this list, then the corresponding feature remains in the bulk feature list

  • full_feature_set (Tensor) – The set of full features that is checked against the target_atoms

Returns:

List of N feature sets, where the length of target_atoms_list is N-1 in length, the first N-1 features sets correspond to those isolated features, and the final feauture set is the list of bulk/non-isolated features.

Return type:

feature_sets

mlcg.cg.projection.make_non_bonded_set(topology, minimum_separation, residue_indices, residue_exclusion=True, edge_exclusion=None)[source]

Helper function for constructing non-bonded sets from a topology

Parameters:
  • topology; – input topology

  • minimum_separation (int) – minimum edge separation between pairs of atoms for the non-bonded set

  • residue_indices (List[int]) – list of indices mapping atoms to residue indices

  • residue_exclusion – if True, an extra exclusion rule is applied such that if any two atoms are within the same residue they are automatically removed from the non-bonded set.

  • edge_exclusion (Optional[List[Tensor]]) – if None, this list of pairwise edge indices will be excluded from the non-bonded set.

Returns:

The set of non-bonded edges

Return type:

non_bonded_edges

mlcg.cg.projection.make_opep_residue_indices(resnames)[source]

Helper method to make residue index lists, which specify the residue index that a given atom belongs to in an octapeptide topology using the octapeptide CG mapping, wherein non-glycine amino acids are given a 5 atom mapping (N,CA,CB,C,O) while glycines are given a 4 atom mapping (N,CA,C,O).

Parameters:

resnames (List[str]) – List of three letter amino acid codes that together specify the molecular sequence

Returns:

List of residue indices that specify which specify to which residue an atom belongs

Return type:

residue_indices

mlcg.cg.projection.swap_mapping_rows(topology, residue, target_atoms, types, masses, matrix, mapping)[source]

Helper function to swap atom rows systematically in the specified residue. The last four arguments are the outputs of mlcg.cg.projection.build_cg_matrix().

Parameters:
  • topology (Topology) – Topology of the all-atom (non-CG system).

  • residue (str) – Three letter amino acid string specifying the residue in which swaps must be made

  • target_atoms (List[str]) – List of atom types to swap places

  • types (ndarray) – The types array that will by swapped at the specified atoms for the specified residue

  • masses (ndarray) – The mass array that will be swapped at the specified atoms for the specified residue

  • matrix (ndarray) – The CG mapping matrix for which rows will be swapped at the specified atoms for the specifed residue

  • mapping (OrderedDict) – The mapping dictionary from the original topology to the CG atoms

Return type:

Tuple[ndarray, ndarray, ndarray]

Returns:

  • swapped_types

  • swapped_masses

  • swapped_matrix

  • generalized_mapping

Models

Neural network models for coarse grain property predictions

class mlcg.nn.schnet.AttentiveSchNet(rbf_layer, cutoff, output_hidden_layer_widths, num_features_in, num_features_out, num_residual_q, num_residual_k, num_residual_v, attention_block, hidden_channels=128, embedding_size=100, num_filters=128, num_interactions=3, activation_func=Tanh(), max_num_neighbors=1000, layer_widths=None, activation_first=False, aggr='add', attention_version='normal')[source]

Small wrapper class for SchNet to simplify the definition of the SchNet model with an Interaction block that includes attention through an input file. The upper distance cutoff attribute in is set by default to match the upper cutoff value in the cutoff function.

Parameters:
  • rbf_layer (Module) – radial basis function used to project the distances \(r_{ij}\).

  • cutoff (Module) – smooth cutoff function to supply to the CFConv

  • output_hidden_layer_widths (List[int]) – List giving the number of hidden nodes of each hidden layer of the MLP used to predict the target property from the learned representation.

  • num_features_in (int) – size of each input sample for linear layer

  • num_features_out (int) – size of each output sample for liner layer

  • num_residuals_q – Number of residual blocks applied to features via self-attention for queries, keys, and values

  • num_residuals_k – Number of residual blocks applied to features via self-attention for queries, keys, and values

  • num_residuals_v – Number of residual blocks applied to features via self-attention for queries, keys, and values

  • attention_block (Module) – Specify if you want to use softmax attention (input: ExactAttention) or favor+ (input: FavorAttention)

  • hidden_channels (int) – dimension of the learned representation, i.e. dimension of the embeding projection, convolution layers, and interaction block.

  • embedding_size (int) – dimension of the input embeddings (should be larger than AtomicData.atom_types.max()+1).

  • num_filters (int) – number of nodes of the networks used to filter the projected distances

  • num_interactions (int) – number of interaction blocks

  • activation – activation function

  • max_num_neighbors (int) – The maximum number of neighbors to return for each atom in data. If the number of actual neighbors is greater than max_num_neighbors, returned neighbors are picked randomly.

  • aggr (str) – Aggregation scheme for continuous filter output. For all options, see here for more options.

  • activation_first (bool) – Inverting the order of linear layers and activation functions.

  • attention_version (str) – Specifiy which interaction block architecture to choose. By default normal.

class mlcg.nn.schnet.CFConv(filter_network, cutoff, in_channels=128, out_channels=128, num_filters=128, aggr='add')[source]

Continuous filter convolutions for SchNet.

Parameters:
  • filter_net – Neural network for generating filters from expanded pairwise distances

  • cutoff (Module) – Cutoff envelope to apply to the output of the filter generating network.

  • in_channels (int) – Hidden input dimensions

  • out_channels (int) – Hidden output dimensions

  • num_filters (int) – Number of filters

  • aggr (str) – Aggregation scheme for continuous filter output. For all options, see here <https://pytorch-geometric.readthedocs.io/en/latest/notes/create_gnn.html?highlight=MessagePassing#the-messagepassing-base-class>.

forward(x, edge_index, edge_weight, edge_attr)[source]

Forward pass through the continuous filter convolution.

Parameters:
  • x (Tensor) – Embedded features of shape (num_examples * num_atoms, hidden_channels)

  • edge_index (Tensor) – Graph edge index tensor of shape (2, total_num_edges)

  • edge_weight (Tensor) – Graph edge weight (eg, distances), of shape (total_num_edges)

  • edge_attr (Tensor) – Graph edge attributes (eg, expanded distances), of shape (total_num_edges, num_rbf)

Returns:

Updated embedded features of shape (num_examples * num_atoms, hidden_channels)

Return type:

x

message(x_j, W)[source]

Message passing operation to perform the continuous filter convolution through element-wise multiplcation of embedded features with the output of the filter network.

Parameters:
  • x_j (Tensor) – Tensor of embedded features of shape (total_num_edges, hidden_channels)

  • W (Tensor) – Tensor of filter values of shape (total_num_edges, num_filters)

Returns:

Elementwise multiplication of the filters with embedded features.

Return type:

x_j * W

reset_parameters()[source]

Method for resetting the weights of the linear layers and filter network according the the Xavier uniform strategy. Biases are set to 0.

class mlcg.nn.schnet.InteractionBlock(cfconv_layer, hidden_channels=128, activation=Tanh())[source]

Interaction blocks for SchNet. Consists of atomwise transformations of embedded features that are continuously convolved with filters generated from radial basis function-expanded pairwise distances.

Parameters:
  • cfconv_layer (Module) – Continuous filter convolution layer for convolutions of radial basis function-expanded distances with embedded features

  • hidden_channels (int) – Hidden dimension of embedded features

  • activation (Module) – Activation function applied to linear layer outputs

forward(x, edge_index, edge_weight, edge_attr, *args)[source]

Forward pass through the interaction block.

Parameters:
  • x (Tensor) – Embedded features of shape (num_examples, num_atoms, hidden_channels)

  • edge_index (Tensor) – Graph edge index tensor of shape (2, total_num_edges)

  • edge_weight (Tensor) – Graph edge weight (eg, distances), of shape (total_num_edges)

  • edge_attr (Tensor) – Graph edge attributes (eg, expanded distances), of shape (total_num_edges, num_rbf)

Returns:

Updated embedded features of shape (num_examples * num_atoms, hidden_channels)

Return type:

x

class mlcg.nn.schnet.SchNet(embedding_layer, interaction_blocks, rbf_layer, output_network, self_interaction=False, max_num_neighbors=1000)[source]

PyTorch Geometric implementation of SchNet Code adapted from [PT_geom_schnet] which is based on the architecture described in [Schnet] .

Parameters:
  • embedding_layer (Module) – Initial embedding layer that transforms atoms/coarse grain bead types into embedded features

  • interaction_blocks (list of torch.nn.Module or torch.nn.Sequential) – Sequential interaction blocks of the model, where each interaction block applies

  • rbf_layer (Module) – The set of radial basis functions that expands pairwise distances between atoms/CG beads.

  • output_network (Module) – Output neural network that predicts scalar energies from SchNet features. This network should transform (num_examples * num_atoms, hidden_channels) to (num_examples * num atoms, 1).

  • upper_distance_cutoff – Upper distance cutoff used for making neighbor lists.

  • self_interaction (bool) – If True, self interactions/distancess are calculated. But it never had a function due to a bug in the implementation (see static method neighbor_list). Should be kept False. This option shall not be deleted for compatibility.

  • max_num_neighbors (int) – Maximum number of neighbors to return for a given node/atom when constructing the molecular graph during forward passes. This attribute is passed to the torch_cluster radius_graph routine keyword max_num_neighbors, which normally defaults to 32. Users should set this to higher values if they are using higher upper distance cutoffs and expect more than 32 neighbors per node/atom.

forward(data)[source]

Forward pass through the SchNet architecture.

Parameters:

data (AtomicData) – Input data object containing batch atom/bead positions and atom/bead types.

Returns:

Data dictionary, updated with predicted energy of shape (num_examples * num_atoms, 1), as well as neighbor list information.

Return type:

data

static neighbor_list(data, rcut, max_num_neighbors=1000)[source]

Computes the neighborlist for data using a strict cutoff of rcut.

Return type:

dict

reset_parameters()[source]

Method for resetting linear layers in each SchNet component

class mlcg.nn.schnet.StandardSchNet(rbf_layer, cutoff, output_hidden_layer_widths, hidden_channels=128, embedding_size=100, num_filters=128, num_interactions=3, activation=Tanh(), max_num_neighbors=1000, aggr='add')[source]

Small wrapper class for SchNet to simplify the definition of the SchNet model through an input file. The upper distance cutoff attribute in is set by default to match the upper cutoff value in the cutoff function.

Parameters:
  • rbf_layer (Module) – radial basis function used to project the distances \(r_{ij}\).

  • cutoff (Module) – smooth cutoff function to supply to the CFConv

  • output_hidden_layer_widths (List[int]) – List giving the number of hidden nodes of each hidden layer of the MLP used to predict the target property from the learned representation.

  • hidden_channels (int) – dimension of the learned representation, i.e. dimension of the embeding projection, convolution layers, and interaction block.

  • embedding_size (int) – dimension of the input embeddings (should be larger than AtomicData.atom_types.max()+1).

  • num_filters (int) – number of nodes of the networks used to filter the projected distances

  • num_interactions (int) – number of interaction blocks

  • activation (Module) – activation function

  • max_num_neighbors (int) – The maximum number of neighbors to return for each atom in data. If the number of actual neighbors is greater than max_num_neighbors, returned neighbors are picked randomly.

  • aggr (str) –

    Aggregation scheme for continuous filter output. For all options, see here for more options.

Radial basis functions

class mlcg.nn.radial_basis.radial_integral_gto.RIGTOBasis(cutoff, nmax=5, lmax=5, sigma=0.4, mesh_size=300)[source]

This radial basis is the effective basis when expanding an atomic density smeared by Gaussains of width \(\sigma\) on a set of \(n_{max}\) orthonormal Gaussian Type Orbitals (GTOs) and \(l_{max}+1\) Spherical Harmonics (SPHs) namely. This radial basis set is interpolated using natural cubic splines for efficiency and the cutoff is included into the splined functions.

The basis is defined as

\[R_{nl}(r) = f_c(r) \mathcal{N}_n \frac{\Gamma(\frac{n+l+3}{2})}{\Gamma(l+\frac{3}{2})} c^l r^l(c+b_n)^{-\frac{(n+l+3)}{2}} {}_1F_1\left(\frac{n+l+3}{2},l+\frac{3}{2};\frac{c^2 r^2}{c+b_n}\right),\]

where \({}_1F_1\) is the confluent hypergeometric function, \(\Gamma\) is the gamma function, \(f_c\) is a cutoff function, \(b_n=\frac{1}{2\sigma_n^2}\), \(c= 1 / (2\sigma^2\), \(\sigma_n = r_\text{cut} \max(\sqrt{n},1)/n_{max}\) and \(\mathcal{N}_n^2 = \frac{2}{\sigma_n^{2n + 3}\Gamma(n + 3/2)}\).

For more details on the derivation, refer to appendix A.

Parameters:
  • nmax (int) – number of radial basis

  • lmax (int) – maximum spherical order (lmax included so there are lmax+1 orders)

  • sigma (float) – smearing of the atomic density

  • cutoff (Union[int, float, _Cutoff]) – Defines the smooth cutoff function. If a float is provided, it will be interpreted as an upper cutoff and a CosineCutoff will be used between 0 and the provided float. Otherwise, a chosen _Cutoff instance can be supplied.

  • mesh_size (int) – number of points used to interpolate with splines the radial basis spanning uniformly the range difined by the cutoff \([0, r_c]\).

forward(dist)[source]

Expansion of distances through the radial basis function set.

Parameters:

dist (torch.Tensor) – Input pairwise distances of shape (total_num_edges)

Returns:

expanded_distances – Distances expanded in the radial basis with shape (total_num_edges, lmax + 1, nmax)

Return type:

torch.Tensor

plot()[source]

Plot the set of radial basis function.

class mlcg.nn.radial_basis.exp_normal.ExpNormalBasis(cutoff, num_rbf=50, trainable=True)[source]

Class for generating a set of exponential normal radial basis functions, as described in [Physnet] . The functions have the following form:

\[f_n(r_{ij};\alpha, r_{low},r_{high}) = f_{cut}(r_{ij},r_{low},r_{high}) \times \exp\left[-\beta_n \left(e^{\alpha (r_{ij} -r_{high}) } - \mu_n \right)^2\right]\]

where

\[\alpha = 5.0/(r_{high} - r_{low})\]

is a distance rescaling factor, and, by default

\[f_{cut} ( r_{ij},r_{low},r_{high} ) = \cos{\left( r_{ij} \times \pi / r_{high}\right)} + 1.0\]

represents a cosine cutoff function (though users can specify their own cutoff function if they desire). :type cutoff: Union[int, float, _Cutoff] :param cutoff: Defines the smooth cutoff function. If a float is provided, it will be interpreted as

an upper cutoff and a CosineCutoff will be used between 0 and the provided float. Otherwise, a chosen _Cutoff instance can be supplied.

Parameters:
  • num_rbf (int) – The number of functions in the basis set.

  • trainable (bool) – If True, the parameters of the basis (the centers and widths of each function) are registered as optimizable parameters that will be updated during backpropagation. If False, these parameters will be instead fixed in an unoptimizable buffer.

forward(dist)[source]

Expansion of distances through the radial basis function set. :type dist: Tensor :param dist: Input pairwise distances of shape (total_num_edges) :type dist: torch.Tensor

Returns:

expanded_distances – Distances expanded in the radial basis with shape (total_num_edges, num_rbf)

Return type:

torch.Tensor

reset_parameters()[source]

Method to reset the parameters of the basis functions to their initial values.

class mlcg.nn.radial_basis.exp_normal.ExtendedExpNormalBasis(*args, internal_cutoff_lower=None, trainable=False, **kwargs)[source]

ExpNormalBasis that allows for arbitrary lower cutoffs not tied to a supplied _Cutoff. This basis follows the definition in [Physnet]. :param internal_cutoff: if specified, this lower cutoff overrides any supplied _Cutoff lower

cutoff.

Parameters:

trainable – If True, the parameters of the basis (the centers and widths of each function) are registered as optimizable parameters that will be updated during backpropagation. If False, these parameters will be instead fixed in an unoptimizable buffer.

extended_initial_params()[source]

Method for initializing the basis function parameters, as described in https://pubs.acs.org/doi/10.1021/acs.jctc.9b00181 .

forward(dist)[source]

Expansion of distances through the radial basis function set. :type dist: Tensor :param dist: Input pairwise distances of shape (total_num_edges) :type dist: torch.Tensor

Returns:

expanded_distances – Distances expanded in the radial basis with shape (total_num_edges, num_rbf)

Return type:

torch.Tensor

class mlcg.nn.radial_basis.gaussian.GaussianBasis(cutoff, num_rbf=50, trainable=False)[source]

Class that generates a set of equidistant 1-D gaussian basis functions scattered between a specified lower and upper cutoff:

\[f_n = \exp{ \left( -\gamma(r-c_n)^2 \right) }\]
Parameters:
  • cutoff (Union[int, float, _Cutoff]) – Defines the smooth cutoff function. If a float is provided, it will be interpreted as an upper cutoff and an IdentityCutoff will be used between 0 and the provided float. Otherwise, a chosen _Cutoff instance can be supplied.

  • num_rbf (int) – The number of gaussian functions in the basis set.

  • trainable (bool) – If True, the parameters of the gaussian basis (the centers and widths of each function) are registered as optimizable parameters that will be updated during backpropagation. If False, these parameters will be instead fixed in an unoptimizable buffer.

forward(dist)[source]

Expansion of distances through the radial basis function set.

Parameters:

dist (Tensor) – Input pairwise distances of shape (total_num_edges)

Returns:

Distances expanded in the radial basis with shape (total_num_edges, num_rbf)

Return type:

expanded_distances

reset_parameters()[source]

Method for resetting the basis to its initial state

class mlcg.nn.radial_basis.exp_spaced.SpacedExpBasis(cutoff, sigma_min=0.25, sigma_factor=2.0, mean_spacing=2.0, trainable=True)[source]

Class for generating a set of exponential normal radial basis functions, with means and standard deviations designed to capture the physics around 2-* A. The functions have an exponential form with the following means and std:

\[ \begin{align}\begin{aligned}\sigma_0 = \sigma_f/s \sigma_1 = \sigma_${min} \sigma_2 = \sigma_f*\sigma_1 ... \sigma_n = \sigma_f*\sigma_${n-1}\\\mu_0 = 0 \mu_1 = \sigma_f \mu_2 = \mu_1 + s*\sigma_1 ... \mu_n = \mu_${n-1} + s*\sigma_${n-1}\end{aligned}\end{align} \]
Parameters:
  • cutoff (Union[int, float, _Cutoff]) – Defines the smooth cutoff function. If a float is provided, it will be interpreted as an IdentityCutoff ranging over [0, cutoff]. Otherwise, a chosen _Cutoff instance can be supplied.

  • sigma_min (float) – Width of first non-zero-centered basis function

  • sigma_factor (float) – Location of first non-zero-centered basis function and multiplicative factor to spread std of each new peak by

  • mean_spacing (float) – this time previous sigma indicates how much to distance the mean of subsequent gaussian by

  • trainable (bool) – If True, the parameters of the basis (the centers and widths of each function) are registered as optimizable parameters that will be updated during backpropagation. If False, these parameters will be instead fixed in an unoptimizable buffer.

forward(dist)[source]

Expansion of distances through the radial basis function set.

Parameters:

dist (torch.Tensor) – Input pairwise distances of shape (total_num_edges)

Returns:

expanded_distances – Distances expanded in the radial basis with shape (total_num_edges, num_rbf)

Return type:

torch.Tensor

reset_parameters()[source]

Method to reset the parameters of the basis functions to their initial values.

Spherical Harmonics as polynomials of x, y, z taken from the e3nn repo changing the layout of the output sph (https://github.com/e3nn/e3nn/tree/34ebae91b094392a9f26dfa153f9350d0e24d185)

class mlcg.nn.angular_basis.spherical_harmonics.SphericalHarmonics(lmax, normalize=False, normalization='integral', irreps_in=None)[source]

JITable module version of real Spherical harmonics

https://user-images.githubusercontent.com/333780/79220728-dbe82c00-7e54-11ea-82c7-b3acbd9b2246.gif
Polynomials defined on the 3d space \(Y^l: \mathbb{R}^3 \longrightarrow \mathbb{R}^{2l+1}\)
Usually restricted on the sphere (with normalize=True) \(Y^l: S^2 \longrightarrow \mathbb{R}^{2l+1}\)
who satisfies the following properties:
  • are polynomials of the cartesian coordinates x, y, z

  • is equivariant \(Y^l(R x) = D^l(R) Y^l(x)\)

  • are orthogonal \(\int_{S^2} Y^l_m(x) Y^j_n(x) dx = \text{cste} \; \delta_{lj} \delta_{mn}\)

The value of the constant depends on the choice of normalization.

It obeys the following property:

\[ \begin{align}\begin{aligned}Y^{l+1}_i(x) &= \text{cste}(l) \; & C_{ijk} Y^l_j(x) x_k\\\partial_k Y^{l+1}_i(x) &= \text{cste}(l) \; (l+1) & C_{ijk} Y^l_j(x)\end{aligned}\end{align} \]

Where \(C\) are the wigner_3j.

Note

This function matches with the following table of standard real spherical harmonics from Wikipedia when normalize=True, normalization='integral' and is called with the argument in the order y,z,x (instead of x,y,z).

Parameters:
  • l (int or list of int) – degree of the spherical harmonics.

  • x (torch.Tensor) – tensor \(x\) of shape (..., 3).

  • normalize (bool) – whether to normalize the x to vectors that lie on the unit sphere before projecting onto the spherical harmonics

  • normalization ({'integral', 'component', 'norm'}) – normalization of the output tensors — note that this option is independent of normalize, which controls the processing of the input, rather than the output. Valid options: * component: \(\|Y^l(x)\|^2 = 2l+1, x \in S^2\) * norm: \(\|Y^l(x)\| = 1, x \in S^2\), component / sqrt(2l+1) * integral: \(\int_{S^2} Y^l_m(x)^2 dx = 1\), component / sqrt(4pi)

Returns:

a list of tensors of shapes [lmax][n_neighbor, m] where \(m<=l\)

\[Y^l(x)\]

Return type:

List[torch.Tensor]

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses. :rtype: List[Tensor]

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Cutoff functions

class mlcg.nn.cutoff.CosineCutoff(cutoff_lower=0.0, cutoff_upper=5.0)[source]

Class implementing a cutoff envelope based a cosine signal in the interval [lower_cutoff, upper_cutoff]:

\[\cos{\left( r_{ij} \times \pi / r_{high}\right)} + 1.0\]

NOTE: The behavior of the cutoff is qualitatively different for lower cutoff values greater than zero when compared to the zero lower cutoff default. We recommend visualizing your basis to see if it makes physical sense.

\[0.5 \cos{ \left[ \pi \left(2 \frac{r_{ij} - r_{low}}{r_{high} - r_{low}} + 1.0 \right)\right]} + 0.5\]
forward(distances)[source]

Applies cutoff envelope to distances.

Parameters:

distances (Tensor) – Distances of shape (total_num_edges)

Returns:

Distances multiplied by the cutoff envelope, with shape (total_num_edges)

Return type:

cutoffs

class mlcg.nn.cutoff.IdentityCutoff(cutoff_lower=0, cutoff_upper=inf)[source]

Cutoff function that is one everywhere, but retains cutoff_lower and cutoff_upper attributes

Parameters:
  • cutoff_lower (float) – left bound for the radial cutoff distance

  • cutoff_upper (float) – right bound for the radial cutoff distance

forward(distances)[source]

Fowrad method that returns a cutoff enevlope where all values are one

Parameters:

distances (Tensor) – Input distances of shape (total_num_distances)

Return type:

Tensor

Returns:

Cutoff envelope filled with ones, of shape (total_num_edges)

class mlcg.nn.cutoff.ShiftedCosineCutoff(cutoff=5.0, smooth_width=0.5)[source]

Class of Behler cosine cutoff with an additional smoothing parameter.

\[0.5 + 0.5 \cos{ \left[ \pi \left( \frac{r_{ij} - r_{high} + \sigma}{\sigma}\right)\right]}\]

where \(\sigma\) is the smoothing width.

Parameters:
  • cutoff (Union[int, float]) – cutoff radius

  • smooth_width (Union[int, float]) – parameter that controls the extent of smoothing in the cutoff envelope.

forward(distances)[source]

Compute cutoff function.

Parameters:

distances (torch.Tensor) – values of interatomic distances.

Returns:

values of cutoff function.

Return type:

torch.Tensor

Datasets

class mlcg.datasets.chignolin.ChignolinDataset(root, transform=None, pre_transform=None, pre_filter=None, mapping='slice_aggregate', terminal_embeds=False, max_num_files=None)[source]

Dataset for training a CG model of the chignolin protein following a Cα CG mapping using the all-atom data from [CGnet].

This Dataset produces delta forces for model training, in which the CG prior forces (harmonic bonds and angles and a repulsive term) have been subtracted from the full CG forces.

download()[source]

Downloads the dataset to the self.raw_dir folder.

kB = 0.0019872041

Boltzmann constan in kcal/mol/K

process()[source]

Processes the dataset to the self.processed_dir folder.

property processed_file_names

The name of the files in the self.processed_dir folder that must be present in order to skip processing.

property raw_file_names

The name of the files in the self.raw_dir folder that must be present in order to skip downloading.

temperature = 350

Temperature used to generate the underlying all-atom data in [K]

class mlcg.datasets.chignolin.ChignolinDatasetWithDihedralPriors(root, transform=None, pre_transform=None, pre_filter=None)[source]

Inherit from Chignolin Dataset and redefine priors to compute Modify _prior_cls to include priors to compute Must provide

class mlcg.datasets.alanine_dipeptide.AlanineDataset(root, stride=1, transform=None, pre_transform=None, pre_filter=None)[source]

Dataset for training a CG model of the alanine-dipeptide protein following a Cα + 1 Cbeta CG mapping

Alanine Dipeptide CG structure:

        CB(3)
          |
  N(1) - CA(2) - C(4)
 /                  \
C(0)                 N(5)

This Dataset produces delta forces for model training, in which the CG prior forces (harmonic bonds and angles) have been subtracted from the full CG forces.

The class works as follows:
  • If the raw data (coordinates, forces, and pdb file) for alanine dipeptide does not exist, the files will automatically be downloaded and processed

  • If the raw data exists, “root/processed/” will be created and the raw dataset will be processed

  • If the raw data and processed data exists, the class will load the processed dataset containing:
    • data : AtomicData object containing all the CG positions, forces, embeddings

    • slices : Slices of the AtomicData object

    • prior_models : Prior models (HarmonicBonds, HarmonicAngles) of the dataset with x_0 and sigma

    • topologies : Object of Topology class containing all topology information of the _CG_ molecule, including neighbor lists for bond and angle priors

Inputs:
root :

Location for AlanineDataset to be downloaded and processed

stride :

Stride length over dataset

Default priors:
  • HarmonicBonds, HarmonicAngles

Optional priors:
  • Repulsion
    • If repulsion prior is used, a custom neighbor list is created for the repulsion prior where all pairs of beads not interacting through bonds and angles are included in the interaction set

download()[source]

Downloads the dataset to the self.raw_dir folder.

kB = 0.0019872041

Boltzmann constant in kcal/mol/K

static load_original_topology(pdb_file)[source]

Method to load origin topology

Parameters:

pdb_file (str) – Path of all-atom PDB file

Returns:

Topology class object for all-atom molecule

Return type:

topology

static make_baseline_models(data, beta, priors_cls)[source]

Method to make all baseline models

Parameters:
  • data (AtomicData) – AtomicData object of entire trajectory

  • beta (float) – 1/(k_B * T)

  • priors_cls (List[_Prior]) – List of prior classes

Returns:

Dictionary of prior models fitted with the right harmonic restraint values

Return type:

baseline_models

static make_cg_topology(topology, cg_mapping={('ACE', 'C'): ('CA_C', 0, 12), ('ALA', 'C'): ('C_A', 4, 12), ('ALA', 'CA'): ('CA_A', 2, 12), ('ALA', 'CB'): ('CB_A', 3, 12), ('ALA', 'N'): ('N_A', 1, 14), ('NME', 'N'): ('N_N', 5, 14)}, special_terminal=False)[source]

Method to make Topology class object of CG molecule, creates custom bonds and angles to make a non-linear CG molecule

Parameters:
  • topology (Topology) – All-atom topology

  • cg_mapping ((optional)) – Dictionary containing CG mapping. The default is AL_CG_MAP.

  • special_terminal ((optional)) – True if termini beads are to be treated separately. The default is False.

Returns:

Topology class object for CG molecule

Return type:

cg_topo

make_data_slices_prior(coords_forces_file, topology, cg_mapping, cg_topo)[source]

Method to make collated AtomicData object, slices, and baseline models

Parameters:
  • coords_forces_file (str) – npz file containing the forces and coordinates from the all-atom simulation

  • topology (Topology) – Topology of all-atom model

  • cg_mapping (dict) – Dictionary containing CG mapping.

  • cg_topo (Topology) – Topology of CG model

Return type:

Tuple[AtomicData, dict, ModuleDict]

Returns:

  • data_list_coll – Collated AtomicData object

  • slices – Dictionary containing slices of the AtomicData object

  • baseline_models – Module dictionary containing fitted prior models

make_priors(priors_cls, cg_topo)[source]

Method to make prior neighbor lists

Parameters:
  • priors_cls (List[_Prior]) – List of prior classes

  • cg_topo (Topology)

Returns:

Dictionary containing all prior neighbor lists

Return type:

prior_nls

process(cg_mapping={('ACE', 'C'): ('CA_C', 0, 12), ('ALA', 'C'): ('C_A', 4, 12), ('ALA', 'CA'): ('CA_A', 2, 12), ('ALA', 'CB'): ('CB_A', 3, 12), ('ALA', 'N'): ('N_A', 1, 14), ('NME', 'N'): ('N_N', 5, 14)})[source]

Method for processing the raw data - this is where all processing function calls take place All outputs are stored in the relevant processed files

Parameters:

cg_mapping ((optional)) – CG mapping dictionary. The default is AL_CG_MAP.

property processed_file_names

The name of the files in the self.processed_dir folder that must be present in order to skip processing.

property raw_file_names

The name of the files in the self.raw_dir folder that must be present in order to skip downloading.

static repulsion_nls(name, cg_topo)[source]

Method for generating neighbor list for Repulsion prior - all interactions not included in bond or angle interactions

Parameters:
  • name (str) – Name of prior class

  • cg_topo (Topology) – Topology object of CG molecule

Returns:

Dictionary containing repulsion neighor list

Return type:

dict

save_dataset(pickle_name)[source]

Method for saving dataset given pickle name

Parameters:

pickle_name (str) – Name of pickle to store the dataset in

temperature = 300

Temperature used to generate the underlying all-atom data in [K]

class mlcg.datasets.h5_dataset.H5Dataset(h5_file_path, partition_options, loading_options, subsample_using_weights=False, exclude_listed_pairs=False)[source]

The top-level class for handling multiple datasets contained in a HDF5 file.

Parameters:
  • h5_file_path (str) – Path to the hdf5 file containing the dataset(s)

  • partition_options (Dict) – Dictionary of partition names mapping to collections of metaset information

  • loading_options (Dict) – Dictionary of options specifying how hdf5 attrs/datasets are named within hd5 groups

training_validation_splitting(input_detailed_indices, part_name, metaset_name, mol_list)[source]

Option to split molecule in metaset frame by frame into training or validation Inputs:

input_detailed_indices –
dictionary read in from yaml file about how data should be split

must contain 3 primary keys [val_ratio, test_ratio, seed] additional option to write out to filename if in dict.keys()

part_name –

which partition is currently being examined

metaset_name –

global name describing class of molecules

mol_list –

names of molecules

Outputs:
self._detailed_indices[part_name][metaset_name] –

pass back indices according to queried metaset and partition

class mlcg.datasets.h5_dataset.H5MetasetDataLoader(metaset, batch_size, collater_fn=<torch_geometric.loader.dataloader.Collater object>, shuffle=True, pin_memory=False)[source]

Load batches from one Metaset. For kwargs/options, see https://pytorch.org/docs/stable/data.html?highlight=dataset#torch.utils.data.Dataset

Parameters:
  • metaset (Dataset) – Dataset object for a single set of molecules

  • batch_size (int) – Size of the batches to draw from the metaset

class mlcg.datasets.h5_dataset.H5PartitionDataLoader(data_partition, collater_fn=<torch_geometric.loader.dataloader.Collater object>, pin_memory=False, subsample_using_weights=False)[source]

Load batches from one or multiple Metasets in a Partition. In multiple Metasets scenario, the order of data loaders will be alphabetically ascending with respect to the Metaset names.

class mlcg.datasets.h5_dataset.H5SimpleDataset(h5_file_path, stride=1, detailed_indices=None, metaset_name=None, mol_list=None, hdf_key_mapping={'coords': 'cg_coords', 'embeds': 'attrs:cg_embeds', 'forces': 'cg_delta_forces'}, parallel={'rank': 0, 'world_size': 1}, subsample_using_weights=False, exclude_listed_pairs=False)[source]

The top-level class for handling a single dataset contained in a HDF5 file. Will only load from one single type of molecules (i.e., one Metaset), and do not support partition splits. Use .get_dataloader for obtaining a dataloader for PyTorch training, etc.

Parameters:
  • h5_file_path (str) – Path to the hdf5 file containing the dataset(s)

  • stride (default 1) – Stride for loading the frames

  • detailed_indices (default None) – Set this to manually define which frames to be included for each molecule.

  • metaset_name (default None) – the name of the h5 group containing the molecule data. If kept None and the given file consists of only one metaset, this parameter will be inferred from the file.

  • mol_list (default None) – A list of the molecules to be loaded. When kept None, all molecules will be loaded.

  • hdf_key_mapping (default loading the delta forces) – Key mapping for reading the data from h5 file.

  • parallel (default for single process) – For DDP parallelism. Details see the head of this file.

get_dataloader(batch_size, collater_fn=<torch_geometric.loader.dataloader.Collater object>, shuffle=True, pin_memory=False)[source]

Parameters:

batch_size: Size of the batches to draw from the metaset collater_fn, shuffle, pin_memory: See PyTorch documentations for dataloader options.

class mlcg.datasets.h5_dataset.MetaSet(name)[source]

Set of MolData instances for molecules with similar characterstics

Parameters:

name – Name of the metaset

static create_from_hdf5_group(hdf5_group, mol_list, detailed_indices=None, stride=1, hdf_key_mapping={'coords': 'cg_coords', 'embeds': 'attrs:cg_embeds', 'forces': 'cg_delta_forces', 'weights': 'subsampling_weights'}, parallel={'rank': 0, 'world_size': 1}, subsample_using_weights=False, exclude_listed_pairs=False)[source]

initiate a Metaset by loading data from given HDF-group. mol_list, detailed_indices, hdf_key_mapping, stride and parallel can control which subset is loaded to the Metaset (details see the description of “H5Dataset”)

static grab_n_frames(hdf5_group, mol_name)[source]

Returns number of frames for each mol name

static retrieve_hdf(hdf_grp, hdf_key)[source]

Unified hdf retriever for attributes and dataset.

trim_down_to(target_n_samples, random_seed=42, verbose=True)[source]

Trimming all datasets randomly to reach the target number of samples. MolData attributes of the MetaSet are permanently subsampled by this method.

class mlcg.datasets.h5_dataset.MolData(name, embeds, coords, forces, weights=None, exclusion_pairs=None)[source]

Data-holder for coordinates, forces and embedding vector of a molecule, e.g., opep_0000.

Parameters:
  • name (str) – Name of the molecule

  • embeds (ndarray) – Type embeddings for each atom, of shape (n_atoms,)

  • coords (ndarray) – Cartesian coordinates of the molecule, of shape (n_frames, n_atoms, 3)

  • forces (ndarray) – Cartesian forces of the molecule, of shape (n_frames, n_atoms, 3)

class mlcg.datasets.h5_dataset.Partition(name)[source]

Contain several metasets for a certain purpose, e.g., training.

Parameters:

name (str) – name of the partition

sampling_setup(batch_sizes, max_epoch_samples=None, random_seed=42, verbose=True)[source]

Calculate the number of batches available for an epoch according to the batch size for each metaset and optionally the maximum number of samples in an epoch. The molecular dataset will be trimmed accordingly.

Neighbor List

Main interface to the computation of neighborlists with a finite spherical cutoff

mlcg.neighbor_list.neighbor_list.atomic_data2neighbor_list(data, rcut, self_interaction=False, max_num_neighbors=1000)[source]

Build a neighborlist from a mlcg.data.atomic_data.AtomicData by searching for neighboring atom within a maximum radius rcut.

Parameters:
  • data (mlcg.data.atomic_data.AtomicData) – define an atomic structure

  • rcut (float) – cutoff radius used to compute the connectivity

  • self_interaction (bool) – whether the mapping includes self referring mappings, e.g. mappings where i == j.

  • max_num_neighbors (int) – The maximum number of neighbors to return for each atom in data. If the number of actual neighbors is greater than max_num_neighbors, returned neighbors are picked randomly.

Returns:

Neighborlist dictionary

Return type:

Dict

mlcg.neighbor_list.neighbor_list.make_neighbor_list(tag, order, index_mapping, mapping_batch=None, cell_shifts=None, rcut=None, self_interaction=None)[source]

Build a neighbor_list dictionary.

Parameters:
  • tag (str) – quick identifier for compatibility checking

  • order (int) – an int providing the order of the neighborlist, e.g. order == 2 means that central atoms i have 1 neighbor j so distances can be computed, order == 3 means that central atoms i have 2 neighbors j and k so angles can be computed

  • index_mapping (Tensor) – The [order, n_edge] index tensor giving center -> neighbor relations. 1st column refers to the central atom index and the 2nd column to the neighbor atom index in the list of atoms (so it has to be shifted by a cell_shift to get the actual position of the neighboring atoms)

  • mapping_batch (Optional[Tensor]) – [n_edge] map for the neighbor -> structure index relation

  • cell_shifts (Optional[Tensor]) – A [n_edge, 3] tensor giving the periodic cell shift

  • rcut (Optional[float]) – cutoff radius used to compute the connectivity

  • self_interaction (Optional[bool]) – whether the mapping includes self referring mappings, e.g. mappings where i == j.

Returns:

Neighborlist dictionary

Return type:

Dict

mlcg.neighbor_list.neighbor_list.validate_neighborlist(inp)[source]

Tool to validate that the neighborlist dictionary has the required fields

Parameters:

inp (Dict) – Input neighborlist to be validated

Returns:

True if the supplied neighborlist is valid, false otherwise

Return type:

bool

Torch geometric implementation

mlcg.neighbor_list.torch_impl.compute_images(positions, cell, pbc, cutoff, batch, n_atoms)[source]

TODO: add doc

Return type:

Tuple[Tensor, Tensor, Tensor, Tensor]

mlcg.neighbor_list.torch_impl.get_j_idx(edge_index, batch_images, n_atoms)[source]

TODO: add docs

Return type:

Tensor

mlcg.neighbor_list.torch_impl.strides_of(v)[source]

TODO: add docs

Return type:

Tensor

mlcg.neighbor_list.torch_impl.torch_neighbor_list(data, rcut, self_interaction=True, num_workers=1, max_num_neighbors=1000)[source]

Function for computing neighbor lists from pytorch geometric data instances that may or may not use periodic boundary conditions

Parameters:
  • data (Data) – Pytorch geometric data instance

  • rcut (float) – upper distance cutoff, in which neighbors with distances larger than this cutoff will be excluded.

  • self_interaction (bool) – If True, each atom will be considered it’s own neighbor. This can be convenient for certain bases and representations

  • num_workers (int) – Number of threads to spawn and use for computing the neighbor list

  • max_number_neighbors – kwarg for radius_graph function from torch_cluster package, specifying the maximum number of neighbors for each atom

Return type:

Tuple[Tensor, Tensor, Tensor, Tensor]

Returns:

  • torch.Tensor – The atom indices of the first atoms in each neighbor pair

  • torch.Tensor – The atom indices of the second atoms in each neighbor pair

  • torch.Tensor – The cell shifts associated with minimum image distances in the presence of periodic boundary conditions

  • torch.Tensor – Mask for excluding self interactions

mlcg.neighbor_list.torch_impl.torch_neighbor_list_no_pbc(data, rcut, self_interaction=True, num_workers=1, max_num_neighbors=1000)[source]

Function for producing torch neighborlists without periodic boundary conditions

Parameters:
  • data (AtomicData) – AtomicData instance

  • rcut (float) – Upper distance cutoff for determining neighbor edges

  • self_interaction (Optional[bool]) – If True, self edges will added for each atom

  • num_workers (Optional[int]) – Number of threads to use for neighbor enumeration

  • max_num_neighbors (Optional[int]) – The maximum number of neighbors to return for each atom. For larger systems, it is important to make sure that this number is sufficiently large.

Return type:

Tuple[Tensor, Tensor, Tensor]

Returns:

  • torch.Tensor – The first atoms in each edge

  • torch.Tensor – The second atoms in each edge

  • torch.Tensor – Boolean tensor identifying self_interacting edges

mlcg.neighbor_list.torch_impl.torch_neighbor_list_pbc(data, rcut, self_interaction=True, num_workers=1, max_num_neighbors=1000)[source]

Function for returning torch neighborlists from AtomicData instances with periodic boundary conditions. The minimum image convention is used for resolving periodic shifts.

Parameters:
  • data (AtomicData) – AtomicData instance

  • rcut (float) – Upper distance cutoff for determining neighbor edges

  • self_interaction (Optional[bool]) – If True, self edges will added for each atom

  • num_workers (Optional[int]) – Number of threads to use for neighbor enumeration

  • max_num_neighbors (Optional[int]) – The maximum number of neighbors to return for each atom. For larger systems, it is important to make sure that this number is sufficiently large.

Return type:

Tuple[Tensor, Tensor, Tensor]

Returns:

  • torch.Tensor – The first atoms in each edge

  • torch.Tensor – The second atoms in each edge

  • torch.Tensor – Boolean tensor identifying self_interacting edges

mlcg.neighbor_list.torch_impl.wrap_positions(data, eps=1e-07)[source]

Wrap positions to unit cell.

Returns positions changed by a multiple of the unit cell vectors to fit inside the space spanned by these vectors.

Parameters:
  • data (Data) – torch_geometric.Data instance

  • eps (float) – Small number to prevent slightly negative coordinates from being wrapped.

Return type:

None

Utilities to compute the internal coordinates

TODO: write tests

mlcg.geometry.internal_coordinates.compute_angles(pos, mapping, cell_shifts=None)[source]

Compute the cosine of the angle between the positions in pos following the mapping assuming that the mapping indices follow:

  j--k
 /
i
\[\begin{split}\cos{\theta_{ijk}} &= \frac{\mathbf{r}_{ji} \mathbf{r}_{jk}}{r_{ji} r_{jk}} \\ r_{ji}&= ||\mathbf{r}_i - \mathbf{r}_j||_{2} \\ r_{jk}&= ||\mathbf{r}_k - \mathbf{r}_j||_{2}\end{split}\]

In the case of periodic boundary conditions, cell_shifts must be provided so that \(\mathbf{r}_j\) can be outside of the original unit cell.

mlcg.geometry.internal_coordinates.compute_distance_vectors(pos, mapping, cell_shifts=None)[source]

Compute the distance (or displacement) vectors between the positions in pos following the mapping assuming that that mapping indices follow:

i--j

such that:

\[\begin{split}r_{ij} &= ||\mathbf{r}_j - \mathbf{r}_i||_{2} \\ \hat{\mathbf{r}}_{ij} &= \frac{\mathbf{r}_{ij}}{r_{ij}}\end{split}\]

In the case of periodic boundary conditions, cell_shifts must be provided so that \(\mathbf{r}_j\) can be outside of the original unit cell.

mlcg.geometry.internal_coordinates.compute_distances(pos, mapping, cell_shifts=None)[source]

Compute the distance between the positions in pos following the mapping assuming that mapping indices follow:

i--j

such that:

\[r_{ij} = ||\mathbf{r}_j - \mathbf{r}_i||_{2}\]

In the case of periodic boundary conditions, cell_shifts must be provided so that \(\mathbf{r}_j\) can be outside of the original unit cell.

mlcg.geometry.internal_coordinates.compute_torsions(pos, mapping)[source]

Compute the angle between two planes from positions in :obj:’pos’ following the mapping:

For dihedrals: the angle w.r.t. position of i&l is positive if l i rotated clockwise when staring down bond jk

j–k–l

/

i

For impropers: the angle is positive if when looking in plane ikj, l is rotated clockwise

k

l–j

/

i

In the case of periodic boundary conditions, cell_shifts must be provided so that \(\mathbf{r}_j\) can be outside of the original unit cell.

mlcg.geometry.internal_coordinates.safe_norm(input, dim=None, keepdims=True, eps=1e-16)[source]

Compute Euclidean norm of input so that 0-norm vectors can be used in the backpropagation

Return type:

Tensor

mlcg.geometry.internal_coordinates.safe_normalization(input, norms)[source]

Normalizes input using norms avoiding divitions by zero

Return type:

Tensor

Simulations