API Reference
TODO: split this file into respective files where both API reference and some more high level information is provided to the reader.
Atomic Data
TODO: describe the data structure and its use.
- class mlcg.data.atomic_data.AtomicData(**kwargs)[source]
A data object holding atomic structures. The attribute names are defined in
mlcg.data._keys
- pos
set of atomic positions in each structures
- Type:
[n_atoms * n_structures, 3]
- atom_types
if atoms then it’s the atomic number, if it’s a CG bead then it’s a number defined by the CG mapping
- Type:
[n_atoms * n_structures]
- masses
Masses of each atom
- Type:
[n_atoms * n_structures]
- pbc
periodic boundary conditions
- Type:
[n_structures, 3] (Optional)
- cell
unit cell of the atomic structure. Lattice vectors are defined row wise
- Type:
[n_structures, 3, 3] (Optional)
- tag
metadata about each structure
- Type:
[n_structures] (Optional)
- energy
reference energy associated with each structures
- Type:
[n_structures] (Optional)
- forces
reference forces associated with each structures
- Type:
[n_atoms * n_structures, 3] (Optional)
- velocities
velocities associated with each structure
- Type:
[n_atoms * n_structures, 3] (optional)
- neighborlist
contains information about the connectivity formatted according to mlcg.neighbor_list.neighbor_list.make_neighbor_list.
- Type:
Dict[str, Dict[str, Any]] (Optional)
- batch
maps the atoms to their structure index (from 0 to n_structures-1)
- Type:
[n_atoms]
- static from_ase(frame, energy_tag='energy', force_tag='forces')[source]
Method for constructing an AtomicData instance from an ASE Atoms instance. Atom types are taken from atomic numbers, while periodic flags, masses, cell vectors, and configuration tags are taken from the ASE Atoms attributes.
- Parameters:
frame – Input ase.atom.Atoms instance
energy_tag (
str
) – String value used to access the ASE energyforce_tag (
str
) – String value used to access the ASE forces
- Returns:
Populated AtomicData instance
- Return type:
data
- static from_points(pos, atom_types, masses=None, pbc=None, cell=None, tag=None, energy=None, forces=None, velocities=None, neighborlist=None, **kwargs)[source]
Method for constructing an AtomicData instance directly from input tensors
- Parameters:
pos (
Tensor
) – Cartesian coordinates of a single configuration, of shape (n_atoms, 3)atom_types (
Tensor
) – Atom type labels of shape (n_atoms)masses (
Optional
[Tensor
]) – Atom masses of shape (n_atoms)pbc (
Optional
[Tensor
]) – Period boundary condition flags for the framecell (
Optional
[Tensor
]) – Box vectors for the configuration, for computing periodic images, of shape (3,3)tag (
Optional
[str
]) – String tag for the configurationenergy (
Optional
[Tensor
]) – Reference scalar energy for the configurationforces (
Optional
[Tensor
]) – Reference cartesian forces for the configuration, of shape (n_atoms, 3)velocities (
Optional
[Tensor
]) – Atomic velocities, of shape (n_atoms, 3)neighborlist (
Optional
[Dict
[str
,Dict
[str
,Any
]]]) – Neighborlist dictionary for tagged features. Eg,
- Returns:
Populated AtomicData instance
- Return type:
data
Coarse Graining Utilities
TODO: document the CG mapping format.
- mlcg.cg.projection.build_cg_matrix(topology, cg_mapping=None, special_terminal=True)[source]
Function for producing coarse grain types, masses, and mapping matrices using a slicing strategy and a predetermined set of atoms to retain at the coarse grain resolution.
- Parameters:
topology (
Topology
) – System topology instancecg_mapping (
Optional
[Dict
[Tuple
[str
,str
],Tuple
[str
,int
,int
]]]) –Mapping dictionary with the following structure:
{ (residue name, atom name) : (compound name, type, mass) ... }
Eg, a row for an alanine carbon alpha atom would be:
{ ("ALA", "CA") : ("CA_A", 1, 12) ... }
special_termini – If True, special types will be reserved for the first and last CG atoms
- Return type:
Tuple
[ndarray
,ndarray
,ndarray
,OrderedDict
]- Returns:
np.ndarray – Array of CG atom types
np.ndarray – Array of CG masses
np.ndarray – One-hot transformation matrix of shape (n_high_res_atoms, n_cg_atoms) that maps atoms for the high resolution repesentation to the coarse grain representation
OrderedDict – Ordered dictionary mapping each CG atom index (with respect to the CG topology) to a list containing the CG atom name, CG atom type and the CG atom mass
- mlcg.cg.projection.build_cg_topology(topology, cg_mapping=None, special_terminal=True, bonds=<function add_chain_bonds>, angles=<function add_chain_angles>, dihedrals=<function add_chain_dihedrals>)[source]
Takes an mlcg.geometry.topology.Topology instance and returns another mlcg.geometry.topology.Topology instance conditioned on the supplied CG mapping
- Parameters:
topology (
Topology
) – Original MLCG topology before coarse grainingcg_mapping (
Optional
[Dict
[Tuple
[str
,str
],Tuple
[str
,int
,int
]]]) – A suitable CG mapping. See mclg.cg._mapping.py for examples.special_termini – If True, the first and last CG atoms recieve their own special types
bonds (
Optional
[Callable
]) – Function to enumerate and define bonds in the final CG topologyangles (
Optional
[Callable
]) – Function to enumerate and define angles in the final CG topologydihedrals (
Optional
[Callable
]) – Function to enumerate and define dihedrals in the final CG topology
- Returns:
CG topoology
- Return type:
- mlcg.cg.projection.build_opeps_connectivity_matrix(resnames)[source]
Builds bonded connectivity matrix for (N,CA,CB,C,O) octapeptide CG mapping scheme, based on amino acid sequence. GLY residues are constructed as (N,CA,C,O).
- Parameters:
resnames (
List
[str
]) – List of three letter amino acid strings that specify the protein/peptide sequence.- Returns:
Undirected graph connectivity matrix for the specified sequence of residues.
- Return type:
connectivity_matrix
- mlcg.cg.projection.isolate_features(target_atoms_list, full_feature_set)[source]
Helper function for isolating molecular features from specified lists of beads.
- Parameters:
target_atoms_list (
List
[List
[int
]]) – if a feature atom from full_feature_set is in this list, that corresponding feature is moved to the isolated feature list. If no feature atom is in this list, then the corresponding feature remains in the bulk feature listfull_feature_set (
Tensor
) – The set of full features that is checked against the target_atoms
- Returns:
List of N feature sets, where the length of target_atoms_list is N-1 in length, the first N-1 features sets correspond to those isolated features, and the final feauture set is the list of bulk/non-isolated features.
- Return type:
feature_sets
- mlcg.cg.projection.make_non_bonded_set(topology, minimum_separation, residue_indices, residue_exclusion=True, edge_exclusion=None)[source]
Helper function for constructing non-bonded sets from a topology
- Parameters:
topology; – input topology
minimum_separation (
int
) – minimum edge separation between pairs of atoms for the non-bonded setresidue_indices (
List
[int
]) – list of indices mapping atoms to residue indicesresidue_exclusion – if True, an extra exclusion rule is applied such that if any two atoms are within the same residue they are automatically removed from the non-bonded set.
edge_exclusion (
Optional
[List
[Tensor
]]) – if None, this list of pairwise edge indices will be excluded from the non-bonded set.
- Returns:
The set of non-bonded edges
- Return type:
non_bonded_edges
- mlcg.cg.projection.make_opep_residue_indices(resnames)[source]
Helper method to make residue index lists, which specify the residue index that a given atom belongs to in an octapeptide topology using the octapeptide CG mapping, wherein non-glycine amino acids are given a 5 atom mapping (N,CA,CB,C,O) while glycines are given a 4 atom mapping (N,CA,C,O).
- Parameters:
resnames (
List
[str
]) – List of three letter amino acid codes that together specify the molecular sequence- Returns:
List of residue indices that specify which specify to which residue an atom belongs
- Return type:
residue_indices
- mlcg.cg.projection.swap_mapping_rows(topology, residue, target_atoms, types, masses, matrix, mapping)[source]
Helper function to swap atom rows systematically in the specified residue. The last four arguments are the outputs of mlcg.cg.projection.build_cg_matrix().
- Parameters:
topology (
Topology
) – Topology of the all-atom (non-CG system).residue (
str
) – Three letter amino acid string specifying the residue in which swaps must be madetarget_atoms (
List
[str
]) – List of atom types to swap placestypes (
ndarray
) – The types array that will by swapped at the specified atoms for the specified residuemasses (
ndarray
) – The mass array that will be swapped at the specified atoms for the specified residuematrix (
ndarray
) – The CG mapping matrix for which rows will be swapped at the specified atoms for the specifed residuemapping (
OrderedDict
) – The mapping dictionary from the original topology to the CG atoms
- Return type:
Tuple
[ndarray
,ndarray
,ndarray
]- Returns:
swapped_types
swapped_masses
swapped_matrix
generalized_mapping
Models
Neural network models for coarse grain property predictions
- class mlcg.nn.schnet.AttentiveSchNet(rbf_layer, cutoff, output_hidden_layer_widths, num_features_in, num_features_out, num_residual_q, num_residual_k, num_residual_v, attention_block, hidden_channels=128, embedding_size=100, num_filters=128, num_interactions=3, activation_func=Tanh(), max_num_neighbors=1000, layer_widths=None, activation_first=False, aggr='add', attention_version='normal')[source]
Small wrapper class for SchNet to simplify the definition of the SchNet model with an Interaction block that includes attention through an input file. The upper distance cutoff attribute in is set by default to match the upper cutoff value in the cutoff function.
- Parameters:
rbf_layer (
Module
) – radial basis function used to project the distances \(r_{ij}\).cutoff (
Module
) – smooth cutoff function to supply to the CFConvoutput_hidden_layer_widths (
List
[int
]) – List giving the number of hidden nodes of each hidden layer of the MLP used to predict the target property from the learned representation.num_features_in (
int
) – size of each input sample for linear layernum_features_out (
int
) – size of each output sample for liner layernum_residuals_q – Number of residual blocks applied to features via self-attention for queries, keys, and values
num_residuals_k – Number of residual blocks applied to features via self-attention for queries, keys, and values
num_residuals_v – Number of residual blocks applied to features via self-attention for queries, keys, and values
attention_block (
Module
) – Specify if you want to use softmax attention (input: ExactAttention) or favor+ (input: FavorAttention)hidden_channels (
int
) – dimension of the learned representation, i.e. dimension of the embeding projection, convolution layers, and interaction block.embedding_size (
int
) – dimension of the input embeddings (should be larger thanAtomicData.atom_types.max()+1
).num_filters (
int
) – number of nodes of the networks used to filter the projected distancesnum_interactions (
int
) – number of interaction blocksactivation – activation function
max_num_neighbors (
int
) – The maximum number of neighbors to return for each atom indata
. If the number of actual neighbors is greater thanmax_num_neighbors
, returned neighbors are picked randomly.aggr (
str
) – Aggregation scheme for continuous filter output. For all options, see here for more options.activation_first (
bool
) – Inverting the order of linear layers and activation functions.attention_version (
str
) – Specifiy which interaction block architecture to choose. By default normal.
- class mlcg.nn.schnet.CFConv(filter_network, cutoff, in_channels=128, out_channels=128, num_filters=128, aggr='add')[source]
Continuous filter convolutions for SchNet.
- Parameters:
filter_net – Neural network for generating filters from expanded pairwise distances
cutoff (
Module
) – Cutoff envelope to apply to the output of the filter generating network.in_channels (
int
) – Hidden input dimensionsout_channels (
int
) – Hidden output dimensionsnum_filters (
int
) – Number of filtersaggr (
str
) – Aggregation scheme for continuous filter output. For all options, see here <https://pytorch-geometric.readthedocs.io/en/latest/notes/create_gnn.html?highlight=MessagePassing#the-messagepassing-base-class>.
- forward(x, edge_index, edge_weight, edge_attr)[source]
Forward pass through the continuous filter convolution.
- Parameters:
x (
Tensor
) – Embedded features of shape (num_examples * num_atoms, hidden_channels)edge_index (
Tensor
) – Graph edge index tensor of shape (2, total_num_edges)edge_weight (
Tensor
) – Graph edge weight (eg, distances), of shape (total_num_edges)edge_attr (
Tensor
) – Graph edge attributes (eg, expanded distances), of shape (total_num_edges, num_rbf)
- Returns:
Updated embedded features of shape (num_examples * num_atoms, hidden_channels)
- Return type:
x
- message(x_j, W)[source]
Message passing operation to perform the continuous filter convolution through element-wise multiplcation of embedded features with the output of the filter network.
- Parameters:
x_j (
Tensor
) – Tensor of embedded features of shape (total_num_edges, hidden_channels)W (
Tensor
) – Tensor of filter values of shape (total_num_edges, num_filters)
- Returns:
Elementwise multiplication of the filters with embedded features.
- Return type:
x_j * W
- class mlcg.nn.schnet.InteractionBlock(cfconv_layer, hidden_channels=128, activation=Tanh())[source]
Interaction blocks for SchNet. Consists of atomwise transformations of embedded features that are continuously convolved with filters generated from radial basis function-expanded pairwise distances.
- Parameters:
cfconv_layer (
Module
) – Continuous filter convolution layer for convolutions of radial basis function-expanded distances with embedded featureshidden_channels (
int
) – Hidden dimension of embedded featuresactivation (
Module
) – Activation function applied to linear layer outputs
- forward(x, edge_index, edge_weight, edge_attr, *args)[source]
Forward pass through the interaction block.
- Parameters:
x (
Tensor
) – Embedded features of shape (num_examples, num_atoms, hidden_channels)edge_index (
Tensor
) – Graph edge index tensor of shape (2, total_num_edges)edge_weight (
Tensor
) – Graph edge weight (eg, distances), of shape (total_num_edges)edge_attr (
Tensor
) – Graph edge attributes (eg, expanded distances), of shape (total_num_edges, num_rbf)
- Returns:
Updated embedded features of shape (num_examples * num_atoms, hidden_channels)
- Return type:
x
- class mlcg.nn.schnet.SchNet(embedding_layer, interaction_blocks, rbf_layer, output_network, self_interaction=False, max_num_neighbors=1000)[source]
PyTorch Geometric implementation of SchNet Code adapted from [PT_geom_schnet] which is based on the architecture described in [Schnet] .
- Parameters:
embedding_layer (
Module
) – Initial embedding layer that transforms atoms/coarse grain bead types into embedded featuresinteraction_blocks (list of torch.nn.Module or torch.nn.Sequential) – Sequential interaction blocks of the model, where each interaction block applies
rbf_layer (
Module
) – The set of radial basis functions that expands pairwise distances between atoms/CG beads.output_network (
Module
) – Output neural network that predicts scalar energies from SchNet features. This network should transform (num_examples * num_atoms, hidden_channels) to (num_examples * num atoms, 1).upper_distance_cutoff – Upper distance cutoff used for making neighbor lists.
self_interaction (
bool
) – If True, self interactions/distancess are calculated. But it never had a function due to a bug in the implementation (see static method neighbor_list). Should be kept False. This option shall not be deleted for compatibility.max_num_neighbors (
int
) – Maximum number of neighbors to return for a given node/atom when constructing the molecular graph during forward passes. This attribute is passed to the torch_cluster radius_graph routine keyword max_num_neighbors, which normally defaults to 32. Users should set this to higher values if they are using higher upper distance cutoffs and expect more than 32 neighbors per node/atom.
- forward(data)[source]
Forward pass through the SchNet architecture.
- Parameters:
data (
AtomicData
) – Input data object containing batch atom/bead positions and atom/bead types.- Returns:
Data dictionary, updated with predicted energy of shape (num_examples * num_atoms, 1), as well as neighbor list information.
- Return type:
data
- class mlcg.nn.schnet.StandardSchNet(rbf_layer, cutoff, output_hidden_layer_widths, hidden_channels=128, embedding_size=100, num_filters=128, num_interactions=3, activation=Tanh(), max_num_neighbors=1000, aggr='add')[source]
Small wrapper class for SchNet to simplify the definition of the SchNet model through an input file. The upper distance cutoff attribute in is set by default to match the upper cutoff value in the cutoff function.
- Parameters:
rbf_layer (
Module
) – radial basis function used to project the distances \(r_{ij}\).cutoff (
Module
) – smooth cutoff function to supply to the CFConvoutput_hidden_layer_widths (
List
[int
]) – List giving the number of hidden nodes of each hidden layer of the MLP used to predict the target property from the learned representation.hidden_channels (
int
) – dimension of the learned representation, i.e. dimension of the embeding projection, convolution layers, and interaction block.embedding_size (
int
) – dimension of the input embeddings (should be larger thanAtomicData.atom_types.max()+1
).num_filters (
int
) – number of nodes of the networks used to filter the projected distancesnum_interactions (
int
) – number of interaction blocksactivation (
Module
) – activation functionmax_num_neighbors (
int
) – The maximum number of neighbors to return for each atom indata
. If the number of actual neighbors is greater thanmax_num_neighbors
, returned neighbors are picked randomly.aggr (
str
) –Aggregation scheme for continuous filter output. For all options, see here for more options.
Radial basis functions
- class mlcg.nn.radial_basis.radial_integral_gto.RIGTOBasis(cutoff, nmax=5, lmax=5, sigma=0.4, mesh_size=300)[source]
This radial basis is the effective basis when expanding an atomic density smeared by Gaussains of width \(\sigma\) on a set of \(n_{max}\) orthonormal Gaussian Type Orbitals (GTOs) and \(l_{max}+1\) Spherical Harmonics (SPHs) namely. This radial basis set is interpolated using natural cubic splines for efficiency and the cutoff is included into the splined functions.
The basis is defined as
\[R_{nl}(r) = f_c(r) \mathcal{N}_n \frac{\Gamma(\frac{n+l+3}{2})}{\Gamma(l+\frac{3}{2})} c^l r^l(c+b_n)^{-\frac{(n+l+3)}{2}} {}_1F_1\left(\frac{n+l+3}{2},l+\frac{3}{2};\frac{c^2 r^2}{c+b_n}\right),\]where \({}_1F_1\) is the confluent hypergeometric function, \(\Gamma\) is the gamma function, \(f_c\) is a cutoff function, \(b_n=\frac{1}{2\sigma_n^2}\), \(c= 1 / (2\sigma^2\), \(\sigma_n = r_\text{cut} \max(\sqrt{n},1)/n_{max}\) and \(\mathcal{N}_n^2 = \frac{2}{\sigma_n^{2n + 3}\Gamma(n + 3/2)}\).
For more details on the derivation, refer to appendix A.
- Parameters:
nmax (
int
) – number of radial basislmax (
int
) – maximum spherical order (lmax included so there are lmax+1 orders)sigma (
float
) – smearing of the atomic densitycutoff (
Union
[int
,float
,_Cutoff
]) – Defines the smooth cutoff function. If a float is provided, it will be interpreted as an upper cutoff and a CosineCutoff will be used between 0 and the provided float. Otherwise, a chosen _Cutoff instance can be supplied.mesh_size (
int
) – number of points used to interpolate with splines the radial basis spanning uniformly the range difined by the cutoff \([0, r_c]\).
- forward(dist)[source]
Expansion of distances through the radial basis function set.
- Parameters:
dist (torch.Tensor) – Input pairwise distances of shape (total_num_edges)
- Returns:
expanded_distances – Distances expanded in the radial basis with shape (total_num_edges, lmax + 1, nmax)
- Return type:
torch.Tensor
- class mlcg.nn.radial_basis.exp_normal.ExpNormalBasis(cutoff, num_rbf=50, trainable=True)[source]
Class for generating a set of exponential normal radial basis functions, as described in [Physnet] . The functions have the following form:
\[f_n(r_{ij};\alpha, r_{low},r_{high}) = f_{cut}(r_{ij},r_{low},r_{high}) \times \exp\left[-\beta_n \left(e^{\alpha (r_{ij} -r_{high}) } - \mu_n \right)^2\right]\]where
\[\alpha = 5.0/(r_{high} - r_{low})\]is a distance rescaling factor, and, by default
\[f_{cut} ( r_{ij},r_{low},r_{high} ) = \cos{\left( r_{ij} \times \pi / r_{high}\right)} + 1.0\]represents a cosine cutoff function (though users can specify their own cutoff function if they desire). :type cutoff:
Union
[int
,float
,_Cutoff
] :param cutoff: Defines the smooth cutoff function. If a float is provided, it will be interpreted asan upper cutoff and a CosineCutoff will be used between 0 and the provided float. Otherwise, a chosen _Cutoff instance can be supplied.
- Parameters:
num_rbf (
int
) – The number of functions in the basis set.trainable (
bool
) – If True, the parameters of the basis (the centers and widths of each function) are registered as optimizable parameters that will be updated during backpropagation. If False, these parameters will be instead fixed in an unoptimizable buffer.
- forward(dist)[source]
Expansion of distances through the radial basis function set. :type dist:
Tensor
:param dist: Input pairwise distances of shape (total_num_edges) :type dist: torch.Tensor- Returns:
expanded_distances – Distances expanded in the radial basis with shape (total_num_edges, num_rbf)
- Return type:
torch.Tensor
- class mlcg.nn.radial_basis.exp_normal.ExtendedExpNormalBasis(*args, internal_cutoff_lower=None, trainable=False, **kwargs)[source]
ExpNormalBasis that allows for arbitrary lower cutoffs not tied to a supplied _Cutoff. This basis follows the definition in [Physnet]. :param internal_cutoff: if specified, this lower cutoff overrides any supplied _Cutoff lower
cutoff.
- Parameters:
trainable – If True, the parameters of the basis (the centers and widths of each function) are registered as optimizable parameters that will be updated during backpropagation. If False, these parameters will be instead fixed in an unoptimizable buffer.
- extended_initial_params()[source]
Method for initializing the basis function parameters, as described in https://pubs.acs.org/doi/10.1021/acs.jctc.9b00181 .
- forward(dist)[source]
Expansion of distances through the radial basis function set. :type dist:
Tensor
:param dist: Input pairwise distances of shape (total_num_edges) :type dist: torch.Tensor- Returns:
expanded_distances – Distances expanded in the radial basis with shape (total_num_edges, num_rbf)
- Return type:
torch.Tensor
- class mlcg.nn.radial_basis.gaussian.GaussianBasis(cutoff, num_rbf=50, trainable=False)[source]
Class that generates a set of equidistant 1-D gaussian basis functions scattered between a specified lower and upper cutoff:
\[f_n = \exp{ \left( -\gamma(r-c_n)^2 \right) }\]- Parameters:
cutoff (
Union
[int
,float
,_Cutoff
]) – Defines the smooth cutoff function. If a float is provided, it will be interpreted as an upper cutoff and an IdentityCutoff will be used between 0 and the provided float. Otherwise, a chosen _Cutoff instance can be supplied.num_rbf (
int
) – The number of gaussian functions in the basis set.trainable (
bool
) – If True, the parameters of the gaussian basis (the centers and widths of each function) are registered as optimizable parameters that will be updated during backpropagation. If False, these parameters will be instead fixed in an unoptimizable buffer.
- class mlcg.nn.radial_basis.exp_spaced.SpacedExpBasis(cutoff, sigma_min=0.25, sigma_factor=2.0, mean_spacing=2.0, trainable=True)[source]
Class for generating a set of exponential normal radial basis functions, with means and standard deviations designed to capture the physics around 2-* A. The functions have an exponential form with the following means and std:
\[ \begin{align}\begin{aligned}\sigma_0 = \sigma_f/s \sigma_1 = \sigma_${min} \sigma_2 = \sigma_f*\sigma_1 ... \sigma_n = \sigma_f*\sigma_${n-1}\\\mu_0 = 0 \mu_1 = \sigma_f \mu_2 = \mu_1 + s*\sigma_1 ... \mu_n = \mu_${n-1} + s*\sigma_${n-1}\end{aligned}\end{align} \]- Parameters:
cutoff (
Union
[int
,float
,_Cutoff
]) – Defines the smooth cutoff function. If a float is provided, it will be interpreted as an IdentityCutoff ranging over [0, cutoff]. Otherwise, a chosen _Cutoff instance can be supplied.sigma_min (
float
) – Width of first non-zero-centered basis functionsigma_factor (
float
) – Location of first non-zero-centered basis function and multiplicative factor to spread std of each new peak bymean_spacing (
float
) – this time previous sigma indicates how much to distance the mean of subsequent gaussian bytrainable (
bool
) – If True, the parameters of the basis (the centers and widths of each function) are registered as optimizable parameters that will be updated during backpropagation. If False, these parameters will be instead fixed in an unoptimizable buffer.
- forward(dist)[source]
Expansion of distances through the radial basis function set.
- Parameters:
dist (torch.Tensor) – Input pairwise distances of shape (total_num_edges)
- Returns:
expanded_distances – Distances expanded in the radial basis with shape (total_num_edges, num_rbf)
- Return type:
torch.Tensor
Spherical Harmonics as polynomials of x, y, z taken from the e3nn repo changing the layout of the output sph (https://github.com/e3nn/e3nn/tree/34ebae91b094392a9f26dfa153f9350d0e24d185)
- class mlcg.nn.angular_basis.spherical_harmonics.SphericalHarmonics(lmax, normalize=False, normalization='integral', irreps_in=None)[source]
JITable module version of real Spherical harmonics
Polynomials defined on the 3d space \(Y^l: \mathbb{R}^3 \longrightarrow \mathbb{R}^{2l+1}\)Usually restricted on the sphere (withnormalize=True
) \(Y^l: S^2 \longrightarrow \mathbb{R}^{2l+1}\)who satisfies the following properties:are polynomials of the cartesian coordinates
x, y, z
is equivariant \(Y^l(R x) = D^l(R) Y^l(x)\)
are orthogonal \(\int_{S^2} Y^l_m(x) Y^j_n(x) dx = \text{cste} \; \delta_{lj} \delta_{mn}\)
The value of the constant depends on the choice of normalization.
It obeys the following property:
\[ \begin{align}\begin{aligned}Y^{l+1}_i(x) &= \text{cste}(l) \; & C_{ijk} Y^l_j(x) x_k\\\partial_k Y^{l+1}_i(x) &= \text{cste}(l) \; (l+1) & C_{ijk} Y^l_j(x)\end{aligned}\end{align} \]Where \(C\) are the wigner_3j.
Note
This function matches with the following table of standard real spherical harmonics from Wikipedia when
normalize=True
,normalization='integral'
and is called with the argument in the ordery,z,x
(instead ofx,y,z
).- Parameters:
l (int or list of int) – degree of the spherical harmonics.
x (torch.Tensor) – tensor \(x\) of shape
(..., 3)
.normalize (bool) – whether to normalize the
x
to vectors that lie on the unit sphere before projecting onto the spherical harmonicsnormalization ({'integral', 'component', 'norm'}) – normalization of the output tensors — note that this option is independent of
normalize
, which controls the processing of the input, rather than the output. Valid options: * component: \(\|Y^l(x)\|^2 = 2l+1, x \in S^2\) * norm: \(\|Y^l(x)\| = 1, x \in S^2\),component / sqrt(2l+1)
* integral: \(\int_{S^2} Y^l_m(x)^2 dx = 1\),component / sqrt(4pi)
- Returns:
a list of tensors of shapes
[lmax][n_neighbor, m]
where \(m<=l\)\[Y^l(x)\]- Return type:
List[torch.Tensor]
- forward(x)[source]
Defines the computation performed at every call.
Should be overridden by all subclasses. :rtype:
List
[Tensor
]Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
Cutoff functions
- class mlcg.nn.cutoff.CosineCutoff(cutoff_lower=0.0, cutoff_upper=5.0)[source]
Class implementing a cutoff envelope based a cosine signal in the interval [lower_cutoff, upper_cutoff]:
\[\cos{\left( r_{ij} \times \pi / r_{high}\right)} + 1.0\]NOTE: The behavior of the cutoff is qualitatively different for lower cutoff values greater than zero when compared to the zero lower cutoff default. We recommend visualizing your basis to see if it makes physical sense.
\[0.5 \cos{ \left[ \pi \left(2 \frac{r_{ij} - r_{low}}{r_{high} - r_{low}} + 1.0 \right)\right]} + 0.5\]
- class mlcg.nn.cutoff.IdentityCutoff(cutoff_lower=0, cutoff_upper=inf)[source]
Cutoff function that is one everywhere, but retains cutoff_lower and cutoff_upper attributes
- Parameters:
cutoff_lower (
float
) – left bound for the radial cutoff distancecutoff_upper (
float
) – right bound for the radial cutoff distance
- class mlcg.nn.cutoff.ShiftedCosineCutoff(cutoff=5.0, smooth_width=0.5)[source]
Class of Behler cosine cutoff with an additional smoothing parameter.
\[0.5 + 0.5 \cos{ \left[ \pi \left( \frac{r_{ij} - r_{high} + \sigma}{\sigma}\right)\right]}\]where \(\sigma\) is the smoothing width.
- Parameters:
cutoff (
Union
[int
,float
]) – cutoff radiussmooth_width (
Union
[int
,float
]) – parameter that controls the extent of smoothing in the cutoff envelope.
Datasets
- class mlcg.datasets.chignolin.ChignolinDataset(root, transform=None, pre_transform=None, pre_filter=None, mapping='slice_aggregate', terminal_embeds=False, max_num_files=None)[source]
Dataset for training a CG model of the chignolin protein following a Cα CG mapping using the all-atom data from [CGnet].
This Dataset produces delta forces for model training, in which the CG prior forces (harmonic bonds and angles and a repulsive term) have been subtracted from the full CG forces.
- kB = 0.0019872041
Boltzmann constan in kcal/mol/K
- property processed_file_names
The name of the files in the
self.processed_dir
folder that must be present in order to skip processing.
- property raw_file_names
The name of the files in the
self.raw_dir
folder that must be present in order to skip downloading.
- temperature = 350
Temperature used to generate the underlying all-atom data in [K]
- class mlcg.datasets.chignolin.ChignolinDatasetWithDihedralPriors(root, transform=None, pre_transform=None, pre_filter=None)[source]
Inherit from Chignolin Dataset and redefine priors to compute Modify _prior_cls to include priors to compute Must provide
- class mlcg.datasets.alanine_dipeptide.AlanineDataset(root, stride=1, transform=None, pre_transform=None, pre_filter=None)[source]
Dataset for training a CG model of the alanine-dipeptide protein following a Cα + 1 Cbeta CG mapping
Alanine Dipeptide CG structure:
CB(3) | N(1) - CA(2) - C(4) / \ C(0) N(5)
This Dataset produces delta forces for model training, in which the CG prior forces (harmonic bonds and angles) have been subtracted from the full CG forces.
- The class works as follows:
If the raw data (coordinates, forces, and pdb file) for alanine dipeptide does not exist, the files will automatically be downloaded and processed
If the raw data exists, “root/processed/” will be created and the raw dataset will be processed
- If the raw data and processed data exists, the class will load the processed dataset containing:
data : AtomicData object containing all the CG positions, forces, embeddings
slices : Slices of the AtomicData object
prior_models : Prior models (HarmonicBonds, HarmonicAngles) of the dataset with x_0 and sigma
topologies : Object of Topology class containing all topology information of the _CG_ molecule, including neighbor lists for bond and angle priors
- Inputs:
- root :
Location for AlanineDataset to be downloaded and processed
- stride :
Stride length over dataset
- Default priors:
HarmonicBonds, HarmonicAngles
- Optional priors:
- Repulsion
If repulsion prior is used, a custom neighbor list is created for the repulsion prior where all pairs of beads not interacting through bonds and angles are included in the interaction set
- kB = 0.0019872041
Boltzmann constant in kcal/mol/K
- static load_original_topology(pdb_file)[source]
Method to load origin topology
- Parameters:
pdb_file (
str
) – Path of all-atom PDB file- Returns:
Topology class object for all-atom molecule
- Return type:
topology
- static make_baseline_models(data, beta, priors_cls)[source]
Method to make all baseline models
- Parameters:
data (
AtomicData
) – AtomicData object of entire trajectorybeta (
float
) – 1/(k_B * T)priors_cls (
List
[_Prior
]) – List of prior classes
- Returns:
Dictionary of prior models fitted with the right harmonic restraint values
- Return type:
baseline_models
- static make_cg_topology(topology, cg_mapping={('ACE', 'C'): ('CA_C', 0, 12), ('ALA', 'C'): ('C_A', 4, 12), ('ALA', 'CA'): ('CA_A', 2, 12), ('ALA', 'CB'): ('CB_A', 3, 12), ('ALA', 'N'): ('N_A', 1, 14), ('NME', 'N'): ('N_N', 5, 14)}, special_terminal=False)[source]
Method to make Topology class object of CG molecule, creates custom bonds and angles to make a non-linear CG molecule
- Parameters:
topology (
Topology
) – All-atom topologycg_mapping ((optional)) – Dictionary containing CG mapping. The default is AL_CG_MAP.
special_terminal ((optional)) – True if termini beads are to be treated separately. The default is False.
- Returns:
Topology class object for CG molecule
- Return type:
cg_topo
- make_data_slices_prior(coords_forces_file, topology, cg_mapping, cg_topo)[source]
Method to make collated AtomicData object, slices, and baseline models
- Parameters:
- Return type:
Tuple
[AtomicData
,dict
,ModuleDict
]- Returns:
data_list_coll – Collated AtomicData object
slices – Dictionary containing slices of the AtomicData object
baseline_models – Module dictionary containing fitted prior models
- make_priors(priors_cls, cg_topo)[source]
Method to make prior neighbor lists
- Parameters:
priors_cls (
List
[_Prior
]) – List of prior classescg_topo (
Topology
)
- Returns:
Dictionary containing all prior neighbor lists
- Return type:
prior_nls
- process(cg_mapping={('ACE', 'C'): ('CA_C', 0, 12), ('ALA', 'C'): ('C_A', 4, 12), ('ALA', 'CA'): ('CA_A', 2, 12), ('ALA', 'CB'): ('CB_A', 3, 12), ('ALA', 'N'): ('N_A', 1, 14), ('NME', 'N'): ('N_N', 5, 14)})[source]
Method for processing the raw data - this is where all processing function calls take place All outputs are stored in the relevant processed files
- Parameters:
cg_mapping ((optional)) – CG mapping dictionary. The default is AL_CG_MAP.
- property processed_file_names
The name of the files in the
self.processed_dir
folder that must be present in order to skip processing.
- property raw_file_names
The name of the files in the
self.raw_dir
folder that must be present in order to skip downloading.
- static repulsion_nls(name, cg_topo)[source]
Method for generating neighbor list for Repulsion prior - all interactions not included in bond or angle interactions
- Parameters:
name (
str
) – Name of prior classcg_topo (
Topology
) – Topology object of CG molecule
- Returns:
Dictionary containing repulsion neighor list
- Return type:
dict
- save_dataset(pickle_name)[source]
Method for saving dataset given pickle name
- Parameters:
pickle_name (
str
) – Name of pickle to store the dataset in
- temperature = 300
Temperature used to generate the underlying all-atom data in [K]
- class mlcg.datasets.h5_dataset.H5Dataset(h5_file_path, partition_options, loading_options, subsample_using_weights=False, exclude_listed_pairs=False)[source]
The top-level class for handling multiple datasets contained in a HDF5 file.
- Parameters:
h5_file_path (
str
) – Path to the hdf5 file containing the dataset(s)partition_options (
Dict
) – Dictionary of partition names mapping to collections of metaset informationloading_options (
Dict
) – Dictionary of options specifying how hdf5 attrs/datasets are named within hd5 groups
- training_validation_splitting(input_detailed_indices, part_name, metaset_name, mol_list)[source]
Option to split molecule in metaset frame by frame into training or validation Inputs:
- input_detailed_indices –
- dictionary read in from yaml file about how data should be split
must contain 3 primary keys [val_ratio, test_ratio, seed] additional option to write out to filename if in dict.keys()
- part_name –
which partition is currently being examined
- metaset_name –
global name describing class of molecules
- mol_list –
names of molecules
- Outputs:
- self._detailed_indices[part_name][metaset_name] –
pass back indices according to queried metaset and partition
- class mlcg.datasets.h5_dataset.H5MetasetDataLoader(metaset, batch_size, collater_fn=<torch_geometric.loader.dataloader.Collater object>, shuffle=True, pin_memory=False)[source]
Load batches from one Metaset. For kwargs/options, see https://pytorch.org/docs/stable/data.html?highlight=dataset#torch.utils.data.Dataset
- Parameters:
metaset (
Dataset
) – Dataset object for a single set of moleculesbatch_size (
int
) – Size of the batches to draw from the metaset
- class mlcg.datasets.h5_dataset.H5PartitionDataLoader(data_partition, collater_fn=<torch_geometric.loader.dataloader.Collater object>, pin_memory=False, subsample_using_weights=False)[source]
Load batches from one or multiple Metasets in a Partition. In multiple Metasets scenario, the order of data loaders will be alphabetically ascending with respect to the Metaset names.
- class mlcg.datasets.h5_dataset.H5SimpleDataset(h5_file_path, stride=1, detailed_indices=None, metaset_name=None, mol_list=None, hdf_key_mapping={'coords': 'cg_coords', 'embeds': 'attrs:cg_embeds', 'forces': 'cg_delta_forces'}, parallel={'rank': 0, 'world_size': 1}, subsample_using_weights=False, exclude_listed_pairs=False)[source]
The top-level class for handling a single dataset contained in a HDF5 file. Will only load from one single type of molecules (i.e., one Metaset), and do not support partition splits. Use .get_dataloader for obtaining a dataloader for PyTorch training, etc.
- Parameters:
h5_file_path (
str
) – Path to the hdf5 file containing the dataset(s)stride (default 1) – Stride for loading the frames
detailed_indices (default None) – Set this to manually define which frames to be included for each molecule.
metaset_name (default None) – the name of the h5 group containing the molecule data. If kept None and the given file consists of only one metaset, this parameter will be inferred from the file.
mol_list (default None) – A list of the molecules to be loaded. When kept None, all molecules will be loaded.
hdf_key_mapping (default loading the delta forces) – Key mapping for reading the data from h5 file.
parallel (default for single process) – For DDP parallelism. Details see the head of this file.
- class mlcg.datasets.h5_dataset.MetaSet(name)[source]
Set of MolData instances for molecules with similar characterstics
- Parameters:
name – Name of the metaset
- static create_from_hdf5_group(hdf5_group, mol_list, detailed_indices=None, stride=1, hdf_key_mapping={'coords': 'cg_coords', 'embeds': 'attrs:cg_embeds', 'forces': 'cg_delta_forces', 'weights': 'subsampling_weights'}, parallel={'rank': 0, 'world_size': 1}, subsample_using_weights=False, exclude_listed_pairs=False)[source]
initiate a Metaset by loading data from given HDF-group. mol_list, detailed_indices, hdf_key_mapping, stride and parallel can control which subset is loaded to the Metaset (details see the description of “H5Dataset”)
- class mlcg.datasets.h5_dataset.MolData(name, embeds, coords, forces, weights=None, exclusion_pairs=None)[source]
Data-holder for coordinates, forces and embedding vector of a molecule, e.g., opep_0000.
- Parameters:
name (
str
) – Name of the moleculeembeds (
ndarray
) – Type embeddings for each atom, of shape (n_atoms,)coords (
ndarray
) – Cartesian coordinates of the molecule, of shape (n_frames, n_atoms, 3)forces (
ndarray
) – Cartesian forces of the molecule, of shape (n_frames, n_atoms, 3)
Neighbor List
Main interface to the computation of neighborlists with a finite spherical cutoff
- mlcg.neighbor_list.neighbor_list.atomic_data2neighbor_list(data, rcut, self_interaction=False, max_num_neighbors=1000)[source]
Build a neighborlist from a mlcg.data.atomic_data.AtomicData by searching for neighboring atom within a maximum radius rcut.
- Parameters:
data (mlcg.data.atomic_data.AtomicData) – define an atomic structure
rcut (
float
) – cutoff radius used to compute the connectivityself_interaction (
bool
) – whether the mapping includes self referring mappings, e.g. mappings where i == j.max_num_neighbors (
int
) – The maximum number of neighbors to return for each atom indata
. If the number of actual neighbors is greater thanmax_num_neighbors
, returned neighbors are picked randomly.
- Returns:
Neighborlist dictionary
- Return type:
Dict
- mlcg.neighbor_list.neighbor_list.make_neighbor_list(tag, order, index_mapping, mapping_batch=None, cell_shifts=None, rcut=None, self_interaction=None)[source]
Build a neighbor_list dictionary.
- Parameters:
tag (
str
) – quick identifier for compatibility checkingorder (
int
) – an int providing the order of the neighborlist, e.g. order == 2 means that central atoms i have 1 neighbor j so distances can be computed, order == 3 means that central atoms i have 2 neighbors j and k so angles can be computedindex_mapping (
Tensor
) – The [order, n_edge] index tensor giving center -> neighbor relations. 1st column refers to the central atom index and the 2nd column to the neighbor atom index in the list of atoms (so it has to be shifted by a cell_shift to get the actual position of the neighboring atoms)mapping_batch (
Optional
[Tensor
]) – [n_edge] map for the neighbor -> structure index relationcell_shifts (
Optional
[Tensor
]) – A [n_edge, 3] tensor giving the periodic cell shiftrcut (
Optional
[float
]) – cutoff radius used to compute the connectivityself_interaction (
Optional
[bool
]) – whether the mapping includes self referring mappings, e.g. mappings where i == j.
- Returns:
Neighborlist dictionary
- Return type:
Dict
- mlcg.neighbor_list.neighbor_list.validate_neighborlist(inp)[source]
Tool to validate that the neighborlist dictionary has the required fields
- Parameters:
inp (
Dict
) – Input neighborlist to be validated- Returns:
True if the supplied neighborlist is valid, false otherwise
- Return type:
bool
Torch geometric implementation
- mlcg.neighbor_list.torch_impl.compute_images(positions, cell, pbc, cutoff, batch, n_atoms)[source]
TODO: add doc
- Return type:
Tuple
[Tensor
,Tensor
,Tensor
,Tensor
]
- mlcg.neighbor_list.torch_impl.get_j_idx(edge_index, batch_images, n_atoms)[source]
TODO: add docs
- Return type:
Tensor
- mlcg.neighbor_list.torch_impl.torch_neighbor_list(data, rcut, self_interaction=True, num_workers=1, max_num_neighbors=1000)[source]
Function for computing neighbor lists from pytorch geometric data instances that may or may not use periodic boundary conditions
- Parameters:
data (
Data
) – Pytorch geometric data instancercut (
float
) – upper distance cutoff, in which neighbors with distances larger than this cutoff will be excluded.self_interaction (
bool
) – If True, each atom will be considered it’s own neighbor. This can be convenient for certain bases and representationsnum_workers (
int
) – Number of threads to spawn and use for computing the neighbor listmax_number_neighbors – kwarg for radius_graph function from torch_cluster package, specifying the maximum number of neighbors for each atom
- Return type:
Tuple
[Tensor
,Tensor
,Tensor
,Tensor
]- Returns:
torch.Tensor – The atom indices of the first atoms in each neighbor pair
torch.Tensor – The atom indices of the second atoms in each neighbor pair
torch.Tensor – The cell shifts associated with minimum image distances in the presence of periodic boundary conditions
torch.Tensor – Mask for excluding self interactions
- mlcg.neighbor_list.torch_impl.torch_neighbor_list_no_pbc(data, rcut, self_interaction=True, num_workers=1, max_num_neighbors=1000)[source]
Function for producing torch neighborlists without periodic boundary conditions
- Parameters:
data (
AtomicData
) – AtomicData instancercut (
float
) – Upper distance cutoff for determining neighbor edgesself_interaction (
Optional
[bool
]) – If True, self edges will added for each atomnum_workers (
Optional
[int
]) – Number of threads to use for neighbor enumerationmax_num_neighbors (
Optional
[int
]) – The maximum number of neighbors to return for each atom. For larger systems, it is important to make sure that this number is sufficiently large.
- Return type:
Tuple
[Tensor
,Tensor
,Tensor
]- Returns:
torch.Tensor – The first atoms in each edge
torch.Tensor – The second atoms in each edge
torch.Tensor – Boolean tensor identifying self_interacting edges
- mlcg.neighbor_list.torch_impl.torch_neighbor_list_pbc(data, rcut, self_interaction=True, num_workers=1, max_num_neighbors=1000)[source]
Function for returning torch neighborlists from AtomicData instances with periodic boundary conditions. The minimum image convention is used for resolving periodic shifts.
- Parameters:
data (
AtomicData
) – AtomicData instancercut (
float
) – Upper distance cutoff for determining neighbor edgesself_interaction (
Optional
[bool
]) – If True, self edges will added for each atomnum_workers (
Optional
[int
]) – Number of threads to use for neighbor enumerationmax_num_neighbors (
Optional
[int
]) – The maximum number of neighbors to return for each atom. For larger systems, it is important to make sure that this number is sufficiently large.
- Return type:
Tuple
[Tensor
,Tensor
,Tensor
]- Returns:
torch.Tensor – The first atoms in each edge
torch.Tensor – The second atoms in each edge
torch.Tensor – Boolean tensor identifying self_interacting edges
- mlcg.neighbor_list.torch_impl.wrap_positions(data, eps=1e-07)[source]
Wrap positions to unit cell.
Returns positions changed by a multiple of the unit cell vectors to fit inside the space spanned by these vectors.
- Parameters:
data (
Data
) – torch_geometric.Data instanceeps (float) – Small number to prevent slightly negative coordinates from being wrapped.
- Return type:
None
Utilities to compute the internal coordinates
TODO: write tests
- mlcg.geometry.internal_coordinates.compute_angles(pos, mapping, cell_shifts=None)[source]
Compute the cosine of the angle between the positions in
pos
following themapping
assuming that the mapping indices follow:j--k / i
\[\begin{split}\cos{\theta_{ijk}} &= \frac{\mathbf{r}_{ji} \mathbf{r}_{jk}}{r_{ji} r_{jk}} \\ r_{ji}&= ||\mathbf{r}_i - \mathbf{r}_j||_{2} \\ r_{jk}&= ||\mathbf{r}_k - \mathbf{r}_j||_{2}\end{split}\]In the case of periodic boundary conditions,
cell_shifts
must be provided so that \(\mathbf{r}_j\) can be outside of the original unit cell.
- mlcg.geometry.internal_coordinates.compute_distance_vectors(pos, mapping, cell_shifts=None)[source]
Compute the distance (or displacement) vectors between the positions in
pos
following themapping
assuming that that mapping indices follow:i--j
such that:
\[\begin{split}r_{ij} &= ||\mathbf{r}_j - \mathbf{r}_i||_{2} \\ \hat{\mathbf{r}}_{ij} &= \frac{\mathbf{r}_{ij}}{r_{ij}}\end{split}\]In the case of periodic boundary conditions,
cell_shifts
must be provided so that \(\mathbf{r}_j\) can be outside of the original unit cell.
- mlcg.geometry.internal_coordinates.compute_distances(pos, mapping, cell_shifts=None)[source]
Compute the distance between the positions in
pos
following themapping
assuming that mapping indices follow:i--j
such that:
\[r_{ij} = ||\mathbf{r}_j - \mathbf{r}_i||_{2}\]In the case of periodic boundary conditions,
cell_shifts
must be provided so that \(\mathbf{r}_j\) can be outside of the original unit cell.
- mlcg.geometry.internal_coordinates.compute_torsions(pos, mapping)[source]
Compute the angle between two planes from positions in :obj:’pos’ following the
mapping
:For dihedrals: the angle w.r.t. position of i&l is positive if l i rotated clockwise when staring down bond jk
j–k–l
/
i
For impropers: the angle is positive if when looking in plane ikj, l is rotated clockwise
- k
l–j
/
i
In the case of periodic boundary conditions,
cell_shifts
must be provided so that \(\mathbf{r}_j\) can be outside of the original unit cell.