Models
mlcg.nn currently implements the SchNet graph neural network, as well as several utility classes for computing distance expansions and cutoffs. The nn subpackage also contains several useful classes for extracting other properties from energy predictions or aggregating the predictions from several different model types.
SchNet Utilities
These classes are used to define a SchNet graph neural network. For “typical” SchNet models, users may find the class StandardSchNet to be helpful in getting started quickly.
- class mlcg.nn.StandardSchNet(rbf_layer, cutoff, output_hidden_layer_widths, hidden_channels=128, embedding_size=100, num_filters=128, num_interactions=3, activation=Tanh(), max_num_neighbors=1000, aggr='add')[source]
Small wrapper class for SchNet to simplify the definition of the SchNet model through an input file. The upper distance cutoff attribute in is set by default to match the upper cutoff value in the cutoff function.
- Parameters:
rbf_layer (
Module
) – radial basis function used to project the distances \(r_{ij}\).cutoff (
Module
) – smooth cutoff function to supply to the CFConvoutput_hidden_layer_widths (
List
[int
]) – List giving the number of hidden nodes of each hidden layer of the MLP used to predict the target property from the learned representation.hidden_channels (
int
) – dimension of the learned representation, i.e. dimension of the embeding projection, convolution layers, and interaction block.embedding_size (
int
) – dimension of the input embeddings (should be larger thanAtomicData.atom_types.max()+1
).num_filters (
int
) – number of nodes of the networks used to filter the projected distancesnum_interactions (
int
) – number of interaction blocksactivation (
Module
) – activation functionmax_num_neighbors (
int
) – The maximum number of neighbors to return for each atom indata
. If the number of actual neighbors is greater thanmax_num_neighbors
, returned neighbors are picked randomly.aggr (
str
) – Aggregation scheme for continuous filter output. For all options, see here for more options.
- class mlcg.nn.schnet.SchNet(embedding_layer, interaction_blocks, rbf_layer, output_network, self_interaction=False, max_num_neighbors=1000)[source]
PyTorch Geometric implementation of SchNet Code adapted from [PT_geom_schnet] which is based on the architecture described in [Schnet] .
- Parameters:
embedding_layer (
Module
) – Initial embedding layer that transforms atoms/coarse grain bead types into embedded featuresinteraction_blocks (list of torch.nn.Module or torch.nn.Sequential) – Sequential interaction blocks of the model, where each interaction block applies
rbf_layer (
Module
) – The set of radial basis functions that expands pairwise distances between atoms/CG beads.output_network (
Module
) – Output neural network that predicts scalar energies from SchNet features. This network should transform (num_examples * num_atoms, hidden_channels) to (num_examples * num atoms, 1).upper_distance_cutoff – Upper distance cutoff used for making neighbor lists.
self_interaction (
bool
) – If True, self interactions/distancess are calculated. But it never had a function due to a bug in the implementation (see static method neighbor_list). Should be kept False. This option shall not be deleted for compatibility.max_num_neighbors (
int
) – Maximum number of neighbors to return for a given node/atom when constructing the molecular graph during forward passes. This attribute is passed to the torch_cluster radius_graph routine keyword max_num_neighbors, which normally defaults to 32. Users should set this to higher values if they are using higher upper distance cutoffs and expect more than 32 neighbors per node/atom.
- class mlcg.nn.schnet.InteractionBlock(cfconv_layer, hidden_channels=128, activation=Tanh())[source]
Interaction blocks for SchNet. Consists of atomwise transformations of embedded features that are continuously convolved with filters generated from radial basis function-expanded pairwise distances.
- Parameters:
cfconv_layer (
Module
) – Continuous filter convolution layer for convolutions of radial basis function-expanded distances with embedded featureshidden_channels (
int
) – Hidden dimension of embedded featuresactivation (
Module
) – Activation function applied to linear layer outputs
- class mlcg.nn.schnet.CFConv(filter_network, cutoff, in_channels=128, out_channels=128, num_filters=128, aggr='add')[source]
Continuous filter convolutions for SchNet.
- Parameters:
filter_net – Neural network for generating filters from expanded pairwise distances
cutoff (
Module
) – Cutoff envelope to apply to the output of the filter generating network.in_channels (
int
) – Hidden input dimensionsout_channels (
int
) – Hidden output dimensionsnum_filters (
int
) – Number of filtersaggr (
str
) – Aggregation scheme for continuous filter output. For all options, see here <https://pytorch-geometric.readthedocs.io/en/latest/notes/create_gnn.html?highlight=MessagePassing#the-messagepassing-base-class>.
MACE Utilities
mlcg provides wrapper utitilies for [MACE] models, the base implementation of which is available here in the develop branch.
Radial Basis Functions
Sets of radial basis functions are used to expand the distances (or other molecular features) between atoms on a fixed-sized vector. For instance, this is the main transformation of the distances in the SchNet model.
- class mlcg.nn.radial_basis.GaussianBasis(cutoff, num_rbf=50, trainable=False)[source]
Class that generates a set of equidistant 1-D gaussian basis functions scattered between a specified lower and upper cutoff:
\[f_n = \exp{ \left( -\gamma(r-c_n)^2 \right) }\]- Parameters:
cutoff (
Union
[int
,float
,_Cutoff
]) – Defines the smooth cutoff function. If a float is provided, it will be interpreted as an upper cutoff and an IdentityCutoff will be used between 0 and the provided float. Otherwise, a chosen _Cutoff instance can be supplied.num_rbf (
int
) – The number of gaussian functions in the basis set.trainable (
bool
) – If True, the parameters of the gaussian basis (the centers and widths of each function) are registered as optimizable parameters that will be updated during backpropagation. If False, these parameters will be instead fixed in an unoptimizable buffer.
- class mlcg.nn.radial_basis.ExpNormalBasis(cutoff, num_rbf=50, trainable=True)[source]
Class for generating a set of exponential normal radial basis functions, as described in [Physnet] . The functions have the following form:
\[f_n(r_{ij};\alpha, r_{low},r_{high}) = f_{cut}(r_{ij},r_{low},r_{high}) \times \exp\left[-\beta_n \left(e^{\alpha (r_{ij} -r_{high}) } - \mu_n \right)^2\right]\]where
\[\alpha = 5.0/(r_{high} - r_{low})\]is a distance rescaling factor, and, by default
\[f_{cut} ( r_{ij},r_{low},r_{high} ) = \cos{\left( r_{ij} \times \pi / r_{high}\right)} + 1.0\]represents a cosine cutoff function (though users can specify their own cutoff function if they desire). :type cutoff:
Union
[int
,float
,_Cutoff
] :param cutoff: Defines the smooth cutoff function. If a float is provided, it will be interpreted asan upper cutoff and a CosineCutoff will be used between 0 and the provided float. Otherwise, a chosen _Cutoff instance can be supplied.
- Parameters:
num_rbf (
int
) – The number of functions in the basis set.trainable (
bool
) – If True, the parameters of the basis (the centers and widths of each function) are registered as optimizable parameters that will be updated during backpropagation. If False, these parameters will be instead fixed in an unoptimizable buffer.
- class mlcg.nn.radial_basis.RIGTOBasis(cutoff, nmax=5, lmax=5, sigma=0.4, mesh_size=300)[source]
This radial basis is the effective basis when expanding an atomic density smeared by Gaussains of width \(\sigma\) on a set of \(n_{max}\) orthonormal Gaussian Type Orbitals (GTOs) and \(l_{max}+1\) Spherical Harmonics (SPHs) namely. This radial basis set is interpolated using natural cubic splines for efficiency and the cutoff is included into the splined functions.
The basis is defined as
\[R_{nl}(r) = f_c(r) \mathcal{N}_n \frac{\Gamma(\frac{n+l+3}{2})}{\Gamma(l+\frac{3}{2})} c^l r^l(c+b_n)^{-\frac{(n+l+3)}{2}} {}_1F_1\left(\frac{n+l+3}{2},l+\frac{3}{2};\frac{c^2 r^2}{c+b_n}\right),\]where \({}_1F_1\) is the confluent hypergeometric function, \(\Gamma\) is the gamma function, \(f_c\) is a cutoff function, \(b_n=\frac{1}{2\sigma_n^2}\), \(c= 1 / (2\sigma^2\), \(\sigma_n = r_\text{cut} \max(\sqrt{n},1)/n_{max}\) and \(\mathcal{N}_n^2 = \frac{2}{\sigma_n^{2n + 3}\Gamma(n + 3/2)}\).
For more details on the derivation, refer to appendix A.
- Parameters:
nmax (
int
) – number of radial basislmax (
int
) – maximum spherical order (lmax included so there are lmax+1 orders)sigma (
float
) – smearing of the atomic densitycutoff (
Union
[int
,float
,_Cutoff
]) – Defines the smooth cutoff function. If a float is provided, it will be interpreted as an upper cutoff and a CosineCutoff will be used between 0 and the provided float. Otherwise, a chosen _Cutoff instance can be supplied.mesh_size (
int
) – number of points used to interpolate with splines the radial basis spanning uniformly the range difined by the cutoff \([0, r_c]\).
- class mlcg.nn.radial_basis.SpacedExpBasis(cutoff, sigma_min=0.25, sigma_factor=2.0, mean_spacing=2.0, trainable=True)[source]
Class for generating a set of exponential normal radial basis functions, with means and standard deviations designed to capture the physics around 2-* A. The functions have an exponential form with the following means and std:
\[ \begin{align}\begin{aligned}\sigma_0 = \sigma_f/s \sigma_1 = \sigma_${min} \sigma_2 = \sigma_f*\sigma_1 ... \sigma_n = \sigma_f*\sigma_${n-1}\\\mu_0 = 0 \mu_1 = \sigma_f \mu_2 = \mu_1 + s*\sigma_1 ... \mu_n = \mu_${n-1} + s*\sigma_${n-1}\end{aligned}\end{align} \]- Parameters:
cutoff (
Union
[int
,float
,_Cutoff
]) – Defines the smooth cutoff function. If a float is provided, it will be interpreted as an IdentityCutoff ranging over [0, cutoff]. Otherwise, a chosen _Cutoff instance can be supplied.sigma_min (
float
) – Width of first non-zero-centered basis functionsigma_factor (
float
) – Location of first non-zero-centered basis function and multiplicative factor to spread std of each new peak bymean_spacing (
float
) – this time previous sigma indicates how much to distance the mean of subsequent gaussian bytrainable (
bool
) – If True, the parameters of the basis (the centers and widths of each function) are registered as optimizable parameters that will be updated during backpropagation. If False, these parameters will be instead fixed in an unoptimizable buffer.
- class mlcg.nn.angular_basis.SphericalHarmonics(lmax, normalize=False, normalization='integral', irreps_in=None)[source]
JITable module version of real Spherical harmonics
Polynomials defined on the 3d space \(Y^l: \mathbb{R}^3 \longrightarrow \mathbb{R}^{2l+1}\)Usually restricted on the sphere (withnormalize=True
) \(Y^l: S^2 \longrightarrow \mathbb{R}^{2l+1}\)who satisfies the following properties:are polynomials of the cartesian coordinates
x, y, z
is equivariant \(Y^l(R x) = D^l(R) Y^l(x)\)
are orthogonal \(\int_{S^2} Y^l_m(x) Y^j_n(x) dx = \text{cste} \; \delta_{lj} \delta_{mn}\)
The value of the constant depends on the choice of normalization.
It obeys the following property:
\[ \begin{align}\begin{aligned}Y^{l+1}_i(x) &= \text{cste}(l) \; & C_{ijk} Y^l_j(x) x_k\\\partial_k Y^{l+1}_i(x) &= \text{cste}(l) \; (l+1) & C_{ijk} Y^l_j(x)\end{aligned}\end{align} \]Where \(C\) are the wigner_3j.
Note
This function matches with the following table of standard real spherical harmonics from Wikipedia when
normalize=True
,normalization='integral'
and is called with the argument in the ordery,z,x
(instead ofx,y,z
).- Parameters:
l (int or list of int) – degree of the spherical harmonics.
x (torch.Tensor) – tensor \(x\) of shape
(..., 3)
.normalize (bool) – whether to normalize the
x
to vectors that lie on the unit sphere before projecting onto the spherical harmonicsnormalization ({'integral', 'component', 'norm'}) – normalization of the output tensors — note that this option is independent of
normalize
, which controls the processing of the input, rather than the output. Valid options: * component: \(\|Y^l(x)\|^2 = 2l+1, x \in S^2\) * norm: \(\|Y^l(x)\| = 1, x \in S^2\),component / sqrt(2l+1)
* integral: \(\int_{S^2} Y^l_m(x)^2 dx = 1\),component / sqrt(4pi)
- Returns:
a list of tensors of shapes
[lmax][n_neighbor, m]
where \(m<=l\)\[Y^l(x)\]- Return type:
List[torch.Tensor]
Cutoff Functions
Cutoff functions are used to enforce the smoothness of the models w.r.t. neighbor insertion/removal from an atomic environment. Some are also used to damp the signal from a neighbor’s displacement that is “far” from the central atom, e.g. CosineCutoff. Cutoff functions are also used in the construction of radial basis functions.
- class mlcg.nn.cutoff.IdentityCutoff(cutoff_lower=0, cutoff_upper=inf)[source]
Cutoff function that is one everywhere, but retains cutoff_lower and cutoff_upper attributes
- Parameters:
cutoff_lower (
float
) – left bound for the radial cutoff distancecutoff_upper (
float
) – right bound for the radial cutoff distance
- class mlcg.nn.cutoff.CosineCutoff(cutoff_lower=0.0, cutoff_upper=5.0)[source]
Class implementing a cutoff envelope based a cosine signal in the interval [lower_cutoff, upper_cutoff]:
\[\cos{\left( r_{ij} \times \pi / r_{high}\right)} + 1.0\]NOTE: The behavior of the cutoff is qualitatively different for lower cutoff values greater than zero when compared to the zero lower cutoff default. We recommend visualizing your basis to see if it makes physical sense.
\[0.5 \cos{ \left[ \pi \left(2 \frac{r_{ij} - r_{low}}{r_{high} - r_{low}} + 1.0 \right)\right]} + 0.5\]
- class mlcg.nn.cutoff.ShiftedCosineCutoff(cutoff=5.0, smooth_width=0.5)[source]
Class of Behler cosine cutoff with an additional smoothing parameter.
\[0.5 + 0.5 \cos{ \left[ \pi \left( \frac{r_{ij} - r_{high} + \sigma}{\sigma}\right)\right]}\]where \(\sigma\) is the smoothing width.
- Parameters:
cutoff (
Union
[int
,float
]) – cutoff radiussmooth_width (
Union
[int
,float
]) – parameter that controls the extent of smoothing in the cutoff envelope.
Model Building Utilities
These classes are used to build more complicated models.
- class mlcg.nn.gradients.GradientsOut(model, targets='forces')[source]
Gradient wrapper for models.
- Parameters:
targets (
str
) – The gradient targets to produce from a model output. These can be any of the gradient properties referenced in mlcg.data._keys. At the moment only forces are implemented.
Example
To predict forces from an energy model, one would supply a model that predicts a scalar atom property (an energy) and specify the FORCE_KEY in the targets.
- class mlcg.nn.gradients.SumOut(models, targets=None)[source]
Property pooling wrapper for models
- Parameters:
models (
ModuleDict
) – Dictionary of predictors models keyed by their name attributetargets (
Optional
[List
[str
]]) – List of prediction targets that will be pooled
Example
To combine SchNet force predictions with prior interactions:
import torch from mlcg.nn import (StandardSchNet, HarmonicBonds, HarmonicAngles, GradientsOut, SumOut, CosineCutoff, GaussianBasis) from mlcg.data._keys import FORCE_KEY, ENERGY_KEY bond_terms = GradientsOut(HarmonicBonds(bond_stats), FORCE_KEY) angle_terms = GradientsOut(HarmonicAngles(angle_stats), FORCE_KEY) cutoff = CosineCutoff() rbf = GaussianBasis(cutoff) energy_network = StandardSchNet(cutoff, rbf, [128]) force_network = GradientsOut(energy_model, FORCE_KEY) models = torch.nn.ModuleDict{ "bonds": bond_terms, "angles": angle_terms, "SchNet": force_network } full_model = SumOut(models, targets=[ENERGY_KEY, FORCE_KEY])
Loss Functions
These classes define loss functions for model optimization, as well as generalized losses that combine several losses of different types.
- class mlcg.nn.losses.Loss(losses, weights=None)[source]
Generalized loss function class that can be used to combine more than one loss function
- Parameters:
losses (
List
[_Loss
]) – List of torch loss modulesweights (
Optional
[List
[float
]]) – List of corresponding weights for each loss in the losses list. By default, each loss is weighted equally by 1.0.
- class mlcg.nn.losses.ForceRMSE(force_kwd='forces', size_average=None, reduce=None, reduction='mean')[source]
Force root-mean-square error loss, as defined by:
\[L\left(f,\hat{f}\right) = \sqrt{ \frac{1}{Nd}\sum_{i}^{N} \left\Vert f_i - \hat{f}_i \right\Vert ^2 }\]where \(f\) are predicted forces, \(\hat{f}\) are reference forces, \(N\) is the number of examples/structures, and \(d\) is the real space dimensionality (eg, \(d=3\) for proteins)
- Parameters:
force_kwd (
str
) – string to specify the force key in an AtomicData instancesize_average (
Optional
[bool
]) – If True, the loss is normalized by the batch sizereduce (
Optional
[bool
]) – If True, the loss is reduced to a scalarreduction (
str
) –Specifies the method of reduction. See here for more options.
- class mlcg.nn.losses.ForceMSE(force_kwd='forces', size_average=None, reduce=None, reduction='mean')[source]
Force mean square error loss, as defined by:
\[L\left(f,\hat{f}\right) = \frac{1}{Nd}\sum_{i}^{N} \left\Vert f_i - \hat{f}_i \right\Vert ^2\]where \(f\) are predicted forces, \(\hat{f}\) are reference forces, \(N\) is the number of examples/structures, and \(d\) is the real space dimensionality (eg, \(d=3\) for proteins)
- Parameters:
force_kwd (
str
) – string to specify the force key in an AtomicData instancesize_average (
Optional
[bool
]) – If True, the loss is normalized by the batch sizereduce (
Optional
[bool
]) – If True, the loss is reduced to a scalarreduction (
str
) –Specifies the method of reduction. See here for more options.