Models

mlcg.nn currently implements the SchNet graph neural network, as well as several utility classes for computing distance expansions and cutoffs. The nn subpackage also contains several useful classes for extracting other properties from energy predictions or aggregating the predictions from several different model types.

SchNet Utilities

These classes are used to define a SchNet graph neural network. For “typical” SchNet models, users may find the class StandardSchNet to be helpful in getting started quickly.

class mlcg.nn.StandardSchNet(rbf_layer, cutoff, output_hidden_layer_widths, hidden_channels=128, embedding_size=100, num_filters=128, num_interactions=3, activation=Tanh(), max_num_neighbors=1000, aggr='add')[source]

Small wrapper class for SchNet to simplify the definition of the SchNet model through an input file. The upper distance cutoff attribute in is set by default to match the upper cutoff value in the cutoff function.

Parameters:
  • rbf_layer (Module) – radial basis function used to project the distances \(r_{ij}\).

  • cutoff (Module) – smooth cutoff function to supply to the CFConv

  • output_hidden_layer_widths (List[int]) – List giving the number of hidden nodes of each hidden layer of the MLP used to predict the target property from the learned representation.

  • hidden_channels (int) – dimension of the learned representation, i.e. dimension of the embeding projection, convolution layers, and interaction block.

  • embedding_size (int) – dimension of the input embeddings (should be larger than AtomicData.atom_types.max()+1).

  • num_filters (int) – number of nodes of the networks used to filter the projected distances

  • num_interactions (int) – number of interaction blocks

  • activation (Module) – activation function

  • max_num_neighbors (int) – The maximum number of neighbors to return for each atom in data. If the number of actual neighbors is greater than max_num_neighbors, returned neighbors are picked randomly.

  • aggr (str) – Aggregation scheme for continuous filter output. For all options, see here for more options.

class mlcg.nn.schnet.SchNet(embedding_layer, interaction_blocks, rbf_layer, output_network, self_interaction=False, max_num_neighbors=1000)[source]

PyTorch Geometric implementation of SchNet Code adapted from [PT_geom_schnet] which is based on the architecture described in [Schnet] .

Parameters:
  • embedding_layer (Module) – Initial embedding layer that transforms atoms/coarse grain bead types into embedded features

  • interaction_blocks (list of torch.nn.Module or torch.nn.Sequential) – Sequential interaction blocks of the model, where each interaction block applies

  • rbf_layer (Module) – The set of radial basis functions that expands pairwise distances between atoms/CG beads.

  • output_network (Module) – Output neural network that predicts scalar energies from SchNet features. This network should transform (num_examples * num_atoms, hidden_channels) to (num_examples * num atoms, 1).

  • upper_distance_cutoff – Upper distance cutoff used for making neighbor lists.

  • self_interaction (bool) – If True, self interactions/distancess are calculated. But it never had a function due to a bug in the implementation (see static method neighbor_list). Should be kept False. This option shall not be deleted for compatibility.

  • max_num_neighbors (int) – Maximum number of neighbors to return for a given node/atom when constructing the molecular graph during forward passes. This attribute is passed to the torch_cluster radius_graph routine keyword max_num_neighbors, which normally defaults to 32. Users should set this to higher values if they are using higher upper distance cutoffs and expect more than 32 neighbors per node/atom.

class mlcg.nn.schnet.InteractionBlock(cfconv_layer, hidden_channels=128, activation=Tanh())[source]

Interaction blocks for SchNet. Consists of atomwise transformations of embedded features that are continuously convolved with filters generated from radial basis function-expanded pairwise distances.

Parameters:
  • cfconv_layer (Module) – Continuous filter convolution layer for convolutions of radial basis function-expanded distances with embedded features

  • hidden_channels (int) – Hidden dimension of embedded features

  • activation (Module) – Activation function applied to linear layer outputs

class mlcg.nn.schnet.CFConv(filter_network, cutoff, in_channels=128, out_channels=128, num_filters=128, aggr='add')[source]

Continuous filter convolutions for SchNet.

Parameters:
  • filter_net – Neural network for generating filters from expanded pairwise distances

  • cutoff (Module) – Cutoff envelope to apply to the output of the filter generating network.

  • in_channels (int) – Hidden input dimensions

  • out_channels (int) – Hidden output dimensions

  • num_filters (int) – Number of filters

  • aggr (str) – Aggregation scheme for continuous filter output. For all options, see here <https://pytorch-geometric.readthedocs.io/en/latest/notes/create_gnn.html?highlight=MessagePassing#the-messagepassing-base-class>.

MACE Utilities

mlcg provides wrapper utitilies for [MACE] models, the base implementation of which is available here in the develop branch.

Radial Basis Functions

Sets of radial basis functions are used to expand the distances (or other molecular features) between atoms on a fixed-sized vector. For instance, this is the main transformation of the distances in the SchNet model.

class mlcg.nn.radial_basis.GaussianBasis(cutoff, num_rbf=50, trainable=False)[source]

Class that generates a set of equidistant 1-D gaussian basis functions scattered between a specified lower and upper cutoff:

\[f_n = \exp{ \left( -\gamma(r-c_n)^2 \right) }\]
Parameters:
  • cutoff (Union[int, float, _Cutoff]) – Defines the smooth cutoff function. If a float is provided, it will be interpreted as an upper cutoff and an IdentityCutoff will be used between 0 and the provided float. Otherwise, a chosen _Cutoff instance can be supplied.

  • num_rbf (int) – The number of gaussian functions in the basis set.

  • trainable (bool) – If True, the parameters of the gaussian basis (the centers and widths of each function) are registered as optimizable parameters that will be updated during backpropagation. If False, these parameters will be instead fixed in an unoptimizable buffer.

class mlcg.nn.radial_basis.ExpNormalBasis(cutoff, num_rbf=50, trainable=True)[source]

Class for generating a set of exponential normal radial basis functions, as described in [Physnet] . The functions have the following form:

\[f_n(r_{ij};\alpha, r_{low},r_{high}) = f_{cut}(r_{ij},r_{low},r_{high}) \times \exp\left[-\beta_n \left(e^{\alpha (r_{ij} -r_{high}) } - \mu_n \right)^2\right]\]

where

\[\alpha = 5.0/(r_{high} - r_{low})\]

is a distance rescaling factor, and, by default

\[f_{cut} ( r_{ij},r_{low},r_{high} ) = \cos{\left( r_{ij} \times \pi / r_{high}\right)} + 1.0\]

represents a cosine cutoff function (though users can specify their own cutoff function if they desire). :type cutoff: Union[int, float, _Cutoff] :param cutoff: Defines the smooth cutoff function. If a float is provided, it will be interpreted as

an upper cutoff and a CosineCutoff will be used between 0 and the provided float. Otherwise, a chosen _Cutoff instance can be supplied.

Parameters:
  • num_rbf (int) – The number of functions in the basis set.

  • trainable (bool) – If True, the parameters of the basis (the centers and widths of each function) are registered as optimizable parameters that will be updated during backpropagation. If False, these parameters will be instead fixed in an unoptimizable buffer.

class mlcg.nn.radial_basis.RIGTOBasis(cutoff, nmax=5, lmax=5, sigma=0.4, mesh_size=300)[source]

This radial basis is the effective basis when expanding an atomic density smeared by Gaussains of width \(\sigma\) on a set of \(n_{max}\) orthonormal Gaussian Type Orbitals (GTOs) and \(l_{max}+1\) Spherical Harmonics (SPHs) namely. This radial basis set is interpolated using natural cubic splines for efficiency and the cutoff is included into the splined functions.

The basis is defined as

\[R_{nl}(r) = f_c(r) \mathcal{N}_n \frac{\Gamma(\frac{n+l+3}{2})}{\Gamma(l+\frac{3}{2})} c^l r^l(c+b_n)^{-\frac{(n+l+3)}{2}} {}_1F_1\left(\frac{n+l+3}{2},l+\frac{3}{2};\frac{c^2 r^2}{c+b_n}\right),\]

where \({}_1F_1\) is the confluent hypergeometric function, \(\Gamma\) is the gamma function, \(f_c\) is a cutoff function, \(b_n=\frac{1}{2\sigma_n^2}\), \(c= 1 / (2\sigma^2\), \(\sigma_n = r_\text{cut} \max(\sqrt{n},1)/n_{max}\) and \(\mathcal{N}_n^2 = \frac{2}{\sigma_n^{2n + 3}\Gamma(n + 3/2)}\).

For more details on the derivation, refer to appendix A.

Parameters:
  • nmax (int) – number of radial basis

  • lmax (int) – maximum spherical order (lmax included so there are lmax+1 orders)

  • sigma (float) – smearing of the atomic density

  • cutoff (Union[int, float, _Cutoff]) – Defines the smooth cutoff function. If a float is provided, it will be interpreted as an upper cutoff and a CosineCutoff will be used between 0 and the provided float. Otherwise, a chosen _Cutoff instance can be supplied.

  • mesh_size (int) – number of points used to interpolate with splines the radial basis spanning uniformly the range difined by the cutoff \([0, r_c]\).

class mlcg.nn.radial_basis.SpacedExpBasis(cutoff, sigma_min=0.25, sigma_factor=2.0, mean_spacing=2.0, trainable=True)[source]

Class for generating a set of exponential normal radial basis functions, with means and standard deviations designed to capture the physics around 2-* A. The functions have an exponential form with the following means and std:

\[ \begin{align}\begin{aligned}\sigma_0 = \sigma_f/s \sigma_1 = \sigma_${min} \sigma_2 = \sigma_f*\sigma_1 ... \sigma_n = \sigma_f*\sigma_${n-1}\\\mu_0 = 0 \mu_1 = \sigma_f \mu_2 = \mu_1 + s*\sigma_1 ... \mu_n = \mu_${n-1} + s*\sigma_${n-1}\end{aligned}\end{align} \]
Parameters:
  • cutoff (Union[int, float, _Cutoff]) – Defines the smooth cutoff function. If a float is provided, it will be interpreted as an IdentityCutoff ranging over [0, cutoff]. Otherwise, a chosen _Cutoff instance can be supplied.

  • sigma_min (float) – Width of first non-zero-centered basis function

  • sigma_factor (float) – Location of first non-zero-centered basis function and multiplicative factor to spread std of each new peak by

  • mean_spacing (float) – this time previous sigma indicates how much to distance the mean of subsequent gaussian by

  • trainable (bool) – If True, the parameters of the basis (the centers and widths of each function) are registered as optimizable parameters that will be updated during backpropagation. If False, these parameters will be instead fixed in an unoptimizable buffer.

class mlcg.nn.angular_basis.SphericalHarmonics(lmax, normalize=False, normalization='integral', irreps_in=None)[source]

JITable module version of real Spherical harmonics

https://user-images.githubusercontent.com/333780/79220728-dbe82c00-7e54-11ea-82c7-b3acbd9b2246.gif
Polynomials defined on the 3d space \(Y^l: \mathbb{R}^3 \longrightarrow \mathbb{R}^{2l+1}\)
Usually restricted on the sphere (with normalize=True) \(Y^l: S^2 \longrightarrow \mathbb{R}^{2l+1}\)
who satisfies the following properties:
  • are polynomials of the cartesian coordinates x, y, z

  • is equivariant \(Y^l(R x) = D^l(R) Y^l(x)\)

  • are orthogonal \(\int_{S^2} Y^l_m(x) Y^j_n(x) dx = \text{cste} \; \delta_{lj} \delta_{mn}\)

The value of the constant depends on the choice of normalization.

It obeys the following property:

\[ \begin{align}\begin{aligned}Y^{l+1}_i(x) &= \text{cste}(l) \; & C_{ijk} Y^l_j(x) x_k\\\partial_k Y^{l+1}_i(x) &= \text{cste}(l) \; (l+1) & C_{ijk} Y^l_j(x)\end{aligned}\end{align} \]

Where \(C\) are the wigner_3j.

Note

This function matches with the following table of standard real spherical harmonics from Wikipedia when normalize=True, normalization='integral' and is called with the argument in the order y,z,x (instead of x,y,z).

Parameters:
  • l (int or list of int) – degree of the spherical harmonics.

  • x (torch.Tensor) – tensor \(x\) of shape (..., 3).

  • normalize (bool) – whether to normalize the x to vectors that lie on the unit sphere before projecting onto the spherical harmonics

  • normalization ({'integral', 'component', 'norm'}) – normalization of the output tensors — note that this option is independent of normalize, which controls the processing of the input, rather than the output. Valid options: * component: \(\|Y^l(x)\|^2 = 2l+1, x \in S^2\) * norm: \(\|Y^l(x)\| = 1, x \in S^2\), component / sqrt(2l+1) * integral: \(\int_{S^2} Y^l_m(x)^2 dx = 1\), component / sqrt(4pi)

Returns:

a list of tensors of shapes [lmax][n_neighbor, m] where \(m<=l\)

\[Y^l(x)\]

Return type:

List[torch.Tensor]

Cutoff Functions

Cutoff functions are used to enforce the smoothness of the models w.r.t. neighbor insertion/removal from an atomic environment. Some are also used to damp the signal from a neighbor’s displacement that is “far” from the central atom, e.g. CosineCutoff. Cutoff functions are also used in the construction of radial basis functions.

class mlcg.nn.cutoff.IdentityCutoff(cutoff_lower=0, cutoff_upper=inf)[source]

Cutoff function that is one everywhere, but retains cutoff_lower and cutoff_upper attributes

Parameters:
  • cutoff_lower (float) – left bound for the radial cutoff distance

  • cutoff_upper (float) – right bound for the radial cutoff distance

class mlcg.nn.cutoff.CosineCutoff(cutoff_lower=0.0, cutoff_upper=5.0)[source]

Class implementing a cutoff envelope based a cosine signal in the interval [lower_cutoff, upper_cutoff]:

\[\cos{\left( r_{ij} \times \pi / r_{high}\right)} + 1.0\]

NOTE: The behavior of the cutoff is qualitatively different for lower cutoff values greater than zero when compared to the zero lower cutoff default. We recommend visualizing your basis to see if it makes physical sense.

\[0.5 \cos{ \left[ \pi \left(2 \frac{r_{ij} - r_{low}}{r_{high} - r_{low}} + 1.0 \right)\right]} + 0.5\]
class mlcg.nn.cutoff.ShiftedCosineCutoff(cutoff=5.0, smooth_width=0.5)[source]

Class of Behler cosine cutoff with an additional smoothing parameter.

\[0.5 + 0.5 \cos{ \left[ \pi \left( \frac{r_{ij} - r_{high} + \sigma}{\sigma}\right)\right]}\]

where \(\sigma\) is the smoothing width.

Parameters:
  • cutoff (Union[int, float]) – cutoff radius

  • smooth_width (Union[int, float]) – parameter that controls the extent of smoothing in the cutoff envelope.

Model Building Utilities

These classes are used to build more complicated models.

class mlcg.nn.gradients.GradientsOut(model, targets='forces')[source]

Gradient wrapper for models.

Parameters:

targets (str) – The gradient targets to produce from a model output. These can be any of the gradient properties referenced in mlcg.data._keys. At the moment only forces are implemented.

Example

To predict forces from an energy model, one would supply a model that predicts a scalar atom property (an energy) and specify the FORCE_KEY in the targets.

class mlcg.nn.gradients.SumOut(models, targets=None)[source]

Property pooling wrapper for models

Parameters:
  • models (ModuleDict) – Dictionary of predictors models keyed by their name attribute

  • targets (Optional[List[str]]) – List of prediction targets that will be pooled

Example

To combine SchNet force predictions with prior interactions:

import torch
from mlcg.nn import (StandardSchNet, HarmonicBonds, HarmonicAngles,
                     GradientsOut, SumOut, CosineCutoff,
                     GaussianBasis)
from mlcg.data._keys import FORCE_KEY, ENERGY_KEY

bond_terms = GradientsOut(HarmonicBonds(bond_stats), FORCE_KEY)
angle_terms = GradientsOut(HarmonicAngles(angle_stats), FORCE_KEY)
cutoff = CosineCutoff()
rbf = GaussianBasis(cutoff)
energy_network = StandardSchNet(cutoff, rbf, [128])
force_network = GradientsOut(energy_model, FORCE_KEY)

models = torch.nn.ModuleDict{
             "bonds": bond_terms,
             "angles": angle_terms,
             "SchNet": force_network
         }
full_model = SumOut(models, targets=[ENERGY_KEY, FORCE_KEY])

Loss Functions

These classes define loss functions for model optimization, as well as generalized losses that combine several losses of different types.

class mlcg.nn.losses.Loss(losses, weights=None)[source]

Generalized loss function class that can be used to combine more than one loss function

Parameters:
  • losses (List[_Loss]) – List of torch loss modules

  • weights (Optional[List[float]]) – List of corresponding weights for each loss in the losses list. By default, each loss is weighted equally by 1.0.

class mlcg.nn.losses.ForceRMSE(force_kwd='forces', size_average=None, reduce=None, reduction='mean')[source]

Force root-mean-square error loss, as defined by:

\[L\left(f,\hat{f}\right) = \sqrt{ \frac{1}{Nd}\sum_{i}^{N} \left\Vert f_i - \hat{f}_i \right\Vert ^2 }\]

where \(f\) are predicted forces, \(\hat{f}\) are reference forces, \(N\) is the number of examples/structures, and \(d\) is the real space dimensionality (eg, \(d=3\) for proteins)

Parameters:
  • force_kwd (str) – string to specify the force key in an AtomicData instance

  • size_average (Optional[bool]) – If True, the loss is normalized by the batch size

  • reduce (Optional[bool]) – If True, the loss is reduced to a scalar

  • reduction (str) –

    Specifies the method of reduction. See here for more options.

class mlcg.nn.losses.ForceMSE(force_kwd='forces', size_average=None, reduce=None, reduction='mean')[source]

Force mean square error loss, as defined by:

\[L\left(f,\hat{f}\right) = \frac{1}{Nd}\sum_{i}^{N} \left\Vert f_i - \hat{f}_i \right\Vert ^2\]

where \(f\) are predicted forces, \(\hat{f}\) are reference forces, \(N\) is the number of examples/structures, and \(d\) is the real space dimensionality (eg, \(d=3\) for proteins)

Parameters:
  • force_kwd (str) – string to specify the force key in an AtomicData instance

  • size_average (Optional[bool]) – If True, the loss is normalized by the batch size

  • reduce (Optional[bool]) – If True, the loss is reduced to a scalar

  • reduction (str) –

    Specifies the method of reduction. See here for more options.