Simulations

mlcg provides some tools to run simulations with its models in the scripts folder and some example input files such as examples/langevin.yaml.

NVT ensemble

These classes define simulation integration schemes. After setting simulation options, attaching a model, and attaching initial configurations, they can additionally be used to run CG simulations on CPU or GPU.

class mlcg.simulation.LangevinSimulation(friction=0.001, **kwargs)[source]

Langevin simulation class for trained models.

The following [BAOAB] integration scheme is used, where:

B = deterministic velocity update
A = deterministic position update
O = stochastic velocity update

We have implemented the following update so as to only calculate forces once per timestep:

\[\begin{split}[B]\;& V_{t+1/2} = V_t + \frac{\Delta t}{2m} F(X_t) \\ [A]\;& X_{t+1/2} = X_t + \frac{\Delta t}{2}V_{t+1/2} \\ [O]\;& \tilde{V}_{t+1/2} = \epsilon V_{t+1/2} + \alpha dW_t \\ [A]\;& X_{t+1} = X_{t+1/2} + \frac{\Delta t}{2} \tilde{V}_{t+1/2} \\ [B]\;& V_{t+1} = \tilde{V}_{t+1/2} + \frac{\Delta t}{2m} F(X_{t+1})\end{split}\]

Where, \(dW_t\) is a noise drawn from \(\mathcal{N}(0,1)\), \(\eta\) is the friction, \(\epsilon\) is the velocity scale, \(\alpha\) is the noise scale, and:

\[\begin{split}F(X_t) =& - \nabla U(X_t) \\ & \\ \epsilon =& \exp(-\eta \; \Delta t) \\ & \\ \alpha =& \sqrt{(1 - \epsilon^2) / \beta m}\end{split}\]

Initial velocities are sampled from the Maxwell-Boltzmann distribution for the provided beta.

A diffusion constant \(D\) can be back-calculated using the Einstein relation:

\[D = 1 / (\beta \eta)\]

For a larger discussion on Langevin Dynamics integrators, please refer to [MDBook].

Parameters:

friction (float) – Friction value for Langevin scheme, in units of inverse time.
dt (float, default=5e-4) –
The integration time step for the dynamics. The simulation code is unit agnostic and will use the same distance and energy units that the network and initial configurations. However, for the two other parameters in a simulation, time and mass, only one of them is unit-agnostic: if you provide the time thinking in a particular unit, the unit of mass is defined, and viceversa. Be aware that only either mass or time can be provided in custom units.

The following relation must be satisfied by the units to ensure consistency:

\[[\text{Energy}] = [\text{Mass}]\frac{[\text{Length}]^2}{[\text{Time}]^2}\]

For example, if you are using kcal/mol for the energy, Angstrom from the distance and time in picoseconds, the masses will need to be provided in units of AMU/418.39.
save_forces (bool, default=False) – Whether to save forces at the same saved interval as the simulation coordinates
save_energies (bool, default=False) – Whether to save potential at the same saved interval as the simulation coordinates
create_checkpoints (bool, default=False) – Save the atomic data object so it can be reloaded in. Overwrites previous object.
read_checkpoint_file ([str,bool], default=None) – Whether to read in checkpoint file from. Can specify explicit file path or try to infer from self.filename
n_timesteps (int, default=100) – The length of the simulation in simulation timesteps
save_interval (int, default=10) – The interval at which simulation timesteps should be saved. Must be a factor of the simulation length
random_seed (int, default=None) – Seed for random number generator; if seeded, results always will be identical for the same random seed
device (str, default='cpu') – Device upon which simulation computation will be carried out
dtype (str, default='single') – precision to run the simulation with (single or double)
export_interval (int, default=None) – Interval at which .npy files will be saved. If an int is given, then the int specifies at what intervals numpy files will be saved per observable. This number must be an integer multiple of save_interval. All output files should be the same shape. Forces and potentials will also be saved according to the save_forces and save_energies arguments, respectively. If friction is not None, kinetic energies will also be saved. This method is only implemented for a maximum of 1000 files per observable due to file naming conventions. If None, export_interval will be set to n_timesteps to output one file for the entire simulation
log_interval (int, default=None) – If not None, a log will be generated indicating simulation start and end times as well as completion updates at regular intervals. If an int is given, then the int specifies how many log statements will be output. This number must be a multiple of save_interval.
log_type (str, default='write') – Only relevant if log_interval is not None. If ‘print’, a log statement will be printed. If ‘write’, the log will be written to a .txt file.
filename (str, default=None) – Specifies the location to which numpys and/or log files are saved. Must be provided if export_interval is not None and/or if log_interval is not None and log_type is ‘write’. This provides the base file name; for numpy outputs, ‘_coords_000.npy’ or similar is added. For log outputs, ‘_log.txt’ is added.
specialize_priors (bool, default=False) – use optimized version of the priors for a particular molecule
dtype – precision to run the simulation with (single or double)
sim_subroutine – Optional subroutine to run at at the interval specified by subroutine_interval after simulation updates. The subroutine should take only the internal collated AtomicData instance as an argument.
sim_subroutine_interval – Specifies the interval, in simulation steps, between successive calls to the subroutine, if specified.
save_subroutine – Specifies additional saving procedures for extra information at the same interval as export_interval. The subroutine should take only the internal collated AtomicData and the current timestep // save_interval as arguments.

class mlcg.simulation.OverdampedSimulation(friction=1.0, **kwargs)[source]

Overdamped Langevin simulation class for trained models.

The following Brownian motion scheme is used:

\[dX_t = - \nabla U( X_t ) D \Delta t + \sqrt{( 2 D \Delta t / \beta )} dW_t\]

for coordinates \(X_t\) at time \(t\), potential energy \(U\), diffusion \(D\), thermodynamic inverse temperature \(\beta\), time step \(\Delta t\), and stochastic Weiner process \(W\).

Due to the nature of Overdamped Langevin dynamics, the masses and velocities are not used.

Parameters:

diffusion – The constant diffusion parameter \(D\). By default, the diffusion is set to unity and is absorbed into the \(\Delta t\) argument. However, users may specify separate diffusion and \(\Delta t\) parameters in the case that they have some estimate of the diffusion.
dt (float, default=5e-4) –
The integration time step for the dynamics. The simulation code is unit agnostic and will use the same distance and energy units that the network and initial configurations. However, for the two other parameters in a simulation, time and mass, only one of them is unit-agnostic: if you provide the time thinking in a particular unit, the unit of mass is defined, and viceversa. Be aware that only either mass or time can be provided in custom units.

The following relation must be satisfied by the units to ensure consistency:

\[[\text{Energy}] = [\text{Mass}]\frac{[\text{Length}]^2}{[\text{Time}]^2}\]

For example, if you are using kcal/mol for the energy, Angstrom from the distance and time in picoseconds, the masses will need to be provided in units of AMU/418.39.
save_forces (bool, default=False) – Whether to save forces at the same saved interval as the simulation coordinates
save_energies (bool, default=False) – Whether to save potential at the same saved interval as the simulation coordinates
create_checkpoints (bool, default=False) – Save the atomic data object so it can be reloaded in. Overwrites previous object.
read_checkpoint_file ([str,bool], default=None) – Whether to read in checkpoint file from. Can specify explicit file path or try to infer from self.filename
n_timesteps (int, default=100) – The length of the simulation in simulation timesteps
save_interval (int, default=10) – The interval at which simulation timesteps should be saved. Must be a factor of the simulation length
random_seed (int, default=None) – Seed for random number generator; if seeded, results always will be identical for the same random seed
device (str, default='cpu') – Device upon which simulation computation will be carried out
dtype (str, default='single') – precision to run the simulation with (single or double)
export_interval (int, default=None) – Interval at which .npy files will be saved. If an int is given, then the int specifies at what intervals numpy files will be saved per observable. This number must be an integer multiple of save_interval. All output files should be the same shape. Forces and potentials will also be saved according to the save_forces and save_energies arguments, respectively. If friction is not None, kinetic energies will also be saved. This method is only implemented for a maximum of 1000 files per observable due to file naming conventions. If None, export_interval will be set to n_timesteps to output one file for the entire simulation
log_interval (int, default=None) – If not None, a log will be generated indicating simulation start and end times as well as completion updates at regular intervals. If an int is given, then the int specifies how many log statements will be output. This number must be a multiple of save_interval.
log_type (str, default='write') – Only relevant if log_interval is not None. If ‘print’, a log statement will be printed. If ‘write’, the log will be written to a .txt file.
filename (str, default=None) – Specifies the location to which numpys and/or log files are saved. Must be provided if export_interval is not None and/or if log_interval is not None and log_type is ‘write’. This provides the base file name; for numpy outputs, ‘_coords_000.npy’ or similar is added. For log outputs, ‘_log.txt’ is added.
specialize_priors (bool, default=False) – use optimized version of the priors for a particular molecule
dtype – precision to run the simulation with (single or double)
sim_subroutine – Optional subroutine to run at at the interval specified by subroutine_interval after simulation updates. The subroutine should take only the internal collated AtomicData instance as an argument.
sim_subroutine_interval – Specifies the interval, in simulation steps, between successive calls to the subroutine, if specified.
save_subroutine – Specifies additional saving procedures for extra information at the same interval as export_interval. The subroutine should take only the internal collated AtomicData and the current timestep // save_interval as arguments.

class mlcg.simulation.PTSimulation(friction=0.001, exchange_interval=100, **kwargs)[source]

Parallel tempering simulation using a Langevin update scheme. For theoretical details on replica exchange/parallel tempering, see https://github.com/noegroup/reform.

Briefly, a pair exchange is proposed, and the associated potential energies are used to compute a Metroplis-Hastings Boltzmann ratio. This ratio defines an acceptance threshold

aginst which approved exchanges are sampled according to a unit uniform distribution. For a pair of configurations \(A\) and \(B\), characterized by the respective potential energies \(U_A\) and \(U_B\) the the inverse thermodynamic temperatures \(\beta_A\) and \(\beta_B\), the acceptance rate for exchanging configurations is:

\[Acc = \exp{\left( (U_A - U_B) \times (\beta_A - \beta_B) \right)}\]

Pairs of candidate configurations undergo exchange if \(\rho \sim U(0,1) < Acc\). Note that the exchanged velocities for each configuration must further be rescaled according to the square root of their inverse beta ratios. See _perform_exchange for more details.

Currently we only implement parallel tempering for Langevin dynamics. Be aware that the output will contain information (e.g., coordinates) for all replicas.

In addition to the typical outputs of LangevinSimulation, PTSimulation also ouputs information about exchange acceptance between replicas at each export. This information takes the form of an acceptance_matrix, which has shape (n_betas, n_betas). The upper triangular portion of the matrix counts accepted exchanges between adjacent tempeartures, while the lower triangular portion of the matrix counts rejected exchanges between adjacent temperatures. For example, the entry [1,2] contains the number of accepted exchanges between replicas running at the temperatures associated with beta_1 and beta_2, while the entry [2,1] counts the number of rejected echanges at those two temperatures. The sum of such complementary entries across the matrix diagonal always sum to the total number of exhchanges proposed between each export interval.

Note: This implementation only allows for replica exchanges between directly adjecent temperatures implied by the user-supplied list of beta values.

Parameters:

friction (float) – Scalar friction to use for Langevin updates
exchange_interval (int) – Specifies the number of simulation steps to take before attempting replica exchange.
dt (float, default=5e-4) –
The integration time step for the dynamics. The simulation code is unit agnostic and will use the same distance and energy units that the network and initial configurations. However, for the two other parameters in a simulation, time and mass, only one of them is unit-agnostic: if you provide the time thinking in a particular unit, the unit of mass is defined, and viceversa. Be aware that only either mass or time can be provided in custom units.

The following relation must be satisfied by the units to ensure consistency:

\[[\text{Energy}] = [\text{Mass}]\frac{[\text{Length}]^2}{[\text{Time}]^2}\]

For example, if you are using kcal/mol for the energy, Angstrom from the distance and time in picoseconds, the masses will need to be provided in units of AMU/418.39.
save_forces (bool, default=False) – Whether to save forces at the same saved interval as the simulation coordinates
save_energies (bool, default=False) – Whether to save potential at the same saved interval as the simulation coordinates
create_checkpoints (bool, default=False) – Save the atomic data object so it can be reloaded in. Overwrites previous object.
read_checkpoint_file ([str,bool], default=None) – Whether to read in checkpoint file from. Can specify explicit file path or try to infer from self.filename
n_timesteps (int, default=100) – The length of the simulation in simulation timesteps
save_interval (int, default=10) – The interval at which simulation timesteps should be saved. Must be a factor of the simulation length
random_seed (int, default=None) – Seed for random number generator; if seeded, results always will be identical for the same random seed
device (str, default='cpu') – Device upon which simulation computation will be carried out
dtype (str, default='single') – precision to run the simulation with (single or double)
export_interval (int, default=None) – Interval at which .npy files will be saved. If an int is given, then the int specifies at what intervals numpy files will be saved per observable. This number must be an integer multiple of save_interval. All output files should be the same shape. Forces and potentials will also be saved according to the save_forces and save_energies arguments, respectively. If friction is not None, kinetic energies will also be saved. This method is only implemented for a maximum of 1000 files per observable due to file naming conventions. If None, export_interval will be set to n_timesteps to output one file for the entire simulation
log_interval (int, default=None) – If not None, a log will be generated indicating simulation start and end times as well as completion updates at regular intervals. If an int is given, then the int specifies how many log statements will be output. This number must be a multiple of save_interval.
log_type (str, default='write') – Only relevant if log_interval is not None. If ‘print’, a log statement will be printed. If ‘write’, the log will be written to a .txt file.
filename (str, default=None) – Specifies the location to which numpys and/or log files are saved. Must be provided if export_interval is not None and/or if log_interval is not None and log_type is ‘write’. This provides the base file name; for numpy outputs, ‘_coords_000.npy’ or similar is added. For log outputs, ‘_log.txt’ is added.
specialize_priors (bool, default=False) – use optimized version of the priors for a particular molecule
dtype – precision to run the simulation with (single or double)
sim_subroutine – Optional subroutine to run at at the interval specified by subroutine_interval after simulation updates. The subroutine should take only the internal collated AtomicData instance as an argument.
sim_subroutine_interval – Specifies the interval, in simulation steps, between successive calls to the subroutine, if specified.
save_subroutine – Specifies additional saving procedures for extra information at the same interval as export_interval. The subroutine should take only the internal collated AtomicData and the current timestep // save_interval as arguments.

Simulations

NVT ensemble

Utilities