Simulations

mlcg provides some tools to run simulations with its models in the scripts folder and some example input files such as examples/langevin.yaml.

NVT ensemble

These classes define simulation integration schemes. After setting simulation options, attaching a model, and attaching initial configurations, they can additionally be used to run CG simulations on CPU or GPU.

class mlcg.simulation.LangevinSimulation(friction=0.001, **kwargs)[source]

Langevin simulatin class for trained models.

The following BAOAB integration scheme is used, where:

B = deterministic velocity update
A = deterministic position update
O = stochastic velocity update

We have implemented the following update so as to only calculate forces once per timestep:

\[\begin{split}[B]\;& V_{t+1/2} = V_t + \frac{\Delta t}{2m} F(X_t) \\ [A]\;& X_{t+1/2} = X_t + \frac{\Delta t}{2}V_{t+1/2} \\ [O]\;& \tilde{V}_{t+1/2} = \epsilon V_{t+1/2} + \alpha dW_t \\ [A]\;& X_{t+1} = X_{t+1/2} + \frac{\Delta t}{2} \tilde{V}_{t+1/2} \\ [B]\;& V_{t+1} = \tilde{V}_{t+1/2} + \frac{\Delta t}{2m} F(X_{t+1})\end{split}\]

Where, \(dW_t\) is a noise drawn from \(\mathcal{N}(0,1)\), \(\eta\) is the friction, \(\epsilon\) is the velocity scale, \(\alpha\) is the noise scale, and:

\[\begin{split}F(X_t) =& - \nabla U(X_t) \\ \epsilon =& \exp(-\eta \; \Delta t) \\ \alpha =& \sqrt{1 - \epsilon^2}\end{split}\]

A diffusion constant \(D\) can be back-calculated using the Einstein relation:

\[D = 1 / (\beta \eta)\]

Initial velocities are set to zero if not provided.

Parameters:
  • friction (float) – Friction value for Langevin scheme, in units of inverse time.

  • dt (float, default=5e-4) – The integration time step for Langevin dynamics.

  • save_forces (bool, default=False) – Whether to save forces at the same saved interval as the simulation coordinates

  • save_potential (bool, default=False) – Whether to save potential at the same saved interval as the simulation coordinates

  • create_checkpoints (bool, default=False) – Save the atomic data object so it can be reloaded in. Overwrites previous object.

  • read_checkpoint_file ([str,bool], default=None) – Whether to read in checkpoint file from. Can specify explicit file path or try to infer from self.filename

  • n_timesteps (int, default=100) – The length of the simulation in simulation timesteps

  • save_interval (int, default=10) – The interval at which simulation timesteps should be saved. Must be a factor of the simulation length

  • random_seed (int, default=None) – Seed for random number generator; if seeded, results always will be identical for the same random seed

  • device (str, default='cpu') – Device upon which simulation compuation will be carried out

  • dtype (str, default='single') – precision to run the simulation with (single or double)

  • export_interval (int, default=None) – Interval at which .npy files will be saved. If an int is given, then the int specifies at what intervals numpy files will be saved per observable. This number must be an integer multiple of save_interval. All output files should be the same shape. Forces and potentials will also be saved according to the save_forces and save_potential arguments, respectively. If friction is not None, kinetic energies will also be saved. This method is only implemented for a maximum of 1000 files per observable due to file naming conventions. If None, export_interval will be set to n_timesteps to output one file for the entire simulation

  • log_interval (int, default=None) – If not None, a log will be generated indicating simulation start and end times as well as completion updates at regular intervals. If an int is given, then the int specifies how many log statements will be output. This number must be a multiple of save_interval.

  • log_type (str, default='write') – Only relevant if log_interval is not None. If ‘print’, a log statement will be printed. If ‘write’, the log will be written to a .txt file.

  • filename (str, default=None) – Specifies the location to which numpys and/or log files are saved. Must be provided if export_interval is not None and/or if log_interval is not None and log_type is ‘write’. This provides the base file name; for numpy outputs, ‘_coords_000.npy’ or similar is added. For log outputs, ‘_log.txt’ is added.

  • specialize_priors (bool, default=False) – use optimized version of the priors for a particular molecule

  • dtype – precision to run the simulation with (single or double)

  • sim_subroutine – Optional subroutine to run at at the interval specified by subroutine_interval after simulation updates. The subroutine should take only the internal collated AtomicData instance as an argument.

  • sim_subroutine_interval – Specifies the interval, in simulation steps, between successive calls to the subroutine, if specified.

  • save_subroutine – Specifies additional saving procedures for extra information at the same interval as export_interval. The subroutine should take only the internal collated AtomicData and the current timestep // save_interval as arguments.

class mlcg.simulation.OverdampedSimulation(friction=1.0, **kwargs)[source]

Overdamped Langevin simulation class for trained models.

The following Brownian motion scheme is used:

\[dX_t = - \nabla U( X_t ) D \Delta t + \sqrt{( 2 D *\Delta t / \beta )} dW_t\]

for coordinates \(X_t\) at time \(t\), potential energy \(U\), diffusion \(D\), thermodynamic inverse temperature \(\beta\), time step \(\Delta t\), and stochastic Weiner process \(W\).

Parameters:
  • diffusion – The constant diffusion parameter \(D\). By default, the diffusion is set to unity and is absorbed into the \(\Delta t\) argument. However, users may specify separate diffusion and \(\Delta t\) parameters in the case that they have some estimate of the diffusion.

  • dt (float, default=5e-4) – The integration time step for Langevin dynamics.

  • save_forces (bool, default=False) – Whether to save forces at the same saved interval as the simulation coordinates

  • save_potential (bool, default=False) – Whether to save potential at the same saved interval as the simulation coordinates

  • create_checkpoints (bool, default=False) – Save the atomic data object so it can be reloaded in. Overwrites previous object.

  • read_checkpoint_file ([str,bool], default=None) – Whether to read in checkpoint file from. Can specify explicit file path or try to infer from self.filename

  • n_timesteps (int, default=100) – The length of the simulation in simulation timesteps

  • save_interval (int, default=10) – The interval at which simulation timesteps should be saved. Must be a factor of the simulation length

  • random_seed (int, default=None) – Seed for random number generator; if seeded, results always will be identical for the same random seed

  • device (str, default='cpu') – Device upon which simulation compuation will be carried out

  • dtype (str, default='single') – precision to run the simulation with (single or double)

  • export_interval (int, default=None) – Interval at which .npy files will be saved. If an int is given, then the int specifies at what intervals numpy files will be saved per observable. This number must be an integer multiple of save_interval. All output files should be the same shape. Forces and potentials will also be saved according to the save_forces and save_potential arguments, respectively. If friction is not None, kinetic energies will also be saved. This method is only implemented for a maximum of 1000 files per observable due to file naming conventions. If None, export_interval will be set to n_timesteps to output one file for the entire simulation

  • log_interval (int, default=None) – If not None, a log will be generated indicating simulation start and end times as well as completion updates at regular intervals. If an int is given, then the int specifies how many log statements will be output. This number must be a multiple of save_interval.

  • log_type (str, default='write') – Only relevant if log_interval is not None. If ‘print’, a log statement will be printed. If ‘write’, the log will be written to a .txt file.

  • filename (str, default=None) – Specifies the location to which numpys and/or log files are saved. Must be provided if export_interval is not None and/or if log_interval is not None and log_type is ‘write’. This provides the base file name; for numpy outputs, ‘_coords_000.npy’ or similar is added. For log outputs, ‘_log.txt’ is added.

  • specialize_priors (bool, default=False) – use optimized version of the priors for a particular molecule

  • dtype – precision to run the simulation with (single or double)

  • sim_subroutine – Optional subroutine to run at at the interval specified by subroutine_interval after simulation updates. The subroutine should take only the internal collated AtomicData instance as an argument.

  • sim_subroutine_interval – Specifies the interval, in simulation steps, between successive calls to the subroutine, if specified.

  • save_subroutine – Specifies additional saving procedures for extra information at the same interval as export_interval. The subroutine should take only the internal collated AtomicData and the current timestep // save_interval as arguments.

class mlcg.simulation.PTSimulation(friction=0.001, exchange_interval=100, **kwargs)[source]

Parallel tempering simulation using a Langevin update scheme. For theoretical details on replica exchange/parallel tempering, see https://github.com/noegroup/reform.

Briefly, a pair exchange is proposed, and the associated potential energies are used to compute a Metroplis-Hastings Boltzmann ratio. This ratio defines an acceptance threshold

aginst which approved exchanges are sampled according to a unit uniform distribution. For a pair of configurations \(A\) and \(B\), characterized by the respective potential energies \(U_A\) and \(U_B\) the the inverse thermodynamic temperatures \(\beta_A\) and \(\beta_B\), the acceptance rate for exchanging configurations is:

\[Acc = \exp{\left( (U_A - U_B) \times (\beta_A - \beta_B) \right)}\]

Pairs of candidate configurations undergo exchange if \(\rho \sim U(0,1) < Acc\). Note that the exchanged velocities for each configuration must further be rescaled according to the square root of their inverse beta ratios. See _perform_exchange for more details.

Currently we only implement parallel tempering for Langevin dynamics. Be aware that the output will contain information (e.g., coordinates) for all replicas.

In addition to the typical outputs of LangevinSimulation, PTSimulation also ouputs information about exchange acceptance between replicas at each export. This information takes the form of an acceptance_matrix, which has shape (n_betas, n_betas). The upper triangular portion of the matrix counts accepted exchanges between adjacent tempeartures, while the lower triangular portion of the matrix counts rejected exchanges between adjacent temperatures. For example, the entry [1,2] contains the number of accepted exchanges between replicas running at the temperatures associated with beta_1 and beta_2, while the entry [2,1] counts the number of rejected echanges at those two temperatures. The sum of such complementary entries across the matrix diagonal always sum to the total number of exhchanges proposed between each export interval.

Note: This implementation only allows for replica exchanges between directly adjecent temperatures implied by the user-supplied list of beta values.

Parameters:
  • friction (float) – Scalar friction to use for Langevin updates

  • exchange_interval (int) – Specifies the number of simulation steps to take before attempting replica exchange.

  • dt (float, default=5e-4) – The integration time step for Langevin dynamics.

  • save_forces (bool, default=False) – Whether to save forces at the same saved interval as the simulation coordinates

  • save_potential (bool, default=False) – Whether to save potential at the same saved interval as the simulation coordinates

  • create_checkpoints (bool, default=False) – Save the atomic data object so it can be reloaded in. Overwrites previous object.

  • read_checkpoint_file ([str,bool], default=None) – Whether to read in checkpoint file from. Can specify explicit file path or try to infer from self.filename

  • n_timesteps (int, default=100) – The length of the simulation in simulation timesteps

  • save_interval (int, default=10) – The interval at which simulation timesteps should be saved. Must be a factor of the simulation length

  • random_seed (int, default=None) – Seed for random number generator; if seeded, results always will be identical for the same random seed

  • device (str, default='cpu') – Device upon which simulation compuation will be carried out

  • dtype (str, default='single') – precision to run the simulation with (single or double)

  • export_interval (int, default=None) – Interval at which .npy files will be saved. If an int is given, then the int specifies at what intervals numpy files will be saved per observable. This number must be an integer multiple of save_interval. All output files should be the same shape. Forces and potentials will also be saved according to the save_forces and save_potential arguments, respectively. If friction is not None, kinetic energies will also be saved. This method is only implemented for a maximum of 1000 files per observable due to file naming conventions. If None, export_interval will be set to n_timesteps to output one file for the entire simulation

  • log_interval (int, default=None) – If not None, a log will be generated indicating simulation start and end times as well as completion updates at regular intervals. If an int is given, then the int specifies how many log statements will be output. This number must be a multiple of save_interval.

  • log_type (str, default='write') – Only relevant if log_interval is not None. If ‘print’, a log statement will be printed. If ‘write’, the log will be written to a .txt file.

  • filename (str, default=None) – Specifies the location to which numpys and/or log files are saved. Must be provided if export_interval is not None and/or if log_interval is not None and log_type is ‘write’. This provides the base file name; for numpy outputs, ‘_coords_000.npy’ or similar is added. For log outputs, ‘_log.txt’ is added.

  • specialize_priors (bool, default=False) – use optimized version of the priors for a particular molecule

  • dtype – precision to run the simulation with (single or double)

  • sim_subroutine – Optional subroutine to run at at the interval specified by subroutine_interval after simulation updates. The subroutine should take only the internal collated AtomicData instance as an argument.

  • sim_subroutine_interval – Specifies the interval, in simulation steps, between successive calls to the subroutine, if specified.

  • save_subroutine – Specifies additional saving procedures for extra information at the same interval as export_interval. The subroutine should take only the internal collated AtomicData and the current timestep // save_interval as arguments.

Utilities