Statistics Utilities¶

Warning

The code presented here is legacy which is not longer used. All the prior-related code has moved to the mlcg-tk repository.

These functions gather statistics for further analysis or for parametrizing prior models.

mlcg.geometry.statistics.compute_statistics(data, target, beta, TargetPrior=<class 'mlcg.nn.prior.harmonic.Harmonic'>, nbins=100, bmin=None, bmax=None, fit_from_values=False, target_fit_kwargs=None)[source]¶

Function for computing atom type-specific statistics for every combination of atom types present in a collated AtomicData structure.

Parameters:

data (AtomicData) – Input data, in the form of a collated list of individual AtomicData structures.
target (str) – The keyword specifiying with neighbor_list sub_dictionary should be used to gather statisitics
beta (float) –
Inverse thermodynamic temperature:

\[\beta = \frac{1}{k_B T}\]

where \(k_B\) is Boltzmann’s constant and \(T\) is the temperature.
TargetPrior (_Prior) – The class type of prior for which the statistics will be processed.
nbins (int) – The number of bins over which 1-D feature histograms are constructed in order to estimate distributions
bmin (Optional[float]) – If specified, the lower bound of bin edges. If not specified, the lower bound defaults to the lowest value in the input feature
bmax (Optional[float]) – If specified, the upper bound of bin edges. If not specified, the upper bound defaults to the greatest value in the input feature
fit_from_values (bool) – If True, the prior parameters are estimated directly from features values instead of their implied emperical potentials
target_fit_kwargs (Optional[Dict]) – Extra fit options that are prior_specific

Returns:

Dictionary of gathered statistics and estimated parameters based on the TargetPrior. The following key/value pairs are common across all TargetPrior choices:

(*specific_types) : {

    ...

    "p" : torch.Tensor of shape [n_bins], containing the normalized bin counts
        of the of the 1-D feature corresponding to the atom_type group
        (*specific_types) = (specific_types[0], specific_types[1], ...)
    "p_bin": : torch.Tensor of shape [n_bins] containing the bin center values
    "V" : torch.tensor of shape [n_bins], containing the emperically estimated
        free energy curve according to a direct Boltzmann inversion of the
        normalized probability distribution for the feature.
    "V_bin" : torch_tensor of shape [n_bins], containing the bin center values
}

where … indicates other sub-key/value pairs apart from those enumerated above, which may appear depending on the chosen TargetPrior. For example, if TargetPrior is HarmonicBonds, there will also be keys/values associated with estimated bond constants and means.

Return type:

Dict

Example

my_data = AtomicData(
    out={},
    pos=[769600, 3],
    atom_types=[769600],
    n_atoms=[20800],
    neighbor_list={
        bonds={
          tag=[20800],
          order=[20800],
          index_mapping=[2, 748800],
          cell_shifts=[20800],
          rcut=[20800],
          self_interaction=[20800]
        },
        angles={
          tag=[20800],
          order=[20800],
          index_mapping=[3, 977600],
          cell_shifts=[20800],
          rcut=[20800],
          self_interaction=[20800]
        }
    },
    batch=[769600],
    ptr=[20801]
)

angle_stats = bond_stats = compute_statistics(my_data,
     'bonds', beta=beta,
     TargetPrior=HarmonicBonds
)
dihedral_stats = compute_statistics(my_data,
                                    'dihedrals',
                                    beta=beta,
                                    TargetPrior=Dihedral
)

mlcg.geometry.statistics.fit_baseline_models(data, beta, priors_cls, nbins=100, bmin=None, bmax=None)[source]¶

Function for parametrizing a list of priors based on type-specific interactions contained in a collated AtomicData structure

Parameters:

data (AtomicData) – Input data, in the form of a collated list of individual AtomicData structures.
beta (float) –
Inverse thermodynamic temperature:

\[\]

beta = frac{1}{k_B T}

where \(k_B\) is Boltzmann’s constant and \(T\) is the temperature.
priors_cls (List[_Prior]) – List of priors to parametrize based on the input data
nbins (int) – The number of bins over which 1-D feature histograms are constructed in order to estimate distributions
bmin (Optional[float]) – If specified, the lower bound of bin edges. If not specified, the lower bound defaults to the lowest value in the input feature
bmax (Optional[float]) – If specified, the upper bound of bin edges. If not specified, the upper bound defaults to the greatest value in the input feature

Return type:

Tuple[ModuleDict, Dict]

Returns:

nn.Module – The list of parametrized priors
Dict – Corresponding statistsics for prior fits