Statistics Utilities¶
Warning
The code presented here is legacy which is not longer used. All the prior-related code has moved to the mlcg-tk repository.
These functions gather statistics for further analysis or for parametrizing prior models.
- mlcg.geometry.statistics.compute_statistics(data, target, beta, TargetPrior=<class 'mlcg.nn.prior.harmonic.Harmonic'>, nbins=100, bmin=None, bmax=None, fit_from_values=False, target_fit_kwargs=None)[source]¶
Function for computing atom type-specific statistics for every combination of atom types present in a collated AtomicData structure.
- Parameters:
data (
AtomicData) – Input data, in the form of a collated list of individual AtomicData structures.target (
str) – The keyword specifiying with neighbor_list sub_dictionary should be used to gather statisiticsbeta (
float) –Inverse thermodynamic temperature:
\[\beta = \frac{1}{k_B T}\]where \(k_B\) is Boltzmann’s constant and \(T\) is the temperature.
TargetPrior (
_Prior) – The class type of prior for which the statistics will be processed.nbins (
int) – The number of bins over which 1-D feature histograms are constructed in order to estimate distributionsbmin (
Optional[float]) – If specified, the lower bound of bin edges. If not specified, the lower bound defaults to the lowest value in the input featurebmax (
Optional[float]) – If specified, the upper bound of bin edges. If not specified, the upper bound defaults to the greatest value in the input featurefit_from_values (
bool) – If True, the prior parameters are estimated directly from features values instead of their implied emperical potentialstarget_fit_kwargs (
Optional[Dict]) – Extra fit options that are prior_specific
- Returns:
Dictionary of gathered statistics and estimated parameters based on the TargetPrior. The following key/value pairs are common across all TargetPrior choices:
(*specific_types) : { ... "p" : torch.Tensor of shape [n_bins], containing the normalized bin counts of the of the 1-D feature corresponding to the atom_type group (*specific_types) = (specific_types[0], specific_types[1], ...) "p_bin": : torch.Tensor of shape [n_bins] containing the bin center values "V" : torch.tensor of shape [n_bins], containing the emperically estimated free energy curve according to a direct Boltzmann inversion of the normalized probability distribution for the feature. "V_bin" : torch_tensor of shape [n_bins], containing the bin center values }
where … indicates other sub-key/value pairs apart from those enumerated above, which may appear depending on the chosen TargetPrior. For example, if TargetPrior is HarmonicBonds, there will also be keys/values associated with estimated bond constants and means.
- Return type:
Dict
Example
my_data = AtomicData( out={}, pos=[769600, 3], atom_types=[769600], n_atoms=[20800], neighbor_list={ bonds={ tag=[20800], order=[20800], index_mapping=[2, 748800], cell_shifts=[20800], rcut=[20800], self_interaction=[20800] }, angles={ tag=[20800], order=[20800], index_mapping=[3, 977600], cell_shifts=[20800], rcut=[20800], self_interaction=[20800] } }, batch=[769600], ptr=[20801] ) angle_stats = bond_stats = compute_statistics(my_data, 'bonds', beta=beta, TargetPrior=HarmonicBonds ) dihedral_stats = compute_statistics(my_data, 'dihedrals', beta=beta, TargetPrior=Dihedral )
- mlcg.geometry.statistics.fit_baseline_models(data, beta, priors_cls, nbins=100, bmin=None, bmax=None)[source]¶
Function for parametrizing a list of priors based on type-specific interactions contained in a collated AtomicData structure
- Parameters:
data (
AtomicData) – Input data, in the form of a collated list of individual AtomicData structures.beta (
float) –Inverse thermodynamic temperature:
\[\]beta = frac{1}{k_B T}
where \(k_B\) is Boltzmann’s constant and \(T\) is the temperature.
priors_cls (
List[_Prior]) – List of priors to parametrize based on the input datanbins (
int) – The number of bins over which 1-D feature histograms are constructed in order to estimate distributionsbmin (
Optional[float]) – If specified, the lower bound of bin edges. If not specified, the lower bound defaults to the lowest value in the input featurebmax (
Optional[float]) – If specified, the upper bound of bin edges. If not specified, the upper bound defaults to the greatest value in the input feature
- Return type:
Tuple[ModuleDict,Dict]- Returns:
nn.Module – The list of parametrized priors
Dict – Corresponding statistsics for prior fits