Metrics

Abstract interfaces

class proteka.metrics.calculator.IMetrics[source]

Abstract class defining interface for metrics calculators

abstract compute(Ensemble, metrics)[source]: Method to compute the metrics

Implementations

class proteka.metrics.calculator.StructuralIntegrityMetrics[source]

Class takes a dataset and checks if for chemical integrity

static ca_clashes(ensemble)[source]

Compute total number of instances when there is a clash between CA atoms Clashes are defined as any 2 nonconsecutive CA atoms been closer than 0.4 nm

Return type:: Dict[str, int]

static ca_pseudobonds(ensemble)[source]

Computes maximum and rms z-score for d_ca_ca bonds over ensemble. Z-score is defined as (d_ca_ca - mean(d_ca_ca)) / std(d_ca_ca) d_ca_ca and std(d_ca_ca) are parametrized based on analysis of the proteka and pdb databases. Beware, that this metric will give high deviations for cis-proline peptide bonds

Return type:: Dict[str, float]

compute(ensemble, metrics=['ca_clashes', 'ca_pseudobonds'])[source]: Method to compute the metrics

static general_clashes(ensemble, atom_name_pairs, thresholds=None, res_offset=1, stride=None, allowance=0.07, save_frames=False)[source]

“Compute clashes between atoms of types atom_name_1/2 according to user-supplied thresholds or according to the method of allowance-modified VDW radii overlap described here:

https://www.cgl.usf.edu/chimera/docs/ContributedSoftware/findclash/findclash.html

with VDW radii in nm taken from mdtraj.core.element.Element. If the pair is composed of atom species that can potentially form hydrogen bonds (e.g., (“N”, “O”, “S”)), then an additional default allowance of 0.07 nm is permitted.

Parameters:

ensemble (Ensemble) – Ensemble over which clashes should be detected
atom_name_pairs (List[Tuple[str, str]]) – List of str tuples that denote the first atom type pairs according to the MDTraj selection language
thresholds (Optional[List[float]]) –
List of clash thresholds for each type pair in atom_name_pairs. If None, the clash thresholds are calculated according to:

thresh = r_vdw_i + r_vdw_j - allowance

for atoms i,j.
allowance (float) – Additional distance tolerance for atoms involved in hydrogen bonding. Only used if thresholds is None. Set to 0.07 nm by default
res_offset (int) – int that determines the minimum residue separation for inclusion in distance calculations; two atoms that belong to residues i and j are included in the calculations if |i-j| > res_offset.
stride (Optional[int]) – If specified, this stride is applied to the trajectory before the distance calculations
save_frames (bool) – If True, the results also contain a keyword frames which denotes the frame indices in which clashes are detected.

Returns:

Dictionary with keys {name1}_{name2}_clashes and values reporting the number of clashes found for those name pairs

Return type:

Dict[str, int]

class proteka.metrics.calculator.EnsembleQualityMetrics(metrics)[source]

Metrics to compare a target ensemble to the reference ensemble. Input metric configs must be a dictionary of the following form:

{
    "features": {
        "rg": {
            "feature_params": {"atom_selection": "name CA"},
            "metric_params": {"js_div": {"bins": 100}},
        },
        "ca_distances": {
            "feature_params": None,
            "metric_params": {"js_div": {"bins": 100}},
        },
        "dssp": {
            "feature_params": {"digitize": True},
            "metric_params": {
                "mse_ldist": {"bins": np.array([0, 1, 2, 3, 4])}
            },
        },
    }
},

Specifying computation and metric parameters for each feature/metric for comparisons between target and reference ensembles.

compute(target, reference)[source]

compute the metrics that compare the target ensemble to the reference over the specified features. comute metrics are stored in the EnsembleQualityMetrics.results attribute.

parameters:

target: Ensemble: the target ensemble
reference: Ensemble: the reference ensemble, against which the target ensemble is compared

static compute_metric(target, reference, feature, metric='kl_div', bins=100, **kwargs)[source]

computes metric for desired feature between two ensembles.

Parameters:

target (Ensemble) – target ensemble
reference (Ensemble) – refernce ensemble
feature (str) – string specifying the feature for which the desired metric should be computed over from the target to the reference ensemble. valid features can be scalars (eg, EnsembleQualityMetrics.scalar_features) or vector features (eg, EnsembleQualityMetrics.vector_features)
metric (str) – string specifying the metric to compute for the desire feature between the target and reference ensembles. valid metrics are contained in EnsembleQualityMetrics.metrics
bins (Union[int, ndarray]) – in the case that the metric is calculated over probability distributions, this integer number of bins or np.ndarray of bins is used to compute histograms for both the target and reference ensembles

Returns:

dict of the form {“{feature}, {metric}” : metric_result} for the specified feature and metric between the target and reference ensembles.

Return type:

result

classmethod from_config(config_file)[source]

instances an EnsembleQualityMetrics from a config file. the config should have the example following structure:

EnsembleQualityMetrics:
  features:
    rmsd:
      feature_params:
        reference_structure: path_to_struct.pdb
        atom_selection: "name ca"
      metric_params:
        js_div:
          bins: 100
        mse_ldist:
          -bins:
            start: 0
            stop: 100
            num: 1000
    ...

for specific metrics, bins can be either an integer or a dictionary of key value pairs corresponding to kwargs of np.linspace to instance equal-width bins over a specific range of values. For 2D metrics, a list of binopts can be specified through the “-” operator.

Parameters:: config_file (Union[str, Dict]) – If str, a path to a YAML file specifying feature and config options. If Dict, a dictionary of a loaded YAML file

static parse_config(eqm_config)[source]

Parser for input configuration loaded from YAML or otherwise dictionaries that have unparsed bin options

Parameters:: eqm_config (Dict) – Unparsed EnsembleQualityMetrics configuration options dictionary
Returns:: Parsed EnsembleQualityMetrics configuration options dictionary that can be used for class instantiation.
Return type:: eqm_config