Geometry

mlcg.geometry implements geometry and topology tools for CG systems.

Internal Coordinate Functions

mlcg.geometry.internal_coordinates.compute_distances(pos, mapping, cell_shifts=None)[source]

Compute the distance between the positions in pos following the mapping assuming that mapping indices follow:

i--j

such that:

\[r_{ij} = ||\mathbf{r}_j - \mathbf{r}_i||_{2}\]

In the case of periodic boundary conditions, cell_shifts must be provided so that \(\mathbf{r}_j\) can be outside of the original unit cell.

mlcg.geometry.internal_coordinates.compute_distance_vectors(pos, mapping, cell_shifts=None)[source]

Compute the distance (or displacement) vectors between the positions in pos following the mapping assuming that that mapping indices follow:

i--j

such that:

\[\begin{split}r_{ij} &= ||\mathbf{r}_j - \mathbf{r}_i||_{2} \\ \hat{\mathbf{r}}_{ij} &= \frac{\mathbf{r}_{ij}}{r_{ij}}\end{split}\]

In the case of periodic boundary conditions, cell_shifts must be provided so that \(\mathbf{r}_j\) can be outside of the original unit cell.

mlcg.geometry.internal_coordinates.compute_angles_cos(pos, mapping, cell_shifts=None)[source]

Compute the cosine of the angle between the positions in pos following the mapping assuming that the mapping indices follow:

  j--k
 /
i

\[\begin{split}\cos{\theta_{ijk}} &= \frac{\mathbf{r}_{ij} \cdot \mathbf{r}_{jk}}{ \Vert \mathbf{r}_{ji} \Vert \Vert \mathbf{r}_{kj} \Vert} \\ \mathbf{r}_{ij} &= \mathbf{r}_i - \mathbf{r}_j \\ \mathbf{r}_{kj} &= \mathbf{r}_k - \mathbf{r}_j\end{split}\]

In the case of periodic boundary conditions, cell_shifts must be provided so that \(\mathbf{r}_j\) can be outside of the original unit cell.

mlcg.geometry.internal_coordinates.compute_angles_raw(pos, mapping, cell_shifts=None)[source]

Compute the raw angle (in radians) between the positions in pos following the mapping assuming that the mapping indices follow:

  j--k
 /
i

\[\begin{split}\theta_{ijk} = &\text{atan2}(\Vert \hat{\mathbf{n}} \vert, \mathbf{r}_{ij} \cdot \mathbf{r}_{kj} ) \\ \mathbf{r}_{ij} &= \mathbf{r}_i - \mathbf{r}_j \\ \mathbf{r}_{kj} &= \mathbf{r}_k - \mathbf{r}_j \\ \mathbf{\hat{n}} &= \frac{\mathbf{r}_{ij} \times \mathbf{r}_{kj}}{\Vert \mathbf{r}_{ij} \times \mathbf{r}_{kj} \Vert}\end{split}\]

mlcg.geometry.internal_coordinates.compute_torsions(pos, mapping)[source]

Compute the angle between two planes from positions in :obj:’pos’ following the mapping:

For dihedrals: the angle w.r.t. position of i&l is positive if l i rotated clockwise when staring down bond jk:

  j--k--l
 /
i

For impropers: the angle is positive if when looking in plane ikj, l is rotated clockwise:

k
 \\
  l--j
 /
i

The angle is computed using the formula:

\[\begin{split}\phi_{ijkl} &= \text{atan2}(-\mathbf{m} \cdot \mathbf{n}_2, \mathbf{n}_2 \cdot \mathbf{n}_1) \\ \mathbf{n}_1 &= \mathbf{\hat{r}}_{ji} \times \mathbf{\hat{r}}_{kj} \\ \mathbf{n}_2 &= \mathbf{\hat{r}}_{kj} \times \mathbf{\hat{r}}_{jl} \\ \mathbf{m} &= \mathbf{n}_{1} \times \mathbf{\hat{r}}_{kj}\end{split}\]

Where the \(\hat{u}\) indicates a normalized vector in direction of \(u\). The order and signs in the atan2 arguments are needed to obtain values consistent with mdtraj. In the case of periodic boundary conditions, cell_shifts must be provided so that \(\mathbf{r}_j\) can be outside of the original unit cell.

Topology Utilities

These classes and functions may be used to define, manipulate, and analyze CG topologies through graph representations.

class mlcg.geometry.topology.Atom(type: int, name: str | None = None, resname: str | None = None, resid: int | None = None, charge: None | float = None)[source]

Define an atom

type

Atom type/integer label

Type:: int

name

Atom name

Type:: Optional[str] = None

resname

Name of the residue containing the atom

Type:: Optional[str] = None

resid

Index of the desired residue

Type:: Optional[int] = None

class mlcg.geometry.topology.Topology[source]

Topology of an isolated protein.

add_angle(idx1, idx2, idx3)[source]

Define an angle between three atoms. idx2 represents the apex/central angles:

  2---3
 /
1

Parameters:

idx1 (int) – The index of the first atom defining the angle
idx2 (int) – The index of the central atom defining the angle
idx3 (int) – The index of the last atom defining the angle

Return type:

None

add_bond(idx1, idx2)[source]

Define a bond between two atoms.

Parameters:

idx1 (int) – The index of the first atom in the bond
idx2 (int) – The index of the second atom in the bond

Return type:

None

add_dihedral(idx1, idx2, idx3, idx4)[source]

The dihedral angle formed by a quadruplet of indices (1,2,3,4) is difined around the axis connecting index 2 and 3 (i.e., the angle between the planes spanned by indices (1,2,3) and (2,3,4)):

        4
        |
  2-----3
 /
1

Parameters:

idx1 (int) – The index of the first atom defining the dihedral
idx2 (int) – The index of the second atom defining the dihedral
idx3 (int) – The index of the third atom defining the dihedral
idx3 – The index of the last atom defining the dihedral

Return type:

None

angles: Tuple[List[int], List[int], List[int]]: list of angles formed by triplets of atoms

angles_from_edge_index(edge_index)[source]

Overwrites the internal angle list with the angles defined in the supplied angle edge_index

Parameters:: edge_index (Tensor) – Edge index tensor of shape (3, n_angles)
Return type:: None

bonds: Tuple[List[int], List[int]]: list of bonds between the atoms. Defines the bonded topology.

bonds_from_edge_index(edge_index)[source]

Overwrites the internal bond list with the bonds defined in the supplied bond edge_index

Parameters:: edge_index (Tensor) – Edge index tensor of shape (2, n_bonds)
Return type:: None

charges: List[int]: charge of the atoms

dihedrals: Tuple[List[int], List[int], List[int], List[int]]: list of dihedrals formed by quadruplets of atoms

dihedrals_from_edge_index(edge_index)[source]

Overwrites the internal dihedral list with the dihedral defined in the supplied dihedral edge_index

Parameters:: edge_index (Tensor) – Edge index tensor of shape (4, n_dihedrals)
Return type:: None

draw(layout=<function spring_layout>, layout_kwargs=None, drawing_kwargs=None)[source]

Use NetworkX to draw the current molecular topology. by default, node labels correspond to atom types.

Parameters:

layout (Callable) – NetworkX layout drawing function (from networkx.drawing.layout) that determines the positions of the nodes
layout_kwargs (Optional[Dict]) – keyword arguments for the node layout drawing function
drawing_kwargs (Optional[Dict]) – keyword arguments for nx.draw

Return type:

None

static from_ase(mol, unique=True)[source]

Build topology from an ASE Atoms instance

Warning

The minimum image convention is applied to build the topology.

Parameters:

mol (Atoms) – ASE atoms instance
unique – If True, only the unique bonds and angles will be added to the resulting Topology object. If False, all redundant (backwards) bonds and angles will be added as well.

Returns:

Topology instance based on the ASE input

Return type:

Topology

static from_file(filename)[source]

Uses mdtraj reader to read the input topology.

Return type:: Topology

static from_mdtraj(topology)[source]

Build topology from an existing mdtraj topology.

Parameters:: topology – Input MDTraj topology
Returns:: Topology instance created from the input MDTraj topology
Return type:: Topology

impropers: Tuple[List[int], List[int], List[int], List[int]]: list of impropers formed by quadruplets of atoms

impropers_from_edge_index(edge_index)[source]

Overwrites the internal improper list with the improper defined in the supplied improper edge_index

Parameters:: edge_index (Tensor) – Edge index tensor of shape (4, n_impropers)
Return type:: None

property n_atoms: int: Number of atoms in the topology.

names: List[str]: name of the atoms

neighbor_list(type, device='cpu')[source]

Build Neighborlist from a mlcg.neighbor_list.neighbor_list.Topology.

Parameters:

type (str) – kind of information to extract (should be in [“bonds”, “angles”, “dihedrals”, “fully connected”]).
device (str) – device upon which the neighborlist is returned

Returns:

Neighborlist dictionary

Return type:

Dict

remove_angle(angle_removal_list)[source]

Method to remove angles given list of angles to be removed. The changes are made in place to the Topology.angles attribute.

Warning

The order of the removal list matters, e.g., [1,2,3] and [3,2,1] are treated differently

Parameters:

angle_removal_list (list) –

List of angles, of shape (3, n_angles), to be removed from current angle list. Format: [[index1, index2, index3], …, [index1, index2, index3]]

where index1, index2, index3 are the indices of the first, second, and third atom involved in angle formation, respectively.

Return type:

None

remove_bond(bond_removal_list)[source]

Method to remove bonds given list of bonds to be removed. The changes are made in place to the Topology.bonds attribute.

Warning

The order of the removal list matters, e.g., [1,2] and [2,1] are treated differently.

Parameters:

bond_removal_list (list) –

List of bonds, of shape (2, n_bonds), to be removed from current bond list. Format: [[index1, index2], …, [index1, index2]]

where index1 and index are the indices of the first and second atom involved in bonding, respectively.

Return type:

None

resids: List[int]: number of the resid containing the atoms

resnames: List[str]: name of the residue containing the atoms

to_mdtraj()[source]

Convert to mdtraj format. If the topology does not have a resids attribute, the resids will be written incrementally for each atom.

Returns:: MDTraj topology instance from mlcg Topology
Return type:: mdtraj.Topology

types: List[int]: types of the atoms

mlcg.geometry.topology.add_chain_bonds(topology)[source]

Add bonds to the topology assuming a chain-like pattern, i.e. atoms are linked together following their insertion order. A four atoms chain will are linked like: 1-2-3-4.

Parameters:: topology (Topology) – Topology instance to which the bonds should be added
Return type:: None

mlcg.geometry.topology.add_chain_angles(topology)[source]

Add angles to the topology assuming a chain-like pattern, i.e. angles are defined following the insertion order of the atoms in the topology. A four atoms chain 1-2-3-4 will fine the angles: 1-2-3, 2-3-4.

Parameters:: topology (Topology) – Topology instance to which the angles should be added
Return type:: None

mlcg.geometry.topology.get_connectivity_matrix(topology, directed=False)[source]

Produces a full connectivity matrix from the graph structure implied by Topology.bonds

Parameters:

topology (Topology) – Topology for which a connectivity matrix will be constructed
directed (bool) – If True, an asymmetric connectivity matrix will be returned correspending to a directed graph. If false, the connectivity matrix will be symmetric and the corresponding graph will be undirected.

Returns:

Torch tensor of shape (n_atoms, n_atoms) representing the connectivity/adjacency matrix from the bonded graph.

Return type:

torch.Tensor

mlcg.geometry.topology.get_n_pairs(connectivity_matrix, n=3, unique=True)[source]

This function uses networkx to identify those pairs that are exactly n atoms away. Paths are found using Dijkstra’s algorithm.

Parameters:

connectivity_matrix (Union[Tensor, Graph, array]) – Connectivity/adjacency matrix of the molecular graph of shape (n_atoms, n_atoms) or a networkx graph object
n (int) – Number of atoms to count away from the starting atom, with the starting atom counting as n=1
unique (bool) – If True, the returned pairs will be unique and symmetrised.

Returns:

Edge index tensor of shape (2, n_pairs)

Return type:

torch.Tensor

mlcg.geometry.topology.get_n_paths(connectivity_matrix, n=3, unique=True)[source]

This function use networkx to grab all connected paths defined by n connecting edges. Paths are found using Dijkstra’s algorithm.

Parameters:

connectivity_matrix (Union[Tensor, Graph, array]) – Connectivity/adjacency matrix of the molecular graph of shape (n_atoms, n_atoms) or a networkx graph object
n (int) – Number of atoms to count away from the starting atom, with the starting atom counting as n=1
unique (bool) – If True, the returned pairs will be unique and symmetrised such that the lower bead index precedes the higher bead index in each pair.

Returns:

Path index tensor of shape (n, n_pairs)

Return type:

torch.Tensor

Statistics

These functions gather statistics for further analysis or for parametrizing prior models.

mlcg.geometry.statistics.compute_statistics(data, target, beta, TargetPrior=<class 'mlcg.nn.prior.Harmonic'>, nbins=100, bmin=None, bmax=None, fit_from_values=False, target_fit_kwargs=None)[source]

Function for computing atom type-specific statistics for every combination of atom types present in a collated AtomicData structure.

Parameters:

data (AtomicData) – Input data, in the form of a collated list of individual AtomicData structures.
target (str) – The keyword specifiying with neighbor_list sub_dictionary should be used to gather statisitics
beta (float) –
Inverse thermodynamic temperature:

\[\beta = \frac{1}{k_B T}\]

where \(k_B\) is Boltzmann’s constant and \(T\) is the temperature.
TargetPrior (_Prior) – The class type of prior for which the statistics will be processed.
nbins (int) – The number of bins over which 1-D feature histograms are constructed in order to estimate distributions
bmin (Optional[float]) – If specified, the lower bound of bin edges. If not specified, the lower bound defaults to the lowest value in the input feature
bmax (Optional[float]) – If specified, the upper bound of bin edges. If not specified, the upper bound defaults to the greatest value in the input feature
fit_from_values (bool) – If True, the prior parameters are estimated directly from features values instead of their implied emperical potentials
target_fit_kwargs (Optional[Dict]) – Extra fit options that are prior_specific

Returns:

Dictionary of gathered statistics and estimated parameters based on the TargetPrior. The following key/value pairs are common across all TargetPrior choices:

(*specific_types) : {

    ...

    "p" : torch.Tensor of shape [n_bins], containing the normalized bin counts
        of the of the 1-D feature corresponding to the atom_type group
        (*specific_types) = (specific_types[0], specific_types[1], ...)
    "p_bin": : torch.Tensor of shape [n_bins] containing the bin center values
    "V" : torch.tensor of shape [n_bins], containing the emperically estimated
        free energy curve according to a direct Boltzmann inversion of the
        normalized probability distribution for the feature.
    "V_bin" : torch_tensor of shape [n_bins], containing the bin center values
}

where … indicates other sub-key/value pairs apart from those enumerated above, which may appear depending on the chosen TargetPrior. For example, if TargetPrior is HarmonicBonds, there will also be keys/values associated with estimated bond constants and means.

Return type:

Dict

Example

my_data = AtomicData(
    out={},
    pos=[769600, 3],
    atom_types=[769600],
    n_atoms=[20800],
    neighbor_list={
        bonds={
          tag=[20800],
          order=[20800],
          index_mapping=[2, 748800],
          cell_shifts=[20800],
          rcut=[20800],
          self_interaction=[20800]
        },
        angles={
          tag=[20800],
          order=[20800],
          index_mapping=[3, 977600],
          cell_shifts=[20800],
          rcut=[20800],
          self_interaction=[20800]
        }
    },
    batch=[769600],
    ptr=[20801]
)

angle_stats = bond_stats = compute_statistics(my_data,
     'bonds', beta=beta,
     TargetPrior=HarmonicBonds
)
dihedral_stats = compute_statistics(my_data,
                                    'dihedrals',
                                    beta=beta,
                                    TargetPrior=Dihedral
)

mlcg.geometry.statistics.fit_baseline_models(data, beta, priors_cls, nbins=100, bmin=None, bmax=None)[source]

Function for parametrizing a list of priors based on type-specific interactions contained in a collated AtomicData structure

Parameters:

data (AtomicData) – Input data, in the form of a collated list of individual AtomicData structures.
beta (float) –
Inverse thermodynamic temperature:

\[\]

beta = frac{1}{k_B T}

where \(k_B\) is Boltzmann’s constant and \(T\) is the temperature.
priors_cls (List[_Prior]) – List of priors to parametrize based on the input data
nbins (int) – The number of bins over which 1-D feature histograms are constructed in order to estimate distributions
bmin (Optional[float]) – If specified, the lower bound of bin edges. If not specified, the lower bound defaults to the lowest value in the input feature
bmax (Optional[float]) – If specified, the upper bound of bin edges. If not specified, the upper bound defaults to the greatest value in the input feature

Return type:

Tuple[ModuleDict, Dict]

Returns:

nn.Module – The list of parametrized priors
Dict – Corresponding statistsics for prior fits