Geometry

mlcg.geometry implements geometry and topology tools for CG systems.

Internal Coordinate Functions

mlcg.geometry.internal_coordinates.compute_distances(pos, mapping, cell_shifts=None)[source]

Compute the distance between the positions in pos following the mapping assuming that mapping indices follow:

i--j

such that:

\[r_{ij} = ||\mathbf{r}_j - \mathbf{r}_i||_{2}\]

In the case of periodic boundary conditions, cell_shifts must be provided so that \(\mathbf{r}_j\) can be outside of the original unit cell.

mlcg.geometry.internal_coordinates.compute_distance_vectors(pos, mapping, cell_shifts=None)[source]

Compute the distance (or displacement) vectors between the positions in pos following the mapping assuming that that mapping indices follow:

i--j

such that:

\[\begin{split}r_{ij} &= ||\mathbf{r}_j - \mathbf{r}_i||_{2} \\ \hat{\mathbf{r}}_{ij} &= \frac{\mathbf{r}_{ij}}{r_{ij}}\end{split}\]

In the case of periodic boundary conditions, cell_shifts must be provided so that \(\mathbf{r}_j\) can be outside of the original unit cell.

mlcg.geometry.internal_coordinates.compute_angles(pos, mapping, cell_shifts=None)[source]

Compute the cosine of the angle between the positions in pos following the mapping assuming that the mapping indices follow:

  j--k
 /
i
\[\begin{split}\cos{\theta_{ijk}} &= \frac{\mathbf{r}_{ji} \mathbf{r}_{jk}}{r_{ji} r_{jk}} \\ r_{ji}&= ||\mathbf{r}_i - \mathbf{r}_j||_{2} \\ r_{jk}&= ||\mathbf{r}_k - \mathbf{r}_j||_{2}\end{split}\]

In the case of periodic boundary conditions, cell_shifts must be provided so that \(\mathbf{r}_j\) can be outside of the original unit cell.

mlcg.geometry.internal_coordinates.compute_torsions(pos, mapping)[source]

Compute the angle between two planes from positions in :obj:’pos’ following the mapping:

For dihedrals: the angle w.r.t. position of i&l is positive if l i rotated clockwise when staring down bond jk

j–k–l

/

i

For impropers: the angle is positive if when looking in plane ikj, l is rotated clockwise

k

l–j

/

i

In the case of periodic boundary conditions, cell_shifts must be provided so that \(\mathbf{r}_j\) can be outside of the original unit cell.

Topology Utilities

These classes and functions may be used to define, manipulate, and analyze CG topologies through graph representations.

class mlcg.geometry.topology.Atom(type: int, name: str | None = None, resname: str | None = None, resid: int | None = None, charge: None | float = None)[source]

Define an atom

type

Atom type/integer label

Type:

int

name

Atom name

Type:

Optional[str] = None

resname

Name of the residue containing the atom

Type:

Optional[str] = None

resid

Index of the desired residue

Type:

Optional[int] = None

class mlcg.geometry.topology.Topology[source]

Topology of an isolated protein.

add_angle(idx1, idx2, idx3)[source]

Define an angle between three atoms. idx2 represents the apex/central angles:

  2---3
 /
1
Parameters:
  • idx1 (int) – The index of the first atom defining the angle

  • idx2 (int) – The index of the central atom defining the angle

  • idx3 (int) – The index of the last atom defining the angle

Return type:

None

add_bond(idx1, idx2)[source]

Define a bond between two atoms.

Parameters:
  • idx1 (int) – The index of the first atom in the bond

  • idx2 (int) – The index of the second atom in the bond

Return type:

None

add_dihedral(idx1, idx2, idx3, idx4)[source]

The dihedral angle formed by a quadruplet of indices (1,2,3,4) is difined around the axis connecting index 2 and 3 (i.e., the angle between the planes spanned by indices (1,2,3) and (2,3,4)):

        4
        |
  2-----3
 /
1
Parameters:
  • idx1 (int) – The index of the first atom defining the dihedral

  • idx2 (int) – The index of the second atom defining the dihedral

  • idx3 (int) – The index of the third atom defining the dihedral

  • idx3 – The index of the last atom defining the dihedral

Return type:

None

angles: Tuple[List[int], List[int], List[int]]

list of angles formed by triplets of atoms

angles_from_edge_index(edge_index)[source]

Overwrites the internal angle list with the angles defined in the supplied angle edge_index

Parameters:

edge_index (Tensor) – Edge index tensor of shape (3, n_angles)

Return type:

None

bonds: Tuple[List[int], List[int]]

list of bonds between the atoms. Defines the bonded topology.

bonds_from_edge_index(edge_index)[source]

Overwrites the internal bond list with the bonds defined in the supplied bond edge_index

Parameters:

edge_index (Tensor) – Edge index tensor of shape (2, n_bonds)

Return type:

None

charges: List[int]

charge of the atoms

dihedrals: Tuple[List[int], List[int], List[int], List[int]]

list of dihedrals formed by quadruplets of atoms

dihedrals_from_edge_index(edge_index)[source]

Overwrites the internal dihedral list with the dihedral defined in the supplied dihedral edge_index

Parameters:

edge_index (Tensor) – Edge index tensor of shape (4, n_dihedrals)

Return type:

None

draw(layout=<function spring_layout>, layout_kwargs=None, drawing_kwargs=None)[source]

Use NetworkX to draw the current molecular topology. by default, node labels correspond to atom types.

Parameters:
  • layout (Callable) – NetworkX layout drawing function (from networkx.drawing.layout) that determines the positions of the nodes

  • layout_kwargs (Optional[Dict]) – keyword arguments for the node layout drawing function

  • drawing_kwargs (Optional[Dict]) – keyword arguments for nx.draw

Return type:

None

static from_ase(mol, unique=True)[source]

Build topology from an ASE Atoms instance

Warning

The minimum image convention is applied to build the topology.

Parameters:
  • mol (Atoms) – ASE atoms instance

  • unique – If True, only the unique bonds and angles will be added to the resulting Topology object. If False, all redundant (backwards) bonds and angles will be added as well.

Returns:

Topology instance based on the ASE input

Return type:

Topology

static from_file(filename)[source]

Uses mdtraj reader to read the input topology.

Return type:

Topology

static from_mdtraj(topology)[source]

Build topology from an existing mdtraj topology.

Parameters:

topology – Input MDTraj topology

Returns:

Topology instance created from the input MDTraj topology

Return type:

Topology

impropers: Tuple[List[int], List[int], List[int], List[int]]

list of impropers formed by quadruplets of atoms

impropers_from_edge_index(edge_index)[source]

Overwrites the internal improper list with the improper defined in the supplied improper edge_index

Parameters:

edge_index (Tensor) – Edge index tensor of shape (4, n_impropers)

Return type:

None

property n_atoms: int

Number of atoms in the topology.

names: List[str]

name of the atoms

neighbor_list(type, device='cpu')[source]

Build Neighborlist from a mlcg.neighbor_list.neighbor_list.Topology.

Parameters:
  • type (str) – kind of information to extract (should be in [“bonds”, “angles”, “dihedrals”, “fully connected”]).

  • device (str) – device upon which the neighborlist is returned

Returns:

Neighborlist dictionary

Return type:

Dict

remove_angle(angle_removal_list)[source]

Method to remove angles given list of angles to be removed. The changes are made in place to the Topology.angles attribute.

Warning

The order of the removal list matters, e.g., [1,2,3] and [3,2,1] are treated differently

Parameters:

angle_removal_list (list) –

List of angles, of shape (3, n_angles), to be removed from current angle list. Format: [[index1, index2, index3], …, [index1, index2, index3]]

where index1, index2, index3 are the indices of the first, second, and third atom involved in angle formation, respectively.

Return type:

None

remove_bond(bond_removal_list)[source]

Method to remove bonds given list of bonds to be removed. The changes are made in place to the Topology.bonds attribute.

Warning

The order of the removal list matters, e.g., [1,2] and [2,1] are treated differently.

Parameters:

bond_removal_list (list) –

List of bonds, of shape (2, n_bonds), to be removed from current bond list. Format: [[index1, index2], …, [index1, index2]]

where index1 and index are the indices of the first and second atom involved in bonding, respectively.

Return type:

None

resids: List[int]

number of the resid containing the atoms

resnames: List[str]

name of the residue containing the atoms

to_mdtraj()[source]

Convert to mdtraj format. If the topology does not have a resids attribute, the resids will be written incrementally for each atom.

Returns:

MDTraj topology instance from mlcg Topology

Return type:

mdtraj.Topology

types: List[int]

types of the atoms

mlcg.geometry.topology.add_chain_bonds(topology)[source]

Add bonds to the topology assuming a chain-like pattern, i.e. atoms are linked together following their insertion order. A four atoms chain will are linked like: 1-2-3-4.

Parameters:

topology (Topology) – Topology instance to which the bonds should be added

Return type:

None

mlcg.geometry.topology.add_chain_angles(topology)[source]

Add angles to the topology assuming a chain-like pattern, i.e. angles are defined following the insertion order of the atoms in the topology. A four atoms chain 1-2-3-4 will fine the angles: 1-2-3, 2-3-4.

Parameters:

topology (Topology) – Topology instance to which the angles should be added

Return type:

None

mlcg.geometry.topology.get_connectivity_matrix(topology, directed=False)[source]

Produces a full connectivity matrix from the graph structure implied by Topology.bonds

Parameters:
  • topology (Topology) – Topology for which a connectivity matrix will be constructed

  • directed (bool) – If True, an asymmetric connectivity matrix will be returned correspending to a directed graph. If false, the connectivity matrix will be symmetric and the corresponding graph will be undirected.

Returns:

Torch tensor of shape (n_atoms, n_atoms) representing the connectivity/adjacency matrix from the bonded graph.

Return type:

torch.Tensor

mlcg.geometry.topology.get_n_pairs(connectivity_matrix, n=3, unique=True)[source]

This function uses networkx to identify those pairs that are exactly n atoms away. Paths are found using Dijkstra’s algorithm.

Parameters:
  • connectivity_matrix (Union[Tensor, Graph, array]) – Connectivity/adjacency matrix of the molecular graph of shape (n_atoms, n_atoms) or a networkx graph object

  • n (int) – Number of atoms to count away from the starting atom, with the starting atom counting as n=1

  • unique (bool) – If True, the returned pairs will be unique and symmetrised.

Returns:

Edge index tensor of shape (2, n_pairs)

Return type:

torch.Tensor

mlcg.geometry.topology.get_n_paths(connectivity_matrix, n=3, unique=True)[source]

This function use networkx to grab all connected paths defined by n connecting edges. Paths are found using Dijkstra’s algorithm.

Parameters:
  • connectivity_matrix (Union[Tensor, Graph, array]) – Connectivity/adjacency matrix of the molecular graph of shape (n_atoms, n_atoms) or a networkx graph object

  • n (int) – Number of atoms to count away from the starting atom, with the starting atom counting as n=1

  • unique (bool) – If True, the returned pairs will be unique and symmetrised such that the lower bead index precedes the higher bead index in each pair.

Returns:

Path index tensor of shape (n, n_pairs)

Return type:

torch.Tensor

Statistics

These functions gather statistics for further analysis or for parametrizing prior models.

mlcg.geometry.statistics.compute_statistics(data, target, beta, TargetPrior=<class 'mlcg.nn.prior.Harmonic'>, nbins=100, bmin=None, bmax=None, fit_from_values=False, target_fit_kwargs=None)[source]

Function for computing atom type-specific statistics for every combination of atom types present in a collated AtomicData structure.

Parameters:
  • data (AtomicData) – Input data, in the form of a collated list of individual AtomicData structures.

  • target (str) – The keyword specifiying with neighbor_list sub_dictionary should be used to gather statisitics

  • beta (float) –

    Inverse thermodynamic temperature:

    \[\beta = \frac{1}{k_B T}\]

    where \(k_B\) is Boltzmann’s constant and \(T\) is the temperature.

  • TargetPrior (_Prior) – The class type of prior for which the statistics will be processed.

  • nbins (int) – The number of bins over which 1-D feature histograms are constructed in order to estimate distributions

  • bmin (Optional[float]) – If specified, the lower bound of bin edges. If not specified, the lower bound defaults to the lowest value in the input feature

  • bmax (Optional[float]) – If specified, the upper bound of bin edges. If not specified, the upper bound defaults to the greatest value in the input feature

  • fit_from_values (bool) – If True, the prior parameters are estimated directly from features values instead of their implied emperical potentials

  • target_fit_kwargs (Optional[Dict]) – Extra fit options that are prior_specific

Returns:

Dictionary of gathered statistics and estimated parameters based on the TargetPrior. The following key/value pairs are common across all TargetPrior choices:

(*specific_types) : {

    ...

    "p" : torch.Tensor of shape [n_bins], containing the normalized bin counts
        of the of the 1-D feature corresponding to the atom_type group
        (*specific_types) = (specific_types[0], specific_types[1], ...)
    "p_bin": : torch.Tensor of shape [n_bins] containing the bin center values
    "V" : torch.tensor of shape [n_bins], containing the emperically estimated
        free energy curve according to a direct Boltzmann inversion of the
        normalized probability distribution for the feature.
    "V_bin" : torch_tensor of shape [n_bins], containing the bin center values
}

where indicates other sub-key/value pairs apart from those enumerated above, which may appear depending on the chosen TargetPrior. For example, if TargetPrior is HarmonicBonds, there will also be keys/values associated with estimated bond constants and means.

Return type:

Dict

Example

my_data = AtomicData(
    out={},
    pos=[769600, 3],
    atom_types=[769600],
    n_atoms=[20800],
    neighbor_list={
        bonds={
          tag=[20800],
          order=[20800],
          index_mapping=[2, 748800],
          cell_shifts=[20800],
          rcut=[20800],
          self_interaction=[20800]
        },
        angles={
          tag=[20800],
          order=[20800],
          index_mapping=[3, 977600],
          cell_shifts=[20800],
          rcut=[20800],
          self_interaction=[20800]
        }
    },
    batch=[769600],
    ptr=[20801]
)

angle_stats = bond_stats = compute_statistics(my_data,
     'bonds', beta=beta,
     TargetPrior=HarmonicBonds
)
dihedral_stats = compute_statistics(my_data,
                                    'dihedrals',
                                    beta=beta,
                                    TargetPrior=Dihedral
)
mlcg.geometry.statistics.fit_baseline_models(data, beta, priors_cls, nbins=100, bmin=None, bmax=None)[source]

Function for parametrizing a list of priors based on type-specific interactions contained in a collated AtomicData structure

Parameters:
  • data (AtomicData) – Input data, in the form of a collated list of individual AtomicData structures.

  • beta (float) –

    Inverse thermodynamic temperature:

    \[\]

    beta = frac{1}{k_B T}

    where \(k_B\) is Boltzmann’s constant and \(T\) is the temperature.

  • priors_cls (List[_Prior]) – List of priors to parametrize based on the input data

  • nbins (int) – The number of bins over which 1-D feature histograms are constructed in order to estimate distributions

  • bmin (Optional[float]) – If specified, the lower bound of bin edges. If not specified, the lower bound defaults to the lowest value in the input feature

  • bmax (Optional[float]) – If specified, the upper bound of bin edges. If not specified, the upper bound defaults to the greatest value in the input feature

Return type:

Tuple[ModuleDict, Dict]

Returns:

  • nn.Module – The list of parametrized priors

  • Dict – Corresponding statistsics for prior fits