Geometry
mlcg.geometry implements geometry and topology tools for CG systems.
Internal Coordinate Functions
- mlcg.geometry.internal_coordinates.compute_distances(pos, mapping, cell_shifts=None)[source]
Compute the distance between the positions in
pos
following themapping
assuming that mapping indices follow:i--j
such that:
\[r_{ij} = ||\mathbf{r}_j - \mathbf{r}_i||_{2}\]In the case of periodic boundary conditions,
cell_shifts
must be provided so that \(\mathbf{r}_j\) can be outside of the original unit cell.
- mlcg.geometry.internal_coordinates.compute_distance_vectors(pos, mapping, cell_shifts=None)[source]
Compute the distance (or displacement) vectors between the positions in
pos
following themapping
assuming that that mapping indices follow:i--j
such that:
\[\begin{split}r_{ij} &= ||\mathbf{r}_j - \mathbf{r}_i||_{2} \\ \hat{\mathbf{r}}_{ij} &= \frac{\mathbf{r}_{ij}}{r_{ij}}\end{split}\]In the case of periodic boundary conditions,
cell_shifts
must be provided so that \(\mathbf{r}_j\) can be outside of the original unit cell.
- mlcg.geometry.internal_coordinates.compute_angles(pos, mapping, cell_shifts=None)[source]
Compute the cosine of the angle between the positions in
pos
following themapping
assuming that the mapping indices follow:j--k / i
\[\begin{split}\cos{\theta_{ijk}} &= \frac{\mathbf{r}_{ji} \mathbf{r}_{jk}}{r_{ji} r_{jk}} \\ r_{ji}&= ||\mathbf{r}_i - \mathbf{r}_j||_{2} \\ r_{jk}&= ||\mathbf{r}_k - \mathbf{r}_j||_{2}\end{split}\]In the case of periodic boundary conditions,
cell_shifts
must be provided so that \(\mathbf{r}_j\) can be outside of the original unit cell.
- mlcg.geometry.internal_coordinates.compute_torsions(pos, mapping)[source]
Compute the angle between two planes from positions in :obj:’pos’ following the
mapping
:For dihedrals: the angle w.r.t. position of i&l is positive if l i rotated clockwise when staring down bond jk
j–k–l
/
i
For impropers: the angle is positive if when looking in plane ikj, l is rotated clockwise
- k
l–j
/
i
In the case of periodic boundary conditions,
cell_shifts
must be provided so that \(\mathbf{r}_j\) can be outside of the original unit cell.
Topology Utilities
These classes and functions may be used to define, manipulate, and analyze CG topologies through graph representations.
- class mlcg.geometry.topology.Atom(type: int, name: str | None = None, resname: str | None = None, resid: int | None = None, charge: None | float = None)[source]
Define an atom
- type
Atom type/integer label
- Type:
int
- name
Atom name
- Type:
Optional[str] = None
- resname
Name of the residue containing the atom
- Type:
Optional[str] = None
- resid
Index of the desired residue
- Type:
Optional[int] = None
- class mlcg.geometry.topology.Topology[source]
Topology of an isolated protein.
- add_angle(idx1, idx2, idx3)[source]
Define an angle between three atoms. idx2 represents the apex/central angles:
2---3 / 1
- Parameters:
idx1 (
int
) – The index of the first atom defining the angleidx2 (
int
) – The index of the central atom defining the angleidx3 (
int
) – The index of the last atom defining the angle
- Return type:
None
- add_bond(idx1, idx2)[source]
Define a bond between two atoms.
- Parameters:
idx1 (
int
) – The index of the first atom in the bondidx2 (
int
) – The index of the second atom in the bond
- Return type:
None
- add_dihedral(idx1, idx2, idx3, idx4)[source]
The dihedral angle formed by a quadruplet of indices (1,2,3,4) is difined around the axis connecting index 2 and 3 (i.e., the angle between the planes spanned by indices (1,2,3) and (2,3,4)):
4 | 2-----3 / 1
- Parameters:
idx1 (
int
) – The index of the first atom defining the dihedralidx2 (
int
) – The index of the second atom defining the dihedralidx3 (
int
) – The index of the third atom defining the dihedralidx3 – The index of the last atom defining the dihedral
- Return type:
None
-
angles:
Tuple
[List
[int
],List
[int
],List
[int
]] list of angles formed by triplets of atoms
- angles_from_edge_index(edge_index)[source]
Overwrites the internal angle list with the angles defined in the supplied angle edge_index
- Parameters:
edge_index (
Tensor
) – Edge index tensor of shape (3, n_angles)- Return type:
None
-
bonds:
Tuple
[List
[int
],List
[int
]] list of bonds between the atoms. Defines the bonded topology.
- bonds_from_edge_index(edge_index)[source]
Overwrites the internal bond list with the bonds defined in the supplied bond edge_index
- Parameters:
edge_index (
Tensor
) – Edge index tensor of shape (2, n_bonds)- Return type:
None
-
charges:
List
[int
] charge of the atoms
-
dihedrals:
Tuple
[List
[int
],List
[int
],List
[int
],List
[int
]] list of dihedrals formed by quadruplets of atoms
- dihedrals_from_edge_index(edge_index)[source]
Overwrites the internal dihedral list with the dihedral defined in the supplied dihedral edge_index
- Parameters:
edge_index (
Tensor
) – Edge index tensor of shape (4, n_dihedrals)- Return type:
None
- draw(layout=<function spring_layout>, layout_kwargs=None, drawing_kwargs=None)[source]
Use NetworkX to draw the current molecular topology. by default, node labels correspond to atom types.
- Parameters:
layout (
Callable
) – NetworkX layout drawing function (from networkx.drawing.layout) that determines the positions of the nodeslayout_kwargs (
Optional
[Dict
]) – keyword arguments for the node layout drawing functiondrawing_kwargs (
Optional
[Dict
]) – keyword arguments for nx.draw
- Return type:
None
- static from_ase(mol, unique=True)[source]
Build topology from an ASE Atoms instance
Warning
The minimum image convention is applied to build the topology.
- Parameters:
mol (
Atoms
) – ASE atoms instanceunique – If True, only the unique bonds and angles will be added to the resulting Topology object. If False, all redundant (backwards) bonds and angles will be added as well.
- Returns:
Topology instance based on the ASE input
- Return type:
- static from_mdtraj(topology)[source]
Build topology from an existing mdtraj topology.
- Parameters:
topology – Input MDTraj topology
- Returns:
Topology instance created from the input MDTraj topology
- Return type:
-
impropers:
Tuple
[List
[int
],List
[int
],List
[int
],List
[int
]] list of impropers formed by quadruplets of atoms
- impropers_from_edge_index(edge_index)[source]
Overwrites the internal improper list with the improper defined in the supplied improper edge_index
- Parameters:
edge_index (
Tensor
) – Edge index tensor of shape (4, n_impropers)- Return type:
None
- property n_atoms: int
Number of atoms in the topology.
-
names:
List
[str
] name of the atoms
- neighbor_list(type, device='cpu')[source]
Build Neighborlist from a mlcg.neighbor_list.neighbor_list.Topology.
- Parameters:
type (
str
) – kind of information to extract (should be in [“bonds”, “angles”, “dihedrals”, “fully connected”]).device (
str
) – device upon which the neighborlist is returned
- Returns:
Neighborlist dictionary
- Return type:
Dict
- remove_angle(angle_removal_list)[source]
Method to remove angles given list of angles to be removed. The changes are made in place to the Topology.angles attribute.
Warning
The order of the removal list matters, e.g., [1,2,3] and [3,2,1] are treated differently
- Parameters:
angle_removal_list (list) –
List of angles, of shape (3, n_angles), to be removed from current angle list. Format: [[index1, index2, index3], …, [index1, index2, index3]]
where index1, index2, index3 are the indices of the first, second, and third atom involved in angle formation, respectively.
- Return type:
None
- remove_bond(bond_removal_list)[source]
Method to remove bonds given list of bonds to be removed. The changes are made in place to the Topology.bonds attribute.
Warning
The order of the removal list matters, e.g., [1,2] and [2,1] are treated differently.
- Parameters:
bond_removal_list (list) –
List of bonds, of shape (2, n_bonds), to be removed from current bond list. Format: [[index1, index2], …, [index1, index2]]
where index1 and index are the indices of the first and second atom involved in bonding, respectively.
- Return type:
None
-
resids:
List
[int
] number of the resid containing the atoms
-
resnames:
List
[str
] name of the residue containing the atoms
- to_mdtraj()[source]
Convert to mdtraj format. If the topology does not have a resids attribute, the resids will be written incrementally for each atom.
- Returns:
MDTraj topology instance from mlcg Topology
- Return type:
mdtraj.Topology
-
types:
List
[int
] types of the atoms
- mlcg.geometry.topology.add_chain_bonds(topology)[source]
Add bonds to the topology assuming a chain-like pattern, i.e. atoms are linked together following their insertion order. A four atoms chain will are linked like: 1-2-3-4.
- Parameters:
topology (
Topology
) – Topology instance to which the bonds should be added- Return type:
None
- mlcg.geometry.topology.add_chain_angles(topology)[source]
Add angles to the topology assuming a chain-like pattern, i.e. angles are defined following the insertion order of the atoms in the topology. A four atoms chain 1-2-3-4 will fine the angles: 1-2-3, 2-3-4.
- Parameters:
topology (
Topology
) – Topology instance to which the angles should be added- Return type:
None
- mlcg.geometry.topology.get_connectivity_matrix(topology, directed=False)[source]
Produces a full connectivity matrix from the graph structure implied by Topology.bonds
- Parameters:
topology (
Topology
) – Topology for which a connectivity matrix will be constructeddirected (
bool
) – If True, an asymmetric connectivity matrix will be returned correspending to a directed graph. If false, the connectivity matrix will be symmetric and the corresponding graph will be undirected.
- Returns:
Torch tensor of shape (n_atoms, n_atoms) representing the connectivity/adjacency matrix from the bonded graph.
- Return type:
torch.Tensor
- mlcg.geometry.topology.get_n_pairs(connectivity_matrix, n=3, unique=True)[source]
This function uses networkx to identify those pairs that are exactly n atoms away. Paths are found using Dijkstra’s algorithm.
- Parameters:
connectivity_matrix (
Union
[Tensor
,Graph
,array
]) – Connectivity/adjacency matrix of the molecular graph of shape (n_atoms, n_atoms) or a networkx graph objectn (
int
) – Number of atoms to count away from the starting atom, with the starting atom counting as n=1unique (
bool
) – If True, the returned pairs will be unique and symmetrised.
- Returns:
Edge index tensor of shape (2, n_pairs)
- Return type:
torch.Tensor
- mlcg.geometry.topology.get_n_paths(connectivity_matrix, n=3, unique=True)[source]
This function use networkx to grab all connected paths defined by n connecting edges. Paths are found using Dijkstra’s algorithm.
- Parameters:
connectivity_matrix (
Union
[Tensor
,Graph
,array
]) – Connectivity/adjacency matrix of the molecular graph of shape (n_atoms, n_atoms) or a networkx graph objectn (
int
) – Number of atoms to count away from the starting atom, with the starting atom counting as n=1unique (
bool
) – If True, the returned pairs will be unique and symmetrised such that the lower bead index precedes the higher bead index in each pair.
- Returns:
Path index tensor of shape (n, n_pairs)
- Return type:
torch.Tensor
Statistics
These functions gather statistics for further analysis or for parametrizing prior models.
- mlcg.geometry.statistics.compute_statistics(data, target, beta, TargetPrior=<class 'mlcg.nn.prior.Harmonic'>, nbins=100, bmin=None, bmax=None, fit_from_values=False, target_fit_kwargs=None)[source]
Function for computing atom type-specific statistics for every combination of atom types present in a collated AtomicData structure.
- Parameters:
data (
AtomicData
) – Input data, in the form of a collated list of individual AtomicData structures.target (
str
) – The keyword specifiying with neighbor_list sub_dictionary should be used to gather statisiticsbeta (
float
) –Inverse thermodynamic temperature:
\[\beta = \frac{1}{k_B T}\]where \(k_B\) is Boltzmann’s constant and \(T\) is the temperature.
TargetPrior (
_Prior
) – The class type of prior for which the statistics will be processed.nbins (
int
) – The number of bins over which 1-D feature histograms are constructed in order to estimate distributionsbmin (
Optional
[float
]) – If specified, the lower bound of bin edges. If not specified, the lower bound defaults to the lowest value in the input featurebmax (
Optional
[float
]) – If specified, the upper bound of bin edges. If not specified, the upper bound defaults to the greatest value in the input featurefit_from_values (
bool
) – If True, the prior parameters are estimated directly from features values instead of their implied emperical potentialstarget_fit_kwargs (
Optional
[Dict
]) – Extra fit options that are prior_specific
- Returns:
Dictionary of gathered statistics and estimated parameters based on the TargetPrior. The following key/value pairs are common across all TargetPrior choices:
(*specific_types) : { ... "p" : torch.Tensor of shape [n_bins], containing the normalized bin counts of the of the 1-D feature corresponding to the atom_type group (*specific_types) = (specific_types[0], specific_types[1], ...) "p_bin": : torch.Tensor of shape [n_bins] containing the bin center values "V" : torch.tensor of shape [n_bins], containing the emperically estimated free energy curve according to a direct Boltzmann inversion of the normalized probability distribution for the feature. "V_bin" : torch_tensor of shape [n_bins], containing the bin center values }
where … indicates other sub-key/value pairs apart from those enumerated above, which may appear depending on the chosen TargetPrior. For example, if TargetPrior is HarmonicBonds, there will also be keys/values associated with estimated bond constants and means.
- Return type:
Dict
Example
my_data = AtomicData( out={}, pos=[769600, 3], atom_types=[769600], n_atoms=[20800], neighbor_list={ bonds={ tag=[20800], order=[20800], index_mapping=[2, 748800], cell_shifts=[20800], rcut=[20800], self_interaction=[20800] }, angles={ tag=[20800], order=[20800], index_mapping=[3, 977600], cell_shifts=[20800], rcut=[20800], self_interaction=[20800] } }, batch=[769600], ptr=[20801] ) angle_stats = bond_stats = compute_statistics(my_data, 'bonds', beta=beta, TargetPrior=HarmonicBonds ) dihedral_stats = compute_statistics(my_data, 'dihedrals', beta=beta, TargetPrior=Dihedral )
- mlcg.geometry.statistics.fit_baseline_models(data, beta, priors_cls, nbins=100, bmin=None, bmax=None)[source]
Function for parametrizing a list of priors based on type-specific interactions contained in a collated AtomicData structure
- Parameters:
data (
AtomicData
) – Input data, in the form of a collated list of individual AtomicData structures.beta (
float
) –Inverse thermodynamic temperature:
\[\]beta = frac{1}{k_B T}
where \(k_B\) is Boltzmann’s constant and \(T\) is the temperature.
priors_cls (
List
[_Prior
]) – List of priors to parametrize based on the input datanbins (
int
) – The number of bins over which 1-D feature histograms are constructed in order to estimate distributionsbmin (
Optional
[float
]) – If specified, the lower bound of bin edges. If not specified, the lower bound defaults to the lowest value in the input featurebmax (
Optional
[float
]) – If specified, the upper bound of bin edges. If not specified, the upper bound defaults to the greatest value in the input feature
- Return type:
Tuple
[ModuleDict
,Dict
]- Returns:
nn.Module – The list of parametrized priors
Dict – Corresponding statistsics for prior fits