utils module¶
- mlcg_tk.input_generator.utils.batch_matmul(map_matrix, X, batch_size)[source]¶
Perform matrix multiplication in chunks.
- Parameters:
map_matrix – Union[np.ndarray,sparray] of shape (N_CG_ats, N_FG_ats)
X – np.ndarray of shape (M_frames, N_FG_ats, 3)
batch_size – int, the number of rows (from the M dimension) to process at a time.
- Returns:
np.ndarray of shape (M_frames, N_CG_ats, 3)
- Return type:
result
- mlcg_tk.input_generator.utils.cg_matmul(map_arr, timeseries_arr)[source]¶
Function to perform array multiplication for both numpy and scipy sparse arrays
- Parameters:
map_arr (Union[np.ndarray,sparray]) – array of shape (n_beads,n_atoms) representing a linear CG mapping
timeseries_arr (np.ndarray) – array of shape (n_frames,n_atoms,3) holding coordinate or force information
- Return type:
ndarray- Returns:
np.ndarry of shape (n_frames,n_beads,3) after applying the CG map to every entry of timeseries_arr
- mlcg_tk.input_generator.utils.chunker(array, n_batches)[source]¶
Chunks an input array into a specified number of batches.
This function divides the input array into approximately equal-sized chunks. The last chunk may contain more elements if the array length is not perfectly divisible by the number of batches.
Parameters:¶
- arraynp.ndarray or List
The input array to be chunked.
- n_batchesint
The number of batches to divide the array into. Must be a positive integer and less than or equal to the length of the array.
Returns:¶
: batched_array: List
A list of lists/arrays, where each inner list/array is a chunk of the original array.
Examples: >>> chunker([1, 2, 3, 4, 5, 6, 7, 8, 9], 3) [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
>>> chunker([1, 2, 3, 4, 5], 2) [[1, 2], [3, 4, 5]]
>>> chunker([1, 2, 3, 4, 5], 5) [[1], [2], [3], [4], [5]]
>>> chunker([1, 2, 3, 4, 5], 1) [[1, 2, 3, 4, 5]]
- mlcg_tk.input_generator.utils.filter_cis_frames(coords, forces, topology, verbose=True)[source]¶
filters out frames containing cis-omega angles
- Parameters:
coords ([n_frames, n_atoms, 3]) – Non-filtered atomistic coordinates
forces ([n_frames, n_atoms, 3]) – Non-filtered atomistic forces
topology (
Topology) – mdtraj topology to load the coordinates withverbose (
bool) – If True, will print a warning containing the number of discarded frames for this sample
- Return type:
Tuple of np.ndarray’s for filtered coarse grained coordinates and forces
- mlcg_tk.input_generator.utils.get_dihedral_groups(top, atoms_needed, offset, tag)[source]¶
- Parameters:
top (
Topology) – MDTraj topology object.atoms_needed ([4]) – Names of atoms forming dihedrals, should correspond to existing atom name in topology.
offset ([4]) – Residue offset of each atom in atoms_needed from starting point.
tag (
Optional[str]) – Dihedral prior tag.
- Return type:
Dictionary of atom groups for each residue corresponding to dihedrals.
Example
To obtain all phi dihedral atom groups for a backbone-preserving resolution: >>> dihedral_dict = get_dihedral_groups( >>> topology, atoms_needed=[“C”, “N”, “CA”, “C”], offset=[-1.,0.,0.,0.], tag=”_phi” >>> )
For a one-bead-per-residue mapping with only CA atoms preserved: >>> dihedral_dict = get_dihedral_groups( >>> topology, atoms_needed=[“CA”, “CA”, “CA”, “CA”], offset=[-3.,-2.,-1.,0.] >>> )
- mlcg_tk.input_generator.utils.get_edges_and_orders(prior_builders, topology)[source]¶
- Parameters:
prior_builders (
List[PriorBuilder]) – List of PriorBuilder’s to use for defining neighbour liststopology (
Topology) – MDTraj topology object from which atom groups defining each prior term will be created.cg_dataframe – Dataframe of CG topology (from MDTraj topology object).
- Return type:
List of edges, orders, and tag for each prior term specified in prior_dict.
- mlcg_tk.input_generator.utils.get_output_tag(tag_label, placement='before')[source]¶
Helper function for combining output tag labels neatly. Fixes issues of connecting/preceding ‘_’ being included in some labels but not others.
- Parameters:
tag_label (List, str) – Either a list of labels to include (ex: for datasets, delta force computation) or individual label item.
placement (str) – Placement of tag in output name. One of: ‘before’, ‘after’.
- mlcg_tk.input_generator.utils.get_terminal_atoms(prior_builder, cg_dataframe, N_term=None, C_term=None)[source]¶
- Parameters:
prior_builder (
PriorBuilder)cg_dataframe (
DataFrame) – Dataframe of CG topology (from MDTraj topology object).N_term ((Optional)) – Atom used in definition of N-terminus embedding.
C_term ((Optional)) – Atom used in definition of C-terminus embedding.
- Return type:
Dict
- mlcg_tk.input_generator.utils.map_cg_topology(atom_df, cg_atoms, embedding_function, skip_residues=None)[source]¶
- Parameters:
atom_df (
DataFrame) – Pandas DataFrame row from mdTraj topology.cg_atoms (
List[str]) – List of atoms needed in CG mapping.embedding_function (
str) – Function that slices coodinates, if not provided will fail.special_typing – Optional dictionary of alternative atom properties to use in assigning types instead of atom names.
skip_residues (
Union[List,str,None]) – Optional list of residues to skip when assigning CG atoms (can be used to skip caps for example); As of now, this skips all instances of a given residue.
- Return type:
New DataFrame columns indicating atom involvement in CG mapping and type assignment.
Example
First obtain a Pandas DataFrame object using the built-in MDTraj function: >>> top_df = aa_traj.topology.to_dataframe()[0]
For a five-bead resolution mapping without including caps: >>> cg_atoms = [“N”, “CA”, “CB”, “C”, “O”] >>> embedding_function = embedding_fivebead >>> skip_residues = [“ACE”, “NME”]
Apply row-wise function: >>> top_df = top_df.apply(map_cg_topology, axis=1, cg_atoms, embedding_dict, skip_residues)
- mlcg_tk.input_generator.utils.slice_coord_forces(coords, forces, cg_map, mapping='slice_aggregate', force_stride=100, batch_size=None, atoms_batch_size=None)[source]¶
- Parameters:
coords ([n_frames, n_atoms, 3]) – Numpy array of atomistic coordinates
forces ([n_frames, n_atoms, 3]) – Numpy array of atomistic forces
cg_map ([n_cg_atoms, n_atomistic_atoms]) – Linear map characterizing the atomistic to CG configurational map with shape.
mapping (
str) – Mapping scheme to be used, Can be either a string, then must be either ‘slice_aggregate’ or ‘slice_optimize’, Or can be directly a numpy array to use for projectionforce_stride (
int) – Striding to use for force projection resultsbatch_size (
Optional[int]) – Optional length of batch in which divide the AA mapping of coords and forces to CG onesatoms_batch_size (
Optional[int]) – Optional batch size for dividing atoms in coordinates to estimate pairwise constraints
- Return type:
Coarse-grained coordinates and forces
- mlcg_tk.input_generator.utils.split_bulk_termini(N_term, C_term, all_edges)[source]¶
- Parameters:
N_term – List of atom indices to be split as part of the N-terminal.
C_term – List of atom indices to be split as part of the C-terminal.
all_edges – All atom groups forming part of prior term.
- Return type:
Separated edges for bulk and terminal groups