utils module

mlcg_tk.input_generator.utils.batch_matmul(map_matrix, X, batch_size)[source]

Perform matrix multiplication in chunks.

Parameters:
  • map_matrix – Union[np.ndarray,sparray] of shape (N_CG_ats, N_FG_ats)

  • X – np.ndarray of shape (M_frames, N_FG_ats, 3)

  • batch_size – int, the number of rows (from the M dimension) to process at a time.

Returns:

np.ndarray of shape (M_frames, N_CG_ats, 3)

Return type:

result

mlcg_tk.input_generator.utils.cg_matmul(map_arr, timeseries_arr)[source]

Function to perform array multiplication for both numpy and scipy sparse arrays

Parameters:
  • map_arr (Union[np.ndarray,sparray]) – array of shape (n_beads,n_atoms) representing a linear CG mapping

  • timeseries_arr (np.ndarray) – array of shape (n_frames,n_atoms,3) holding coordinate or force information

Return type:

ndarray

Returns:

  • np.ndarry of shape (n_frames,n_beads,3) after applying the CG map to every entry of timeseries_arr

mlcg_tk.input_generator.utils.chunker(array, n_batches)[source]

Chunks an input array into a specified number of batches.

This function divides the input array into approximately equal-sized chunks. The last chunk may contain more elements if the array length is not perfectly divisible by the number of batches.

Parameters:

arraynp.ndarray or List

The input array to be chunked.

n_batchesint

The number of batches to divide the array into. Must be a positive integer and less than or equal to the length of the array.

Returns:

: batched_array: List

A list of lists/arrays, where each inner list/array is a chunk of the original array.

Examples: >>> chunker([1, 2, 3, 4, 5, 6, 7, 8, 9], 3) [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

>>> chunker([1, 2, 3, 4, 5], 2)
[[1, 2], [3, 4, 5]]
>>> chunker([1, 2, 3, 4, 5], 5)
[[1], [2], [3], [4], [5]]
>>> chunker([1, 2, 3, 4, 5], 1)
[[1, 2, 3, 4, 5]]
mlcg_tk.input_generator.utils.filter_cis_frames(coords, forces, topology, verbose=True)[source]

filters out frames containing cis-omega angles

Parameters:
  • coords ([n_frames, n_atoms, 3]) – Non-filtered atomistic coordinates

  • forces ([n_frames, n_atoms, 3]) – Non-filtered atomistic forces

  • topology (Topology) – mdtraj topology to load the coordinates with

  • verbose (bool) – If True, will print a warning containing the number of discarded frames for this sample

Return type:

Tuple of np.ndarray’s for filtered coarse grained coordinates and forces

mlcg_tk.input_generator.utils.get_dihedral_groups(top, atoms_needed, offset, tag)[source]
Parameters:
  • top (Topology) – MDTraj topology object.

  • atoms_needed ([4]) – Names of atoms forming dihedrals, should correspond to existing atom name in topology.

  • offset ([4]) – Residue offset of each atom in atoms_needed from starting point.

  • tag (Optional[str]) – Dihedral prior tag.

Return type:

Dictionary of atom groups for each residue corresponding to dihedrals.

Example

To obtain all phi dihedral atom groups for a backbone-preserving resolution: >>> dihedral_dict = get_dihedral_groups( >>> topology, atoms_needed=[“C”, “N”, “CA”, “C”], offset=[-1.,0.,0.,0.], tag=”_phi” >>> )

For a one-bead-per-residue mapping with only CA atoms preserved: >>> dihedral_dict = get_dihedral_groups( >>> topology, atoms_needed=[“CA”, “CA”, “CA”, “CA”], offset=[-3.,-2.,-1.,0.] >>> )

mlcg_tk.input_generator.utils.get_edges_and_orders(prior_builders, topology)[source]
Parameters:
  • prior_builders (List[PriorBuilder]) – List of PriorBuilder’s to use for defining neighbour lists

  • topology (Topology) – MDTraj topology object from which atom groups defining each prior term will be created.

  • cg_dataframe – Dataframe of CG topology (from MDTraj topology object).

Return type:

List of edges, orders, and tag for each prior term specified in prior_dict.

mlcg_tk.input_generator.utils.get_output_tag(tag_label, placement='before')[source]

Helper function for combining output tag labels neatly. Fixes issues of connecting/preceding ‘_’ being included in some labels but not others.

Parameters:
  • tag_label (List, str) – Either a list of labels to include (ex: for datasets, delta force computation) or individual label item.

  • placement (str) – Placement of tag in output name. One of: ‘before’, ‘after’.

mlcg_tk.input_generator.utils.get_terminal_atoms(prior_builder, cg_dataframe, N_term=None, C_term=None)[source]
Parameters:
  • prior_builder (PriorBuilder)

  • cg_dataframe (DataFrame) – Dataframe of CG topology (from MDTraj topology object).

  • N_term ((Optional)) – Atom used in definition of N-terminus embedding.

  • C_term ((Optional)) – Atom used in definition of C-terminus embedding.

Return type:

Dict

mlcg_tk.input_generator.utils.map_cg_topology(atom_df, cg_atoms, embedding_function, skip_residues=None)[source]
Parameters:
  • atom_df (DataFrame) – Pandas DataFrame row from mdTraj topology.

  • cg_atoms (List[str]) – List of atoms needed in CG mapping.

  • embedding_function (str) – Function that slices coodinates, if not provided will fail.

  • special_typing – Optional dictionary of alternative atom properties to use in assigning types instead of atom names.

  • skip_residues (Union[List, str, None]) – Optional list of residues to skip when assigning CG atoms (can be used to skip caps for example); As of now, this skips all instances of a given residue.

Return type:

New DataFrame columns indicating atom involvement in CG mapping and type assignment.

Example

First obtain a Pandas DataFrame object using the built-in MDTraj function: >>> top_df = aa_traj.topology.to_dataframe()[0]

For a five-bead resolution mapping without including caps: >>> cg_atoms = [“N”, “CA”, “CB”, “C”, “O”] >>> embedding_function = embedding_fivebead >>> skip_residues = [“ACE”, “NME”]

Apply row-wise function: >>> top_df = top_df.apply(map_cg_topology, axis=1, cg_atoms, embedding_dict, skip_residues)

mlcg_tk.input_generator.utils.slice_coord_forces(coords, forces, cg_map, mapping='slice_aggregate', force_stride=100, batch_size=None, atoms_batch_size=None)[source]
Parameters:
  • coords ([n_frames, n_atoms, 3]) – Numpy array of atomistic coordinates

  • forces ([n_frames, n_atoms, 3]) – Numpy array of atomistic forces

  • cg_map ([n_cg_atoms, n_atomistic_atoms]) – Linear map characterizing the atomistic to CG configurational map with shape.

  • mapping (str) – Mapping scheme to be used, Can be either a string, then must be either ‘slice_aggregate’ or ‘slice_optimize’, Or can be directly a numpy array to use for projection

  • force_stride (int) – Striding to use for force projection results

  • batch_size (Optional[int]) – Optional length of batch in which divide the AA mapping of coords and forces to CG ones

  • atoms_batch_size (Optional[int]) – Optional batch size for dividing atoms in coordinates to estimate pairwise constraints

Return type:

Coarse-grained coordinates and forces

mlcg_tk.input_generator.utils.split_bulk_termini(N_term, C_term, all_edges)[source]
Parameters:
  • N_term – List of atom indices to be split as part of the N-terminal.

  • C_term – List of atom indices to be split as part of the C-terminal.

  • all_edges – All atom groups forming part of prior term.

Return type:

Separated edges for bulk and terminal groups

mlcg_tk.input_generator.utils.with_attrs(**func_attrs)[source]

Set attributes in the decorated function, at definition time. Only accepts keyword arguments.