raw_dataset module¶
- class mlcg_tk.input_generator.raw_dataset.CGDataBatch(cg_coords, cg_forces, cg_embeds, cg_prior_nls, batch_size, stride, weights=None, concat_forces=False)[source]¶
Bases:
objectSplits input CG data into batches for further memory-efficient processing
- cg_coords¶
Coarse grained coordinates
- cg_forces¶
Coarse grained forces
- cg_embeds¶
Atom embeddings
- cg_prior_nls¶
Dictionary of prior neighbour list
- batch_size¶
Number of frames to use in each batch
- stride¶
Integer by which to stride frames
- concat_forces¶
Boolean indicating whether forces should be added to batch
- class mlcg_tk.input_generator.raw_dataset.RawDataset(dataset_name, names, tag, n_batches=1, collection_cls=<class 'mlcg_tk.input_generator.raw_dataset.SampleCollection'>)[source]¶
Bases:
objectGenerates a list of data samples for a specified dataset
- dataset_name¶
Name given to dataset
- names¶
List of sample names
- tag¶
Label given to all output files produced from dataset
- dataset¶
List of SampleCollection objects for all samples in dataset
- class mlcg_tk.input_generator.raw_dataset.SampleCollection(name, tag, n_batches=1)[source]¶
Bases:
objectInput generation object for loading, manupulating, and saving training data samples.
- name¶
String associated with atomistic trajectory output.
- tag¶
String to identify dataset in output files.
- pdb_fn¶
File location of atomistic structure to be used for topology.
- add_terminal_embeddings(N_term='N', C_term='C')[source]¶
Adds separate embedding to terminals (do not need to be defined in original embedding_dict).
- Parameters:
N_term (
Optional[str]) – Atom of N-terminus to which N_term embedding will be assigned.C_term (
Optional[str]) – Atom of C-terminus to which C_term embedding will be assigned.assigned. (Either of N_term and/or C_term can be None; in this case only one (or no) terminal embedding(s) will be)
- apply_cg_mapping(cg_atoms, embedding_function, embedding_dict, skip_residues=None)[source]¶
Applies mapping function to atomistic topology to obtain CG representation.
- Parameters:
cg_atoms (
List[str]) – List of atom names to preserve in CG representation.embedding_function (
str) – Name of function (should be defined in embedding_maps) to apply CG mapping.embedding_dict (
str) – Name of dictionary (should eb defined in embedding_maps) to define embeddings of CG beads.skip_residues ((Optional)) – List of residue names to skip (can be used to skip terminal caps, for example). Currently, can only be used to skip all residues with given name.
- get_prior_nls(prior_builders, save_nls=True, **kwargs)[source]¶
Creates neighbourlists for all prior terms specified in the prior_dict.
- Parameters:
prior_builders (
List[PriorBuilder]) –List of PriorBuilder objects and their corresponding parameters. Input config file must minimally contain the following information for each builder:
class_path: class specifying PriorBuilder object implemented in prior_gen.py init_args:
name: string specifying type as one of ‘bonds’, ‘angles’, ‘dihedrals’, ‘non_bonded’ nl_builder: name of class implemented in prior_nls.py which will be used to collect
atom groups associated with the prior term.
save_nls (
bool) – If true, will save an output of the molecule’s neighbourlist.kwargs –
- save_dir:
If save_nls = True, the neighbourlist will be saved to this directory.
- prior_tag:
String identifying the specific combination of prior terms.
- Return type:
Dictionary of prior terms with specific index mapping for the given molecule.
Example
To build neighbour lists for a system with priors for bonds, angles, nonbonded pairs, and phi and psi dihedral angles:
class_path: input_generator.Bonds init_args:
name: bonds separate_termini: true nl_builder: input_generator.StandardBonds
class_path: input_generator.Angles init_args:
name: angles separate_termini: true nl_builder: input_generator.StandardAngles
class_path: input_generator.NonBonded init_args:
name: non_bonded min_pair: 6 res_exclusion: 1 separate_termini: false nl_builder: input_generator.Non_Bonded
class_path: input_generator.Dihedrals init_args:
name: phi nl_builder: input_generator.Phi
class_path: input_generator.Dihedrals init_args:
name: psi nl_builder: input_generator.Psi
- has_delta_forces_output(training_data_dir, force_tag='', mol_num_batches=1)[source]¶
Returns True if cg data exists for this SampleCollection
Used to skip processing of molecules where all frames have been removed by cis conformation filtering
- Parameters:
training_data_dir (
str) – Location of saved cg dataprior_tag – String identifying the specific combination of prior terms
mol_num_batches (int) – number of batches in which the molecule is suposed to be saved
- Return type:
bool- Returns:
True if cg output for the sample corresponding to prior_tag is present in training_data_dir
False otherwise
- has_saved_cg_output(save_dir, prior_tag='')[source]¶
Returns True if cg data exists for this SampleCollection
Used to skip processing of molecules where all frames have been removed by cis conformation filtering
- Parameters:
save_dir (
str) – Location of saved cg dataprior_tag (
str) – String identifying the specific combination of prior terms
- Return type:
bool- Returns:
True if cg output for the sample corresponding to prior_tag is present in save_dir
False otherwise
- load_all_batches_training_inputs(training_data_dir, force_tag='', mol_num_batches=1, stride=1)[source]¶
- load_cg_force_map(save_dir)[source]¶
Helper function to load a previously saved force map for the molecule in the sample
- Return type:
ndarray
Parameters:¶
- save_dir: str
path to the directory where the force map was saved in the first batch of the molecule in the sample
Returns:¶
: force_map: np.ndarray
force map corresponding to the molecule in self
- load_cg_output(save_dir, prior_tag='')[source]¶
Loads all cg data produced by save_cg_output and get_prior_nls
- Parameters:
save_dir (
str) – Location of saved cg dataprior_tag (
str) – String identifying the specific combination of prior terms
- Return type:
Tuple- Returns:
Tuple of np.ndarrays containing coarse grained coordinates, forces, embeddings,
structure, and prior neighbour list
- load_cg_output_into_batches(save_dir, prior_tag, batch_size, stride, weights_template_fn)[source]¶
Loads saved CG data and splits these into batches for further processing
- Parameters:
save_dir (
str) – Location of saved cg dataprior_tag (
str) – String identifying the specific combination of prior termsbatch_size (
int) – Number of frames to use in each batchstride (
int) – Integer by which to stride frames
- Return type:
Loaded CG data split into list of batches
- load_training_inputs(training_data_dir, force_tag='', stride=1)[source]¶
Loads all cg data produced by save_cg_output and get_prior_nls
- Parameters:
training_data – Location of saved cg data including delta forces
force_tag (
str) – String identifying the produced delta forces
- Return type:
Tuple of np.ndarrays containing coarse grained coordinates, delta forces, and embeddings,
- process_coords_forces(coords, forces, topology, mapping='slice_aggregate', filter_cis=False, force_stride=100, batch_size=None, atoms_batch_size=None)[source]¶
Maps coordinates and forces to CG resolution
- Parameters:
coords ([n_frames, n_atoms, 3]) – Atomistic coordinates
forces ([n_frames, n_atoms, 3]) – Atomistic forces
topology (
Topology) – mdtraj topology to lead atomistic coordinates (used for cis-omega angles filtering)mapping (
str) – Mapping scheme to be used, must be either ‘slice_aggregate’ or ‘slice_optimize’.filter_cis (
bool) – If True, frames containing a cis-omega angle will be filtered outforce_stride (
int) – Striding to use for force projection resultsbatch_size (
Optional[int]) – Batching the coords and forces projection to CGatoms_batch_size (
Optional[int]) – Batch size for processing atoms when inferring constrained atoms
- Return type:
Tuple of np.ndarray’s for coarse grained coordinates and forces
- save_cg_output(save_dir, save_coord_force=True, save_cg_maps=True, cg_coords=None, cg_forces=None)[source]¶
Saves processed CG data.
- Parameters:
save_dir (
str) – Path of directory to which output will be saved.save_coord_force (
bool) – Whether coordinates and forces should also be saved.cg_coords (
Optional[ndarray]) – CG coordinates; if None, will check whether these are saved as attribute.cg_forces (
Optional[ndarray]) – CG forces; if None, will check whether these are saved as an object attribute.
- class mlcg_tk.input_generator.raw_dataset.SimInput(dataset_name, tag, pdb_fns, collection_cls=<class 'mlcg_tk.input_generator.raw_dataset.SampleCollection'>)[source]¶
Bases:
objectGenerates a list of samples from pdb structures to be used in simulation
- dataset_name¶
Name given to dataset
- tag¶
Label given to all output files produced from dataset
- pdb_fns¶
List of pdb filenames from which samples will be generated
- dataset¶
List of SampleCollection objects for all structures