General utilities¶

mlcg.utils contains useful tools for diverse use cases in the mlcg ecosystems such as reading and using yaml files, converting tensors to tuples and others

mlcg.utils.load_yaml(fn)[source]¶: Load a yaml file using ruamel.yaml

mlcg.utils.dump_yaml(fn, data)[source]¶: dump a dictionary into a yaml file using ruamel.yaml

mlcg.utils.tensor2tuple(x)[source]¶

Helper function that flattens tensors and returns them as tuples

Parameters:: x (Tensor) – Input tensor
Returns:: Output tuple
Return type:: x

mlcg.utils.make_splits(dataset_len, val_ratio, test_ratio, seed=None, filename=None, splits=None, order=None)[source]¶

Function for making train, validation, and test sets and then optionally saving them to disk using numpy.savez. Splits are returned as torch tensors.

Parameters:

dataset_len (int) – Dataset length
val_ratio (float) – Ratio of validation set size to dataset size
test_ratio (float) – Ratio of test set size to dataset set size
filename (Optional[str]) – Filename for the numpy zipped archive to save the splits, with the keys “idx_train”, “idx_val”, and “idx_test”. If None, the splits are not saved.
splits (Optional[str]) – Filename from which pre-specified splits may be loaded. Must be a valid numpy zipped archive file with the keys “idx_train”, “idx_val”, “idx_test”.
order (Optional[List[int]]) – If specified, the dataset is not shuffled and the sets are sequentially along the order list in the order (train, validation, test)

Return type:

Tuple[Tensor, Tensor, Tensor]

Returns:

idx_train – The indices of training examples in the dataset
idx_val – The indices of validation examples in the dataset
idx_test – The indices of test examples in the dataset

mlcg.utils.download_url(url, folder, log=True)[source]¶

Downloads the content of an URL to a specific folder.

Parameters:

url (string) – The url.
folder (string) – The folder.
log (bool, optional) – If False, will not print anything to the console. (default: True)

Adtapted from torch_geometric.data.download.py