2) Computing statistics and fitting priors

If a prior model has not already been created for a given set of samples, this can be generated by first computing features defined in the prior terms and then collecting statistics of these features from the input data, using the following command:

mlcg-tk-fit_priors compute_statistics --config configuration_files/trpcage_stats.yaml --config configuration_files/trpcage_priors.yaml

It is also possible to save statistics for individual samples of a dataset by specifying save_sample_statistics=True in the configuration file trpcage_stats.yaml, in which case statics for the entire dataset will NOT be accumulated. Individual sample statistics can be merged as outlined in 2.2.

2.1*) Merging statistics from different datasets

For situations where priors are to be fitted using simulation data from multiple datasets, statistics are computed individually for each dataset. These statistics can then be combined before fitting priors using the following:

mlcg-tk-merge_statistics --config configuration_files/trpcage_priors.yaml --save_dir path_to_output_directory --names '[dataset_tag_1, dataset_tag_2, etc]'

The above code will merge statistics from multiple datasets. If, however, individual sample statistics have been computed by specifying save_sample_statistics=True as detailed above, these can be merged by providing sample names (same as previous names options in Step 1) and including a dataset tag in the configuration file or by passing the dataset name using --tag dataset_tag. This option allows for more control and debugging capabilities in case individual samples in the dataset produce problematic statistics.

2.2) Fitting priors

After gathering statistics, we can have a free energy \(\tilde{U}(x)\) estimate using a simple Boltzmann inversion:

\[\tilde{U}(x) = -\frac{1}{\beta} \log{\left( \mathbb{P}(x)\right)}\]

Where \(\mathbb{P}(x)\) is the distribution of the feature \(x\) associated to a prior term. The prior tern, a curve with a predefined functional form, is then fitted to match tilde{U}(x) with the following command:

mlcg-tk-fit_priors fit_priors --config configuration_files/trpcage_fit.yaml

This will produce a *prior_model.pt file in the save directory where every. For more details on the functional form and fitting algorithms of each prior, please check the API documentation of mlcg_tk.input_generator.prior_fit

Note

Prior fitting, as any curve fitting procedure, can be prone to problems due to the nature of the statistical data from simulation. We strongly recommend to check the correctness of the fit by inspecting the parameters manually.

The notebook in examples/prior_analysis/prior_check.ipynb is an example of how to inspecting a prior manually.