2) Computing statistics and fitting priors¶
If a prior model has not already been created for a given set of samples, this can be generated by first computing features defined in the prior terms and then collecting statistics of these features from the input data, using the following command:
mlcg-tk-fit_priors compute_statistics --config configuration_files/trpcage_stats.yaml --config configuration_files/trpcage_priors.yaml
It is also possible to save statistics for individual samples of a dataset by specifying
save_sample_statistics=True in the configuration file trpcage_stats.yaml, in which case statics for the
entire dataset will NOT be accumulated. Individual sample statistics can be merged as
outlined in 2.2.
2.1*) Merging statistics from different datasets¶
For situations where priors are to be fitted using simulation data from multiple datasets, statistics are computed individually for each dataset. These statistics can then be combined before fitting priors using the following:
mlcg-tk-merge_statistics --config configuration_files/trpcage_priors.yaml --save_dir path_to_output_directory --names '[dataset_tag_1, dataset_tag_2, etc]'
The above code will merge statistics from multiple datasets. If, however, individual
sample statistics have been computed by specifying save_sample_statistics=True as
detailed above, these can be merged by providing sample names (same as previous names
options in Step 1) and including a dataset tag in the configuration file or by passing the
dataset name using --tag dataset_tag. This option allows for more control and
debugging capabilities in case individual samples in the dataset produce problematic
statistics.
2.2) Fitting priors¶
After gathering statistics, we can have a free energy \(\tilde{U}(x)\) estimate using a simple Boltzmann inversion:
Where \(\mathbb{P}(x)\) is the distribution of the feature \(x\) associated to a prior term. The prior tern, a curve with a predefined functional form, is then fitted to match tilde{U}(x) with the following command:
mlcg-tk-fit_priors fit_priors --config configuration_files/trpcage_fit.yaml
This will produce a *prior_model.pt file in the save directory where every. For more details on the
functional form and fitting algorithms of each prior, please check the API documentation of
mlcg_tk.input_generator.prior_fit
Note
Prior fitting, as any curve fitting procedure, can be prone to problems due to the nature of the statistical data from simulation. We strongly recommend to check the correctness of the fit by inspecting the parameters manually.
The notebook in examples/prior_analysis/prior_check.ipynb is an example of how to inspecting
a prior manually.