Inference with EMCEE
Last updated
Last updated
The major changes are:
Under the 'inference' section add method: emcee
entry, and
Under the 'statistics' section move the resample specific configuration under a 'resample' subsection as show bellow:
In addition to those configuration changes there are now new likelihood statistics offered: pois
, norm
/norm_homoskedastic
, norm_cov
/norm_heteroskedastic
, nbinom
, rmse
, absolute_error
. As well as new regularizations: forecast
and allsubpops
.
You can test your updated config by running:
If it works, it should produce:
Plots of simulation directly from your config,
Plots after the fits with the fits and the parameter chains,
An h5 file with all the chains, and
The usual model_output/
directory.
It will also immediately produce standard out that is similar to (dependent on config):
Here, it says the config fits 92 parameters, we'll keep that in mind and choose a number of walkers greater than (ideally 2 times) this number of parameters.
First, install flepiMoP
on the cluster following the Running On A HPC With Slurm guide. Then manually create a batch file to submit to slurm like so:
Breaking down what each of these lines does:
#SBATCH --ntasks 1
: Requests that this be run as a single job,
#SBATCH --nodes 1
: Requests that the job be run on 1 node, as of right now EMCEE only supports single nodes,
#SBATCH --mem 450g
: Requests that the whole job get 405GB of memory should be ~2-3GB per a walker,
#SBATCH --cpus-per-task 256
: Requests that the whole job get 256 CPUs (technically 256 per a task by ntasks
should be set to 1 for EMCEE),
#SBATCH --time 20:00:00
: Specifies a time limit of 20hrs for this job to complete in, and
flepimop-calibrate ...
:
--config config_NC_emcee.yml
: Use the config_NC_emcee.yml
for this calibration run,
--nwalkers 500
: Use 500 walkers (or chains) for this calibration, should be about 2x the number of parameters,
--jobs 256
: The number of parallel walkers to run, should be either 1x or 0.5x the number of cpus,
--niterations
: The number of iterations to run for for each walker,
--nsamples
: The number of posterier samples (taken from the end of each walker) to save to the model_output/
directory, and
--id
: An optional short but unique job name, if not explicitly provided one will be generated from the config.
For more details on other options provided by gempyor for calibration please see flepimop-calibrate --help
.
At this stage postprocessing for EMCEE outputs is fairly manual. A good starting point can be found in postprocessing/emcee_postprocess.ipynb
which plots the chains and can run forward projections from the sample drawn from calibration.