LogoLogo
JHU-IDDCOVID-19 Scenario Modeling hubCOVID-19 Forecast Hub
  • Home
  • 🦠gempyor: modeling infectious disease dynamics
    • Modeling infectious disease dynamics
    • Model Implementation
      • flepiMoP's configuration file
      • Specifying population structure
      • Specifying compartmental model
      • Specifying initial conditions
      • Specifying seeding
      • Specifying observational model
      • Distributions
      • Specifying time-varying parameter modifications
      • Other configuration options
      • Code structure
    • Model Output
  • 📈Model Inference
    • Inference Description
    • Inference Implementation
      • Specifying data source and fitted variables
      • (OLD) Configuration options
      • (OLD) Configuration setup
      • Code structure
    • Inference Model Output
    • Inference with EMCEE
  • 🖥️More
    • Setting up the model and post-processing
      • Config writer
      • Diagnostic plotting scripts
      • Create a post-processing script
      • Reporting
    • Advanced
      • File descriptions
      • Numerical methods
      • Additional parameter options
      • Swapping model modules
      • Using plug-ins 🧩[experimental]
  • 🛠️How To Run
    • Before any run
    • Quick Start Guide
    • Multiple Configuration Files
    • Synchronizing Files
    • Advanced run guides
      • Running with Docker locally 🛳
      • Running locally in a conda environment 🐍
      • Running on AWS 🌳
      • Running On A HPC With Slurm
    • Common errors
    • Useful commands
    • Tips, tricks, FAQ
  • 🗜️Development
    • Git and GitHub Usage
    • Guidelines for contributors
  • Deprecated pages
    • Module specification
  • JHU Internal
    • US specific How to Run
      • Running with Docker locally (outdated/US specific) 🛳
      • Running on Rockfish/MARCC - JHU 🪨🐠
      • Running with docker on AWS - OLD probably outdated
        • Provisioning AWS EC2 instance
        • AWS Submission Instructions: Influenza
        • AWS Submission Instructions: COVID-19
      • Running with RStudio Server on AWS EC2
    • Inference scratch
Powered by GitBook
On this page
  • Config Changes Relative To Classical Inference
  • Running Locally
  • Running On An HPC Environment With Slurm
  • Postprocessing EMCEE
Edit on GitHub
Export as PDF
  1. Model Inference

Inference with EMCEE

PreviousInference Model OutputNextSetting up the model and post-processing

Last updated 4 months ago

Config Changes Relative To Classical Inference

The major changes are:

  1. Under the 'inference' section add method: emcee entry, and

  2. Under the 'statistics' section move the resample specific configuration under a 'resample' subsection as show bellow:

In addition to those configuration changes there are now new likelihood statistics offered: pois, norm/norm_homoskedastic, norm_cov/norm_heteroskedastic, nbinom, rmse, absolute_error. As well as new regularizations: forecast and allsubpops.

Running Locally

You can test your updated config by running:

flepimop-calibrate -c config_emcee.yml --nwalkers 5  --jobs 5 --niterations 10 --nsamples 5 --id my_run_id

If it works, it should produce:

  • Plots of simulation directly from your config,

  • Plots after the fits with the fits and the parameter chains,

  • An h5 file with all the chains, and

  • The usual model_output/ directory.

It will also immediately produce standard out that is similar to (dependent on config):

  gempyor >> Running ***DETERMINISTIC*** simulation;
  gempyor >> ModelInfo USA_inference_all; index: 1; run_id: SMH_Rdisparity_phase_one_phase1_blk1_fixprojnpis_CA-NC_emcee,
  gempyor >> prefix: USA_inference_all/SMH_Rdisparity_phase_one_phase1_blk1_fixprojnpis_CA-NC_emcee/;
Loaded subpops in loaded relative probablity file: 51 Intersect with seir simulation:  2 kept
Running Gempyor Inference

LogLoss: 6 statistics and 92 data points,number of NA for each statistic: 
incidD_latino    46
incidD_other      0
incidD_asian      0
incidD_black      0
incidD_white      0
incidC_white     24
incidC_black     24
incidC_other     24
incidC_asian     24
incidC_latino    61
incidC           24
incidD            0
dtype: int64
InferenceParameters: with 92 parameters: 
    seir_modifiers: 84 parameters
    outcome_modifiers: 8 parameters

Here, it says the config fits 92 parameters, we'll keep that in mind and choose a number of walkers greater than (ideally 2 times) this number of parameters.

Running On An HPC Environment With Slurm

#!/bin/bash
#SBATCH --ntasks 1
#SBATCH --nodes 1
#SBATCH --mem 450g
#SBATCH --cpus-per-task 256
#SBATCH --time 20:00:00
flepimop-calibrate --config config_NC_emcee.yml \
  --nwalkers 500  \
  --jobs 256 \
  --niterations 2000 \
  --nsamples 250 \
  --id my_id  > out_fit256.out 2>&1

Breaking down what each of these lines does:

  • #SBATCH --ntasks 1: Requests that this be run as a single job,

  • #SBATCH --nodes 1: Requests that the job be run on 1 node, as of right now EMCEE only supports single nodes,

  • #SBATCH --mem 450g: Requests that the whole job get 405GB of memory should be ~2-3GB per a walker,

  • #SBATCH --cpus-per-task 256: Requests that the whole job get 256 CPUs (technically 256 per a task by ntasks should be set to 1 for EMCEE),

  • #SBATCH --time 20:00:00: Specifies a time limit of 20hrs for this job to complete in, and

  • flepimop-calibrate ...:

    • --config config_NC_emcee.yml: Use the config_NC_emcee.yml for this calibration run,

    • --nwalkers 500: Use 500 walkers (or chains) for this calibration, should be about 2x the number of parameters,

    • --jobs 256: The number of parallel walkers to run, should be either 1x or 0.5x the number of cpus,

    • --niterations: The number of iterations to run for for each walker,

    • --nsamples: The number of posterier samples (taken from the end of each walker) to save to the model_output/ directory, and

    • --id: An optional short but unique job name, if not explicitly provided one will be generated from the config.

For more details on other options provided by gempyor for calibration please see flepimop-calibrate --help.

Postprocessing EMCEE

At this stage postprocessing for EMCEE outputs is fairly manual. A good starting point can be found in postprocessing/emcee_postprocess.ipynb which plots the chains and can run forward projections from the sample drawn from calibration.

First, install flepiMoP on the cluster following the guide. Then manually create a batch file to submit to slurm like so:

📈
Running On A HPC With Slurm
left: classical inference config, right: new EMCEE config