Setting up the model and post-processing data
We provide helper scripts to aid users in understanding model outputs and diagnosing simulations and iterations. These scripts may be set to run automatically after a model run, and are dependent on the model defined in the user's defined config file ;
The script postprocess_snapshot.R
requires the following command line inputs:
a user-defined config, $CONFIG_PATH
a run index, $FLEPI_RUN_INDEX
a path to the model output results, $FS_RESULTS_PATH
a path to the flepiMoP repository, $FLEPI_PATH
; an ;
a list of outputs to plot, $OUTPUTS
, by default the script provides diagnostics for the following model output file ;
Plots of hosp
output files show confidence intervals of model runs, against the provided ground truth data for inference runs, for each metapopulation node. hnpi
and snpi
plots provide violin plots of parameter values for each slot ;
Other scripts are included as more specific examples of post-processing, used for diagnostic tools. processing_diagnostics.R
scripts provides a detailed diagnosis of inference model runs and fits ;
The model needs the configurations file to run (described in previous sections). These configs become lengthy and sometimes difficult to type manually. The config writer helps to generate configs provided the relevant files are present.
These functions are used to print specific sections of the configuration files.
Used to generate the global header. For more information on global headers click HERE.
Name of the configuration file to be generated. Generally based on the type of simulation
Optional (SMH)
Type of run - a Scenario Modeling Hub ("SMH") or Forecasting Hub ("FCH") Simulation.
Optional (covid19)
Pathogen or disease being simulated
Optional (NA)
Round number for Scenario Modeling Hub Submission
Optional (data)
Folder path which contains where population data (size, mobility, etc) and ground truth data files are stored
Optional (model_output)
Folder path where the outputs of the simulated model is stored
Start date for model simulation
End date for model simulation
Optional (NA)
Start date for fitting data for inference runs
Optional (NA)
End date for fitting data for inference runs
number of independent simulations to run
Used to generate the spatial setup section of the configuration. For more information on spatial setup click HERE.
Optional (2019)
The year of data uses to generate the geodata files for US simulations ?? [Unsure about this]
Vector of locations that will be modeled (US Specific?)
Optional (geodata.csv)
Name of the geodata file which is imported
Optional (mobility.csv)
Name of the mobility file which is imported
Optional (pop2019est)
Name of a column in the geodata file that specifies the population of every subpopulation column
Optional (subpop)
Name of a column in the geodata file that specifies the name of the subpopulation
Optional (TRUE)
Specifies if the subpopulations are US states
Used to generate the compartment list for each way a population can be divided.
Optional (S,E,I1,I2,I3,R,W)
Various infection stages an individual can be in
Optional (unvaccinated, 1dose, 2dose, waned)
Various levels of vaccinations an individual can have
Variants of the pathogen
Optional (age0to17, age18to64, age65to100)
Different age groups, the population has been stratified in
census year: year of geodata files
modeled states (sim_states): This has US state abbreviations. Do we include the names of the sub-populations in the geodata file? Eg: small_province, large_province
state_level: Specifies if the runs are run for US states
These scripts are run automatically after an inference run
Some information to consider if you'd like your script to be run automatically after an inference run ;
Most R/python packages are installed already installed. Try to run your script on the conda environment defined on the submission page (or easier if you are not set up on MARCC, ask me)
There will be some variables set in the environment. These variables are:
the path to the configuration fil ;
the run id for this run (e.g `CH_R3_highVE_pesImm_2022_Jan29
this job name (e.g USA-20230130T163847_inference_med
the path where lies the model results. It's a folder that contains the model_ouput/ as a subfolder
path of the flepiMoP repository.
path of the Data directory (e.g Flu_USA or COVID19_USA).
Anything you ask can theoretically be provided here.
The script must run without any user intervention.
The script is run from $DATA_PATH.
Your script lies in the flepiMoP directory (preferably) or it's ok if it is in a data directory if it makes sense ;
It is run on a 64Gb of RAM multicore machine. All scripts combined must complete under 4 hours, and you can use multiprocessing (48 cores)
Outputs (pdf, csv, html, txt, png ...) must be saved in a directory named pplot/
(you can assume that it exists) in order to be sent to slack by FlepiBot 🤖 after the run.
an example postprocessing script (in python) is here.
You can test your script on MARCC on a run that is already saved in /data/struelo1/flepimop-runs
or I can do it for you.
Once your script works, add (or ask to add) the command line to run in file batch/
(here) between the START and END lines, with a little comment about what your script does.