1 of 5

Setting up the model and post-processing

Setting up the model and post-processing data

Config writer

The model needs the configurations file to run (described in previous sections). These configs become lengthy and sometimes difficult to type manually. The config writer helps to generate configs provided the relevant files are present.

Print Functions:

These functions are used to print specific sections of the configuration files.

print_header

Used to generate the global header. For more information on global headers click HERE.

Variable name

Required (default value if optional)

Description

sim_name

Required

Name of the configuration file to be generated. Generally based on the type of simulation

setup_name

Optional (SMH)

Type of run - a Scenario Modeling Hub ("SMH") or Forecasting Hub ("FCH") Simulation.

disease

Optional (covid19)

Pathogen or disease being simulated

smh_round

Optional (NA)

Round number for Scenario Modeling Hub Submission

data_path

Optional (data)

Folder path which contains where population data (size, mobility, etc) and ground truth data files are stored

model_output_dir_name

Optional (model_output)

Folder path where the outputs of the simulated model is stored

sim_start_date

Required

Start date for model simulation

sim_end_date

Required

End date for model simulation

start_date_groundtruth

Optional (NA)

Start date for fitting data for inference runs

end_date_groundtruth

Optional (NA)

End date for fitting data for inference runs

nslots

Required

number of independent simulations to run

print_spatial_setup

Used to generate the spatial setup section of the configuration. For more information on spatial setup click HERE.

Variable name

Required (default value if optional)

Description

census_year

Optional (2019)

The year of data uses to generate the geodata files for US simulations ?? [Unsure about this]

sim_states

Required

Vector of locations that will be modeled (US Specific?)

geodata_file

Optional (geodata.csv)

Name of the geodata file which is imported

mobility_file

Optional (mobility.csv)

Name of the mobility file which is imported

popnodes

Optional (pop2019est)

Name of a column in the geodata file that specifies the population of every subpopulation column

nodenames

Optional (subpop)

Name of a column in the geodata file that specifies the name of the subpopulation

state_level

Optional (TRUE)

Specifies if the subpopulations are US states

print_compartments

Used to generate the compartment list for each way a population can be divided.

Variable Name

Required (default value if optional)

Description

inf_stages

Optional (S,E,I1,I2,I3,R,W)

Various infection stages an individual can be in

vaccine_compartments

Optional (unvaccinated, 1dose, 2dose, waned)

Various levels of vaccinations an individual can have

variant_compartments

Optional (WILD, ALPHA, DELTA, OMICRON)

Variants of the pathogen

age_strata

Optional (age0to17, age18to64, age65to100)

Different age groups, the population has been stratified in

Parts of the configuration files that are printed but not needed for FlepiMop runs (need to be mentioned for US or COVID-19 specific runs??)

Spatial Setup:

census year: year of geodata files
modeled states (sim_states): This has US state abbreviations. Do we include the names of the sub-populations in the geodata file? Eg: small_province, large_province
state_level: Specifies if the runs are run for US states

Diagnostic plotting scripts

We provide helper scripts to aid users in understanding model outputs and diagnosing simulations and iterations. These scripts may be set to run automatically after a model run, and are dependent on the model defined in the user's defined config file ;

The script postprocess_snapshot.R requires the following command line inputs:

a user-defined config, $CONFIG_PATH
a run index, $FLEPI_RUN_INDEX
a path to the model output results, $FS_RESULTS_PATH
a path to the flepiMoP repository, $FLEPI_PATH; an ;
a list of outputs to plot, $OUTPUTS, by default the script provides diagnostics for the following model output file ;

Plots of hosp output files show confidence intervals of model runs, against the provided ground truth data for inference runs, for each metapopulation node. hnpi and snpi plots provide violin plots of parameter values for each slot ;

Other scripts are included as more specific examples of post-processing, used for diagnostic tools. processing_diagnostics.R scripts provides a detailed diagnosis of inference model runs and fits ;

Create a post-processing script

These scripts are run automatically after an inference run

Some information to consider if you'd like your script to be run automatically after an inference run ;

Most R/python packages are installed already installed. Try to run your script on the conda environment defined on the submission page (or easier if you are not set up on MARCC, ask me)
There will be some variables set in the environment. These variables are:
- $CONFIG_PATH the path to the configuration fil ;
- $FLEPI_RUN_INDEX the run id for this run (e.g `CH_R3_highVE_pesImm_2022_Jan29`
- $JOB_NAME this job name (e.g USA-20230130T163847_inference_med)
- $FS_RESULTS_PATH the path where lies the model results. It's a folder that contains the model_ouput/ as a subfolder
- $FLEPI_PATH path of the flepiMoP repository.
- $DATA_PATH path of the Data directory (e.g Flu_USA or COVID19_USA).
- Anything you ask can theoretically be provided here.
The script must run without any user intervention.
The script is run from $DATA_PATH.
Your script lies in the flepiMoP directory (preferably) or it's ok if it is in a data directory if it makes sense ;
It is run on a 64Gb of RAM multicore machine. All scripts combined must complete under 4 hours, and you can use multiprocessing (48 cores)
Outputs (pdf, csv, html, txt, png ...) must be saved in a directory named pplot/ (you can assume that it exists) in order to be sent to slack by FlepiBot 🤖 after the run.
an example postprocessing script (in python) is here.
You can test your script on MARCC on a run that is already saved in /data/struelo1/flepimop-runs or I can do it for you.
Once your script works, add (or ask to add) the command line to run in file batch/postprocessing_scripts.sh (here) between the START and END lines, with a little comment about what your script does.

Reporting

Config writer

Print Functions:

These functions are used to print specific sections of the configuration files.

print_header

Used to generate the global header. For more information on global headers click HERE.

Variable name

Required (default value if optional)

Description

sim_name

Required

Name of the configuration file to be generated. Generally based on the type of simulation

setup_name

Optional (SMH)

Type of run - a Scenario Modeling Hub ("SMH") or Forecasting Hub ("FCH") Simulation.

disease

Optional (covid19)

Pathogen or disease being simulated

smh_round

Optional (NA)

Round number for Scenario Modeling Hub Submission

data_path

Optional (data)

Folder path which contains where population data (size, mobility, etc) and ground truth data files are stored

model_output_dir_name

Optional (model_output)

Folder path where the outputs of the simulated model is stored

sim_start_date

Required

Start date for model simulation

sim_end_date

Required

End date for model simulation

start_date_groundtruth

Optional (NA)

Start date for fitting data for inference runs

end_date_groundtruth

Optional (NA)

End date for fitting data for inference runs

nslots

Required

number of independent simulations to run

print_spatial_setup

Used to generate the spatial setup section of the configuration. For more information on spatial setup click HERE.

Variable name

Required (default value if optional)

Description

census_year

Optional (2019)

The year of data uses to generate the geodata files for US simulations ?? [Unsure about this]

sim_states

Required

Vector of locations that will be modeled (US Specific?)

geodata_file

Optional (geodata.csv)

Name of the geodata file which is imported

mobility_file

Optional (mobility.csv)

Name of the mobility file which is imported

popnodes

Optional (pop2019est)

Name of a column in the geodata file that specifies the population of every subpopulation column

nodenames

Optional (subpop)

Name of a column in the geodata file that specifies the name of the subpopulation

state_level

Optional (TRUE)

Specifies if the subpopulations are US states

print_compartments

Used to generate the compartment list for each way a population can be divided.

Variable Name

Required (default value if optional)

Description

inf_stages

Optional (S,E,I1,I2,I3,R,W)

Various infection stages an individual can be in

vaccine_compartments

Optional (unvaccinated, 1dose, 2dose, waned)

Various levels of vaccinations an individual can have

variant_compartments

Optional (WILD, ALPHA, DELTA, OMICRON)

Variants of the pathogen

age_strata

Optional (age0to17, age18to64, age65to100)

Different age groups, the population has been stratified in

Parts of the configuration files that are printed but not needed for FlepiMop runs (need to be mentioned for US or COVID-19 specific runs??)

Spatial Setup:

census year: year of geodata files
modeled states (sim_states): This has US state abbreviations. Do we include the names of the sub-populations in the geodata file? Eg: small_province, large_province
state_level: Specifies if the runs are run for US states

Create a post-processing script

These scripts are run automatically after an inference run

Some information to consider if you'd like your script to be run automatically after an inference run ;

Most R/python packages are installed already installed. Try to run your script on the conda environment defined on the submission page (or easier if you are not set up on MARCC, ask me)
There will be some variables set in the environment. These variables are:
- $CONFIG_PATH the path to the configuration fil ;
- $FLEPI_RUN_INDEX the run id for this run (e.g `CH_R3_highVE_pesImm_2022_Jan29`
- $JOB_NAME this job name (e.g USA-20230130T163847_inference_med)
- $FS_RESULTS_PATH the path where lies the model results. It's a folder that contains the model_ouput/ as a subfolder
- $FLEPI_PATH path of the flepiMoP repository.
- $DATA_PATH path of the Data directory (e.g Flu_USA or COVID19_USA).
- Anything you ask can theoretically be provided here.
The script must run without any user intervention.
The script is run from $DATA_PATH.
Your script lies in the flepiMoP directory (preferably) or it's ok if it is in a data directory if it makes sense ;
It is run on a 64Gb of RAM multicore machine. All scripts combined must complete under 4 hours, and you can use multiprocessing (48 cores)
Outputs (pdf, csv, html, txt, png ...) must be saved in a directory named pplot/ (you can assume that it exists) in order to be sent to slack by FlepiBot 🤖 after the run.
an example postprocessing script (in python) is here.
You can test your script on MARCC on a run that is already saved in /data/struelo1/flepimop-runs or I can do it for you.
Once your script works, add (or ask to add) the command line to run in file batch/postprocessing_scripts.sh (here) between the START and END lines, with a little comment about what your script does.