Setting up the model and post-processing data
Loading...
Loading...
Loading...
We provide helper scripts to aid users in understanding model outputs and diagnosing simulations and iterations. These scripts may be set to run automatically after a model run, and are dependent on the model defined in the user's defined config file ;
The script postprocess_snapshot.R
requires the following command line inputs:
a user-defined config, $CONFIG_PATH
a run index, $FLEPI_RUN_INDEX
a path to the model output results, $FS_RESULTS_PATH
a path to the flepiMoP repository, $FLEPI_PATH
; an ;
a list of outputs to plot, $OUTPUTS
, by default the script provides diagnostics for the following model output file ;
Plots of hosp
output files show confidence intervals of model runs, against the provided ground truth data for inference runs, for each metapopulation node. hnpi
and snpi
plots provide violin plots of parameter values for each slot ;
Other scripts are included as more specific examples of post-processing, used for diagnostic tools. processing_diagnostics.R
scripts provides a detailed diagnosis of inference model runs and fits ;
The model needs the configurations file to run (described in previous sections). These configs become lengthy and sometimes difficult to type manually. The config writer helps to generate configs provided the relevant files are present.
These functions are used to print specific sections of the configuration files.
Used to generate the global header. For more information on global headers click HERE.
sim_name
Required
Name of the configuration file to be generated. Generally based on the type of simulation
setup_name
Optional (SMH)
Type of run - a Scenario Modeling Hub ("SMH") or Forecasting Hub ("FCH") Simulation.
disease
Optional (covid19)
Pathogen or disease being simulated
smh_round
Optional (NA)
Round number for Scenario Modeling Hub Submission
data_path
Optional (data)
Folder path which contains where population data (size, mobility, etc) and ground truth data files are stored
model_output_dir_name
Optional (model_output)
Folder path where the outputs of the simulated model is stored
sim_start_date
Required
Start date for model simulation
sim_end_date
Required
End date for model simulation
start_date_groundtruth
Optional (NA)
Start date for fitting data for inference runs
end_date_groundtruth
Optional (NA)
End date for fitting data for inference runs
nslots
Required
number of independent simulations to run
Used to generate the spatial setup section of the configuration. For more information on spatial setup click HERE.
census_year
Optional (2019)
The year of data uses to generate the geodata files for US simulations ?? [Unsure about this]
sim_states
Required
Vector of locations that will be modeled (US Specific?)
geodata_file
Optional (geodata.csv)
Name of the geodata file which is imported
mobility_file
Optional (mobility.csv)
Name of the mobility file which is imported
popnodes
Optional (pop2019est)
Name of a column in the geodata file that specifies the population of every subpopulation column
nodenames
Optional (subpop)
Name of a column in the geodata file that specifies the name of the subpopulation
state_level
Optional (TRUE)
Specifies if the subpopulations are US states
Used to generate the compartment list for each way a population can be divided.
inf_stages
Optional (S,E,I1,I2,I3,R,W)
Various infection stages an individual can be in
vaccine_compartments
Optional (unvaccinated, 1dose, 2dose, waned)
Various levels of vaccinations an individual can have
variant_compartments
Optional (WILD, ALPHA, DELTA, OMICRON)
Variants of the pathogen
age_strata
Optional (age0to17, age18to64, age65to100)
Different age groups, the population has been stratified in
census year: year of geodata files
modeled states (sim_states): This has US state abbreviations. Do we include the names of the sub-populations in the geodata file? Eg: small_province, large_province
state_level: Specifies if the runs are run for US states
These scripts are run automatically after an inference run
Some information to consider if you'd like your script to be run automatically after an inference run ;
Most R/python packages are installed already installed. Try to run your script on the conda environment defined on the submission page (or easier if you are not set up on MARCC, ask me)
There will be some variables set in the environment. These variables are:
$CONFIG_PATH
the path to the configuration fil ;
$FLEPI_RUN_INDEX
the run id for this run (e.g `CH_R3_highVE_pesImm_2022_Jan29
`
$JOB_NAME
this job name (e.g USA-20230130T163847_inference_med
)
$FS_RESULTS_PATH
the path where lies the model results. It's a folder that contains the model_ouput/ as a subfolder
$FLEPI_PATH
path of the flepiMoP repository.
$DATA_PATH
path of the Data directory (e.g Flu_USA or COVID19_USA).
Anything you ask can theoretically be provided here.
The script must run without any user intervention.
The script is run from $DATA_PATH.
Your script lies in the flepiMoP directory (preferably) or it's ok if it is in a data directory if it makes sense ;
It is run on a 64Gb of RAM multicore machine. All scripts combined must complete under 4 hours, and you can use multiprocessing (48 cores)
Outputs (pdf, csv, html, txt, png ...) must be saved in a directory named pplot/
(you can assume that it exists) in order to be sent to slack by FlepiBot 🤖 after the run.
an example postprocessing script (in python) is here.
You can test your script on MARCC on a run that is already saved in /data/struelo1/flepimop-runs
or I can do it for you.
Once your script works, add (or ask to add) the command line to run in file batch/postprocessing_scripts.sh
(here) between the START and END lines, with a little comment about what your script does.