1 of 7

Advanced

(Any more advanced mathematical or computational methods used, or possible configuration options, that only specialized users would need to change)

File descriptions

flepiMoP

https://github.com/HopkinsIDD/flepiMoP

Current branch: main

This repository contains all the code underlying the mathematical model and the data fitting procedure, as well as ...

To actually run the model, this repository folder must be located inside a location folder (e.g. COVID19_USA) which contains additional files describing the specifics of the model to be run (i.e. the config file), all the necessary input data (i.e. the population structure), and any data to which the model will be fit (i.e. cases and death counts each day)

/gempyor_pkg

This directory contains the core Python code that creates and simulates generic compartmental models and additionally simulates observed variables. This code is called gempyor for General Epidemics Modeling Pipeline with Yterventions and Outcome Reporting. The code in gempyor is called from R scripts (see /main_scripts and /R sections below) that read the config, run the model simulation via gempyor as required, read in data, and run the model inference algorithms.

pyproject.toml - contains the build system requirements and dependencies for the gempyor package; used during package installation
setup.cfg - contains information used by Python's setuptools to build the gempyor package. Contains the definitions of command line shortcuts for running simulations directly from gempyor (bypassing R interface) if desired

/gempyor_pkg/src/gempyor/

seir.py - Contains the core code for simulating the mathematical model. Takes in the model definition and parameters from the config, and outputs a file with a timeseries of the value of each state variable (# of individuals in each compartment)
simulate_seir.py -
steps_rk.py -
steps_source.py -
outcomes.py - Contains the core code for generating the outcome variables. Takes in the output of the mathematical model and parameters from the config, and outputs a file with a timeseries of the value of each outcome (observed) variable
simulate_outcomes.py -
setup.py
file_paths.py -
compartments.py
parameters.py
results.py
seeding_ic.py
/NPI/
- base.py -
- SinglePeriodModifier.py -
- MultiPeriodModifier.py -
- SinglePeriodModifierInterven.py -
/dev - contains functions that are still in development
/data - ?

/gempyor_pkg/docs

Contains notebooks with some gempyor-specific documentation and examples

Rinterface.Rmd - And R notebook that provides some background on gempyor and describes how to run it as a standalone package in python, without the R wrapper scripts or the Docker.
Rinterface.html - HTML output of Rinterface.Rmd

/R

/main_scripts

This directory contains the R scripts that takes the specifications in the configuration file and sets up the model simulation, reads the data, and performs inference.

inference_main.R - This is the master R script used to run the model. It distributes the model runs across computer cores, setting up runs for all the scenarios specified in the config, and for each model iteration used in the parameter inference. Note that despite the name "inference" in this file, this script must be used to run the model even if no parameter inference is conducted
inference_slot.R - This script contains the main code of the inference algorithm.
create_seeding.R -

/R_packages

This directory contains the core R code - organized into functions within packages - that handle the model setup, data pulling and processing, conducting parameter inference for the model, and manipulating model output.

flepicommon
- config.R
- DataUtils.R
- file_paths.R
- safe_eval.R
- compartments.R
inference - contains code to
- groundtruth.R - contains functions for pulling ground truth data from various sources. Calls functions in the flepicommon package
- functions.R - contains many functions used in running the inference algorithm
- inference_slot_runner_funcs.R - contains many functions used in running the inference algorithm
- inference_to_forecast.R -
- documentation.Rmd - Summarizes the documentation relevant to the inference package, including the configuration file options relevant to model fitting
- InferenceTest.R -
- /tests/ -
config.writer
- create_config_data.R
- process_npi_list.R
- yaml_utils.R
report.generation
- DataLoadFuncs.R
- ReportBuildUtils.R
- ReportLoadData.R
- setup_testing_environment.R

/test

/data

Depreciated? Should be removed

/vignettes

Depreciated? Should be removed

/doc

Depreciated? Should be removed

/batch

/slurm_batch

COVID19_USA Repository

https://github.com/HopkinsIDD/COVID19_USA

Current branch: main

/R

Contains R scripts for generating model input parameters from data, writing config files, or processing model output. Most of the files in here are historic (specific to a particular model run) and not frequently used. Important scripts include:

get_vacc_rate_and_outcomes_R13.R - this pulls vaccination coverage and variant prevalence data specific to rounds (either empirical, or specified by the scenario), and adjusts these data to the formats required for the model. Several data files are created in this process: variant proportions for each scenario, vaccination rates by age and dose. A file is also generated that defines the outcome ratios (taking in to account immune escape, cross protection and VE).

/R/scripts/config_writers

Scripts to generate config files for particular submissions to the Scenario Modeling Hub. Most of this functionality has now been replaced by the config writer package ()

R/scripts/postprocess

Scripts to process the output of model runs into data formats and plots used for Scenario Modeling Hub and Forecast Hub. These scripts pull runs from AWS S3 buckets and processes and formats them to specifications for submissions to Scenario Modeling Hubs, Forecast Hubs and FluSight. These formatted files are saved and the results visualized. This script uses functions defined in /COVIDScenarioPipeline/R/scripts/postprocess.

run_sum_processing.R

/data

Contains data files used in parameterizing the model for COVID-19 in the US (such as creating the population structure, describing vaccine efficacy, describing parameter alterations due to variants, etc). Some data files are re-downloaded frequently using scripts in the pipeline (us_data.csv) while others are more static (geodata, mobility)

Important files and folders include

geodata.csv
geodata_2019_statelevel.csv
mobility.csv
mobility_territories_2011-2015_statelevel.csv
outcomes_ratios.csv
US_CFR_shift_dates_v3.csv
US_hosp_ratio_corrections.cs
seeding_agestrat_RX.csv

/data/shp

"Shape-files" (.shp) that .....

/data/outcomes

usa-subpop-params-output_V2.parquet

/data/intervention_tracking

Data files containing the dates that different non pharmaceutical interventions (like mask mandates, stay-at-home orders, school closures) were implemented by state

/data/vaccination

Files used to create the config elements related to vaccination, such as vaccination rates by state by age and vaccine efficacy by dose

/data/variant

Files created in the process of downloading and analyzing data on variant proportions

/manuscripts

Contains files for scientific manuscripts using results from the pipeline. Not up to date

/config

Contains an archive of configuration files used for previous model runs

/old_configs

Same as above. Contains an archive of configuration files used for previous model runs

/scripts

Depreciated - to be removed? - contains rarely used scripts

/notebook

Depreciated - to be removed? - contains rarely used notebooks to check model input. Might be used in some unit tests?

/NPI

empty?

/ScenarioHub

Depreciated - to be removed?

Numerical methods

Additional parameter options

Aka “magic numbers” - fixed parameters that may or may not be in config, like MCMC step size, dt, etc . . .

MCMC step size
Numerical integration step size
Mobility proportion

Swapping model modules

(Ie using a totally different compartmental model or outcomes model)

Resuming inference runs

Communication Between Iterations

The pipeline uses files to communicate between different iterations. Currently, the following file types exist:

seed
init
snpi
spar
seir
hpar
hnpi
hosp
llik

During each iteration, inference uses these files to communicate with the compartmental model and outcomes. The intent is that inference should only need to read and write these files, and that the compartmental model can handle everything else. In addition to the global versions of these files actually passed to the compartmental/reporting model, there exist chimeric versions used internally by inference and stored in memory. These copies are what inference interacts with when it needs to perturb values. While this design was chosen primarily to support modularity (a fixed communication boundary makes it easy to swap out the compartmental model), it has had a number of additional benefits.

Bootstrapping

The first iteration of an MCMC algorithm is a special case, because we need to pull initial conditions for our parameters. We originally developed the model without inference in mind, so the compartmental model is already set up to read parameter distributions from the configuration file, and to draw values from those distributions, and record those parameters to file. We take advantage of this to bootstrap our MCMC parameters by running the model one time, and reading the parameters it generated from file.

Resume from previous run

We can, instead of bootstrapping our first iteration, read in final values of a previous iteration. This allows us to resume from runs to save computational time and effectively continue iterating on the same chain. We call these resumes: inferred parameters are taken from a previous run and allowed to continue being inferred ;

Resumes take the following files (if they exist) from previous runs and uses them as the starting point of a new run:

hnpi
snpi
seed

Continue from previous run

Using plug-ins 🧩[experimental]

How to plug-in your code/data directly into flepiMoP

Sometimes, the default modules, such as seeding, or initial condition, do not provide the desired functionality. Thankfully, it is possible to replace a gempyor module with your own code, using plug-ins. This works only for initial conditions and seeding at the moment, reach out to us if you are interested in having it works on parameters, modifiers, ...

Here is an example, to set a random initial condition, where each subpopulation a random proportion of individuals is infected. For this, simply set the method of a block to plugin and provide the path of your file.

initial_conditions:
  method: plugin
  plugin_file_path: model_input/my_initial_conditions.py
  # you can also include some configuration for your plugin:
  ub_prop_infected: 0.001 # upper bound of the uniform distribution

This file contains a class that inherits from a gempyor class, which means that everything already defined in gempyor is available but you can overwrite any single method. Here, we will rewrite the load and draw methods of the initial conditions methods

import gempyor.seeding_ic
import numpy as np

class InitialConditions(gempyor.seeding_ic.InitialConditions):

    def get_from_config(self, sim_id: int, setup) -> np.ndarray:
        y0 = np.zeros((setup.compartments.compartments.shape[0], setup.nsubpops))
        S_idx = setup.compartments.get_comp_idx({"infection_stage":"S"})
        I_idx = setup.compartments.get_comp_idx({"infection_stage":"I"})
        prop_inf = np.random.uniform(low=0,high=self.config["ub_prop_infected"].get(), size=setup.nsubpops)
        y0[S_idx, :] = setup.subpop_pop * (1-prop_inf)
        y0[I_idx, :] = setup.subpop_pop * prop_inf
        
        return y0
    
    def get_from_file(self, sim_id: int, setup) -> np.ndarray:
        return self.get_from_config(sim_id=sim_id, setup=setup)

You can use any code within these functions, as long as the return object has the shape and type that gempyor expect (and that is undocumented and still subject to change, but as you see in this case gempyor except an array (a matrix) of shape: number of compartments X number of subpopulations). You can e.g call bash functions or excute R scripts such as below

import gempyor.seeding_ic
import numpy as np

class InitialConditions(gempyor.seeding_ic.InitialConditions):

    def get_from_config(self, sim_id: int, setup) -> np.ndarray:
        import rpy2.robjects as robjects
        robjects.r.source("path_to_your_Rscript.R", encoding="utf-8")
        y0 = robjects.r["initial_condition_fromR"]
        return y0
    
    def get_from_file(self, sim_id: int, setup) -> np.ndarray:
        return self.get_from_config(sim_id=sim_id, setup=setup)

File descriptions

flepiMoP

https://github.com/HopkinsIDD/flepiMoP

Current branch: main

This repository contains all the code underlying the mathematical model and the data fitting procedure, as well as ...

/gempyor_pkg

pyproject.toml - contains the build system requirements and dependencies for the gempyor package; used during package installation
setup.cfg - contains information used by Python's setuptools to build the gempyor package. Contains the definitions of command line shortcuts for running simulations directly from gempyor (bypassing R interface) if desired

/gempyor_pkg/src/gempyor/

seir.py - Contains the core code for simulating the mathematical model. Takes in the model definition and parameters from the config, and outputs a file with a timeseries of the value of each state variable (# of individuals in each compartment)
simulate_seir.py -
steps_rk.py -
steps_source.py -
outcomes.py - Contains the core code for generating the outcome variables. Takes in the output of the mathematical model and parameters from the config, and outputs a file with a timeseries of the value of each outcome (observed) variable
simulate_outcomes.py -
setup.py
file_paths.py -
compartments.py
parameters.py
results.py
seeding_ic.py
/NPI/
- base.py -
- SinglePeriodModifier.py -
- MultiPeriodModifier.py -
- SinglePeriodModifierInterven.py -
/dev - contains functions that are still in development
/data - ?

/gempyor_pkg/docs

Contains notebooks with some gempyor-specific documentation and examples

Rinterface.Rmd - And R notebook that provides some background on gempyor and describes how to run it as a standalone package in python, without the R wrapper scripts or the Docker.
Rinterface.html - HTML output of Rinterface.Rmd

/R

/main_scripts

This directory contains the R scripts that takes the specifications in the configuration file and sets up the model simulation, reads the data, and performs inference.

inference_main.R - This is the master R script used to run the model. It distributes the model runs across computer cores, setting up runs for all the scenarios specified in the config, and for each model iteration used in the parameter inference. Note that despite the name "inference" in this file, this script must be used to run the model even if no parameter inference is conducted
inference_slot.R - This script contains the main code of the inference algorithm.
create_seeding.R -

/R_packages

flepicommon
- config.R
- DataUtils.R
- file_paths.R
- safe_eval.R
- compartments.R
inference - contains code to
- groundtruth.R - contains functions for pulling ground truth data from various sources. Calls functions in the flepicommon package
- functions.R - contains many functions used in running the inference algorithm
- inference_slot_runner_funcs.R - contains many functions used in running the inference algorithm
- inference_to_forecast.R -
- documentation.Rmd - Summarizes the documentation relevant to the inference package, including the configuration file options relevant to model fitting
- InferenceTest.R -
- /tests/ -
config.writer
- create_config_data.R
- process_npi_list.R
- yaml_utils.R
report.generation
- DataLoadFuncs.R
- ReportBuildUtils.R
- ReportLoadData.R
- setup_testing_environment.R

/test

/data

Depreciated? Should be removed

/vignettes

Depreciated? Should be removed

/doc

Depreciated? Should be removed

/batch

/slurm_batch

COVID19_USA Repository

https://github.com/HopkinsIDD/COVID19_USA

Current branch: main

/R

get_vacc_rate_and_outcomes_R13.R - this pulls vaccination coverage and variant prevalence data specific to rounds (either empirical, or specified by the scenario), and adjusts these data to the formats required for the model. Several data files are created in this process: variant proportions for each scenario, vaccination rates by age and dose. A file is also generated that defines the outcome ratios (taking in to account immune escape, cross protection and VE).

/R/scripts/config_writers

Scripts to generate config files for particular submissions to the Scenario Modeling Hub. Most of this functionality has now been replaced by the config writer package ()

R/scripts/postprocess

run_sum_processing.R

/data

Important files and folders include

geodata.csv
geodata_2019_statelevel.csv
mobility.csv
mobility_territories_2011-2015_statelevel.csv
outcomes_ratios.csv
US_CFR_shift_dates_v3.csv
US_hosp_ratio_corrections.cs
seeding_agestrat_RX.csv

/data/shp

"Shape-files" (.shp) that .....

/data/outcomes

usa-subpop-params-output_V2.parquet

/data/intervention_tracking

Data files containing the dates that different non pharmaceutical interventions (like mask mandates, stay-at-home orders, school closures) were implemented by state

/data/vaccination

Files used to create the config elements related to vaccination, such as vaccination rates by state by age and vaccine efficacy by dose

/data/variant

Files created in the process of downloading and analyzing data on variant proportions

/manuscripts

Contains files for scientific manuscripts using results from the pipeline. Not up to date

/config

Contains an archive of configuration files used for previous model runs

/old_configs

Same as above. Contains an archive of configuration files used for previous model runs

/scripts

Depreciated - to be removed? - contains rarely used scripts

/notebook

Depreciated - to be removed? - contains rarely used notebooks to check model input. Might be used in some unit tests?

/NPI

empty?

/ScenarioHub

Depreciated - to be removed?