1 of 66

FlepiMoP Documentation

Home

Welcome to flepiMoP documentation!

The “FLexible EPIdemic MOdeling Pipeline” (flepiMoP; formerly known as the COVID Scenario Modeling Pipeline or CSP) is an open-source software suite designed by researchers in the Johns Hopkins Infectious Disease Dynamics Group and at UNC Chapel Hill to simulate a wide range of compartmental models of infectious disease transmission. The disease transmission and observation models are defined by a no-code configuration file, which allows models of varying complexity to be specified quickly and consistently, from simple problems described by SIR-style models in a single population to more complicated models of multiple pathogen strains transmitting between thousands of connected spatial divisions and age groups.

It was initially designed in early 2020 and was routinely used to provide projections of the emerging COVID-19 epidemic to health authorities worldwide. Currently, flepiMoP provides COVID-19 projections to the US CDC-funded model aggregation sites, the COVID-19 Forecast Hub and the COVID-19 Scenario Modeling Hub, influenza projections to FluSight and to the Flu Scenario Modeling Hub, and RSV projections to the RSV Scenario Modeling Hub.

However, the pipeline is much more general and can be used to simulate the dynamics of any infection that can be expressed as a compartmental epidemic model. These include applications in chemical reaction kinetics, pharmacokinetics, within-host disease dynamics, or applications in the social sciences.

In addition to producing forward simulations given a specified model and parameter values, the pipeline can also attempt to optimize unknown parameters (e.g., transmission rate, case detection rate, intervention efficacy) to fit the model to datasets the user provides (e.g., hospitalizations due to severe disease) using a Bayesian inference framework. This feature allows the pipeline to be utilized for short-term forecasting or longer-term scenario projections for ongoing epidemics, since it can simultaneously be fit to data for dates in the past and then use best-fit parameters to make projections into the future.

General description of flepiMoP

The main features of flepiMoP are:

Open-source (GPL v3.0) infectious dynamics modeling software, written in R and Python
Versatile, no-code design applicable for most compartmental models and outcome observation models, allowing for quick iteration in reaction to epidemic events (e.g., emergence of new variants, vaccines, non-pharmaceutical interventions (NPIs))
Powerful, just-in-time compiled disease transmission model and distributed inference engine ready for large scale simulations on high-performance computing clusters or cloud workflows
Adapted to small- and large-scale problems, from a simple SIR model to a complex model structure with hundreds of compartments on thousands of connected populations
Strong emphasis on mechanistic processes, with a design aimed at leveraging domain knowledge in conjunction with statistical inference
Portable for Windows WSL, MacOS, and Linux with the provided Docker image and an Anaconda environment

The mathematical model within the pipeline is a compartmental epidemic model embedded within a well-mixed metapopulation. A compartmental epidemic model is a model that divides all individuals in a population into a discrete set of states (e.g. “infected”, “recovered”) and tracks – over time – the number of individuals in each state and the rates at which individuals transition between these states. The well-known SIR model is a classic example of such a model, and much more complex versions of this model type have been simulated with this framework (for example, an SEIR-style model in which individuals are further subdivided into multiple age groups and vaccination statuses).

The structure of the desired model, as well as the parameter values and initial conditions, can be specified flexibly by the user in a no-code fashion. The pipeline allows for parameter values to change over time at discrete intervals, which can be used to specify time-dependent aspects of disease transmission and control (such as seasonality or vaccination campaigns).

The model is embedded within a meta-population structure, which consists of a series of distinct subpopulations (e.g. states, provinces, or other communities) in which the model structure is repeated, albeit with potentially different parameter values. The subpopulations can interact, either through the movement of individuals or the influence of individuals in one subpopulation on the transition rate of individuals in another ;

Within each subpopulation, the population is assumed to be well-mixed, meaning that interactions are assumed to be equally likely between any pair of individuals (since unique identities of individuals are not explicitly tracked). The same model structure can be simulated in a continuous-time deterministic or discrete-time stochastic manner ;

In addition to the variables described by the compartmental model, the model can track other observable variables (“outcomes”) that are functions of the basic model variables but do not themselves influence the dynamics (i.e., some portion of infections are reported as cases, depending on a testing rate). The model can be run iteratively to tune the values of certain parameters so that these outcome variables best match timeseries data provided by the user for a certain time period ;

Fitting is done using a Bayesian-like framework, where the user can specify the likelihood of observed outcomes in data given modeled outcomes, and the priors on any parameters to be fit. Multiple data streams (e.g., cases and deaths) can be fit simultaneously. A custom Markov Chain Monte Carlo method is used to sequentially propose and accept or reject parameter values based on the model fit to data, in a way that balances fit quality within each individual subpopulation with that of the total aggregate population, and that takes advantage of parallel computing environments.

The code is written in a combination of R and Python, and the vast majority of users only need to interact with the pipeline via the components written in R. It is structured in a modular fashion, such that individual components – such as the epidemic model, the observable variables, the population structure, or the parameters – can be edited or completely replaced without any handling of other parts of the code ;

When model simulation is combined with fitting to data, the code is designed to run most efficiently on a supercomputing cluster with many cores. We most commonly run the code on Amazon Web Services or on high-performance computers using SLURM. However, even relatively large models can be run efficiently on most personal computers. Typically, the memory of the machine will limit the number of compartments (i.e., variables) that can be included in the epidemic model, while the machine’s CPU will determine the speed at which each model run is completed and the number of iterations of the model that can be run during parameter searches when fitting the model to data. While the pipeline can be installed on any computer, it is sometime easier to use an Anaconda environment or the provided Docker container, where all the software dependencies (e.g., standardized R and Python versions along with required packages) are included, independent of the user’s local machine. All the code is maintained on our GitHub and shared with the GNU General Public License v3.0 license. It is build on top of a fully open-source software stack.

This documentation is organized as follows. The Model Description section describes the mathematical framework for the compartmental epidemic models that can be simulated forward in time by the pipeline. The Model Inference section describes the statistical framework for fitting the model to data. The Data and Parameter section describes the inputs the user must provide to the pipeline, in terms of the model structure and parameters, the population characteristics, the initial conditions, time-varying interventions, data to be fit, and more. The How to Run section provides concrete guidance on setting up and running the model and analyzing the output. The Quick Start Guide provides a simple example model setup. The Advanced section goes into more detail on specific features of the model and the code that are likely to only be of interest to users who want to run more complex models or data fitting routines or substantially edit the code. It includes a subsection describing each file and package used in the pipeline and their interactions during a model run.

Users who wish to jump to running the model themselves can see Quick Start Guide.

For questions about the pipeline or to report a bug, please use the “Issues” or "Discussions" feature on our GitHub.

Acknowledgments

flepiMoP is actively developed by its current contributors, including Joseph C Lemaitre, Sara L Loo, Emily Przykucki, Clifton McKee, Claire Smith, Sung-mok Jung, Koji Sato, Pengcheng Fang, Erica Carcelen, Alison Hill, Justin Lessler, and Shaun Truelove, affiliated with the ;

Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA for (JCL, JL)
Johns Hopkins University International Vaccine Access Center, Department of International Health, Baltimore, MD, USA for (SLL, KJ, EC, ST)
Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, USA for (CM, CS, JL, ST)
Carolina Population Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA for (S-m.J, JL)
Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD, USA for (AH).

The development of this model was supported by from funds the National Science Foundation (2127976; ST, CPS, JK, ECL, AH), Centers for Disease Control and Prevention (200-2016-91781; ST, CPS, JK, AH, JL, JCL, SL, CM, EC, KS, S-m.J), US Department of Health and Human Services / Department of Homeland Security (ST, CPS, JK, ECL, AH, JL), California Department of Public Health (ST, CPS, JK, ECL, JL), Johns Hopkins University (ST, CPS, JK, ECL, JL), Amazon Web Services (ST, CPS, JK, ECL, AH, JL, JCL), National Institutes of Health (R01GM140564; JL, 5R01AI102939; JCL), and the Swiss National Science Foundation (200021-172578; JCL)

We need to also acknowledge past contributions to the development of the COVID Scenario Pipeline, which evolved into flepiMoP. These include contributions by Heramb Gupta, Kyra H. Grantz, Hannah R. Meredith, Stephen A. Lauer, Lindsay T. Keegan, Sam Shah, Josh Wills, Kathryn Kaminsky, Javier Perez-Saez, Joshua Kaminsky, and Elizabeth C. Lee.

gempyor: modeling infectious disease dynamics

Modeling infectious disease dynamics

Within flepiMoP, gempyor is an open-source Python package that constructs and simulates compartmental infectious disease dynamics models. gempyor is meant to be used within flepiMoP, where it integrates with parameter inference and data processing scripts, but can also be run standalone with a command-line interface, generating simulations of disease incidence under different scenario assumptions.

To simulate an infectious disease dynamics problem, the following building blocks needs to be defined:

The population structure over which the disease is transmitted
The transmission model, defining the compartments and the transitions between compartments
An observation model, defining different observable outcomes (serology, hospitalization, deaths, cases) from the transmission model
The parameters and modifiers that apply to them

Generalized compartmental infection model

At the core of our pipeline is a dynamic mathematical model that categorizes individuals in the population into a discrete set of states ('compartments') and describes the rates at which transitions between states can occur. Our modeling approach was developed to describe classic infectious disease transmission models, like the SIR model, but is much more general. It can encode any compartmental model in which transitions between states are of the form

X \xrightarrow{b X Z^{a}} Y,

where $X$ , $Y$ , and $Z$ are time-dependent variables describing the number of individuals in each state, $b$ is a rate parameter (units of time $^{-1}$ ) and $a$ is a scaling parameter (unitless). $Z$ may be $X$ , $Y$ , a different variable, or 1, and the rate may also be the sum of terms of this form. Rates that involve non-linear functions or more than two variables are currently not possible. For simplicity, we omitted the time dependencies on parameters (e.g $X$ is in fact $X(t)$ and $a$ , $b$ are $a(t)$ , $b(t)$ ).

The model can be simulated as a continuous-time, deterministic process (i.e., a set of ordinary differential equations), which in this example would be in the form

\frac{dX}{dt} = b X Z^a.

Details on the numerical integration procedure for simulating such an equation is given in the Advanced section.

Alternatively, the model can be simulated as a discrete-time stochastic process, where the number of individuals transitioning between states $X$ and $Y$ at time $t$ is a binomial random variable

N_{X\rightarrow Y}(t) = \textrm{Binom}(X,1-e^{-\Delta{t} \cdot bZ(t)^a}),

where the second term is the expected fraction of individuals in the $X$ state at time $t$ who would transition to $Y$ by time $t+\Delta t$ if there were no other changes to $X$ in this time, and time step $\Delta{t}$ is a chosen parameter that must be small for equivalence between continuous- and discrete-time versions of the model.

SEIR model

For example, an SEIR model – which describes an infection for which susceptible individuals ( $S$ ) who are infected first pass through a latent or exposed ( $E$ ) phase before becoming infectious ( $I$ ) and that confers perfect lifelong immunity after recovery ( $R$ ) – could be encoded as

S \xrightarrow{\beta S I/N} E \xrightarrow{\sigma E} I \xrightarrow{\gamma I} R,

where $\beta$ is the transmission rate (rate of infectious contact per infectious individual), $\sigma$ is the rate of progression ( $1/\sigma$ is the average latent/incubation period), and $\gamma$ is the recovery rate ( $1/\gamma$ is the average duration of the infectious period), and $N$ is the total population size ( $N=S+E+I+R$ ). In differential equation form, this model is

\frac{dS}{dt} = - \beta S \frac{I}{N} ,

\frac{dE}{dt} = \beta S \frac{I}{N} - \sigma E,

\frac{dI}{dt} = \sigma E - \gamma I,

\frac{dR}{dt} = \gamma I,

and as a stochastic process, it is

N_{S\rightarrow E}(t) = \textrm{Binom}(S(t),1-e^{-\Delta{t} \cdot \beta I(t)/N}),

N_{E\rightarrow I}(t) = \textrm{Binom}(E(t),1-e^{-\Delta{t} \cdot \sigma}),

N_{I\rightarrow R}(t) = \textrm{Binom}(I(t),1-e^{-\Delta{t} \cdot \gamma }).

A common COVID-19 model is a variation of this SEIR model that incorporates:

multiple identical stages of the infectious period, which allows us to model gamma-distributed durations of infectiousness, and
an infection rate modified by a 'mixing coefficient', $\alpha \in [0,1]$ , which is a rough heuristic for the slowdown in disease spread that occurs in realistically heterogeneous populations where more well-connected individuals are infected first.

A three-stage infectious period model is given by

S \xrightarrow{\beta S (I_1+I_2+I_3)^\alpha/N} E \xrightarrow{\sigma E} I_1 \xrightarrow{3\gamma I_1} I_2 \xrightarrow{3\gamma I_2} I_3 \xrightarrow{3\gamma I_3} R.

The flepiMoP model structure is specifically designed to make it simple to encode the type of more complex "stratified'' models that often arise in infectious disease dynamics. The following are some examples of possible stratifications.

Age groups

To describe an SEIR-type disease that spreads and progresses differently among children versus adults, one may want to repeat each compartment of the model for each of the two age groups (C – Children, A – Adults), creating an age-stratified model

S_C \xrightarrow{S_C (\beta_{CC} I_C/N_C + \beta_{AC} I_A/N_A)} E_C \xrightarrow{\sigma_C E_C} I_C \xrightarrow{\gamma_C I_C} R_C,

S_A \xrightarrow{S_A (\beta_{AA} I_A/N_A + \beta_{CA} I_C/N_C)} E_A \xrightarrow{\sigma_A E_A} I_ A \xrightarrow{\gamma_A I_A} R_A,

where $\beta_{XY}$ is the transmission rate between age X and Y, and we have assumed individuals do not age on the timescale relevant to the model.

Vaccination status

Vaccination status could influence disease progression and infectiousness, and could also change over time as individuals choose to get the vaccine (V – vaccinated, U – unvaccinated)

S_U \xrightarrow{\beta S_U (I_U + I_V)/N} E_U \xrightarrow{\sigma_U E_U} I_U \xrightarrow{\gamma_U I_U} R_U,

S_V \xrightarrow{\beta (1-\theta) S_V (I_U + I_V)/N} E_V \xrightarrow{\sigma_V E_V} I_V \xrightarrow{\gamma_V I_V} R_V,

S_U \xrightarrow{\nu S_U} S_V,

R_U \xrightarrow{\nu R_U} R_V,

where $u$ is the vaccination rate (we assume that individuals do not receive the vaccine while they are exposed or infectious) and $\theta$ is the vaccine efficacy against infection. Similar structures could be used for other sources of prior immunity or other dynamic risk groups.

Pathogen strain

Another common stratification would be pathogen strain, such as COVID-19 variants. Individuals may be infected with one of several variants, strains, or serotypes. Our framework can easily create multistrain models, for example

S_A \xrightarrow{\beta_A S_A I_A/N_A} E_A \xrightarrow{\sigma_A E_A} I_ A \xrightarrow{\gamma_A I_A} R_A,

S_A \xrightarrow{\beta_B S_B I_B/N_B} E_B \xrightarrow{\sigma_B E_B} I_B \xrightarrow{\gamma_B I_B} R_B,

R_{A} \xrightarrow{\beta_B(1-\phi_{AB}) R_A I_B/N_B} E_{AB} \xrightarrow{\sigma_{AB} E_{AB}} I_{AB} \xrightarrow{\gamma_{AB} I_{AB}} R_{AB},

R_{B} \xrightarrow{\beta_A (1-\phi_{BA}) R_B I_A/N_B} E_{AB} \xrightarrow{\sigma_{AB} E_{AB}} I_{AB} \xrightarrow{\gamma_{AB} I_{AB}} R_{AB},

where $\phi_{AB}$ is the immune cross-protection conferred from infection with strain A to subsequent infection with strain B. Co-infection is ignored. All individuals are assumed to be initially equally susceptible to both infections and are just categorized as $S_A$ (vs $S_B$ ) for convenience.

All combinations of these situations can be quickly specified in flepiMoP. Details on how to encode these models is provided in the Model Implementation section, with examples given in the Tutorials section.

Clinical outcomes and observations model

The pipeline allows for an additional type of dynamic state variable beyond those included in the mathematical model. We refer to these extra variables as "Outcomes" or "Observations". Outcome variables can be functions of model variables, but do not feed back into the model by influencing other state variables. Typically, we use outcome variables to describe the process through which some subset of individuals in a compartment are "observed'' and become part of the data to which models are compared and attempt to predict. For example, in the context of a model for an infectious disease like COVID-19, outcome variables include reported cases, hospitalizations, and deaths.

An outcome variable $H(t)$ can be generated from a state variable of the mathematical model $X(t)$ using the following properties:

The proportion of all individuals in $X$ who will be observed as $H$ , $p$
The delay between when an individual enters state $X$ and when they are observed as $H$ , which can follow a class of probability distributions $f(\Delta t;\theta)$ where $\theta$ is the parameters of the distribution (e.g., the mean and standard deviation of a normal distribution)
(optional) the duration spent in observable $H$ , in which case the output will also contain the prevalence (number of individuals currently in $H$ in addition to the incidence into $H$

In addition to single values (drawn from a distribution), the duration and delay can be inputted as distributions, producing a convolution of the output.

The number of individuals in $X$ at time $t_1$ who become part of the outcome variable $H(t_2)$ is a random variable, and individuals who are observed in $H$ at time $t$ could have entered $X$ at different times in the past.

Formally, for a deterministic, continuous-time model

H(t) = \int_{\tau} p X(\tau) f(t-\tau, \theta) d\tau

For a discrete-time, stochastic model

H(t) = \sum_{\tau_i=0}^{t}\text{Multinomial} (\text{Binomial}(X(\tau_i), p), \{f(t-\tau_i, \theta)\}).

Note that outcomes $H(t)$ constructed in this way always represent incidence values; meaning they describe the number of individuals newly entering this state at time $t$ . If the model state $X(t)$ is also an incidence, then $p$ is a unitless probability, whereas if $X(t)$ is a prevalence (number of individuals currently in state at time $t$ ), then $p$ is instead a probability per time unit.

Outcomes can also be constructed as functions of other outcomes. For example, a fraction of hospitalized patients may end up in the intensive care unit (ICU).

There are several benefits to separating outcome variables from the mathematical model. Firstly, these variables can be calculated after the model is run, and only at the timepoints of interest, which can dramatically reduce the memory needed during model simulation. Secondly, outcome variables can be fully stochastic even when the mathematical model is simulated deterministically. This becomes useful when an infection might be at high enough prevalence that a deterministic simulation is appropriate, but when there is a rare and therefore quite stochastic outcome reported in the data (e.g., severe cases) that the model is tasked with predicting. Thirdly, outcome variables can have arbitrary delay distributions, to take into account the complexities of health reporting practices, whereas our mathematical modeling framework is designed mainly for exponentially distributed delays and only easily permits extensions to gamma-distributed delays. Finally, this separation keeps the pipeline modular and allow for easy editing of one component of the model without disrupting the other.

Details on how to specify these outcomes in the model configuration files is provided in the Model Implementation section, with examples given in the Tutorials section.

Population structure

The pipeline was designed specifically to simulate infection dynamics in a set of connected subpopulations. These subpopulations could represent geographic divisions, like countries, states, provinces, or neighborhoods, or demographic groups, or potentially even different host species. The equations and parameters of the transmission and outcomes models are repeated for each subpopulation, but the values of the parameters can differ by location. Within each subpopulation, infection is equally likely to spread between any pair of susceptible/infected individuals after accounting for their infection class, whereas between subpopulations there may be varying levels of mixing.

Formally, this type of population structure is often referred to as a “metapopulation”, and each subpopulation may be called a “deme”.

The following properties may be different between subpopulations:

the population size
the parameters of the transmission model (see LINK)
the parameters of the outcomes model (see LINK)
the amount of transmission that occurs within this subpopulation versus from any other subpopulation (see LINK)
the timing and extent of any interventions that modify these parameters (see LINK)
the initial timing and number of external introductions of infections into the population (see LINK)
the ground truth timeseries data used to compare to model output and infer model parameters (see LINK)

Currently, the following properties must be the same across all subpopulations:

the compartmental model structure
the form of the likelihood function used to estimate parameters by fitting the model to data (LINK)
...

Mixing between subpopulations

The generalized compartmental model allows for second order “interaction” terms that describe transitions between model states that depend on interactions between pairs of individuals. For example, in the context of a classical SIR model, the rate of new infections depends on interactions between susceptible and infectious individuals and the transmission rate

\frac{dI}{dt} = \beta S I - \gamma I

For a model with multiple subpopulations, each of these interactions can occur either between individuals in the same or different subpopulations, with specific rate parameters for each combination of individual locations

\frac{dI_i}{dt} = \sum_j \beta_{ji} I_j S_i - \gamma I_i

where $\beta_{ji}$ is the per-contact per-time rate of disease transmission between an infected individual residing in subpopulation $j$ and a susceptible individual from subpopulation $i$ .

In general for infection models in connected subpopulations, the transmission rates $\beta_{ji}$ can take on arbitrary values. In this pipeline, however, we impose an additional structure on these terms. We assume that interactions between subpopulations occur when individuals temporarily relocate to another subpopulation, where they interact with locals. We call this movement “mobility”, and it could be due to regular commuting, special travel, etc. There is a transmission rate ( $\beta_j$ ) associated with each subpopulation $j$ , and individuals physically in that subpopulation – permanently or temporarily – are exposed and infected with this local rate whenever they encounter local susceptible individuals.

The transmission matrix is then

\beta_{ji} = \begin{cases} p_a \frac{M_{ij}}{N_i} \beta_j &\text{if } j \neq i \\ \left( 1- \sum_{j \neq i} p_a \frac{M_{ij}}{N_i} \right) &\text{if } j = i \end{cases}

where $\beta_j$ is the onward transmission rate from infected individuals in subpopulation $j$ , $M_{ij}$ is the number of individuals in subpopulation i who are interacting with individuals in subpopulation $j$ at any given time (for example, fraction who commute each day), and $p_a$ is a fractional scaling factor for the strength of inter-population contacts (for example, representing the fraction of hours in a day commuting individuals spend outside vs. inside their subpopulation).

The list of all pairwise mobility values and the interaction scaling factor are model input parameters. Details on how to specify them are given in the Model Implementation section.

If an alternative compartmental disease model is created that has other interactions (second order terms), then the same mobility values are used to determine the degree of interaction between each pair of subpopulations.

Initial conditions

Initial conditions can be specified by setting the values of the compartments in the disease transmission model at time zero, or the start of the simulation. For example, we might assume that for day zero of an outbreak the whole population is susceptible except for one single infected individual, i.e. $S(0) = N-1$ and $I(0) = 1$ . Alternatively, we might assume that a certain proportion of the population has prior immunity from previous infection or vaccination.

It might also be necessary to model instantaneous changes in values of model variables at any time during a simulation. We call this 'seeding'. For example, individuals may import infection from other external populations, or instantaneous mutations may occur, leading to new variants of the pathogen. These processes can be modeled with seeding, allowing individuals to change state at specified times independently of model equations.

We also note that seeding can also be used as a convenient way to specify initial conditions, particularly ealy in an outbreak where the outbreak is triggered by a few 'seedings'.

Time-dependent interventions

Parameters in the disease transmission model or the observation model may change over time. These changes could be, for example: environmental drivers of disease seasonality; “non-pharmaceutical interventions” like social distancing, isolation policies, or wearing of personal protective equipment; “pharmaceutical interventions” like vaccination, prophylaxis, or therapeutics; changes in healthcare seeking behavior like testing and diagnosis; changes in case reporting, etc.

The model allows for any parameter of the disease transmission model or the observation model to change to a new value for a time interval specified by start and end times (or multiple start and end times, for interventions that are recurring). Each change may be subpopulation-specific or apply to the entire population. Changes may be overlapping in time.

The magnitude of these changes are themselves model parameters, and thus may inferred along with other parameters when the model is fit to data. Currently, the start and end times of interventions must be fixed and cannot be varied or inferred.

For example, the rate of transmission in subpopulation $i$ , $\beta_i$ , may be reduced by an intervention $r_k$ that acts between times $t_{k,\text{start}}$ and $t_{k,\text{end}}$ , and another intervention $r_l$ that acts between times $t_{l,\text{start}}$ and $t_{l,\text{end}}$

\beta_j'(t) = (1-r_k(t;t_{k,\text{start}},t_{k,\text{end}}))(1-r_l(t;t_{l,\text{start}},t_{l,\text{end}})))\beta_j^0

In this case, $r_k(t)$ and $r_l(t)$ are both considered simple SinglePeriodModifier interventions. There are four possible types of interventions that can be included in the model

SinglePeriodModifier - an intervention $r_j$ that leads to a fractional reduction in a parameter value in subpopulation $j$ (i.e., $\beta_j$ ) between two timepoints

\beta_j'(t) = (1-r_j(t;t_{j,\text{start}},t_{j,\text{end}}))\beta_j^0

r_j(t;t_{j,\text{start}},t_{j,\text{end}}) = \begin{cases} r_j &\text{if } t_{j,\text{start}} < t <t_{j,\text{end}} \\ 0 &\text{otherwise} \end{cases}

MultiPeriodModifier - an intervention $r_j$ that leads to a fractional reduction in a parameter value in subpopulation $j$ (i.e., $\beta_j$ ) value between multiple sets of timepoints

\beta_j'(t) = (1-r_j(t; \{t_{j,k,\text{start}},t_{j,k,\text{end}}\}_k))\beta_j^0

r_j(t;\{t_{j,k,\text{start}},t_{j,k,\text{end}}\}_k) = \begin{dcases} r_ j&\text{if } t_{j,k1,\text{start}} < t <t_{j,k1,\text{end}} \\ r_j &\text{if } t_{j,k2,\text{start}} < t <t_{j,k2,\text{end}} \\ & ... \\ r_j &\text{if } t_{j,kn,\text{start}} < t <t_{j,kn,\text{end}} \\ 0 &\text{otherwise} \end{dcases}

ModifierModifier- an intervention $\pi_j$ that leads to a fractional reduction in the value of another intervention $r_j$ between two timepoints

\beta_j'(t) = (1-r_j(t;t_{j,\text{start}},t_{j,\text{end}})(1-\pi_{r,j}(t;t_{r,j,\text{start}},t_{r,j,\text{end}})))\beta_j^0

\pi_{r,j}(t;t_{r,j,\text{start}},t_{r,j,\text{end}}) = \begin{cases} \pi_{r,j} &\text{if } t_{r,j,\text{start}} < t <t_{r,j,\text{end}} \\ 0 &\text{otherwise} \end{cases}

StackedModifier - TBA

Model Implementation

flepiMoP's configuration file

About configuration files

flepiMop is set up so that all parameters and other options for running the pipeline can be specified in a single "configuration" file (aka "config"). Users do not need to edit any other code files, or even be aware of their contents, to create and run complex model scenarios. Configuration files also provide a convenient record of model options and promote reproducibility of model results.

We use the YAML language syntax to write config files, which are typically named something like config.yml. The file has simple plain text contents and follows a tabbed outline structure. When config files are read by the model code, a data structure encoding the model options is created.

Comments can be added to the config file by starting with the hash key (#) then a space. Comments can start anywhere on a line and continue until the end, but if they run over to a new line, a new # must be used at the start of the new line.

Example

(give a simple configuration for a toy model with two subpopulations, SEIR, single "cases" outcome, single seeded infection, single NPI that starts after some time? this page is currently under development, please see our _for some simple configurations) ;

When referring to config items (individual parameters), we use their full position in the outline. For example, in the sample config file above, we denote

as subpop_setup::geodata having a value of minimal

Notation

Parameters and other options specified in the configuration files can take on a variety of types of values, using the following notations:

dates are specified as [year]-[month]-[day]. (e.g., 2020-01-31)
boolean values are either "TRUE" or "FALSE"
files names are strings
probability is a float between 0 and 1
distribution is a probability distribution from which a random value for the parameter is drawn each time a new simulation is run (or chain, if doing inference). See for the require schema.

Configuration files sections

Global header

Required section

These global configuration options typically sit at the top of the configuration file.

Item

Required?

Type/Format

Description

For example, for a configuration file to simulate the spread of COVID-19 in the US during 2020 and compare to data from March 1 onwards, with 1000 independent simulations, the header of the config might read:

`subpop_setup` section

Required section

This section specifies the population structure on which the model will be simulated, including the names and sizes of each subpopulation and the connectivity between them. More details .

`compartments` section

Required section

This section is where users can specify the variables (infection states) that will be tracked in the infectious disease transmission model. More details can be found . The other details of the model are specified in the seir section, including transitions between these compartments (seir::transitions), the names of the parameters governing the transitions (seir::parameters), and the numerical method used to simulate the equations over time (seir::integration). The initial conditions of the model can be specified in the initial_conditions section, and any other inputs into the model from external populations or instantaneous transitions between states that occur at later times can be specified in the seeding section. ;

`seir` section

Required section

This section is where users can specify the details of the infectious disease transmission model they wish to simulate (e.g., SEIR). This model describes the allowed transitions (seir::transitions) between the compartments that were specified in the compartments section, the values of the parameters involved in these transitions (seir::parameters), and the numerical method used to simulate the equations over time (seir::integration). More details . The initial conditions of the model can be specified in the separate initial_conditions section, and any other inputs into the model from external populations or instantaneous transitions between states that occur at later times can be specified in the seeding section. ;

`initial_conditions` section

Optional section

This section is used to specify the initial conditions of the model, which define how individuals are distributed between the model compartments at the time the model simulation begins. Importantly, the initial conditions specify the time and location where infection is first introduced. If this section is omitted, default values are used. If users want to add infections to the population at later times, or add or remove individuals from compartments separately from the model rules, they can do so via the related seeding section. More details ;

`seeding` section

Optional section

This section is used to specify how individuals are instantaneously "seeded" from one compartment to another, where they then continue to be governed by the model equations. For example, this seeding could be used to represent importations of infected individuals from an outside population, mutation events that create new strains, or vaccinations that alter disease susceptibility. Seeding events can occur at any time in the simulation. The seeding section specifies the numeric values added to or removed from any compartment of the model. More details ;

`outcomes` section

Optional section

This section is where users can define new variables representing the observed quantities and how they are related to the underlying state variables in the model (e.g., the fraction of infections that are detected as cases). More details ;

`interventions` section

Required section

This section is where users can specify time-varying changes to parameters governing either the infectious disease model or the observational model. More details ;

`inference` section

Optional section

This section is where users can specify the details of how the model is fit to data, including what data streams they will be included and which outcome variables they represent and the likelihood functions describing the probability of the data given the model. More details . ;

Specifying population structure

This page describes how users specify the names, sizes, and connectivities of the different subpopulations comprising the total population to be modeled

Overview

The subpop_setup section of the configuration file is where users can input the information required to define a population structure on which to simulate the model. The options allow the user to determine the population size of each subpopulation that makes up the overall population, and to specify the amount of mixing that occurs between each pair of subpopulations.

An example configuration file with the global header and the spatial_setup section is below:

Items and options

Config Item

Required?

Type/Format

Description

`geodata` file and `selected` option

geodata is a .csv with column headers, with at least two columns: subpop and population.
selected if provided, is the subset of locations in geodata file (as determined by subpop column) to be modeled. Requesting subpopulation(s) that are not present will lead to an error.

Example geodata file format

`mobility` file

The mobility file is a .csv file (it has to contain .csv as extension) with long form comma separated values. Columns have to be named ori, dest, amount, with amount being the average number individuals moving from the origin subpopulation ori to destination subpopulation dest on any given day. Details on the mathematics of this model of contact are explained in the . Unassigned relations are assumed to be zero. The location entries in the ori and dest columns should correspond to an entry in the subpop column in geodata.csv. When using selected, the mobility data will also be filtered.

Example mobility file format

It is also possible, but not recommended to specify the mobility file as a .txt with space-separated values in the shape of a matrix. This matrix is symmetric and of size K x K, with K being the number of rows in geodata. The above example corresponds to

Examples

Example 1

To simulate a simple population structure with two subpopulations, a large province with 10,000 individuals and a small province with only 1,000 individuals, where every day 100 residents of the large province travel to the small province and interact with residents there, and 50 residents of the small province visit the large province

geodata.csv contains the population structure (with columns subpop and population)

mobility.csv contains

Specifying compartmental model

This section describes how to specify the compartmental model of infectious disease transmission.

We want to allow users to work with a wide variety of infectious diseases or, one infectious disease under a wide variety of modeling assumptions. To facilitate this, we allow the user to specify their compartmental model of disease dynamics via the configuration file.

We originally considered asking users to specify each compartment and transition manually. However, we quickly found that this created long, confusing configuration files, and so we created a shorthand to more succinctly specify both compartments and transitions between them. This works especially well for models where individuals are stratified by other properties (like age, vaccination status, etc.) in addition to their infection status.

The model is specified in two separate sections of the configuration file. In the compartments section, users define the possible states individuals can be categorized into. Then in the seir section, users define the possible transitions between states, the values of parameters that govern the rates of these transitions, and the numerical method used to simulate the model.

An example section of a configuration file defining a simple SIR model is below.

compartments:
  infection_stage: ["S", "I", "R"]
  
seir:
  transitions:
    # infection
    - source: [S]
      destination: [I]
      proportional_to: [[S], [I]]
      rate: [beta]
      proportion_exponent: 1
    # recovery
    - source: [I]
      destination: [R]
      proportional_to: [[I]]
      rate: [gamma]
      proportion_exponent: 1
  parameters:
    beta: 0.1
    gamma: 0.2
  integration:
     method: rk4
     dt: 1.00

Specifying model compartments (`compartments`)

The first stage of specifying the model is to define the infection states (variables) that the model will track. These "compartments" are defined first in the compartments section of the config file, before describing the processes that lead to transitions between them. The compartments are defined separately from the rest of the model because they are also used by the seeding section that defines initial conditions and importations.

For simple disease models, the compartments can simply be listed with whatever notation the user chooses. For example, for a simple SIR model, the compartments could be ["S", "I", "R"]. The config also requires that there be a variable name for the property of the individual that these compartments describe, which for example in this case could be infection_stage

compartments:
  infection_stage: ["S", "I", "R"]

Our syntax allows for more complex models to be specified without much additional notation. For example, consider a model of a disease that followed SIR dynamics but for which individuals could receive vaccination, which might change how they experience infection.

In this case we can specify compartments as the cross product of multiple states of interest. For example:

 compartments:
   infection_stage: ["S", "I", "R"]
   vaccination_status: ["unvaccinated", "vaccinated"]

Corresponds to 6 compartments, which the code internally converts to this data frame

infection_stage, vaccination_status, compartment_name
S,               unvaccinated,       S_unvaccinated
I,               unvaccinated,       I_unvaccinated
R,               unvaccinated,       R_unvaccinated
S,               vaccinated,         S_vaccinated
I,               vaccinated,         I_vaccinated
R,               vaccinated,         R_vaccinated

In order to more easily describe transitions, we want to be able to refer to a compartment by its components, but then use it by its compartment name.

If the user wants to specify a model in which some compartments are repeated across states but others are not, there will be pros and cons of how the model is specified. Specifying it using the cross product notation is simpler, less error prone, and makes config files easier to read, and there is no issue with having compartments that have zero individuals in them throughout the model. However, for very large models, extra compartments increase the memory required to conduct the simulation, and so having unnecessary compartments tracked may not be desired.

For example, consider a model of a disease that follows SI dynamics in two separate age groups (children and adults), but for which only adults receive vaccination, with one or two doses of vaccine. With the simplified notation, this model could be specified as:

 compartments:
   infection_stage: ["S", "I"]
   age_group: ["child", "adult"]
   vaccination_status: ["unvaccinated", "1dose", "2dose"]

corresponding to 12 compartments, 4 of which are unnecessary to the model

infection_stage, age_group, vaccination_status, compartment_name
S,		 child,	    unvaccinated,	S_child_unvaccinated	
I,		 child,	    unvaccinated,	I_child_unvaccinated
S,		 adult,	    unvaccinated,	S_adult_unvaccinated
I,		 adult,	    unvaccinated,	I_adult_unvaccinated
S,		 child,	    1dose,		S_child_1dose
I,		 child,	    1dose,		I_child_1dose
S,		 adult,     1dose,		S_adult_1dose
I,		 adult,     1dose,		I_adult_1dose
S,		 child,     2dose,		S_child_2dose	
I,		 child,     2dose,		I_child_2dose
S,		 adult,	    2dose,		S_adult_2dose
I,		 adult,	    2dose,		I_adult_2dose

Or, it could be specified with the less concise notation

compartments:
   overall_state: ["S_child", "I_child", "S_adult_unvaccinated", "I_adult_unvaccinated", "S_adult_1dose", "I_adult_1dose", "S_adult_2dose", "I_adult_2dose"]

which does not result in any unnecessary compartments being included.

These compartments are referenced in multiple different subsequent sections of the config. In the seeding (LINK TBA) section the user can specify how the initial (or later imported) infections are distributed across compartments; in the seir section the user can specify the form and rate of the transitions between these compartments encoded by the model; in the outcomes section the user can specify how the observed variables are generated from the underlying model states.

Notation must be consistent between these sections.

Specifying compartmental model transitions (`seir::transitions`)

The way we specify transitions between compartments in the model is a bit more complicated than how the compartments themselves are specified, but allows users to specify complex stratified infectious disease models with minimal code. This makes checking, sharing, and updating models more efficient and less error-prone.

We specify one or more transition globs, each of which corresponds to one or more transitions. Since transition globs are shorthand for collections of transitions, we will first explain how to specify a single transition before discussing transition globs.

A transition has 5 pieces of associated information that a user can specify:

source
destination
rate
proportional_to
proportion_exponent

For more details on the mathematical forms possible for transitions in our models, read the Model Description section.

We first consider a simple example of an SI model where individuals may either be vaccinated (v) or unvaccinated (u), but the vaccine does not change the susceptibility to infection nor the infectiousness of infected individuals.

We will focus on describing the first transition of this model, the rate at which unvaccinated individuals move from the susceptible to infected state.

Specifying a single transition

Source

The compartment the transition moves individuals out of (e.g., the source compartment) is an array. For example, to describe a transition that moves unvaccinated susceptible individuals to another state, we would write

[S,unvaccinated]

which corresponds to the compartment S_unvaccinated

Destination

The compartment the transition moves individuals into (e.g. the destination compartment) is an array. For example, to describe a transition that moves individuals into the unvaccinated but infected state, we would write

[I,unvaccinated]

which corresponds to the compartment I_unvaccinated

Rate

The rate constant specifies the probability per time that an individual in the source compartment changes state and moves to the destination compartment. For example, to describe a transition that occurs with rate 5/time, we would write:

instead, we could describe the rate using a parameter beta, which can be given a numeric value later:

beta

The interpretation and unit of the rate constant depend on the model details, as the rate may potentially also be per number (or proportion) of individuals in other compartments (see below).

Proportional to

A vector of groups of compartments (each of which is an array) that modify the overall rate of transition between the source and destination compartment. Each separate group of compartments in the vector are first summed, and then all entries of the vector are multiplied to get the rate modifier. For example, to specify that the transition rate depends on the product of the number of unvaccinated susceptible individuals and the total infected individuals (vaccinated and unvaccinated), we would write:

[[[S,unvaccinated]], [[I,unvaccinated], [I, vaccinated]]]

To understand this term, consider the compartments written out as strings

[[S_unvaccinated], [I_unvaccinated, I_vaccinated]]

and then sum the terms in each group

[S_unvaccinated, I_unvaccinated + I_vaccinated]

From here, we can say that the transition we are describing is proportional to S_unvaccinated and I_unvaccinated + I_vaccinated, i.e., the rate depends on the product S_unvaccinated * (I_unvaccinated + I_vaccinated).

For transitions that occur at a constant per-capita rate (ie, E -> I at rate $\gamma$ in an SEIR model), it is possible to simply write proportional_to: ["source"].

Proportion exponent

This is an exponent modifying each group of compartments that contribute to the rate. It is equivalent to the "order" term in chemical kinetics. For example, if the reaction rate for the model above depends linearly on the number of unvaccinated susceptible individuals but on the total infected individuals sub-linearly, for example to a power 0.9, we would write:

[1, 0.9]

or a power parameter alpha, which can be given a numeric value later:

[1, alpha]

The (top level) length of the proportion_exponent vector must be the same as the (top level) length of the proportional_to vector, even if the desire of the user is to have the same exponent for all terms being multiplied together to get the rate.

Summary

Putting it all together, the model transition is specified as

source: [S, unvaccinated]
destination: [I, unvaccinated]
proportional_to: [[[S,unvaccinated]], [[I,unvaccinated], [I,vaccinated]]]
rate: [5]
proportion_exponent: [1, 0.9]

would correspond to the following model if expressed as an ordinary differential equation

\frac{\delta \text{S}_\text{unvaccinated}}{\delta t} = - \beta \text{S}_\text{unvaccinated}^1 (\text{I}_\text{unvaccinated}+\text{I}_\text{vaccinated})^{\alpha}

\frac{\delta \text{I}_\text{unvaccinated}}{\delta t} = \beta \text{S}_\text{unvaccinated}^1 (\text{I}_\text{unvaccinated}+\text{I}_\text{vaccinated})^{\alpha}

with parameter and parameter (we will describe how to use parameter symbols in the transitions and specify their numeric values separately in the section Specifying compartmental model parameters).

Transition globs

We now explain a shorthand we have developed for specifying multiple transitions that have similar forms all at once, via transition globs. The basic idea is that for each component of the single transitions described above where a term corresponded to a single model compartment, we can instead specify one or more compartment. Similarly, multiple rate values can be specified at once, for each involved compartment. From one transition glob, multiple individual transitions are created, by broadcasting across the specified compartments.

For transition globs, any time you could specify multiple arguments as a list, you may instead specify one argument as a non-list, which will be used for every broadcast. So [1,1,1] is equivalent to 1 if the dimension of that broadcast is 3.

We continue with the same SI model example, where individuals are stratified by vaccination status, but expand it to allow infection to occur at different rates in vaccinated and unvaccinated individuals:

Source

We allow one or more arguments to be specified for each compartment. So to specify the transitions out of both susceptible compartments (S_unvaccinated and S_unvaccinated), we would use

[[S], [unvaccinated,vaccinated]]

Destination

The destination variable should be the same shape as the source, and in the same relative order. So to specify a transition from S_unvaccinated to I_unvaccinated and S_vaccinated to I_vaccinated, we would write the destination as:

[[I], [unvaccinated,vaccinated]]

If instead we wrote:

[[I], [vaccinated,unvaccinated]]

we would have a transition from S_unvaccinated to I_vaccinated and S_vaccinated to I_unvaccinated.

Rate

The rate vector allows users to specify the rate constant for all the source -> destination transitions that are defined in a shorthand way, by instead specifying how the rate is altered depending on the compartment type. For example, the rate of transmission between a susceptible (S) and an infected (I) individual may vary depending on whether the susceptible individual is vaccinated or not and whether the infected individual is vaccinated or not. The overall rate constant is constructed by multiplying together or "broadcasting" all the compartment type-specific terms that are relevant to a given compartment.

For example,

rate: [[3], [0.6,0.5]]

This would mean our transition from S_unvaccinated to I_unvaccinated would have a rate of 3 * 0.6 while our transition from S_vaccinated to I_vaccinated would have a rate of 3 * 0.5.

The rate vector should be the same shape as source and destination and in the same relative order.

Note that if the desire is to make a model where the difference in the rate constants varies in a more complicated than multiplicative way between different compartment types, it would be better to specify separate transitions for each compartment type instead of using this shorthand.

Proportional to

The broadcasting here is a bit more complicated. In other cases, each broadcast is over a single component. However, in this case, we have a broadcast over a group of components. We allow a different group to be chosen for each broadcast.

[
  [[S,unvaccinated], [S,vaccinated]],
  [[I,unvaccinated],[I, vaccinated]], [[I,unvaccinated],[I, vaccinated]]
]

Again, let's unpack what it says. Since the broadcast is over groups, let's split the config back up

into those groups

[
  [S,unvaccinated],
  [[I,unvaccinated],[I, vaccinated]]
]
[
  [S,vaccinated],
  [[I,unvaccinated],[I, vaccinated]]
]

From here, we can say that we are describing two transitions. Both occur proportionally to the same compartments: S_unvaccinated and the total number of infections (I_unvaccinated+I_vaccinated).

If, for example, we want to model a situation where vaccinated susceptibles cannot be infected by unvaccinated individuals, we would instead write:

[
  [[S,unvaccinated], [S,vaccinated]],
  [[I,unvaccinated],[I, vaccinated]], [[I, vaccinated]]
]

Proportion exponent

Similarly to rate and proportional_to, we provide an exponent for each component and every group across the broadcast. So we could for example use:

[[1,1], [0.9,0.8]]

Summary

Putting it all together, the transition glob

seir:
  transitions:
    source: [[S],[unvaccinated,vaccinated]]
    destination: [[I],[unvaccinated,vaccinated]]
    proportional_to: [
                       [[S,unvaccinated], [S,vaccinated]],
                       [[I,unvaccinated],[I, vaccinated]], [[I, vaccinated]]
                     ]
    rate: [[3], [0.6,0.5]]
    proportion_exponent: [[1,1], [0.9,0.8]]

is equivalent to the following transitions

seir:
  transitions:
    - source: [S,unvaccinated]
      destination: [I,unvaccinated]
      proportional_to: [[[S,unvaccinated]], [[I,unvaccinated],[I, vaccinated]]]
      proportion_exponent: [1 * 0.9]
      rate: [3*0.6]
    - source: [S,vaccinated]
      destination: [I,vaccinated]
      proportional_to: [[[S,vaccinated]], [[I, vaccinated]]]
      proportion_exponent: [1 * 0.8]
      rate: [3*0.5]

Warning

We warn the user that with this shorthand, it is possible to specify large models with few lines of code in the configuration file. The more compartments and transitions you specify, the longer the model will take to run, and the more memory it will require.

Specifying compartmental model parameters (`seir::parameters`)

When the transitions of the compartmental model are specified as described above, they can either be entered as numeric values (e.g., 0.1) or as strings which can be assigned numeric values later (e.g., beta). We recommend the latter method for all but the simplest models, since parameters may recur in multiple transitions and so that parameter values may be edited without risk of editing the model structure itself. It also improves readability of the configuration files.

Parameters can take on three types of values:

Fixed values
Value drawn from distributions
Values read from timeseries specified in a data file

Specifying fixed parameter values

Parameters can be assigned values by using the value argument after their name and then simply stating their numeric argument. For example, in a config describing a simple SIR model with transmission rate $\beta$ (beta) = 0.1/day and recovery rate $\gamma$ (gamma) = 0.2/day. This could be specified as

seir:
  parameters:
    beta: 
      value: 0.1
    gamma: 
      value: 0.2

The full model section of the config could then read

compartments:
  infection_state: ["S", "I", "R"]
  
seir:
  transitions:
    # infection
    - source: [S]
      destination: [I]
      proportional_to: [[S], [I]]
      rate: [beta]
      proportion_exponent: 1
    # recovery
    - source: [I]
      destination: [R]
      proportional_to: [[I]]
      rate: [gamma]
      proportion_exponent: [1,1]
  parameters:
    beta: 
      value: 0.1
    gamma: 
      value: 0.2

For the stratified SI model described above, this portion of the config would read

compartments:
  infection_stage: ["S", "I", "R"]
  vaccination_status: ["unvaccinated", "vaccinated"]
  
seir:
  transitions:
    source: [[S],[unvaccinated,vaccinated]]
    destination: [[I],[unvaccinated,vaccinated]]
    proportional_to: [
                       [[S,unvaccinated], [S,vaccinated]],
                       [[I,unvaccinated],[I, vaccinated]], [[I, vaccinated]]
                     ]
    rate: [[beta], [theta_u,theta_v]]
    proportion_exponent: [[1,1], [alpha_u,alpha_v]]
  parameters:
    beta: 
      value: 0.1
    theta_u: 
      value: 0.6
    theta_v: 
      value: 0.5
    alpha_u: 
      value: 0.9
    alpha_v: 
      value: 0.8

If there are no parameter values that need to be specified (all rates given numeric values when defining model transitions), the seir::parameters section of the config can be left blank or omitted.

Specifying parameters values from distributions

Parameter values can also be specified as random values drawn from a distribution, as a way of including uncertainty in parameters in the model output. In this case, every time the model is run independently, a new random value of the parameter is drawn. For example, to choose the same value of beta = 0.1 each time the model is run but to choose a random values of gamma with mean on a log scale of $e^{-1.6} = 0.2$ and standard deviation on a log scale of $e^{0.2} = 1.2$ (e.g., 1.2-fold variation):

seir:
  parameters:
    beta: 
      value:
        distribution: fixed
        value: 0.1
    gamma: 
      value:
        distribution: lognorm
        logmean: -1.6
        logsd: 0.2

Details on the possible distributions that are currently available, and how to specify their parameters, is provided in the Distributions section.

Note that understanding when a new parameter values from this distribution is drawn becomes more complicated when the model is run in Inference mode. In Inference mode, we distinguish model runs as occurring in different "slots" – i.e., completely independent model instances that could be run on different processing cores in a parallel computing environment – and different "iterations" of the model that occur sequentially when the model is being fit to data and update fitted parameters each time based on the fit quality found in the previous iteration. A new parameter values is only drawn from the above distribution once per slot. Within a slot, at each iteration during an inference run, the parameter is only changed if it is being fit and the inference algorithm decides to perturb it to test a possible improved fit. Otherwise, it would maintain the same value no matter how many times the model was run within a slot.

Specifying parameter values as timeseries from data files

Sometimes, we want to be able to specify model parameters that have different values at different timepoints. For example, the relative transmissibility may vary throughout the year based on the weather conditions, or the rate at which individuals are vaccinated may vary as vaccine programs are rolled out. One way to do this is to instead specify the parameter values as a timeseries.

This can be done by providing a data file in .csv or .parquet format that has a list of values of the parameter for a corresponding timepoint and subpopulation name. One column should be date, which should have an entry for every calendar day of the simulation, with the first and last date corresponding to the start_date and end_date for the simulation specified in the header of the config. There should be another column for each subpopulation, where the column name is the subpop name used in other files and the values are the desired parameter values for that subpopulation for the corresponding day. If any day or subpopulation is missing, an error will occur. However, if you want all subpopulations to have the same parameter value for every day, then only a single column in addition to date is needed, which can have any name, and will be applied to every subpop ;

For example, for an SIR model with a simple two-province population structure where the relative transmissibility peaks on January 1 then decreases linearly to a minimal value on June 1 then increases linearly again, but varies more in the small province than the large province, the theta parameter could be constructed from the file seasonal_transmission_2pop.csv with contents including

date,        small_province,    large_province
2022-01-01,  1.5,               1.3
.....
2022-05-01,  0.5,               0.7 
....
2022-12-31,  1.5,               1.3

as a part of a configuration file with the model sections:

compartments:
  infection_stage: ["S", "I", "R"]

seir:
  transitions:
    # infection
    - source: [S]
      destination: [I]
      proportional_to: [[S], [I]]
      rate: [beta*theta]
      proportion_exponent: 1
    # recovery
    - source: [I]
      destination: [R]
      proportional_to: [[I]]
      rate: [gamma]
      proportion_exponent: 1
  parameters:
    beta: 
      value: 0.1
    gamma: 
      value: 0.2
    theta:
       timeseries: data/seasonal_transmission.csv

Note that there is an alternative way to specify time dependence in parameter values that is described in the Specifying time-varying parameter modifications section. That method allows the user to define intervention parameters that apply specific additive or multiplicative shifts to other parameter values for a defined time interval. Interventions are useful if the parameter doesn't vary frequently and if the values of the shift is unknown and it is desired to either sample over uncertainty in it or try to estimate its value by fitting the model to data. If the parameter varies frequently and its value or relative value over time is known, specifying it as a timeseries is more efficient.

Compartmental model parameters can have an additional attribute beyond value or timeseries, which is called stacked_modifier_method. This value is explained in the section on coding time-dependent parameter modifications (also known as "modifiers") as it determines what happens when two different modifiers act on the same parameter at the same time (are they combined additively or multiplicatively?) ;

Config item

Required?

Type/Format

Description

value

either value or timeseries is required

numerical, or distribution

This defines the value of the parameter, as described above.

timeseries

either value or timeseries is required

path to a csv file

This defines a timeseries for each day, as above.

stacked_modifier_method

optional

string: sum, product, reduction_product

This option defines the method used when modifiers are applied. The default is product.

rolling_mean_windows

optional

integer

The size of the rolling mean window if a rolling mean is applied.

Specifying model simulation method `(seir::integration)`

A compartmental model defined using the notation in the previous sections describes rules for classifying individuals in the population based on infection state dynamically, but does not uniquely specify the mathematical framework that should be used to simulate the model.

Our framework allows for two major methods for implementing compartmental models of disease transmission:

ordinary differential equations, which are completely deterministic, operate in continuous time (consider infinitesimally small timesteps), and allow for arbitrary fractions of the population (i.e., not just discrete individuals) to move between model compartments
discrete-time stochastic process, which tracks discrete individuals and produces random variation in the number of individuals transitioning between states for any given rate, and which allows transitions between states only to occur at discrete time intervals

The mathematics behind each implementation is described in the Model Description section

Config item

Required?

Type/format

Description

method

optional

string: rk4 (default),euler, stochastic

The algorithm used to simulate the model equations. If rk4, model is simulated deterministically by numerical integration using a 4th order Runge-Kutta algorithm. If euler or stochastic, uses a discrete-time process, with steps proceeding either deterministically (at the average rate) or stochastically. For both of these cases, the algorithm ensures no compartment goes below zero for the requested time step. The -(-m)ethod option can be used (see ) to override this configuration option.

dt

optional

positive real number (default: 2)

The timestep used for the numerical integration or discrete time stochastic update; for rk4 method, this is a reasonable value, but for other options, this should be 0.2 or less.

For example, to simulate a model deterministically using the 4th order Runge-Kutta algorithm for numerical integration with a timestep of 1 day:

seir:
  integration:
     method: rk4
     dt: 1.00

Alternatively, to simulate a model stochastically with a timestep of 0.1 days

seir:
  integration:
     method: stochastic
     dt: 0.1

For any method, the results of the model will be more accurate when the timestep is smaller (i.e., output will more precisely match the mathematics of the model description and be invariant to the choice of timestep). However, the computing time required to simulate the model for a certain time range of interest increases with the number of timesteps required (i.e., with smaller timesteps). In our experience, the 4th order Runge-Kutta algorithm (for details see Advanced section) is a very accurate method of numerically integrating such models and can handle timesteps as large as roughly a day for models with the maximum per capita transition rates in this same order of magnitude. However, both of the discrete time engines require smaller timesteps to be accurate (around 0.1 for COVID-19-like dynamics in our experience).

Specifying initial conditions

This section describes how to specify the values of each model state at the time the simulation starts, and how to make instantaneous changes to state values at other times (e.g., due to importations)

Overview

In order for the models specified previously to be dynamically simulated, the user must provide initial conditions, in addition to the model structure and parameter values. Initial conditions describe the value of each variable in the model at the time point that the simulation is to start. For example, on day zero of an outbreak, we may assume that the entire population is susceptible except for one single infected individual. Alternatively, we could assume that some portion of the population already has prior immunity due to vaccination or previous infection. Different initial conditions lead to different model trajectories.

The initial_conditions section of the configuration file is detailed below. Note that in some cases, can replace or complement the initial condition, the table below provides a quick comparison of these sections.

Feature

initial_conditions

seeding

Specifying model initial conditions

The configuration items in the initial_conditions section of the config file are

initial_conditions:method Must be either "Default", "SetInitialConditions", or "FromFile".

initial_conditions:initial_conditions_fileRequired for methods “SetInitialConditions” and “FromFile” . Path to a .csv or .parquet file containing the list of initial conditions for each compartment.

initial_conditions:initial_file_type Only required for method: “FolderDraw”. Description TBA

initial_conditions::allow_missing_subpops Optional for all methods, determines what will happen if initial_conditions_file is missing values for some subpopulations. If FALSE, the default behavior, or unspecified, an error will occur if subpopulations are missing. If TRUE, then for subpopulations missing from the initial_conditions file, it will be assumed that all individuals begin in the first compartment (the “first” compartment depends on how the model was specified, and will be the compartment that contains the first named category in each compartment group), unless another compartment is designated to hold the rest of the individuals ;

initial_conditions::allow_missing_compartments Optional for all methods. If FALSE, the default behavior, or unspecified, an error will occur if any compartments are missing for any subpopulation. If TRUE, then it will be assumed there are zero individuals in compartments missing from the initial_conditions file.

initial_conditions::proportional If TRUE, assume that the user has specified all input initial conditions as fractions of the population, instead of numbers of individuals (the default behavior, or if set to FALSE). Code will check that initial values in all compartments sum to 1.0 and throw an error if not, and then will multiply all values by the total population size for that subpopulation ;

Details on implementing each initial conditions method and the options that go along with it are below.

`initial_conditions::method`

Default

The default initial conditions are that the initial value of all compartments for each subpopulation will be zero, except for the first compartment, whose value will be the population size. The “first” compartment depends on how the model was specified, and will be the compartment that contains the first named category in each compartment group.

For example, a model with the following compartments

with the accompanying geodata file

will be started with 1000 individuals in the S_child_unvaxxed in the "small province" and 10,000 in that compartment in the "large province".

SetInitialConditions

With this method users can specify arbitrary initial conditions in a convenient formatted input .csv or .parquet file.

For example, for a model with the following compartments and initial_conditions sections

with the accompanying geodata file

where initial_conditions.csv contains

the model will be started with half of the population of both subpopulations, consisting of children and the other half of adults, everyone unvaccinated, and 5 infections (in exposed-but-not-yet-infectious class) among the unvaccinated adults in the large province, with the remaining individuals susceptible (4995). All other compartments will contain zero individuals initially ;

initial_conditions::initial_conditions_file must contain the following columns:

subpop – the name of the subpopulation for which the initial condition is being specified. By default, all subpopulations must be listed in this file, unless the allow_missing_subpops option is set to TRUE.
mc_name – the concatenated name of the compartment for which an initial condition is being specified. The order of the compartment groups in the name must be the same as the order in which these groups are defined in the config for the model, e.g., you cannot say unvaccinated_S.
amount – the value of the initial condition; either a numeric value or the string "rest".

For each subpopulation, if there are compartments that are not listed in SetInitialConditions, an error will be thrown unless allow_missing_compartments is set to TRUE, in which case it will be assumed there are zero individuals in them. If the sum of the values of the initial conditions in all compartments in a location does not add up to the total population of that location (specified in the geodata file), an error will be thrown. To allocate all remaining individuals in a subpopulation (the difference between the total population size and those allocated by defined initial conditions) to a single pre-specified compartment, include this compartment in the initial_conditions_file but instead of a number in the amount column, put the word "rest" ;

If allow_missing_subpops is FALSE or unspecified, an error will occur if initial conditions for some subpopulations are missing. If TRUE, then for subpopulations missing from the initial_conditions file, it will be assumed that all individuals begin in the first compartment. (The “first” compartment depends on how the model was specified, and will be the compartment that contains the first named category in each compartment group.)

FromFile

Similar to "SetInitialConditions", with this method users can specify arbitrary initial conditions in a formatted .csv or .parquet input file. However, the format of the input file is different. The required file format is consistent with the from the compartmental model, so the user could take output from one simulation and use it as input into another simulation with the same model structure ;

For example, for an input configuration file containing

with the accompanying geodata file

where initial_conditions_from_previous.csv contains

The simulation would be initiated on 2021-06-01 with these values in each compartment (no children vaccinated, only adults in the small province vaccinated, some past and current infection in both compartments but ).

initial_conditions::initial_conditions_file must contain the following columns:

mc_value_type – in model output files, this is either prevalence or incidence. Prevalence values only are selected to be used as initial conditions, since compartmental models described the prevalence (number of individuals at any given time) in each compartment. Prevalence is taken to be the value measured instantaneously at the start of the day
mc_name – The name of the compartment for which the value is reported, which is a concatenation of the compartment status in each state type, e.g. "S_adult_unvaxxed" and must be in the same order as these groups are defined in the config for the model, e.g., you cannot say unvaxxed_S_adult.
subpop_1, subpop_2, etc. – one column for each different subpopulation, containing the value of the number of individuals in the described compartment in that subpopulation at the given date. Note that these are named after the nodenames defined by the user in the geodata file.
date – The calendar date in the simulation, in YYYY-MM-DD format. Only values with a date that matches to the simulation start_date will be used ;

SetInitialConditionsFolderDraw, FromFileFolderDraw

The way that initial conditions is specified with SetInitialConditions and FromFile results in a single value for each compartment and does not easily allow the user to instead specify a distribution (like is possible for compartmental or outcome model parameters). If a user wants to use different possible initial condition values each time the model is run, the way to do this is to instead specify a folder containing a set of file with initial condition values for each simulation that will be run. The user can do this using files with the format described in initial_conditions::method::SetInitialConditions using instead method::SetInitialConditionsFolder draw. Similarly, to provide a folder of initial condition files with the format described in initial_conditions::method:FromFile using instead method::FromFileFolderDraw ;

Each file in the folder needs to be named according to the same naming conventions as the model output files: run_number.runID.file_type.[csv or parquet] where ....[DESCRIBE] as it is now taking the place of the seeding files the model would normally outpu ;

Only one additional config argument is needed to use a FolderDraw method for initial conditions:

initial_file_type: either seir or seed

When using FolderDraw methods, initial_conditions_file should now be the path to the directory that contains the folder with all the initial conditions files. For example, if you are using output from another model run and so the files are in an seir folder within a model_output folder which is in within your project directory, you would use initial_conditions_file: model_outpu ;

Specifying seeding

Overview

flepiMoP allows users to specify instantaneous changes in values of model variables, at any time during the simulation. We call this "seeding". For example, some individuals in the population may travel or otherwise acquire infection from outside the population throughout the epidemic, and this importation of infection could be specified with the seeding option. As another example, new genetic variants of the pathogen may arise due to mutation and selection that occurs within infected individuals, and this generation of new strains can also be modeled with seeding. Seeding allows individuals to change state at specified times in ways that do not depend on the model equations. In the first example, the individuals would be "seeded" into the infected compartment from the susceptible compartment, and in the second example, individuals would be seeded into the "infected with new variant" compartment from the "infected with wild type" compartment.

The seeding option can also be used as a convenient alternative way to specify . By default, flepiMoP initiates models by putting the entire population size (specified in the geodata file) in the first model compartment. If the desired initial condition is only slightly different than the default state, it may be more convenient to specify it with a few "seedings" that occur on the first day of the simulation. For example, for a simple SIR model where the desired initial condition is just a small number of infected individuals, this could be specified by a single seeding into the infected compartment from the susceptible compartment at time zero, instead of specifying the initial values of three separate compartments. For larger models, the difference becomes more relevant.

Specifying model seeding

The configuration items in the seeding section of the config file are

seeding:method Must be either "NoSeeding", "FromFile", "PoissonDistributed", "NegativeBinomialDistributed", or "FolderDraw".

seeding::seeding_file Only required for method: “FromFile”. Path to a .csv file containing the list of seeding events

seeding::lambda_file Only required for methods "PoissonDistributed" or "NegativeBinomialDistributed". Path to a .csv file containing the list of the events from which the actual seeding will be randomly drawn.

seeding::seeding_file_type Only required for method "FolderDraw". Either seir or seed

Details on implementing each seeding method and the options that go along with it are below.

seeding::method

NoSeeding

If there is no seeding, then the amount of individuals in each compartment will be initiated using the values specified in theinitial_conditions section and will only be changed at later times based on the equations defined in the seir section. No other arguments are needed in the seeding section in this case

Example

FromFile

This seeding method reads in a user-defined file with a list of seeding events (instantaneous transitions of individuals between compartments) including the time of the event and subpopulation where it occurs, and the source and destination compartment of the individuals. For example, for the simple two-subpopulation SIR model where the outbreak starts with 5 individuals in the small province being infected from a source outside the population, the seeding section of the config could be specified as

Where seeding.csv contains

seeding::seeding_file must contain the following columns:

subpop – the name of the subpopulation in which the seeding event takes place. Seeding cannot move individuals between different subpopulations.
date – the date the seeding event occurs, in YYYY-MM-DD format
amount – an integer value for the amount of individuals who transition between states in the seeding event
source_* and destination_* – For each compartment group (i.e., infection stage, vaccination stage, age group), a different column describes the status of individuals before and after the transition described by the seeding event. For example, for a model where individuals are stratified by age and vaccination status, and a 1-day vaccination campaign for young children and the elderly moves a large number of individuals into a vaccinated state, this file could be something like

PoissonDistributed or NegativeBinomialDistributed

These methods are very similar to FromFile, except the seeding value used in the simulation is randomly drawn from the seeding value specified in the file, with an average value equal to the file value. These methods can be useful when the true seeded value is unknown, and only an observed value is available which is assumed to be observed with some uncertainty. The input requirements are the same for both distributions

and the lambda_file has the same format requirements as the seeding_file for the FromFile method described above.

For method::PoissonDistributed, the seeding value for each seeding event is drawn from a Poisson distribution with mean and variance equal to the value in the amount column. Formethod::NegativeBinomialDistributed, seeding is drawn from a negative binomial distribution with mean amount and variance amount+5 (so identical to "PoissonDistributed" for large values of amount but has higher variance for small values).

FolderDraw

TB ;

Specifying observational model

This page describes how to specify the outcomes section of the configuration file

Thinking about `outcomes` variables

Our pipeline allows users to encode state variables describing the infection status of individuals in the population in two different ways. The first way is via the state variables and transitions of the compartmental model of disease transmission, which are specified in the compartments and seir sections of the config. This model should include all variables that influence the natural course of the epidemic (i.e., all variables that feed back into the model by influencing the rate of change of other variables). For example, the number of infected individuals influences the rate at which new infections occur, and the number of immune individuals influences the number of individuals at risk of acquiring infection.

However, these intrinsic model variables may be difficult to observe in the real world and so directly comparing model predictions about the values of these variables to data might not make sense. Instead, the observable outcomes of infection may include only a subset of individuals in any state, and may only be observed with a time delay. Thus, we allow users to define new outcome variables that are functions of the underlying model variables. Commonly used examples include detected cases or hospitalizations ;

Variables should not be included as outcomes if they influence the infection trajectory. The choice of what variables to include in the compartmental disease model vs. the outcomes section may be very model specific. For example, hospitalizations due to infection could be encoded as an outcome variable that is some fraction of infections, but if we believe hospitalized individuals are isolated from the population and don't contribute to onward infection, or that the number of hospitalizations feeds back into the population's perception of risk of infection and influences everyone's contact behavior, this would not be the best choice. Similarly, we could include deaths due to infection as an outcome variable that is also some fraction of infections, but unless death is a very rare outcome of infection and we aren't worried about actually removing deceased individuals from the modeled populations, deaths should be in the compartmental model instead.

The outcomes section is not required in the config. However, there are benefits to including it, even if the only outcome variable is set to be equivalent to one of the infection model variables. If the compartmental model is complicated but you only want to visualize a few output variables, the outcomes output file will be much easier to work with. Outcome variables always occur with some fixed delay from their source infection model variable, which can be more convenient than the exponential distribution underlying the infection model. Outcome variables can be created to automatically sum over multiple compartments of the infection model, removing the need for post-processing code to do this. If the model is being fit to data, then the outcomes section is required, as only outcome variables can be compared to data.

As an example, imagine we are simulating an SIR-style model and want to compare it to real epidemic data in which cases of infection and death from infection are reported. Our model doesn't explicitly include death, but suppose we know that 1% of all infections eventually lead to hospitalization, and that hospitalization occurs on average 1 week after infection. We know that not all infections are reported as cases, and assume that only 50% are detected and are reported 2 days after infection begins. The model and outcomes section of the config for these outcomes, which we call incidC (daily incidence of cases) and incidH (daily incidence of hospital admission) would be

compartments:
  infection_stage: ["S", "I", "R"]
  
seir:
  transitions:
    # infection
    - source: [S]
      destination: [I]
      proportional_to: [[S], [I]]
      rate: [beta]
      proportion_exponent: 1
    # recovery
    - source: [I]
      destination: [R]
      proportional_to: [[I]]
      rate: [gamma]
      proportion_exponent: 1
  parameters:
    beta: 
      value: 0.2
    gamma: 
      value: 0.1

outcomes:
  settings:
    method: delayframe
  outcomes:
    incidC:
      source:
        incidence:
          infection_stage: "I"
      probability: 
        value: 0.5
      delay: 
        value: 2
    incidH:
      source:
        incidence:
          infection_stage: "I"
      probability: 
        value: 0.01
      delay: 
        value: 21

in the following sections we describe in more detail how this specification works

Specifying `outcomes` in the configuration file

The outcomes config section consists of a list of defined outcome variables (observables), which are defined by a user-created name (e.g., "incidH"). For each of these outcome variables, the user defines the source compartment(s) in the infectious disease model that they draw from and whether they draw from the incidence (new individuals entering into that compartment) or prevalence (total current individuals in that compartment). Each new outcome variable is always associated with two mandatory parameters ;

probability of being counted in this outcome variable if in the source compartment
;delay between when an individual enters the source compartment and when they are counted in the outcome variable

and one optional parameter

duration after entering that an individual is counted as part of the outcome variable

The value of the probability, delay, and duration parameters can be a single value or come from distribution ;

Outcome model parameters probability, delay, and distribution can have an additional attribute beyond value called modifier_key. This value is explained in the section on coding time-dependent parameter modifications (also known as "modifiers") as it provides a way to have the same modifier act on multiple different outcomes ;

Just like the case for compartment model parameters, when outcome parameters are drawn from a distribution, each time the model is run, a different value for this parameter will be drawn from this distribution, but that value will be used for all calculations within this model run. Note that understanding when a new parameter values from this distribution is drawn becomes more complicated when the model is run in Inference mode. In Inference mode, we distinguish model runs as occurring in different "slots" – i.e., completely independent model instances that could be run on different processing cores in a parallel computing environment – and different "iterations" of the model that occur sequentially when the model is being fit to data and update fitted parameters each time based on the fit quality found in the previous iteration. A new parameter values is only drawn from the above distribution once per slot. Within a slot, at each iteration during an inference run, the parameter is only changed if it is being fit and the inference algorithm decides to perturb it to test a possible improved fit. Otherwise, it would maintain the same value no matter how many times the model was run within a slot.

Example

// Some code

Config item

Required?

Type/format

Description

source

Yes

Varies

The infection model variable or outcome variable from which the named outcome variable is created

probability

Yes, unless sum option is used instead

value or distribution

The probability that an individual in the source variable appears in the named outcome variable

delay

Yes, unless sum option is used instead

value or distribution

The time delay between individual's appearance in source variable and appearance in named outcome variable

duration

value or distribution

The duration of time an individual remains counted within the named outcome variablet

sum

List

A list of other outcome variables to sum into the current outcome variable

`source ;

Required, unless sum option is used instead. This sub-section describes the compartment(s) in the infectious disease model from which this outcome variable is drawn. Outcome variables can be drawn from the incidence of a variable - meaning that some fraction of new individuals entering the infection model state each day are chosen to contribute to the outcome variable - or from the prevalence, meaning that each day some fraction of individuals currently in the infection state are chosen to contribute to the outcome variable. Note that whatever the source type, the named outcome variable itself is always a measure of incidence ;

To specify which compartment(s) contribute the user must specify the state(s) within each model stratification. For stratifications not mentioned, the outcome will sum over that states in all strata ;

For example, consider a configuration in which the compartmental model was constructed to track infection status stratified by vaccination status and age group. The following code would be used to create an outcome called incidH_child (incidence of hospitalization for children) and incidH_adult (incidence of hospitalization for adults) where some fraction of infected individuals would become hospitalized and we wanted to track separately track pediatric vs adult hospitalizations, but did not care about tracking the vaccination status of hospitalized individuals as in reality it was not tracked by the hospitals ;

 compartments:
   infection_state: ["S", "I", "R"]
   age_group: ["child", "adult"]
   vaccination_status: ["unvaxxed", "vaxxed"]
   
outcomes:
  incidH_child:
    source:
      incidence:
        infection_state: "I"
        age_group: "child"
    ...
  incidH_adult:
    source:
      incidence:
        infection_state: "I"
        age_group: "adult"
    ...
  incidH_all:
    source:
      incidence:
        infection_state: "I"
    ...

to instead create an outcome variable for cases where on each day of infection there is some probability of testing positive (for example, for the situation of an asymptomatic infection where testing is administered totally randomly), the following code would be used

 compartments:
   infection_state: ["S", "I", "R"]
   age_group: ["child", "adult"]
   vaccination_status: ["unvaxxed", "vaxxed"]
   
outcomes:
  incidC:
    source:
      prevalence:
        infection_state: "I"
    ...

The source of an outcome variable can also be a previous defined outcome variable. For example, t to create a new variable for the number of individuals recruited to be part of a contact tracing program (incidT), which is just some fraction of diagnosed cases ;

outcomes:
  incidC:
    source:
      prevalence:
        infection_state: "I"
    ...
  incidT:
    source: incidC
    ...

`probability ;

Required, unless sum option is used instead. Probability is the fraction of individuals in the source compartment who are counted as part of this outcome variable (if the source is incidence; if the source is prevalence it is the fraction of individuals per day). It must be between 0 and 1 ;

Specifying the probability creates a parameter called outcome_name::probability that can be referred to in the outcome_modifiers section of the config. The value of this parameter can be changed using the probability::intervention_param_name option ;

For example, to track the incidence of hospitalization when 5% of children but only 1% of adults infected require hospitalization, and to create a modifier_key such that both of these rates could be modified by the same amount during some time period using the outcomes_modifier section:

outcomes:
  incidH_child:
    source:
      incidence:
        infection_state: "I"
        age_group: "child"
    probability: 
      value: 0.05
      modifier_key: hosp_rate
  incidH_adult:
    source:
      incidence:
        infection_state: "I"
        age_group: "adult"
    probability: 
      value: 0.01
      modifier_key: hosp_rate

To track the incidence of diagnosed cases iterating over uncertainty in the case detection rate (ranging 20% to 30%), and naming this parameter "case_detect_rate"

outcomes:
  incidC:
    source:
      prevalence:
        infection_state: "I"
    probability:
      value:
        distribution: uniform
        low: 
          value: 0.2
        high: 
          value: 0.3
      intervention_param_name: "case_detect_rate"

Each time the model is run a new random value for the probability of case detection will be chosen ;

Delay

Required, unless sum option is used instead. delay is the time delay between when individuals are chosen from the source compartment and when they are counted as part of this outcome variable ;

For example, to track the incidence of hospitalization when 5% of children are hospitalized and hospitalization occurs 7 days after infection:

outcomes:
  incidH_child:
    source:
      incidence:
        infection_state: "I"
        age_group: "child"
    probability: 
      value: 0.05
    delay: 
      value: 7

To iterate over uncertainty in the exact delay time, we could include some variation between simulations in the delay time using a normal distribution with standard deviation of 2 (truncating to make sure the delay does not become negative). Note that a delay distribution here does not mean that the delay time varies between individuals - it is identical) ;

outcomes:
  incidH_child:
    source:
      incidence:
        infection_state: "I"
        age_group: "child"
    probability: 
      value: 0.05
    delay: 
      value: 
        distribution: truncnorm
        mean: 7
        sd: 2
        a: 0
        b: Inf

Duration

By default, all outcome variables describe incidence (new individuals entering each day). However, they can also track an associated "prevalence" if the user specifies how long individuals will stay classified as the outcome state the outcome variable describes. This is the duration parameter ;

When the duration parameter is set, a new outcome variable is automatically created and named with the name of the original outcome variable + "_curr". This name can be changed using the duration::name option ;

For example, to track the incidence and prevalence of hospitalization when 5% of children are hospitalized, hospitalization occurs 7 days after infection, and the duration of hospitalization is 3 days:

outcomes:
  incidH_child:
    source:
      incidence:
        infection_state: "I"
        age_group: "child"
    probability: 
      value: 0.05
    delay: 
      value: 7
    duration: 
      value: 3

which creates the variable "incidH_child_curr" to track all currently hospitalized children. Since it doesn't make sense to call this new outcome variable an incidence, as it is a prevalence, we could instead rename it:

outcomes:
  incidH_child:
    source:
      incidence:
        infection_state: "I"
        age_group: "child"
    probability: 
      value: 0.05
    delay: 
      value: 7
    duration: 
      value: 3
      name: "hosp_child_curr"

Sum

Optional. sum is used to create new outcome variables that are sums over other previously defined outcome variables ;

If sum is included, source, probability, delay, and duration will be ignored ;

For example, to track new hospital admissions and current hospitalizations separately for children and adults, as well as for all ages combined

outcomes:
  incidH_child:
    source:
      incidence:
        infection_state: "I"
        age_group: "child"
    probability: 0.05
    delay: 6
    duration: 
      value: 14
      name: "hosp_child_curr"
  incidH_adult:
    source:
      incidence:
        infection_state: "I"
        age_group: "adult"
    probability: 0.01
    delay: 8
    duration:
      value: 7
      name: "hosp_adult_curr"
  incidH_total: 
    sum: ["incidH_child","incidH_adult"]
  hosp_curr_total:   
    sum: ["hosp_child_curr","hosp_adult_curr"]

outcomes::settings

There are other required and optional configuration items for the outcomes section which can be specified under outcomes::settings:

method: delayframe.This is the mathematical method used to create the outcomes variable values from the transmission model variables. Currently, the only model supported is delayframe, which .. ;

param_from_file: Optional, TRUE or FALSE. It is possible to allow any of the outcomes variables to have values that vary across the subpopulations. For example, disease severity rates or diagnosis rates may differ by demographic group. In this case, all the outcome parameter values defined in outcomes::outcomes will represent baseline values, and then you can define a relative change from this baseline for any particular subpopulation using the paths section. If params_from_file: TRUE is specified, then these relative values will be read from the params_subpop_file. Otherwise, if params_from_file: FALSE or is not listed at all, all subpopulations will have the same values for the outcome parameters, defined below ;

param_subpop_file: Required if params_from_file: TRUE. The path to a .csv or .parquet file that contains the relative amount by which a given outcome variable is shifted relative to baseline in each subpopulation. File must contain the following columns:

subpop: The subpopulation for which the parameter change applies. Must be a subpopulation defined in the geodata file. For example, small_province
parameter: The outcomes parameter which will be altered for this subpopulation. For example, incidH_child: probability
value: The amount by which the baseline value will be multiplied, for example, 0.75 or 1.1

Examples

Consider a disease described by an SIR model in a population that is divided into two age groups, adults and children, which experience the disease separately. We are interested in comparing the predictions of the model to real world data, but we know we cannot observe every infected individual. Instead, we have two types of outcomes that are observed.

First, via syndromic surveillance, we have a database that records how many individuals in the population are experiencing symptoms from the disease at any given time. Suppose careful cohort studies have shown that 50% of infected adults and 80% of infected children will develop symptoms, and that symptoms occur in both age groups around 3 days after infection (following a log-normal distribution with log mean X and log standard deviation of Y). The duration that symptoms persist is also a variable, following a ...

Secondly, via laboratory surveillance we have a database of every positive test result for the infection. We assume the test is 100% sensitive and specific. Only individuals with symptoms are tested, and they are always tested exactly 1 day after their symptom onset. We are unsure what portion of symptomatic individuals are seeking out testing, but are interested in considering two extreme scenarios: 95% of symptomatic individuals are tested, or only 75% of individuals are tested.

The configuration file we could use to model this situation includes

// Some code

Distributions

This page describes the configuration schema for specifying distributions

Distribution

Parameters

Type/Format

Description

fixed

value

Any real number

Draws all values exactly equal to value

uniform

low

Any real number

Draws all values randomly from a uniform distribution with range [low, high]

high

Any real number greater than low

poisson

lam

Any positive real number

Draws all values randomly from a Poisson distribution with rate parameter (mean) lam (lambda)

binomial

size

Any non-negative integer

Draws all values randomly from a binomial distribution with number of trials (n) = size and probability of success on each trial (p) = prob

prob

Any number in [0,1]

lognormal

meanlog

Any real number

Draws all values randomly from a lognormal distribution (natural log, base e) with mean on a log scale of meanlog and standard deviation on a log scale of sdlog

sdlog

Any non-negative real number

truncnorm

mean

Any real number

Draws all values randomly from a truncated normal distribution with mean mean and standard deviation sd, truncated to have a maximum value of a and a minimum value of b

sd

Any non-negative real number

a

Any real number, or -Inf

b

Any real number greater than a, or Inf

Specifying time-varying parameter modifications

This section describes how to specify modifications to any of the parameters of the transmission model or observational model during certain time periods.

Modifiers are a powerful feature in flepiMoP to enable users to modify any of the parameters being specified in the model during particular time periods. They can be used, for example, to mirror public health control interventions, like non-pharmaceutical interventions (NPIs) or increased access to diagnosis or care, or annual seasonal variations in disease parameters. Modifiers can act on any of the transmission model parameters or observation model parameters ;

In the seir_modifiers and outcome_modifiers sections of the configuration file the user can specify several possible types of modifiers which will then be implemented in the model. Each modifier changes a parameter during one or multiple time periods and for one or multiple specified subpopulations.

We currently support the following intervention types. Each of these is described in detail below:

"SinglePeriodModifier" – Modifies a parameter during a single time period
"MultiPeriodModifier" – Modifies a parameter by the same amount during a multiple time periods
"ModifierModifier" – Modifies another intervention during a single time period
"StackedModifier" – Combines two or more interventions additively or multiplicatively, and is used to be able to turn on and off groups of interventions easily for different runs ;

Note that if you want a parameter to vary continuously over time (for example, a daily transmission rate that is influenced by temperature and humidity), then it is easier to do this by using a "timeseries" parameter value than by combining many separate modifiers. Timeseries parameter values are described in the section. Timeseries parameters for parameters (e.g., a testing rate that fluctuates rapidly due to test availability) are in development but not currently available ;

Within flepiMoP, modifiers can be run as "scenarios". With scenarios, we can use the same configuration file to run multiple versions of the model where only the modifiers applied differ.

The modifiers section contains two sub-sections: modifiers::scenarios, which lists the name of the modifiers that will run in each separate scenario, and modifiers::modifiers, where the details of each modifier are specified (e.g., the parameter it acts on, the time it is active, and the subpopulation it is applied to). An example is outlined below

In this example, each scenario runs a single intervention, but more complicated examples are possible. ;

The major benefit of specifying both "scenarios" and "modifiers" is that the user can use "StackedModifier" option to combine other modifiers in different ways, and then run either the individual or combined modifiers as scenarios. This way, each scenario may consist of one or more individual parameter modifications, and each modification may be part of multiple scenarios. This provides a shorthand to quickly consider multiple different versions of a model that have different combinations of parameter modifications occurring. For example, during an outbreak we could evaluate the impact of school closures, case isolation, and masking, or any one or two of these three measures. An example of a configuration file combining modifiers to create new scenarios is given below

The seir_modifiers::scenarios andoutcome_modifiers::scenarios sections are optional. If the scenariossection is not included, the model will run with all of the modifiers turned "on" ;

If thescenariossection is included for either seir or outcomes, then each time a configuration file is run, the user much specify which modifier scenarios will be run. If not specified, the model will be run one time for each combination of seir and outcome scenario ;

Example

[Give a configuration file that tries to use all the possible option available. Based on simple SIR model with parameters beta and gamma in 2 subpopulations. Maybe a SinglePeriodModifier on beta for a lockdown and gamma for isolation, one having a fixed value and one from a distribution, MultiPeriodModifier for school year in different places, ModifierModifer for ..., StackedModifier for .... ]

`modifiers::scenarios`

A optional list consisting of a subset of the modifiers that are described in modifiers::settings, each of which will be run as a separate scenario. For example

`modifiers::settings`

A formatted list consisting of the description of each modifier, including its name, the parameter it acts on, the duration and amount of the change to that parameter, and the subset of subpopulations in which the parameter modification takes place. The list items are summarized in the table below and detailed in the sections below.

Config item

Required

Type/format

Description

SinglePeriodModifier

SinglePeriodModifier interventions enable the user to specify a multiplicative reduction to a parameter of interest. It take a parameter, and reduces it's value by value (new = (1-value) * old) for the subpopulations listed insubpop during the time interval [period_start_date, period_end_date]

For example, if you would like to create an SEIR modifier called lockdown that reduces transmission by 70% in the state of California and the District of Columbia between two dates, you could specify this with a SinglePeriodModifier, as in the example below

Example

Or to create an outcome variable modifier called enhanced_testing during which the case detection rate double ;

Configuration options

method: SinglePeriodModifier

parameter: The name of the parameter that will be modified. This could be a parameter defined for the transmission model in or for the observational model in . If the parameter is used in multiple transitions in the model then all those transitions will be modified by this amount ;

period_start_date: The date when the modification starts, in YYYY-MM-DD format. The modification will only reduce the value of the parameter after (inclusive of) this date.

period_end_date: The date when the modification ends, in YYYY-MM-DD format. The modification will only reduce the value of the parameter before (inclusive of) this date.

subpop:A list of subpopulation names/ids in which the specified modification will be applied. This can be a single subpop, a list, or the word "all" (specifying the modification applies to all existing subpopulations in the model). The modification will do nothing for any subpopulations not listed here.

value:The fractional reduction of the parameter during the time period the modification is active. This can be a scalar number, or a distribution using the notation described in the section. The new parameter value will be

subpop_groups: An optional list of lists specifying which subsets of subpopulations in subpop should share parameter values; when parameters are drawn from a distribution or fit to data. See section below for more details ;

MultiPeriodModifier

MultiPeriodModifier interventions enable the user to specify a multiplicative reduction to the parameter of interest by value (new = (1-value) * old) for the subpopulations listed in subpop during multiple different time intervals each defined by a start_date and end_date.

For example, if you would like to describe the impact that transmission in schools has on overall disease spread, you could create a modifier that increases transmission by 30% during the dates that K-12 schools are in session in different regions (e.g., Massachusetts and Florida):

Example

Configuration options

method: MultiPeriodModifier

groups: A list of subpopulations (subpops) or groups of them, and time periods the modification will be active in each of them

groups:subpop A list of subpopulation names/ids in which the specified modification will be applied. This can be a single subpop, a list, or the word "all" (specifying the modification applies to all existing subpopulations in the model). The modification will do nothing for any subpopulations not listed here.
groups: periods A list of time periods, each defined by a start and end date, when the modification will be applied
- groups:periods:start_date The date when the modification starts, in YYYY-MM-DD format. The modification will only reduce the value of the parameter after (inclusive of) this date.
- groups:periods:end_date The date when the modification ends, in YYYY-MM-DD format. The modification will only reduce the value of the parameter before (inclusive of) this date.

ModifierModifier

ModifierModifier interventions allow the user to specify an intervention that acts to modify the value of another intervention, as opposed to modifying a baseline parameter value. The intervention multiplicatively reduces the modifier of interest by value (new = (1-value) * old) for the subpopulations listed in subpop during the time interval [period_start_date, period_end_date].

Example

For example, ModifierModifier could be used to describe a social distancing policy that is in effect between two dates and reduces transmission by 60% if followed by the whole population, but part way through this period, adherence to the policy drops to only 50% of in one of the subpopulations population:

Note that this configuration is identical to the following alternative specification

However, there are situations when the ModiferModifier notation is more convenient, especially when doing parameter fitting. ;

Configuration options

method: ModifierModifier

baseline_modifier: The name of the original parameter modification which will be further modified.

parameter: The name of the parameter in the baseline_scenario that will be modified ;

period_start_date: The date when the intervention modifier starts, in YYYY-MM-DD format. The intervention modifier will only reduce the value of the other intervention after (inclusive of) this date.

period_end_date: The date when the intervention modifier ends, in YYYY-MM-DD format. The intervention modifier will only reduce the value of the other intervention before (inclusive of) this date.

subpop:A list of subpopulation names/ids in which the specified intervention modifier will be applied. This can be a single subpop, a list, or the word "all" (specifying the interventions applies to all existing subpopulations in the model). The intervention will do nothing for any subpopulations not listed here.

value:The fractional reduction of the baseline intervention during the time period the modifier intervention is active. This can be a scalar number, or a distribution using the notation described in the section. The new parameter value will be

and so the value of the underlying parameter that was modified by the baseline intervention will be

StackedModifier

Combine two or more modifiers into a scenario, so that they can easily be singled out to be run together without the other modifiers. If multiply modifiers act during the same time period in the same subpopulation, their effects are combined multiplicatively. Modifiers of different types (i.e. SinglePeriodModifier, MultiPeriodModifier, ModifierModifier, other StackedModifiers) can be combined ;

Examples

Configuration options

method: StackedModifier

modifiers: A list of names of the other modifiers (specified above) that will be combined to create the new modifier (which we typically refer to as a "scenario")

modifiers::modifiers::groups

subpop_groups: For any of the modifier types, subpop_groups is an optional list of lists specifying which subsets of subpopulations in subpop should share parameter values; when parameters are drawn from a distribution or fit to data. All other subpopulations not listed will have unique intervention values unlinked to other areas. If the value is 'all', then all subpopulations will be assumed to have the same modifier value. When the subpop_groups option is not specified, all subpopulations will be assumed to have unique values of the modifier ;

For example, for a model of disease spread in Canada where we want to specify that the (to be varied) value of a modification to the transmission rate should be the same in all the Atlantic provinces (Nova Scotia, Newfoundland, Prince Edward Island, and New Brunswick), the same in all the prairie provinces (Manitoba, Saskatchewan, Alberta), the same in the three territories (Nunavut, Northwest Territories, and Yukon), and yet take unique values in Ontario, Quebec, and British Columbia, we could write

Other configuration options

Command line inputs

flepiMoP allows some input parameters/options to be specified in the command line at the time of model submission, in addition to or instead of in the configuration file. This can be helpful for users who want to quickly run different versions of the model – typically a different number of simulations or a different intervention scenario from among all those specified in the config – without having to edit or create a new configuration file every time. In addition, some arguments can only be specified via the command line.

In addition to the configuration file and the command line, the inputs described below can also be specified as environmental variables.

In all cases, command line arguments override configuration file entries which override environmental variables. The order of command line arguments does not matter.

Details on how to run the model, including how to add command line arguments or environmental variables, are in the section .

Command-line only inputs

Argument

Env. Variable

Value type

Description

Required?

Default

Command-line versions of configuration file inputs

Argument

Config item

Env. Variable

Value type

Description

Required?

Default

Example

As an example, consider running the following configuration file

To run this model directly in Python (it can alternatively be run from R, for all details see section ), we could use the command line entry

Alternatively, to run 100 simulations using only 4 of the available processors on our computer, but only running the "" scenario with a deterministic model, and to save the files as .csv (since the model is relatively simple), we could call the model using the command line entry

Environmental variables

TBA

US-specific configuration file options

Things below here are very out of date. Put here as place holder but not updated recently.

global: smh_round, setup_name, disease

spatial_setup: census_year, modeled_states, state_level

For US-specific population structures

For creating US-based population structures using the helper script build_US_setup.R which is run before the main model simulation script, the following extra parameters can be specified

Config Item

Required?

Type/Format

Description

Example 2

To simulate an epidemic across all 50 states of the US or a subset of them, users can take advantage of built in machinery to create geodata and mobility files for the US based on the population size and number of daily commuting trips reported in the US Census.

Before running the simulation, the script build_US_setup.R can be run to get the required population data files from online census data and filter out only states/territories of interest for the model. More details are provided in the How to Run section.

This example simulates COVID-19 in the New England states, assuming no transmission from other states, using 2019 census data for the population sizes and a pre-created file for estimated interstate commutes during the 2011-2015 period.

geodata.csv contains

mobility_2011-2015_statelevel.csv contains

`importation` section (optional)

This section is optional. It is used by the to import global air importation data for seeding infections into the United States.

If you wish to include it, here are the options.

Config Item

Required?

Type/Format

Description

`importation::param_list`

Config Item

Required?

Type/Format

Description

`report` section

The report section is completely optional and provides settings for making an R Markdown report. For an example of a report, see the Supplementary Material of our

If you wish to include it, here are the options.

Config Item

Required?

Type/Format

Description

Code structure

Model Output

(This section describes the location and contents of each of the output files produced during a non-inference model run)

The model will output 2–6 different types of files depending on whether the configuration file contains optional sections (such , , and outcomes interventions) and whether is conducted ;

These files contain the values of the variables for both the infection and (if included) observational model at each point in time and for each subpopulation. A new file of the same type is produced for each independent simulation and each intervention scenario. Other files report the values of the initial conditions, seeding, and model parameters for each subpopulation and independent simulation (since parameters may be chosen to vary randomly between simulations). When is run, there are also file types reporting the model likelihood (relative to the provided data) and files for each iteration of the inference algorithm.

Within the model_output directory in the project's directory, the files will be organized into folders named for the file types: seir, spar, snpi, hpar, hnpi, seed, init, or llik (see descriptions below). Within each file type folder, files will further be organized by the simulation name (setup_name in config), the modifier scenario names - if scenarios exist for either seir or outcome parameters (specified with seir_modifiers::scenarios and outcome_modifiers::scenarios in config), and the run_id (the date and time of the simulation, by default). For example:

The name of each individual file contains (in order) the slot, run_id and file type. The first index indicates the slot (chain, in MCMC language). If multiple iterations or blocks are run, the filename will look like 000000001.000000001.000000001.run_id.seir.parquet indicating slot.block.iteration ;

Each file is a data table that is by default saved as a (a compressed representation that can be opened and manipulated with minimal memory) but can alternatively be saved as a csv file. See options for specifying output type in

The example files outputs we show were generated with the following configuration file

The types and contents of the model output files changes slightly depending on whether the model is run as a forward simulation only, or is run in inference mode, in which parameter values are estimated by comparing the model to data. Output specific to model inference is described in a ;

SEIR (infection model output)

Files in the seir folder contain the output of the infection model over time. They contain the value of every variable for each day of the simulation for every subpopulation.

For the example configuration file shown above, the seir file is

The meanings of the columns are:

mc_value_type – either prevalence or incidence. Variable values are reported both as a prevalence (number of individuals in that state measured instantaneously at the start of the day, equivalent to the meaning of the S, I, or R variable in the differential equations or their stochastic representation) and as incidence (total number of individuals who newly entered this state, from all other states, over the course of the 24-hour period comprising that calendar day).

mc_infection_stage, mc_vaccination_status, etc. – The name of the compartment for which the value is reported, broken down into the infection stage for each state type (eg. vaccination, age).

mc_name – The name of the compartment for which the value is reported, which is a concatenation of the compartment status in each state type.

subpop_1, subpop_2, etc. – one column for each different subpopulation, containing the value of the number of individuals in the described compartment in that subpopulation at the given date. Note that these are named after the nodenames defined by the user in the geodata file.

date – The calendar date in the simulation, in YYYY-MM-DD format.

There will be a separate seir file output for each slot (independent simulation) and for each iteration of the simulation if is conducted.

SPAR (infection model parameter values)

The files in the spar folder contain the parameters that define the transitions in the compartmental model of disease transmission, defined in the seir::parameters section of the config ;

The value column gives the numerical values of the parameters defined in the corresponding column parameter.

SNPI (infection model parameter intervention values)

Files in the snpi folder contain the time-dependent modifications to the transmission model parameter values (defined in seir_modifiers section of the config) for each subpopulation. They contain the modifiers that apply to a given subpopulation and the dates within which they apply, and the value of the reduction to the given parameter.

The meanings of the columns are:

subpop – The subpopulation to which this intervention parameter applies.

modifier_name – The name of the intervention parameter.

start_date – The start date of this intervention, as defined in the configuration file.

end_date – The end date of this intervention, as defined in the configuration file.

parameter – The parameter to which the intervention applies, as defined in the configuration file.

value – The size of the modifier to the parameter either from the config, or fit by inference if that is run.

HPAR (observation model parameter values)

Files in the hpar folder contain the output parameters of the observational model. They contain the values of the probabilities, delays or durations for each outcome in a given subpopulation.

The meanings of the columns are:

subpop – Values in this column are the names of the nodes as defined in the geodata file given by the user.

quantity – The values in this column are the types of parameter values described in the config. The options are probability, delay, and duration. These are the quantities to which there is some parameter defined in the config.

outcome – The values here are the outcomes to which this parameter applies. These are names of the outcome compartments defined in the model.

value – The values in this column are the parameter values of the quantity that apply to the given subpopulation and outcome.

HOSP (observation model output)

Files in the hosp folder contain the output of the infection model over time. They contain the value of every outcome variable for each day of the simulation for every subpopulation.

Columns are:

date – The calendar date in the simulation, in YYYY-MM-DD format.

subpop – Values in this column are the names of the nodes as defined in the geodata file given by the user.

outcome_variable_1, outcome_variable_2, ... - one column for each different outcome variable as defined in the config, containing the value of the number of individuals in the described compartment in that subpopulation at the given date ;

HNPI (observation model parameter intervention values)

Files in the hnpi folder contain any parameter modifier values that apply to the outcomes model, defined in the outcome_modifiers section of the config. They contain the values of the outcome parameter modifiers, and the dates to which they apply in a given subpopulation.

The meanings of the columns are:

subpop – The values of this column are the names of the nodes from the geodata file.

modifier_name – The names/labels of the modifier parameters, defined by the user in the config file, which applies to the given node and time period.

start_date – The start date of this intervention, as defined in the configuration file.

end_date – The end date of this intervention, as defined in the configuration file.

parameter – The outcome parameter to which the intervention applies.

value – The values in this column are the modifier values of the intervention parameters, which apply to the given parameter in a given subpopulation. Note that these are strictly reductions; thus a negative value corresponds to an increase in the parameter, while a positive value corresponds to a decrease in the parameter.

SEED (model seeding values)

Files in the seed folder contain the seeded values of the infection model. They contain the amounts seeded into each variable, the variable they are seeded from, and the time at which the seeding occurs. The user can provide a single seeding file (which will be used across all simulations), or, if multiple simulations are being run the user can provide a separate file for each simulation ;

The meanings of the columns are:

subpop - The values of this column are the names of the nodes from the geodata file.

date - The values in this column are the dates of seeding.

amount - The amount seeded in the given subpopulation from source variables to destination variables, at the given date ;

source_infection_stage, source_vaccination_status, etc. - The name of the compartment from which the amount is seeded, broken down into the infection stage for each state type (eg. vaccination, age).

destination_infection_stage, destination_vaccination_status, etc. - The name of the compartment into which the amount is seeded, broken down into the infection stage for each state type (eg. vaccination, age).

no_perturb - The values in this column can be either true or false. If true, then the amount and/or date can be perturbed if running an inference run. Whether the amount or date is perturbed is defined in the config using perturb_amount and perturb_date ;

INIT (model initial conditions)

Files in the init folder contain the initial values of the infection model. Either seed or init files will be present, depending on the configuration of the model . These files contain the initial conditions of the infection model at the start date defined in the configuration file. As with seeding, the user can provide a single initial conditions file (which will be used across all simulations), or, if multiple simulations are being run the user can provide a separate file for each simulation ;

The meanings of the columns are:

subpop - The values of this column are the names of the nodes from the geodata file.

mc_infection_stage, mc_vaccination_status, etc. - The name of the compartment for which the value is reported, broken down into the infection stage for each state type (eg. vaccination, age).

amount - The amount initialized seeded in the given subpopulation at the start date defined in the configuration file ;

Model Inference

Inference Description

Methods for fitting model to data

flepiMoP can be used to conduct forward simulations of a model with user-defined parameter values, or, it can be used to iteratively run a model with some unknown parameters, compare the model output to ground truth data, and find parameter values that optimize the fit of the model to data (i.e., conduct model "inference"). We have developed a custom model inference method that is based on standard Markov Chain Monte Carlo (MCMC)-based approaches to Bayesian inference for dynamic models, but is adapted to deal with some of the particular challenges of large-scale epidemic models, including i) long times and high computational resources required to simulate single model runs, ii) multiple subpopulations with location-specific parameters but inter-location transmission, iii) a high-dimensional parameter space, iv) the need to produce real-time epidemic projections, and v) the availability of parallel computing resources.

Notation

– A set of unknown model parameters to be estimated by fitting the model output to data. For a model with subpopulations each with their own parameters, this set includes all location-specific parameters .
– The timeseries output of one or more of the state variables of the model under parameters For simplicity, we will often just use the notation . The value at a timepoint is . For a model with subpopulations for which there are different state variables, this becomes . (Note that for the general case when the dynamics in one location can effect the dynamics in another, the model state in one location depends on the full set of parameters, not just the location-specific parameters.)
– The timeseries for the observed data (also referred to as "ground truth") that the model attempts to recreate. For a model with subpopulations each with their own observed data for variable , this becomes .
– The likelihood of the observed data being produced by the model for an input parameter set . This is a probability density function over all possible values of the data being produced by the model, conditional on a fixed model parameter value ;
– The prior probability distribution, which in Bayesian inference encodes beliefs about the possible values of the unknown parameter before any observed data is formally compared to the model.
– The posterior probability distribution, which in Bayesian inference describes the updated probability of the parameters conditional on the observed data .
– The proposal density, used in Metropolis-Hastings algorithms for Markov Chain Monte Carlo (MCMC) techniques for sampling the posterior distribution, describes the probability of proposing a new parameter set from a current accepted parameter set .

Background

This section can be skipped by those familiar with Markov Chain Monte Carlo approaches to Bayesian inference.

Bayesian inference

Our model fitting framework is based on the principles of Bayesian inference. Instead of estimating a single "best-fit" value of the unknown model parameters, our goal is to evaluate the consistency of every possible parameter value with the observed data, or in other words, to construct a distribution that describes the probability that a parameter has a certain value given the observations. This output is referred to as the posterior probability. This framework assumes that the model structure accurately describes the underlying generative process which created the data, but that the underlying parameters are unknown and that there can be some error in the observation of the data.

Bayes' Rule states that the posterior probability of a set of model parameters given observed data can be expressed as a function of the likelihood of observing the data under the model with those parameters () and the prior probability ascribed to those parameters before any data was observed ()

where the denominator is a constant factor – independent of – that only serves to normalize the posterior and thus can be ignored ;

The likelihood function can be defined for a model/data combination based on an understanding of both a) the distribution of model outcomes for a given set of input parameters (if output is stochastic), and b) the nature of the measurement error in observing the data (if relevant) ;

For complex models with many parameters like those used to simulate epidemic spread, it is generally impossible to construct the full posterior distribution either analytically or numerically. Instead, we rely on a class of methods called "Markov Chain Monte Carlo" (MCMC) that allows us to draw a random sample of parameters from the posterior distribution. Ideally, the statistics of the parameters drawn from this sample should be an unbiased estimate of those from the complete posterior.

Markov Chain Monte Carlo methods

In many Bayesian inference problems that arise in scientific model fitting, it is impossible to directly evaluate the full posterior distribution, since there are many parameters to be inferred (high dimensionality) and it is computationally costly to evaluate the model at any individual parameter set. Instead, it is common to employ Markov Chain Monte Carlo (MCMC) methods, which provide a way to iteratively construct a sequence of values that when taken together represent a sample from a desired probability distribution. In the limit of infinitely long sequences ("chains") of values, these methods are mathematically proven to converge to an unbiased sample from the distribution. There are many different MCMC algorithms, but each of them relies on some type of rule for generating a new "sampled" parameter set from an existing one. Our parameter inference method is based on the popular Metropolis-Hastings algorithm. Briefly, at every step of this iterative algorithm, a new set of parameters is jointly proposed, the model is evaluated at that proposed set, the value of the posterior (e.g., likelihood and prior) is evaluated at the proposed set, and if the posterior is improved compared to the previous step, the proposed parameters are "accepted" and become the next entry in the sequence, whereas if the value of the posterior is decreased, the proposed parameters are only accepted with some probability and otherwise rejected (in which case the next entry in the sequences becomes a repeat of the previous parameter set).

The full algorithm for Metropolis-Hastings Markov Chain Monte Carlo is:

Generate initial set of parameters
Evaluate the likelihood () and prior () at this parameter set
For where is the length of the MCMC chain, add to the sequence of parameter values :
- Generate a proposed set of parameters based on an arbitrary proposal distribution
- Evaluate the likelihood and prior at the proposed parameter set
- Generate a uniform random number
- Calculate the acceptance ratio
- If , ACCEPT the proposed parameters to the parameter chain. Set ;
- Else, REJECT the proposed parameters for the chimeric parameter chain. Set

Inference algorithm

Likelihood

In our algorithm, model fitting involves comparing timeseries of variables produced by the model (either transmission model state variables or observable outcomes constructed from those variables) to timeseries of observed "ground truth" data with the same time points. For timeseries data that arises from a deterministic, dynamic model, then the overall likelihood can be calculated as the product of the likelihood of the model output at each timepoint (since we assume the data at each timepoint was measured independently). If there are multiple observed datastreams corresponding to multiple model outputs (e.g., cases and deaths) ;

For each subpopulation in the model, the likelihood of observing the "ground truth" data given the model parameters is

where describes the process by which the data is assumed to be observed/measured from the underlying true vales. For example, observations may be assumed to be normally distributed around the truth with a known variance, or, count data may be assumed to be generated by a Poisson process ;

And the overall likelihood taking into account all subpopulations, is the product of the individual likelihoods

Note that the likelihood for each subpopulation depends not only on the parameter values that act within that subpopulation, but on the entire parameter set , since in general the infection dynamics in one subpopulation are also affected by those in each other region. Also note that we assume that the parameters only impact the likelihood through the single model output timeseries . While this is exactly true for a deterministic model, we make the simplifying assumption that it is also true for stochastic models, instead of attempting to calculate the full distribution of possible trajectories for a given parameter set and include that in the likelihood as well.

Fitting algorithm

The method we use for estimating model parameters is based on the Metropolis-Hastings algorithm, which is a class of Markov Chain Monte Carlo (MCMC) methods for obtaining samples from a posterior probability distribution. We developed a custom version of this algorithm to deal with some of the particular mathematical properties and computational challenges of fitting large disease transmission models ;

There are to major unique features of our adapted algorithm:

Parallelization – Generally MCMC methods starting from a single initial parameter set and generating an extremely long sequence of parameter samples such that the Markov process is acceptably close to a stationary state where it represents an unbiased sample from the posterior. Instead, we simulate multiple shorter chains in parallel, starting from different initial conditions, and pool the results. Due to the computational time required to simulate the epidemic model, and the timescale on which forecasts of epidemic trajectories are often needed (~weeks), it is not possible to sequentially simulate the model millions of times. However, modern supercomputers allow massively parallel computation. The hope of this algorithm is that the parallel chains sample different subspaces of the posterior distribution, and together represent a reasonable sample from the full posterior. To maximize the chance of at least local stationarity of these subsamples, we pool only the final values of each of the parallel chains.
Multi-level – Our pipeline, and the fitting algorithm in particular, were designed to be able to simulate disease dynamics in a collection of linked subpopulations. This population structure creates challenges for model fitting. We want the model to be able to recreate the dynamics in each subpopulation, not just the overall summed dynamics. Each subpopulation has unique parameters, but due to the coupling between them (), the model outcomes in one subpopulation also depend on the parameter values in other subpopulations. For some subpopulations (), this coupling may effectively be weak and have little impact on dynamics, but for others (), spillover from another closely connected subpopulation may be the primary driver of the local dynamics. Thus, the model cannot be separately fit to each subpopulation, but must consider the combined likelihood. However, such an algorithm may be very slow to find parameters that optimize fits in all locations simultaneously, and may be predominantly drawn to fitting to the largest/most connected subpopulations. The avoid these issues, we simultaneously generate two communicating parameter chains: a "chimeric" chain that allows the parameters for each subpopulation to evolve quasi-independently based on local fit quality, and a "global" chain that evolves only based on the overall fit quality (for all subpopulations combined).

Note that while the traditional Metropolis-Hastings algorithm for MCMC will provably converge to a stationary distribution where the sequence of parameters represents a sample from the posterior distribution, no such claim has been mathematically proven for our method.

For , where is the number of parallel MCMC chains (also known as slots)
- Generate initial state
  - Generate an initial set of parameters , and copy this to both the global () and chimeric () parameter chain (sequence ;
  - Generate an initial epidemic trajectory
  - Calculate and record the initial likelihood for each subpopulation, $$\mathcal{L_i}(D_i|Z_i(\Theta_{m,0}))$ ;
- For where is the length of the MCMC chain, add to the sequence of parameter values :
  - Generate a proposed set of parameters from the current chimeric parameters using the proposal distribution $$g(\Theta^*|\Theta^C_{m,k-1})$ ;
  - Generate an epidemic trajectory with these proposed parameters,
  - Calculate the likelihood of the data given the proposed parameters for each subpopulation,
  - Calculate the overall likelihood with the proposed parameters,
  - Make "global" decision about proposed parameters
    Generate a uniform random number
    Calculate the overall likelihood with the current global parameters,
    Calculate the acceptance ratio
    If : ACCEPT the proposed parameters to the global and chimeric parameter chains
    Set
    Set
    Update the recorded subpopulation-specific likelihood values (chimeric and global) with the likelihoods calculated using the proposed parameter ;
    Else: REJECT the proposed parameters for the global chain and make subpopulation-specific decisions for the chimeric chain
    Set
    Make "chimeric" decision:
    For
    Generate a uniform random number
    Calculate the acceptance ratio
    If : ACCEPT the proposed parameters to the chimeric parameter chain for this location
    Set
    Update the recorded chimeric likelihood value for subpopulation to that calculated with the proposed parameter
    Else: REJECT the proposed parameters for the chimeric parameter chain for this location
    Set
    `End if ;
    End for subpopulations
    End making chimeric decisions
    End if
  - End making global decision
- End for iterations of each MCMC chain
End for parallel MCMC chains
Collect the final global parameter values for each parallel chain

We consider the sequence to represent a sample from the posterior probability distribution, and use it to calculate statistics about the inferred parameter values and the epidemic trajectories resulting from them (e.g., mean, median, 95% credible intervals).

There are a few important notes/limitations about our method currently:

All parameters to be fit must be location-specific. There is currently no way to fit a parameter that has the identical value across all locations
The pipeline currently does not allow for fitting of the basic parameters of the compartmental epidemic model. Instead, these must be fixed, and the value of location-specific "interventions" acting to increase/reduce these parameters can be fit. All parameters related to the observational/outcomes model can be fit, as well as "interventions" acting to increase or reduce them ;
At no point is the parameter fitting optimizing the fit of the summed total population data to total population model predictions. The "overall" likelihood function used to make "global" parameter acceptance decisions is the product of the individual subpopulations likelihoods (which are based on comparing location-specific data to location-specific model output), which is not equivalent to likelihood for the total population. For example, if overestimates of the model in some subpopulations were exactly balanced by underestimates in others, the total population estimate could be very accurate and the total population likelihood high, but the overall likelihood we use here would still be low.
There is no model simulation run or record that corresponds to the combined parameters recorded in the chimeric parameter chain (). For entry in the chain, some of these parameter values were recently accepted from the last proposal and were used in the simulation produced by that proposal, while for other subpopulations, the most recent proposed parameters were rejected so contains parameters accepted – and part of the simulations produced – in a previous iteration.
It is currently not possible to infer parameters of the measurement process encoded in the likelihood function. For example, if the likelihood is chosen to be a normal distribution, which implies an assumption that the observed data is generated from the underlying truth according to a normal distribution with mean zero, then the standard deviation must be specified, and cannot be inferred along with the other model parameters ;
There is an option to use a slightly different version of our algorithm, in which globally accepted parameter values are not pushed back into the chimeric likelihood, but the chimeric likelihood is instead allowed to continue to evolve independently. In this variation, the chimeric acceptance decision is always made, not only if a global rejection happens ;
The proposal distribution for generating new parameter sets is currently constrained to be a joint distribution in which the the value of each new proposed parameter is chosen independently of any other parameters.
While in general in Metropolis-Hasting algorithms the formula for the the acceptance ratio includes the proposal distribution , those terms cancel out if the proposal distribution is symmetrical. Our algorithm assumes such symmetry and thus does not include in the formula, so the user must be careful to only select symmetric distributions.

Hierarchical parameters

The baseline likelihood function used in the fitting algorithm described above allows for parameter values to differ arbitrarily between different subpopulations. However, it may be desired to instead impose constraints on the best-fit parameters, such that subpopulations that are similar in some way, or belong to some pre-defined group, have parameters that are close to one another. Formally, this is typically done with group-level or hierarchical models that fit meta-parameters from which individual subpopulation parameters are assumed to draw. Here, we instead impose this group-level structure by adding an additional term to the likelihood that describes the probability that the set of parameters proposed for a group of subpopulations comes from a normal distribution. This term of the likelihood will be larger when the variance of this parameter set is smaller. Formally

where is a group of subpopulations, is one of the parameters in the set , is the probability density function of the normal distribution, and are the mean and standard deviation of all values of the parameter in the group . There is also the option to use a logit-normal distribution instead of a standard normal, which may be more appropriate if the parameter is a proportion bounded in [0,1].

Inference Implementation

Specifying data source and fitted variables

inference settings

iterations_per_slot

do_inference

gt_data_path

With inference model runs, the number of simulations nsimulations refers to the number of final model simulations that will be produced. The filtering$simulations_per_slot setting refers to the number of iterative simulations that will be run in order to produce a single final simulation (i.e., number of simulations in a single MCMC chain).

Item

Required?

Type/Format

Description

`f`

inference::statistics options

required options

name

aggregator

period

sim_var

data_var

likelihood

The statistics specified here are used to calibrate the model to empirical data. If multiple statistics are specified, this inference is performed jointly and they are weighted in the likelihood according to the number of data points and the variance of the proposal distribution.

Item

Required?

Type/Format

Description

`f`

optional options ?

remove_na

add_one

gt_start_date

gt_end_date

Optional sections

`inference::hierarchical_stats_geo`

The hierarchical settings specified here are used to group the inference of certain parameters together (similar to inference in "hierarchical" or "fixed/group effects" models). For example, users may desire to group all counties in a given state because they are geograhically proximate and impacted by the same statewide policies. The effect should be to make these inferred parameters follow a normal distribution and to observe shrinkage among the variance in these grouped estimates.

Item

Required?

Type/Format

`inference::priors`

It is now possible to specify prior values for inferred parameters. This will have the effect of speeding up model convergence.

Item

Required?

Type/Format

Ground truth data

name

module

geo_group_col

transform

inference:::priors

inference::

(OLD) Configuration options

`filtering` section

The filtering section configures the settings for the inference algorithm. The below example shows the settings for some typical default settings, where the model is calibrated to the weekly incident deaths and weekly incident confirmed cases for each subpop. Statistics, hierarchical_stats_geo, and priors each have scenario names (e.g., sum_deaths, local_var_hierarchy, and local_var_prior, respectively).

`filtering` settings

Item

Required?

Type/Format

`filtering::statistics`

Item

Required?

Type/Format

`filtering::hierarchical_stats_geo`

Item

Required?

Type/Format

`filtering::priors`

It is now possible to specify prior values for inferred parameters. This will have the effect of speeding up model convergence.

Item

Required?

Type/Format

Ground truth data

Likelihood function

Fitting parameters

Ground truth data

How To Run

Development

Deprecated pages

JHU Internal

Technical Reference

Specifying compartmental model

This section describes how to specify the compartmental model of infectious disease transmission.

An example section of a configuration file defining a simple SIR model is below.

compartments:
  infection_stage: ["S", "I", "R"]
  
seir:
  transitions:
    # infection
    - source: [S]
      destination: [I]
      proportional_to: [[S], [I]]
      rate: [beta]
      proportion_exponent: 1
    # recovery
    - source: [I]
      destination: [R]
      proportional_to: [[I]]
      rate: [gamma]
      proportion_exponent: 1
  parameters:
    beta: 0.1
    gamma: 0.2
  integration:
     method: rk4
     dt: 1.00

Specifying model compartments (`compartments`)

compartments:
  infection_stage: ["S", "I", "R"]

In this case we can specify compartments as the cross product of multiple states of interest. For example:

 compartments:
   infection_stage: ["S", "I", "R"]
   vaccination_status: ["unvaccinated", "vaccinated"]

Corresponds to 6 compartments, which the code internally converts to this data frame

infection_stage, vaccination_status, compartment_name
S,               unvaccinated,       S_unvaccinated
I,               unvaccinated,       I_unvaccinated
R,               unvaccinated,       R_unvaccinated
S,               vaccinated,         S_vaccinated
I,               vaccinated,         I_vaccinated
R,               vaccinated,         R_vaccinated

In order to more easily describe transitions, we want to be able to refer to a compartment by its components, but then use it by its compartment name.

 compartments:
   infection_stage: ["S", "I"]
   age_group: ["child", "adult"]
   vaccination_status: ["unvaccinated", "1dose", "2dose"]

corresponding to 12 compartments, 4 of which are unnecessary to the model

infection_stage, age_group, vaccination_status, compartment_name
S,		 child,	    unvaccinated,	S_child_unvaccinated	
I,		 child,	    unvaccinated,	I_child_unvaccinated
S,		 adult,	    unvaccinated,	S_adult_unvaccinated
I,		 adult,	    unvaccinated,	I_adult_unvaccinated
S,		 child,	    1dose,		S_child_1dose
I,		 child,	    1dose,		I_child_1dose
S,		 adult,     1dose,		S_adult_1dose
I,		 adult,     1dose,		I_adult_1dose
S,		 child,     2dose,		S_child_2dose	
I,		 child,     2dose,		I_child_2dose
S,		 adult,	    2dose,		S_adult_2dose
I,		 adult,	    2dose,		I_adult_2dose

Or, it could be specified with the less concise notation

compartments:
   overall_state: ["S_child", "I_child", "S_adult_unvaccinated", "I_adult_unvaccinated", "S_adult_1dose", "I_adult_1dose", "S_adult_2dose", "I_adult_2dose"]

which does not result in any unnecessary compartments being included.

Notation must be consistent between these sections.

Specifying compartmental model transitions (`seir::transitions`)

A transition has 5 pieces of associated information that a user can specify:

source
destination
rate
proportional_to
proportion_exponent

For more details on the mathematical forms possible for transitions in our models, read the Model Description section.

We will focus on describing the first transition of this model, the rate at which unvaccinated individuals move from the susceptible to infected state.

Specifying a single transition

Source

[S,unvaccinated]

which corresponds to the compartment S_unvaccinated

Destination

[I,unvaccinated]

which corresponds to the compartment I_unvaccinated

Rate

instead, we could describe the rate using a parameter beta, which can be given a numeric value later:

beta

The interpretation and unit of the rate constant depend on the model details, as the rate may potentially also be per number (or proportion) of individuals in other compartments (see below).

Proportional to

[[[S,unvaccinated]], [[I,unvaccinated], [I, vaccinated]]]

To understand this term, consider the compartments written out as strings

[[S_unvaccinated], [I_unvaccinated, I_vaccinated]]

and then sum the terms in each group

[S_unvaccinated, I_unvaccinated + I_vaccinated]

For transitions that occur at a constant per-capita rate (ie, E -> I at rate $\gamma$ in an SEIR model), it is possible to simply write proportional_to: ["source"].

Proportion exponent

[1, 0.9]

or a power parameter alpha, which can be given a numeric value later:

[1, alpha]

Summary

Putting it all together, the model transition is specified as

source: [S, unvaccinated]
destination: [I, unvaccinated]
proportional_to: [[[S,unvaccinated]], [[I,unvaccinated], [I,vaccinated]]]
rate: [5]
proportion_exponent: [1, 0.9]

would correspond to the following model if expressed as an ordinary differential equation

\frac{\delta \text{S}_\text{unvaccinated}}{\delta t} = - \beta \text{S}_\text{unvaccinated}^1 (\text{I}_\text{unvaccinated}+\text{I}_\text{vaccinated})^{\alpha}

\frac{\delta \text{I}_\text{unvaccinated}}{\delta t} = \beta \text{S}_\text{unvaccinated}^1 (\text{I}_\text{unvaccinated}+\text{I}_\text{vaccinated})^{\alpha}

with parameter and parameter (we will describe how to use parameter symbols in the transitions and specify their numeric values separately in the section Specifying compartmental model parameters).

Transition globs

Source

We allow one or more arguments to be specified for each compartment. So to specify the transitions out of both susceptible compartments (S_unvaccinated and S_unvaccinated), we would use

[[S], [unvaccinated,vaccinated]]

Destination

[[I], [unvaccinated,vaccinated]]

If instead we wrote:

[[I], [vaccinated,unvaccinated]]

we would have a transition from S_unvaccinated to I_vaccinated and S_vaccinated to I_unvaccinated.

Rate

For example,

rate: [[3], [0.6,0.5]]

This would mean our transition from S_unvaccinated to I_unvaccinated would have a rate of 3 * 0.6 while our transition from S_vaccinated to I_vaccinated would have a rate of 3 * 0.5.

The rate vector should be the same shape as source and destination and in the same relative order.

Proportional to

[
  [[S,unvaccinated], [S,vaccinated]],
  [[I,unvaccinated],[I, vaccinated]], [[I,unvaccinated],[I, vaccinated]]
]

Again, let's unpack what it says. Since the broadcast is over groups, let's split the config back up

into those groups

[
  [S,unvaccinated],
  [[I,unvaccinated],[I, vaccinated]]
]
[
  [S,vaccinated],
  [[I,unvaccinated],[I, vaccinated]]
]

From here, we can say that we are describing two transitions. Both occur proportionally to the same compartments: S_unvaccinated and the total number of infections (I_unvaccinated+I_vaccinated).

If, for example, we want to model a situation where vaccinated susceptibles cannot be infected by unvaccinated individuals, we would instead write:

[
  [[S,unvaccinated], [S,vaccinated]],
  [[I,unvaccinated],[I, vaccinated]], [[I, vaccinated]]
]

Proportion exponent

Similarly to rate and proportional_to, we provide an exponent for each component and every group across the broadcast. So we could for example use:

[[1,1], [0.9,0.8]]

Summary

Putting it all together, the transition glob

seir:
  transitions:
    source: [[S],[unvaccinated,vaccinated]]
    destination: [[I],[unvaccinated,vaccinated]]
    proportional_to: [
                       [[S,unvaccinated], [S,vaccinated]],
                       [[I,unvaccinated],[I, vaccinated]], [[I, vaccinated]]
                     ]
    rate: [[3], [0.6,0.5]]
    proportion_exponent: [[1,1], [0.9,0.8]]

is equivalent to the following transitions

seir:
  transitions:
    - source: [S,unvaccinated]
      destination: [I,unvaccinated]
      proportional_to: [[[S,unvaccinated]], [[I,unvaccinated],[I, vaccinated]]]
      proportion_exponent: [1 * 0.9]
      rate: [3*0.6]
    - source: [S,vaccinated]
      destination: [I,vaccinated]
      proportional_to: [[[S,vaccinated]], [[I, vaccinated]]]
      proportion_exponent: [1 * 0.8]
      rate: [3*0.5]

Warning

Specifying compartmental model parameters (`seir::parameters`)

Parameters can take on three types of values:

Fixed values
Value drawn from distributions
Values read from timeseries specified in a data file

Specifying fixed parameter values

seir:
  parameters:
    beta: 
      value: 0.1
    gamma: 
      value: 0.2

The full model section of the config could then read

compartments:
  infection_state: ["S", "I", "R"]
  
seir:
  transitions:
    # infection
    - source: [S]
      destination: [I]
      proportional_to: [[S], [I]]
      rate: [beta]
      proportion_exponent: 1
    # recovery
    - source: [I]
      destination: [R]
      proportional_to: [[I]]
      rate: [gamma]
      proportion_exponent: [1,1]
  parameters:
    beta: 
      value: 0.1
    gamma: 
      value: 0.2

For the stratified SI model described above, this portion of the config would read

compartments:
  infection_stage: ["S", "I", "R"]
  vaccination_status: ["unvaccinated", "vaccinated"]
  
seir:
  transitions:
    source: [[S],[unvaccinated,vaccinated]]
    destination: [[I],[unvaccinated,vaccinated]]
    proportional_to: [
                       [[S,unvaccinated], [S,vaccinated]],
                       [[I,unvaccinated],[I, vaccinated]], [[I, vaccinated]]
                     ]
    rate: [[beta], [theta_u,theta_v]]
    proportion_exponent: [[1,1], [alpha_u,alpha_v]]
  parameters:
    beta: 
      value: 0.1
    theta_u: 
      value: 0.6
    theta_v: 
      value: 0.5
    alpha_u: 
      value: 0.9
    alpha_v: 
      value: 0.8

If there are no parameter values that need to be specified (all rates given numeric values when defining model transitions), the seir::parameters section of the config can be left blank or omitted.

Specifying parameters values from distributions

seir:
  parameters:
    beta: 
      value:
        distribution: fixed
        value: 0.1
    gamma: 
      value:
        distribution: lognorm
        logmean: -1.6
        logsd: 0.2

Details on the possible distributions that are currently available, and how to specify their parameters, is provided in the Distributions section.

Specifying parameter values as timeseries from data files

date,        small_province,    large_province
2022-01-01,  1.5,               1.3
.....
2022-05-01,  0.5,               0.7 
....
2022-12-31,  1.5,               1.3

as a part of a configuration file with the model sections:

compartments:
  infection_stage: ["S", "I", "R"]

seir:
  transitions:
    # infection
    - source: [S]
      destination: [I]
      proportional_to: [[S], [I]]
      rate: [beta*theta]
      proportion_exponent: 1
    # recovery
    - source: [I]
      destination: [R]
      proportional_to: [[I]]
      rate: [gamma]
      proportion_exponent: 1
  parameters:
    beta: 
      value: 0.1
    gamma: 
      value: 0.2
    theta:
       timeseries: data/seasonal_transmission.csv

Config item

Required?

Type/Format

Description

value

either value or timeseries is required

numerical, or distribution

This defines the value of the parameter, as described above.

timeseries

either value or timeseries is required

path to a csv file

This defines a timeseries for each day, as above.

stacked_modifier_method

optional

string: sum, product, reduction_product

This option defines the method used when modifiers are applied. The default is product.

rolling_mean_windows

optional

integer

The size of the rolling mean window if a rolling mean is applied.

Specifying model simulation method `(seir::integration)`

Our framework allows for two major methods for implementing compartmental models of disease transmission:

ordinary differential equations, which are completely deterministic, operate in continuous time (consider infinitesimally small timesteps), and allow for arbitrary fractions of the population (i.e., not just discrete individuals) to move between model compartments
discrete-time stochastic process, which tracks discrete individuals and produces random variation in the number of individuals transitioning between states for any given rate, and which allows transitions between states only to occur at discrete time intervals

The mathematics behind each implementation is described in the Model Description section

Config item

Required?

Type/format

Description

method

optional

string: rk4 (default),euler, stochastic

dt

optional

positive real number (default: 2)

The timestep used for the numerical integration or discrete time stochastic update; for rk4 method, this is a reasonable value, but for other options, this should be 0.2 or less.

For example, to simulate a model deterministically using the 4th order Runge-Kutta algorithm for numerical integration with a timestep of 1 day:

seir:
  integration:
     method: rk4
     dt: 1.00

Alternatively, to simulate a model stochastically with a timestep of 0.1 days

seir:
  integration:
     method: stochastic
     dt: 0.1

Environment Variables

description: >- A library of environment variables in the flepiMoP codebase. These variables may be updated or deprecated as the project evolves.

Environment Variables

Below you will find a list of environment variables (envvars) defined throughout the flepiMoP codebase. Often, these variables are set in response to command-line argument input. Though, some are set by flepiMoP without direct user input (these are denoted by a 'Not a CLI option' note in the Argument column.)

Envvar.

Argument

Description

Default

Valid values

Key file locations (inexhaustive)

BATCH_SYSTEM

Not a CLI option.

System you are running on (e.g., aws, SLURM, local).

N/A

e.g., aws, slurm

inference_job_launcher.py

CENSUS_API_KEY

Not a CLI option.

A unique key to the API for census data.

N/A

slurm_init.sh, build_US_setup.R

CONFIG_PATH

-c, --config

Path to a configuration file.

your/path/to/config_file

build_covid_data.R, build_US_setup.R, build_initial_seeding.R, build_flu_data.R, config.R, preprocessing/ files

DELPHI_API_KEY

-d, --delhpi_api_key

Your personalized key for the Delphi Epidata API. Alternatively, this key can go in the config inference section as gt_api_key.

build_covid_data.R

DIAGNOSTICS

-n, --run-diagnostics

Flag for whether or not diagnostic tests should be run during execution.

TRUE

--run-diagnostics FALSE for FALSE, --run-diagnostics or no mention for TRUE

run_sim_processing_SLURM.R

DISEASE

-i, --disease

Which disease is being simulated in the prsent run.

flu

e.g., rsv, covid

run_sim_processing_SLURM.R/td>

DVC_OUTPUTS

Not a CLI option, but defined using --output

The names of the directories with outputs to save in S3 (separated by a space).

model_output model_parameters importation hospitalization

e.g., model_output model_parameters importation hospitalization

scenario_job.py, AWS_scenario_runner.sh

FILENAME

Not a CLI option.

Filenames for output files, determined dynamically during inference.

N/A

file.parquet, plot.pdf

AWS_postprocess_runner.sh, SLURM_inference_job.run, AWS_inference_runner.sh

FIRST_SIM_INDEX

-i, --first_sim_index

The index of the first simulation.

1

int

shared_cli.py

FLEPI_BLOCK_INDEX

-b, --this_block

Index of current block.

1

int

flepimop-inference-main.R, utils.py, AWS_postprocess_runner.sh, AWS_inference_runner.sh, SLURM_inference_job.run, inference_job_launcher.py

FLEPI_CONTINUATION

--continuation/--no-continuation

Flag for whether or not to use the resumed run seir files (or provided initial files bucket) as initial conditions for the next run.

FALSE

--continuation TRUE for TRUE, --continuation or no mention for FALSE

SLURM_inference_job.run, inference_job_launcher.py

FLEPI_CONTINUATION_FTYPE

Not a CLI option.

If running a continuation, the file type of the initial condition files.

config['initial_conditions']['initial_file_type']

e.g., .csv

SLURM_inference_job.run, inference_job_launcher.py

FLEPI_CONTINUATION_LOCATION

--continuation-location

The location (folder or an S3 bucket) from which to pull the /init/ files (if not set, uses the resume location seir files).

path/to/your/location

SLURM_inference_job.run, inference_job_launcher.py

FLEPI_CONTINUATION_RUN_ID

--continuation-run-id

The ID of run to continue at, if doing a continuation.

int

SLURM_inference_job.run, inference_job_launcher.py

FLEPI_INFO_PATH

Not a CLI option.

pending

info.py

FLEPI_ITERATIONS_PER_SLOT

-k, --iterations_per_slot

Number of iterations to run per slot.

int

flepimop-inference-slot.R, flepimop-inference-main.R, SLURM_inference_job.run, inference_job_launcher.py

FLEPI_MAX_STACK_SIZE

--stacked-max

Maximum number of iterventions to allow in a stacked intervention.

5000

int >=350

StackedModifier.py, inference_job_launcher.py

FLEPI_MEM_PROFILE

-M, --memory_profiling

Flag for whether or not memory profile should be run during iterations.

FALSE

--memory_profiling TRUE for TRUE, --memory_profiling or no mention for FALSE

flepimop-inference-slot.R, flepimop-inference-main.R, inference_job_launcher.py

FLEPI_MEM_PROF_ITERS

-P, --memory_profiling_iters

If doing memory profiling, after every X iterations, run the profiling.

100

int

flepimop-inference-slot.R, flepimop-inference-main.R, inference_job_launcher.py

FLEPI_NJOBS

-j, --jobs

Number of parallel processors used to run the simulation. If there are more slots than jobs, slots will be divided up between processors and run in series on each.

Number of cores detected as available at computing cluster.

int

flepimop-inference-slot.R, flepimop-inference-main.R, calibrate.py

FLEPI_NUM_SLOTS

-n, --slots

Number of independent simulations of the model to be run.

int >=1

flepimop-inference-slot.R, flepimop-inference-main.R, calibrate.py, inference_job_launcher.py

FLEPI_OUTCOME_SCENARIOS

-d, --outcome_modifiers_scenarios

Name of the outcome scenario to run.

'all'

pending

flepimop-inference-slot.R, flepimop-inference-main.R, SLURM_inference_job.run, inference_job_launcher.py

FLEPI_PATH

-p, --flepi_path

Path to the flepiMoP directory.

'flepiMoP'

path/to/flepiMoP

several postprocessing/ files, several batch/ files, several preprocessing/ files, info.py, utils.py, _cli.py

FLEPI_PREFIX

--in-prefix

Unique name for the run.

e.g., project_scenario1_outcomeA, etc.

SLURM_inference_job.run, inference_job_launcher.py, AWS_postprocess_runner.sh, calibrate.py, several preprocessing/ files, several postprocessing/ files, several batch/ files

FLEPI_RESET_CHIMERICS

-L, --reset_chimeric_on_accept

Flag for whether or not chimeric parameters should be reset to global parameters whena global acceptance occurs.

TRUE

--reset_chimeric_on_accept FALSE for FALSE, --reset_chimeric_on_accept or no mention for TRUE

flepimop-inference-slot.R, flepimop-inference-main.R, slurm_init.sh, hpc_init, inference_job_launcher.py

FLEPI_RESUME

--resume/--no-resume

Flag for whether or not to resume the current calibration.

FALSE

--resume TRUE for TRUE, --resume or no mention for FALSE

flepimop-inference-slot.R, flepimop-inference-main.R, slurm_init.sh, hpc_init, inference_job_launcher.py

FLEPI_RUN_INDEX

-u, --run_id

Unique ID given to the model run. If the same config is run multiple times, you can avoid the output being overwritten by using unique model run IDs.

Auto-assigned run ID

int

copy_for_continuation.py, flepimop-inference-slot.R, flepimop-inference-main.R, shared_cli.py, base.py, calibrate.py, several batch/ files, several postprocessing/ files

FLEPI_SEIR_SCENARIOS

-s, --seir_modifier_scenarios

Names of the intervention scenarios to run.

'all'

pending

flepimop-inference-slot.R, flepimop-inference-main.R, inference_job_launcher.py

FLEPI_SLOT_INDEX

-i, --this_slot

Index for current slots.

1

int

flepimop-inference-slot.R, several batch/ files

FS_RESULTS_PATH

-R, --results-path

A path to the model results.

your/path/to/model_results

prune_by_llik.py, prune_by_llik_and_proj.py, several postprocessing/ files, several batch/ files, model_output_notebook.Rmd

FULL_FIT

-F, --full-fit

Whether or not to process the full fit.

FALSE

--full-fit TRUE for TRUE, --full-fit or no mention for FALSE

run_sim_processing_SLURM.R

GT_DATA_SOURCE

-s, --gt_data_source

Sources of groundtruth data.

'csse_case, fluview_death, hhs_hosp'

See default

build_covid_data.R

GT_END_DATE

--ground_truth_end

Last date to include ground truth for.

YYYY-MM-DD format

flepimop-inference-slot.R, flepimop-inference-main.R

GT_START_DATE

--ground_truth_start

First date to include ground truth for.

YYYY-MM-DD format

flepimop-inference-slot.R, flepimop-inference-main.R

IMM_ESC_PROP

--imm_esc_prop

Annual percent of immune escape.

0.35

float between 0.00 - 1.00

several preprocessing/ files

INCL_AGGR_LIKELIHOOD

-a, --incl_aggr_likelihood

Whether or not the likelihood should be calculated with aggregate estimates.

FALSE

--incl_aggr_likelihood TRUE for TRUE, --incl_aggr_likelihood or no mention for FALSE

flepimop-inference-slot.R

IN_FILENAME

Not a CLI option.

Name of input files.

N/A

file_1.csv file_2.csv, etc.

several batch/ files

INIT_FILENAME

--init_file_name

Initial file global intermediate name.

file.csv

seir_init_immuneladder.R, inference_job.run, several preprocessing/ files

INTERACTIVE_RUN

-I, --is-interactive

Whether or not the current run is interactive.

FALSE

--is-interactive TRUE for TRUE, --is-interactive or no mention for FALSE

flepimop-inference-slot.R, flepimop-inference-main.R

JOB_NAME

--job-name

Unique job name (intended for use when submitting to SLURM).

Convention: {config['name']}-{timestamp} (str)

several batch/ files

LAST_JOB_OUTPUT

Not a CLI option.

Path to output of last job.

N/A

path/to/last_job/output

utils.py, several batch/ files

OLD_FLEPI_RUN_INDEX

Not a CLI option.

Run ID of old flepiMoP run.

N/A

int

several batch/ files

OUT_FILENAME

Not a CLI option.

Name of output files.

N/A

file_1.csv file_2.csv, etc.

several batch/ files

OUT_FILENAME_DIR

Not a CLI option.

Directory for output files.

N/A

path/to/output/files

SLURM_inference_job.run

OUTPUTS

-o, --select-outputs

A list of outputs to plot.

'hosp, hnpi, snpi, llik'

hosp, hnpi, snpi, llik

postprocess_snapshot.R

PARQUET_TYPES

Not a CLI option.

Parquet files.

'seed spar snpi seir hpar hnpi hosp llik init'

seed spar snpi seir hpar hnpi hosp llik init

AWS_postprocess_runner.sh, SLURM_inference_job.run, AWS_inference_runner.sh

PATH

Not a CLI option.

Path relating to AWS installation. Used during SLURM runs.

N/A

set with export PATH=~/aws-cli/bin:$PATH in SLURM_inference_job.run

schema.yml, utils.py, info.py, AWS_postprocess_runner.sh, SLURM_inference_job.run

PROCESS

-r, --run-processing

Whether or not to process the run.

FALSE

--run-processing TRUE for TRUE, --run-processing or no mention for FALSE

run_sim_processing_SLURM.R

PROJECT_PATH

-d, --data_path

Path to the folder with configs and model output.

path/to/configs_and_model-output

base.py, _cli.py, calibrate.py, several postprocessing/ files, several batch/ files

PULL_GT

-g, --pull-gt

Whether or not to pull ground truth data.

FALSE

--pull-gt TRUE for TRUE, --pull-gt or no mention for FALSE

run_sm_processing_SLURM.R

PYTHON_PATH

-y, --python

Path to Python executable.

'python3'

path/to/your_python

flepimop-inference-slot.R, flepimop-inference-main.R

RESUMED_CONFIG_PATH

--res_config

Path to previous config file, if using resumes.

NA

path/to/past_config

seir_init_immuneladder.R, several preprocessing/ files

RESUME_DISCARD_SEEDING

--resume-discard-seeding, --resume-carry-seeding

Whether or not to keep seeding in resume runs.

FALSE

--resume-carry-seeding TRUE for TRUE, --resume-carry-seeding or no mention for FALSE

several batch/ files

RESUME_LOCATION

-r, --restart-from-location

The location (folder or an S3 bucket) where the previous run is stored.

path/to/last_job/output

built_initial_seeding.R, calibrate.py, slurm_init.sh, hpc_init, inference_job_launcher.py

RESUME_RUN

-R, --is-resume

Whether or not this run is a resume.

FALSE

--is-a-resume TRUE for TRUE, --is-a-resume or no mention for FALSE

flepimop-inference-slot.R, flepimop-inference-main.R

RESUME_RUN_INDEX

Not a CLI option.

Index of resumed run.

set by OLD_FLEPI_RUN_INDEX

int

SLURM_inference_job.run

RSCRIPT_PATH

-r, --rpath

Path to R executable.

'Rscript'

path/to/your_R

build_initial_seeding.R, flepimop-inference-slot.R, flepimop-inference-main.R

RUN_INTERACTIVE

-I, --is-interactive

Whether or not the current run is interactive.

FALSE

--is-interactive for TRUE, --is-interactive or no mention for FALSE

flepimop-inference-slot.R, flepimop-inference-main.R

SAVE_HOSP

-H, --save_hosp

Whether or not the HOSP output files should be saved for each iteration.

TRUE

--save_hosp FALSE for FALSE, --save_hosp or no mention for TRUE

flepimop-inference-slot.R, flepimop-inference-main.R

SAVE_SEIR

-S, --save_seir

Whether or not the SEIR output files should be saved for each iteration.

FALSE

--save_seir TRUE for TRUE, --save_seir or no mention for FALSE

flepimop-inference-slot.R, flepimop-inference-main.R

SEED_VARIANTS

-s, --seed_variants

Whether or not to add variants/subtypes to outcomes in seeding.

FALSE, TRUE

create_seeding.R

SIMS_PER_JOB

Not a CLI option.

Simulations per job.

N/A

int >=1

AWS_postprocess_runner.sh, inference_job_launcher.py, AWS_inference_runner.sh

SLACK_CHANNEL

-s, --slack-channel

Slack channel, either 'csp-production' or 'debug'; or 'noslack' to disable slack.

csp-production, debug, or noslack

postrpocess_auto.py, postprocessing-scripts.sh, inference_job_launcher.py

SLACK_TOKEN

-s, --slack-token

Slack token.

postprocess_auto.py, SLURM_postprocess_runner.run

SUBPOP_LENGTH

-g, --subpop_len

Number of digits in subpops.

5

int

flepimop-inference-slot.R, flepimop-inference-main.R

S3_MODEL_PROJECT_PATH

Not a CLI option.

Location in S3 bucket with the code, data, and dvc pipeline.

N/A

path/to/code_data_dvc

several batch/ files

S3_RESULTS_PATH

Not a CLI option.

Location in S3 to store results.

N/A

path/to/s3/results

several batch/ files

S3_UPLOAD

Not a CLI option.

Whether or not we also save runs to S3 for slurm runs

TRUE

TRUE, FALSE

SLURM_postprocess_runner.run, SLURM_inference_job.run, inference_job_launcher.py

VALIDATION_DATE

--validation-end-date

First date of projection/forecast (first date without ground truth data).

date.today()

YYYY-MM-DD format

data_setup_source.R, DataUtils.R, groundtruth_source.R, slurm_init.sh, hpc_init, inference_job_launcher.py

FlepiMoP Documentation

Home

General description of flepiMoP

Acknowledgments

gempyor: modeling infectious disease dynamics

Modeling infectious disease dynamics

Generalized compartmental infection model

SEIR model

Age groups

Vaccination status

Pathogen strain

Clinical outcomes and observations model

Population structure

Mixing between subpopulations

Initial conditions

Time-dependent interventions

Model Implementation

flepiMoP's configuration file

About configuration files

Example

Notation

Configuration files sections

Global header

subpop_setup section

compartments section

seir section

initial_conditions section

seeding section

outcomes section

interventions section

inference section

Specifying population structure

Overview

Items and options

geodata file and selected option

Example geodata file format

mobility file

Example mobility file format

Examples

Example 1

Specifying compartmental model

Specifying model compartments (compartments)

Specifying compartmental model transitions (seir::transitions)

Specifying a single transition

Source

Destination

Rate

Proportional to

Proportion exponent

Summary

Transition globs

Source

Destination

Rate

Proportional to

Proportion exponent

Summary

Warning

Specifying compartmental model parameters (seir::parameters)

Specifying fixed parameter values

Specifying parameters values from distributions

Specifying parameter values as timeseries from data files

Specifying model simulation method (seir::integration)

Specifying initial conditions

Overview

Specifying model initial conditions

initial_conditions::method

Default

SetInitialConditions

FromFile

SetInitialConditionsFolderDraw, FromFileFolderDraw

Specifying seeding

Overview

Specifying model seeding

seeding::method

NoSeeding

FromFile

PoissonDistributed or NegativeBinomialDistributed

FolderDraw

Specifying observational model

`subpop_setup` section

`compartments` section

`seir` section

`initial_conditions` section

`seeding` section

`outcomes` section

`interventions` section

`inference` section

`geodata` file and `selected` option

`mobility` file

Specifying model compartments (`compartments`)

Specifying compartmental model transitions (`seir::transitions`)

Specifying compartmental model parameters (`seir::parameters`)

Specifying model simulation method `(seir::integration)`

`initial_conditions::method`

Thinking about `outcomes` variables

Specifying `outcomes` in the configuration file

`modifiers::scenarios`

`modifiers::settings`

`importation` section (optional)

`importation::param_list`

`report` section

`f`

`f`