Modeling infectious disease dynamics
Last updated
Last updated
Within flepiMoP, gempyor is an open-source Python package that constructs and simulates compartmental infectious disease dynamics models. gempyor is meant to be used within flepiMoP, where it integrates with parameter inference and data processing scripts, but can also be run standalone with a command-line interface, generating simulations of disease incidence under different scenario assumptions.
To simulate an infectious disease dynamics problem, the following building blocks needs to be defined:
The population structure over which the disease is transmitted
The transmission model, defining the compartments and the transitions between compartments
An observation model, defining different observable outcomes (serology, hospitalization, deaths, cases) from the transmission model
The parameters and modifiers that apply to them
At the core of our pipeline is a dynamic mathematical model that categorizes individuals in the population into a discrete set of states ('compartments') and describes the rates at which transitions between states can occur. Our modeling approach was developed to describe classic infectious disease transmission models, like the SIR model, but is much more general. It can encode any compartmental model in which transitions between states are of the form
where , , and are time-dependent variables describing the number of individuals in each state, is a rate parameter (units of time) and is a scaling parameter (unitless). may be , , a different variable, or 1, and the rate may also be the sum of terms of this form. Rates that involve non-linear functions or more than two variables are currently not possible. For simplicity, we omitted the time dependencies on parameters (e.g is in fact and , are ,).
The model can be simulated as a continuous-time, deterministic process (i.e., a set of ordinary differential equations), which in this example would be in the form
Details on the numerical integration procedure for simulating such an equation is given in the Advanced section.
and as a stochastic process, it is
A common COVID-19 model is a variation of this SEIR model that incorporates:
multiple identical stages of the infectious period, which allows us to model gamma-distributed durations of infectiousness, and
A three-stage infectious period model is given by
The flepiMoP model structure is specifically designed to make it simple to encode the type of more complex "stratified'' models that often arise in infectious disease dynamics. The following are some examples of possible stratifications.
To describe an SEIR-type disease that spreads and progresses differently among children versus adults, one may want to repeat each compartment of the model for each of the two age groups (C – Children, A – Adults), creating an age-stratified model
Vaccination status could influence disease progression and infectiousness, and could also change over time as individuals choose to get the vaccine (V – vaccinated, U – unvaccinated)
Another common stratification would be pathogen strain, such as COVID-19 variants. Individuals may be infected with one of several variants, strains, or serotypes. Our framework can easily create multistrain models, for example
All combinations of these situations can be quickly specified in flepiMoP. Details on how to encode these models is provided in the Model Implementation section, with examples given in the Tutorials section.
The pipeline allows for an additional type of dynamic state variable beyond those included in the mathematical model. We refer to these extra variables as "Outcomes" or "Observations". Outcome variables can be functions of model variables, but do not feed back into the model by influencing other state variables. Typically, we use outcome variables to describe the process through which some subset of individuals in a compartment are "observed'' and become part of the data to which models are compared and attempt to predict. For example, in the context of a model for an infectious disease like COVID-19, outcome variables include reported cases, hospitalizations, and deaths.
In addition to single values (drawn from a distribution), the duration and delay can be inputted as distributions, producing a convolution of the output.
Formally, for a deterministic, continuous-time model
For a discrete-time, stochastic model
Outcomes can also be constructed as functions of other outcomes. For example, a fraction of hospitalized patients may end up in the intensive care unit (ICU).
There are several benefits to separating outcome variables from the mathematical model. Firstly, these variables can be calculated after the model is run, and only at the timepoints of interest, which can dramatically reduce the memory needed during model simulation. Secondly, outcome variables can be fully stochastic even when the mathematical model is simulated deterministically. This becomes useful when an infection might be at high enough prevalence that a deterministic simulation is appropriate, but when there is a rare and therefore quite stochastic outcome reported in the data (e.g., severe cases) that the model is tasked with predicting. Thirdly, outcome variables can have arbitrary delay distributions, to take into account the complexities of health reporting practices, whereas our mathematical modeling framework is designed mainly for exponentially distributed delays and only easily permits extensions to gamma-distributed delays. Finally, this separation keeps the pipeline modular and allow for easy editing of one component of the model without disrupting the other.
Details on how to specify these outcomes in the model configuration files is provided in the Model Implementation section, with examples given in the Tutorials section.
The pipeline was designed specifically to simulate infection dynamics in a set of connected subpopulations. These subpopulations could represent geographic divisions, like countries, states, provinces, or neighborhoods, or demographic groups, or potentially even different host species. The equations and parameters of the transmission and outcomes models are repeated for each subpopulation, but the values of the parameters can differ by location. Within each subpopulation, infection is equally likely to spread between any pair of susceptible/infected individuals after accounting for their infection class, whereas between subpopulations there may be varying levels of mixing.
Formally, this type of population structure is often referred to as a “metapopulation”, and each subpopulation may be called a “deme”.
The following properties may be different between subpopulations:
the population size
the parameters of the transmission model (see LINK)
the parameters of the outcomes model (see LINK)
the amount of transmission that occurs within this subpopulation versus from any other subpopulation (see LINK)
the timing and extent of any interventions that modify these parameters (see LINK)
the initial timing and number of external introductions of infections into the population (see LINK)
the ground truth timeseries data used to compare to model output and infer model parameters (see LINK)
Currently, the following properties must be the same across all subpopulations:
the compartmental model structure
the form of the likelihood function used to estimate parameters by fitting the model to data (LINK)
...
The generalized compartmental model allows for second order “interaction” terms that describe transitions between model states that depend on interactions between pairs of individuals. For example, in the context of a classical SIR model, the rate of new infections depends on interactions between susceptible and infectious individuals and the transmission rate
For a model with multiple subpopulations, each of these interactions can occur either between individuals in the same or different subpopulations, with specific rate parameters for each combination of individual locations
The transmission matrix is then
The list of all pairwise mobility values and the interaction scaling factor are model input parameters. Details on how to specify them are given in the Model Implementation section.
If an alternative compartmental disease model is created that has other interactions (second order terms), then the same mobility values are used to determine the degree of interaction between each pair of subpopulations.
It might also be necessary to model instantaneous changes in values of model variables at any time during a simulation. We call this 'seeding'. For example, individuals may import infection from other external populations, or instantaneous mutations may occur, leading to new variants of the pathogen. These processes can be modeled with seeding, allowing individuals to change state at specified times independently of model equations.
We also note that seeding can also be used as a convenient way to specify initial conditions, particularly ealy in an outbreak where the outbreak is triggered by a few 'seedings'.
Parameters in the disease transmission model or the observation model may change over time. These changes could be, for example: environmental drivers of disease seasonality; “non-pharmaceutical interventions” like social distancing, isolation policies, or wearing of personal protective equipment; “pharmaceutical interventions” like vaccination, prophylaxis, or therapeutics; changes in healthcare seeking behavior like testing and diagnosis; changes in case reporting, etc.
The model allows for any parameter of the disease transmission model or the observation model to change to a new value for a time interval specified by start and end times (or multiple start and end times, for interventions that are recurring). Each change may be subpopulation-specific or apply to the entire population. Changes may be overlapping in time.
The magnitude of these changes are themselves model parameters, and thus may inferred along with other parameters when the model is fit to data. Currently, the start and end times of interventions must be fixed and cannot be varied or inferred.
StackedModifier
- TBA
Alternatively, the model can be simulated as a discrete-time stochastic process, where the number of individuals transitioning between states and at time is a binomial random variable
where the second term is the expected fraction of individuals in the state at time who would transition to by time if there were no other changes to in this time, and time step is a chosen parameter that must be small for equivalence between continuous- and discrete-time versions of the model.
For example, an SEIR model – which describes an infection for which susceptible individuals () who are infected first pass through a latent or exposed () phase before becoming infectious () and that confers perfect lifelong immunity after recovery () – could be encoded as
where is the transmission rate (rate of infectious contact per infectious individual), is the rate of progression ( is the average latent/incubation period), and is the recovery rate ( is the average duration of the infectious period), and is the total population size (). In differential equation form, this model is
an infection rate modified by a 'mixing coefficient', , which is a rough heuristic for the slowdown in disease spread that occurs in realistically heterogeneous populations where more well-connected individuals are infected first.
where is the transmission rate between age X and Y, and we have assumed individuals do not age on the timescale relevant to the model.
where is the vaccination rate (we assume that individuals do not receive the vaccine while they are exposed or infectious) and is the vaccine efficacy against infection. Similar structures could be used for other sources of prior immunity or other dynamic risk groups.
where is the immune cross-protection conferred from infection with strain A to subsequent infection with strain B. Co-infection is ignored. All individuals are assumed to be initially equally susceptible to both infections and are just categorized as (vs ) for convenience.
An outcome variable can be generated from a state variable of the mathematical model using the following properties:
The proportion of all individuals in who will be observed as ,
The delay between when an individual enters state and when they are observed as , which can follow a class of probability distributions where is the parameters of the distribution (e.g., the mean and standard deviation of a normal distribution)
(optional) the duration spent in observable , in which case the output will also contain the prevalence (number of individuals currently in in addition to the incidence into
The number of individuals in at time who become part of the outcome variable is a random variable, and individuals who are observed in at time could have entered at different times in the past.
Note that outcomes constructed in this way always represent incidence values; meaning they describe the number of individuals newly entering this state at time . If the model state is also an incidence, then is a unitless probability, whereas if is a prevalence (number of individuals currently in state at time ), then is instead a probability per time unit.
where is the per-contact per-time rate of disease transmission between an infected individual residing in subpopulation and a susceptible individual from subpopulation .
In general for infection models in connected subpopulations, the transmission rates can take on arbitrary values. In this pipeline, however, we impose an additional structure on these terms. We assume that interactions between subpopulations occur when individuals temporarily relocate to another subpopulation, where they interact with locals. We call this movement “mobility”, and it could be due to regular commuting, special travel, etc. There is a transmission rate () associated with each subpopulation , and individuals physically in that subpopulation – permanently or temporarily – are exposed and infected with this local rate whenever they encounter local susceptible individuals.
where is the onward transmission rate from infected individuals in subpopulation , is the number of individuals in subpopulation i who are interacting with individuals in subpopulation at any given time (for example, fraction who commute each day), and is a fractional scaling factor for the strength of inter-population contacts (for example, representing the fraction of hours in a day commuting individuals spend outside vs. inside their subpopulation).
Initial conditions can be specified by setting the values of the compartments in the disease transmission model at time zero, or the start of the simulation. For example, we might assume that for day zero of an outbreak the whole population is susceptible except for one single infected individual, i.e. and . Alternatively, we might assume that a certain proportion of the population has prior immunity from previous infection or vaccination.
For example, the rate of transmission in subpopulation , , may be reduced by an intervention that acts between times and , and another intervention that acts between times and
In this case, and are both considered simple SinglePeriodModifier
interventions. There are four possible types of interventions that can be included in the model
SinglePeriodModifier
- an intervention that leads to a fractional reduction in a parameter value in subpopulation (i.e., ) between two timepoints
MultiPeriodModifier
- an intervention that leads to a fractional reduction in a parameter value in subpopulation (i.e., ) value between multiple sets of timepoints
ModifierModifier
- an intervention that leads to a fractional reduction in the value of another intervention between two timepoints