Modeling infectious disease dynamics

Within flepiMoP, gempyor is an open-source Python package that constructs and simulates compartmental infectious disease dynamics models. gempyor is meant to be used within flepiMoP, where it integrates with parameter inference and data processing scripts, but can also be run standalone with a command-line interface, generating simulations of disease incidence under different scenario assumptions.

To simulate an infectious disease dynamics problem, the following building blocks needs to be defined:

  • The population structure over which the disease is transmitted

  • The transmission model, defining the compartments and the transitions between compartments

  • An observation model, defining different observable outcomes (serology, hospitalization, deaths, cases) from the transmission model

  • The parameters and modifiers that apply to them

Generalized compartmental infection model

At the core of our pipeline is a dynamic mathematical model that categorizes individuals in the population into a discrete set of states ('compartments') and describes the rates at which transitions between states can occur. Our modeling approach was developed to describe classic infectious disease transmission models, like the SIR model, but is much more general. It can encode any compartmental model in which transitions between states are of the form

XbXZaY,X \xrightarrow{b X Z^{a}} Y,

where XX, YY, and ZZ are time-dependent variables describing the number of individuals in each state, bb is a rate parameter (units of time1^{-1}) and aa is a scaling parameter (unitless). ZZ may be XX, YY, a different variable, or 1, and the rate may also be the sum of terms of this form. Rates that involve non-linear functions or more than two variables are currently not possible. For simplicity, we omitted the time dependencies on parameters (e.g XX is in fact X(t)X(t) and aa,bb are a(t)a(t),b(t)b(t)).

The model can be simulated as a continuous-time, deterministic process (i.e., a set of ordinary differential equations), which in this example would be in the form

dXdt=bXZa.\frac{dX}{dt} = b X Z^a.

Details on the numerical integration procedure for simulating such an equation is given in the Advanced section.

Alternatively, the model can be simulated as a discrete-time stochastic process, where the number of individuals transitioning between states XX and YY at time tt is a binomial random variable

NXY(t)=Binom(X,1eΔtbZ(t)a),N_{X\rightarrow Y}(t) = \textrm{Binom}(X,1-e^{-\Delta{t} \cdot bZ(t)^a}),

where the second term is the expected fraction of individuals in the XX state at time tt who would transition to YY by time t+Δtt+\Delta t if there were no other changes to XX in this time, and time step Δt\Delta{t} is a chosen parameter that must be small for equivalence between continuous- and discrete-time versions of the model.

SEIR model

For example, an SEIR model – which describes an infection for which susceptible individuals (SS) who are infected first pass through a latent or exposed (EE) phase before becoming infectious (II) and that confers perfect lifelong immunity after recovery (RR) – could be encoded as

SβSI/NEσEIγIR,S \xrightarrow{\beta S I/N} E \xrightarrow{\sigma E} I \xrightarrow{\gamma I} R,

where β\beta is the transmission rate (rate of infectious contact per infectious individual), σ\sigma is the rate of progression (1/σ1/\sigma is the average latent/incubation period), and γ\gamma is the recovery rate (1/γ1/\gamma is the average duration of the infectious period), and NN is the total population size (N=S+E+I+RN=S+E+I+R). In differential equation form, this model is

dSdt=βSIN,\frac{dS}{dt} = - \beta S \frac{I}{N} ,
dEdt=βSINσE,\frac{dE}{dt} = \beta S \frac{I}{N} - \sigma E,
dIdt=σEγI,\frac{dI}{dt} = \sigma E - \gamma I,
dRdt=γI,\frac{dR}{dt} = \gamma I,

and as a stochastic process, it is

NSE(t)=Binom(S(t),1eΔtβI(t)/N),N_{S\rightarrow E}(t) = \textrm{Binom}(S(t),1-e^{-\Delta{t} \cdot \beta I(t)/N}),
NEI(t)=Binom(E(t),1eΔtσ),N_{E\rightarrow I}(t) = \textrm{Binom}(E(t),1-e^{-\Delta{t} \cdot \sigma}),
NIR(t)=Binom(I(t),1eΔtγ).N_{I\rightarrow R}(t) = \textrm{Binom}(I(t),1-e^{-\Delta{t} \cdot \gamma }).

A common COVID-19 model is a variation of this SEIR model that incorporates:

  1. multiple identical stages of the infectious period, which allows us to model gamma-distributed durations of infectiousness, and

  2. an infection rate modified by a 'mixing coefficient', α[0,1]\alpha \in [0,1], which is a rough heuristic for the slowdown in disease spread that occurs in realistically heterogeneous populations where more well-connected individuals are infected first.

A three-stage infectious period model is given by

SβS(I1+I2+I3)α/NEσEI13γI1I23γI2I33γI3R.S \xrightarrow{\beta S (I_1+I_2+I_3)^\alpha/N} E \xrightarrow{\sigma E} I_1 \xrightarrow{3\gamma I_1} I_2 \xrightarrow{3\gamma I_2} I_3 \xrightarrow{3\gamma I_3} R.

The flepiMoP model structure is specifically designed to make it simple to encode the type of more complex "stratified'' models that often arise in infectious disease dynamics. The following are some examples of possible stratifications.

Age groups

To describe an SEIR-type disease that spreads and progresses differently among children versus adults, one may want to repeat each compartment of the model for each of the two age groups (C – Children, A – Adults), creating an age-stratified model

SCSC(βCCIC/NC+βACIA/NA)ECσCECICγCICRC,S_C \xrightarrow{S_C (\beta_{CC} I_C/N_C + \beta_{AC} I_A/N_A)} E_C \xrightarrow{\sigma_C E_C} I_C \xrightarrow{\gamma_C I_C} R_C,
SASA(βAAIA/NA+βCAIC/NC)EAσAEAIAγAIARA,S_A \xrightarrow{S_A (\beta_{AA} I_A/N_A + \beta_{CA} I_C/N_C)} E_A \xrightarrow{\sigma_A E_A} I_ A \xrightarrow{\gamma_A I_A} R_A,

where βXY\beta_{XY} is the transmission rate between age X and Y, and we have assumed individuals do not age on the timescale relevant to the model.

Vaccination status

Vaccination status could influence disease progression and infectiousness, and could also change over time as individuals choose to get the vaccine (V – vaccinated, U – unvaccinated)

SUβSU(IU+IV)/NEUσUEUIUγUIURU,S_U \xrightarrow{\beta S_U (I_U + I_V)/N} E_U \xrightarrow{\sigma_U E_U} I_U \xrightarrow{\gamma_U I_U} R_U,
SVβ(1θ)SV(IU+IV)/NEVσVEVIVγVIVRV,S_V \xrightarrow{\beta (1-\theta) S_V (I_U + I_V)/N} E_V \xrightarrow{\sigma_V E_V} I_V \xrightarrow{\gamma_V I_V} R_V,
SUνSUSV,S_U \xrightarrow{\nu S_U} S_V,
RUνRURV,R_U \xrightarrow{\nu R_U} R_V,

where uu is the vaccination rate (we assume that individuals do not receive the vaccine while they are exposed or infectious) and θ\theta is the vaccine efficacy against infection. Similar structures could be used for other sources of prior immunity or other dynamic risk groups.

Pathogen strain

Another common stratification would be pathogen strain, such as COVID-19 variants. Individuals may be infected with one of several variants, strains, or serotypes. Our framework can easily create multistrain models, for example

SAβASAIA/NAEAσAEAIAγAIARA,S_A \xrightarrow{\beta_A S_A I_A/N_A} E_A \xrightarrow{\sigma_A E_A} I_ A \xrightarrow{\gamma_A I_A} R_A,
SAβBSBIB/NBEBσBEBIBγBIBRB,S_A \xrightarrow{\beta_B S_B I_B/N_B} E_B \xrightarrow{\sigma_B E_B} I_B \xrightarrow{\gamma_B I_B} R_B,
RAβB(1ϕAB)RAIB/NBEABσABEABIABγABIABRAB,R_{A} \xrightarrow{\beta_B(1-\phi_{AB}) R_A I_B/N_B} E_{AB} \xrightarrow{\sigma_{AB} E_{AB}} I_{AB} \xrightarrow{\gamma_{AB} I_{AB}} R_{AB},
RBβA(1ϕBA)RBIA/NBEABσABEABIABγABIABRAB,R_{B} \xrightarrow{\beta_A (1-\phi_{BA}) R_B I_A/N_B} E_{AB} \xrightarrow{\sigma_{AB} E_{AB}} I_{AB} \xrightarrow{\gamma_{AB} I_{AB}} R_{AB},

where ϕAB\phi_{AB} is the immune cross-protection conferred from infection with strain A to subsequent infection with strain B. Co-infection is ignored. All individuals are assumed to be initially equally susceptible to both infections and are just categorized as SAS_A (vs SBS_B) for convenience.

All combinations of these situations can be quickly specified in flepiMoP. Details on how to encode these models is provided in the Model Implementation section, with examples given in the Tutorials section.

Clinical outcomes and observations model

The pipeline allows for an additional type of dynamic state variable beyond those included in the mathematical model. We refer to these extra variables as "Outcomes" or "Observations". Outcome variables can be functions of model variables, but do not feed back into the model by influencing other state variables. Typically, we use outcome variables to describe the process through which some subset of individuals in a compartment are "observed'' and become part of the data to which models are compared and attempt to predict. For example, in the context of a model for an infectious disease like COVID-19, outcome variables include reported cases, hospitalizations, and deaths.

An outcome variable H(t)H(t) can be generated from a state variable of the mathematical model X(t)X(t) using the following properties:

  • The proportion of all individuals in XX who will be observed as HH, pp

  • The delay between when an individual enters state XX and when they are observed as HH, which can follow a class of probability distributions f(Δt;θ)f(\Delta t;\theta) where θ\theta is the parameters of the distribution (e.g., the mean and standard deviation of a normal distribution)

  • (optional) the duration spent in observable HH, in which case the output will also contain the prevalence (number of individuals currently in HH in addition to the incidence into HH

In addition to single values (drawn from a distribution), the duration and delay can be inputted as distributions, producing a convolution of the output.

The number of individuals in XX at time t1t_1 who become part of the outcome variable H(t2)H(t_2) is a random variable, and individuals who are observed in HH at time tt could have entered XX at different times in the past.

Formally, for a deterministic, continuous-time model

H(t)=τpX(τ)f(tτ,θ)dτH(t) = \int_{\tau} p X(\tau) f(t-\tau, \theta) d\tau

For a discrete-time, stochastic model

H(t)=τi=0tMultinomial(Binomial(X(τi),p),{f(tτi,θ)}).H(t) = \sum_{\tau_i=0}^{t}\text{Multinomial} (\text{Binomial}(X(\tau_i), p), \{f(t-\tau_i, \theta)\}).

Note that outcomes H(t)H(t) constructed in this way always represent incidence values; meaning they describe the number of individuals newly entering this state at time tt. If the model state X(t)X(t) is also an incidence, then pp is a unitless probability, whereas if X(t)X(t) is a prevalence (number of individuals currently in state at time tt), then pp is instead a probability per time unit.

Outcomes can also be constructed as functions of other outcomes. For example, a fraction of hospitalized patients may end up in the intensive care unit (ICU).

There are several benefits to separating outcome variables from the mathematical model. Firstly, these variables can be calculated after the model is run, and only at the timepoints of interest, which can dramatically reduce the memory needed during model simulation. Secondly, outcome variables can be fully stochastic even when the mathematical model is simulated deterministically. This becomes useful when an infection might be at high enough prevalence that a deterministic simulation is appropriate, but when there is a rare and therefore quite stochastic outcome reported in the data (e.g., severe cases) that the model is tasked with predicting. Thirdly, outcome variables can have arbitrary delay distributions, to take into account the complexities of health reporting practices, whereas our mathematical modeling framework is designed mainly for exponentially distributed delays and only easily permits extensions to gamma-distributed delays. Finally, this separation keeps the pipeline modular and allow for easy editing of one component of the model without disrupting the other.

Details on how to specify these outcomes in the model configuration files is provided in the Model Implementation section, with examples given in the Tutorials section.

Population structure

The pipeline was designed specifically to simulate infection dynamics in a set of connected subpopulations. These subpopulations could represent geographic divisions, like countries, states, provinces, or neighborhoods, or demographic groups, or potentially even different host species. The equations and parameters of the transmission and outcomes models are repeated for each subpopulation, but the values of the parameters can differ by location. Within each subpopulation, infection is equally likely to spread between any pair of susceptible/infected individuals after accounting for their infection class, whereas between subpopulations there may be varying levels of mixing.

Formally, this type of population structure is often referred to as a “metapopulation”, and each subpopulation may be called a “deme”.

The following properties may be different between subpopulations:

  • the population size

  • the parameters of the transmission model (see LINK)

  • the parameters of the outcomes model (see LINK)

  • the amount of transmission that occurs within this subpopulation versus from any other subpopulation (see LINK)

  • the timing and extent of any interventions that modify these parameters (see LINK)

  • the initial timing and number of external introductions of infections into the population (see LINK)

  • the ground truth timeseries data used to compare to model output and infer model parameters (see LINK)

Currently, the following properties must be the same across all subpopulations:

  • the compartmental model structure

  • the form of the likelihood function used to estimate parameters by fitting the model to data (LINK)

  • MORE

Mixing between subpopulations

The generalized compartmental model allows for second order “interaction” terms that describe transitions between model states that depend on interactions between pairs of individuals. For example, in the context of a classical SIR model, the rate of new infections depends on interactions between susceptible and infectious individuals and the transmission rate

dIdt=βSIγI\frac{dI}{dt} = \beta S I - \gamma I

For a model with multiple subpopulations, each of these interactions can occur either between individuals in the same or different subpopulations, with specific rate parameters for each combination of individual locations

dIidt=jβjiIjSiγIi\frac{dI_i}{dt} = \sum_j \beta_{ji} I_j S_i - \gamma I_i

where βji\beta_{ji} is the per-contact per-time rate of disease transmission between an infected individual residing in subpopulation jj and a susceptible individual from subpopulation ii.

In general for infection models in connected subpopulations, the transmission rates βji\beta_{ji} can take on arbitrary values. In this pipeline, however, we impose an additional structure on these terms. We assume that interactions between subpopulations occur when individuals temporarily relocate to another subpopulation, where they interact with locals. We call this movement “mobility”, and it could be due to regular commuting, special travel, etc. There is a transmission rate (βj\beta_j) associated with each subpopulation jj, and individuals physically in that subpopulation – permanently or temporarily – are exposed and infected with this local rate whenever they encounter local susceptible individuals.

The transmission matrix is then

βji={paMijNiβjif ji(1jipaMijNi)if j=i\beta_{ji} = \begin{cases} p_a \frac{M_{ij}}{N_i} \beta_j &\text{if } j \neq i \\ \left( 1- \sum_{j \neq i} p_a \frac{M_{ij}}{N_i} \right) &\text{if } j = i \end{cases}

where βj\beta_j is the onward transmission rate from infected individuals in subpopulation jj, MijM_{ij} is the number of individuals in subpopulation i who are interacting with individuals in subpopulation jj at any given time (for example, fraction who commute each day), and pap_a is a fractional scaling factor for the strength of inter-population contacts (for example, representing the fraction of hours in a day commuting individuals spend outside vs. inside their subpopulation).

The list of all pairwise mobility values and the interaction scaling factor are model input parameters. Details on how to specify them are given in the Model Implementation section.

If an alternative compartmental disease model is created that has other interactions (second order terms), then the same mobility values are used to determine the degree of interaction between each pair of subpopulations.

Initial conditions

(Different types of initial conditions aka “seeding”). Still needs to be filled in.

Time-dependent interventions

Parameters in the disease transmission model or the observation model may change over time. These changes could be, for example: environmental drivers of disease seasonality; “non-pharmaceutical interventions” like social distancing, isolation policies, or wearing of personal protective equipment; “pharmaceutical interventions” like vaccination, prophylaxis, or therapeutics; changes in healthcare seeking behavior like testing and diagnosis; changes in case reporting, etc.

The model allows for any parameter of the disease transmission model or the observation model to change to a new value for a time interval specified by start and end times (or multiple start and end times, for interventions that are recurring). Each change may be subpopulation-specific or apply to the entire population. Changes may be overlapping in time.

The magnitude of these changes are themselves model parameters, and thus may inferred along with other parameters when the model is fit to data. Currently, the start and end times of interventions must be fixed and cannot be varied or inferred.

For example, the rate of transmission in subpopulation ii, βi\beta_i, may be reduced by an intervention rkr_kthat acts between times tk,startt_{k,\text{start}} and tk,endt_{k,\text{end}}, and another intervention rlr_l that acts between times tl,startt_{l,\text{start}}and tl,endt_{l,\text{end}}

βj(t)=(1rk(t;tk,start,tk,end))(1rl(t;tl,start,tl,end)))βj0\beta_j'(t) = (1-r_k(t;t_{k,\text{start}},t_{k,\text{end}}))(1-r_l(t;t_{l,\text{start}},t_{l,\text{end}})))\beta_j^0

In this case, rk(t)r_k(t) and rl(t)r_l(t) are both considered simple SinglePeriodModifier interventions. There are four possible types of interventions that can be included in the model

  • SinglePeriodModifier - an intervention rjr_j that leads to a fractional reduction in a parameter value in subpopulation jj (i.e., βj\beta_j) between two timepoints

βj(t)=(1rj(t;tj,start,tj,end))βj0\beta_j'(t) = (1-r_j(t;t_{j,\text{start}},t_{j,\text{end}}))\beta_j^0
rj(t;tj,start,tj,end)={rjif tj,start<t<tj,end0otherwiser_j(t;t_{j,\text{start}},t_{j,\text{end}}) = \begin{cases} r_j &\text{if } t_{j,\text{start}} < t <t_{j,\text{end}} \\ 0 &\text{otherwise} \end{cases}
  • MultiPeriodModifier - an intervention rjr_j that leads to a fractional reduction in a parameter value in subpopulation jj (i.e., βj\beta_j) value between multiple sets of timepoints

βj(t)=(1rj(t;{tj,k,start,tj,k,end}k))βj0\beta_j'(t) = (1-r_j(t; \{t_{j,k,\text{start}},t_{j,k,\text{end}}\}_k))\beta_j^0
rj(t;{tj,k,start,tj,k,end}k)={rjif tj,k1,start<t<tj,k1,endrjif tj,k2,start<t<tj,k2,end...rjif tj,kn,start<t<tj,kn,end0otherwiser_j(t;\{t_{j,k,\text{start}},t_{j,k,\text{end}}\}_k) = \begin{dcases} r_ j&\text{if } t_{j,k1,\text{start}} < t <t_{j,k1,\text{end}} \\ r_j &\text{if } t_{j,k2,\text{start}} < t <t_{j,k2,\text{end}} \\ & ... \\ r_j &\text{if } t_{j,kn,\text{start}} < t <t_{j,kn,\text{end}} \\ 0 &\text{otherwise} \end{dcases}
  • ModifierModifier- an intervention πj\pi_j that leads to a fractional reduction in the value of another intervention rjr_j between two timepoints

βj(t)=(1rj(t;tj,start,tj,end)(1πr,j(t;tr,j,start,tr,j,end)))βj0\beta_j'(t) = (1-r_j(t;t_{j,\text{start}},t_{j,\text{end}})(1-\pi_{r,j}(t;t_{r,j,\text{start}},t_{r,j,\text{end}})))\beta_j^0
πr,j(t;tr,j,start,tr,j,end)={πr,jif tr,j,start<t<tr,j,end0otherwise\pi_{r,j}(t;t_{r,j,\text{start}},t_{r,j,\text{end}}) = \begin{cases} \pi_{r,j} &\text{if } t_{r,j,\text{start}} < t <t_{r,j,\text{end}} \\ 0 &\text{otherwise} \end{cases}
  • StackedModifier - TBA

Last updated