# Specifying compartmental model

This section describes how to specify the compartmental model of infectious disease transmission.

We want to allow users to work with a wide variety of infectious diseases or, one infectious disease under a wide variety of modeling assumptions. To facilitate this, we allow the user to specify their compartmental model of disease dynamics via the configuration file.

We originally considered asking users to specify each compartment and transition manually. However, we quickly found that this created long, confusing configuration files, and so we created a shorthand to more succinctly specify both compartments and transitions between them. This works especially well for models where individuals are stratified by other properties (like age, vaccination status, etc.) in addition to their infection status.

The model is specified in two separate sections of the configuration file. In the `compartments`

section, users define the possible states individuals can be categorized into. Then in the `seir`

section, users define the possible transitions between states, the values of parameters that govern the rates of these transitions, and the numerical method used to simulate the model.

An example section of a configuration file defining a simple SIR model is below.

## Specifying model compartments (`compartments`

)

`compartments`

)The first stage of specifying the model is to define the infection states (variables) that the model will track. These "compartments" are defined first in the `compartments`

section of the config file, before describing the processes that lead to transitions between them. The compartments are defined separately from the rest of the model because they are also used by the `seeding`

section that defines initial conditions and importations.

For simple disease models, the compartments can simply be listed with whatever notation the user chooses. For example, for a simple SIR model, the compartments could be `["S", "I", "R"]`

. The config also requires that there be a variable name for the property of the individual that these compartments describe, which for example in this case could be `infection_stage`

Our syntax allows for more complex models to be specified without much additional notation. For example, consider a model of a disease that followed SIR dynamics but for which individuals could receive vaccination, which might change how they experience infection.

In this case we can specify compartments as the cross product of multiple states of interest. For example:

Corresponds to 6 compartments, which the code internally converts to this data frame

In order to more easily describe transitions, we want to be able to refer to a compartment by its components, but then use it by its compartment name.

If the user wants to specify a model in which some compartments are repeated across states but others are not, there will be pros and cons of how the model is specified. Specifying it using the cross product notation is simpler, less error prone, and makes config files easier to read, and there is no issue with having compartments that have zero individuals in them throughout the model. However, for very large models, extra compartments increase the memory required to conduct the simulation, and so having unnecessary compartments tracked may not be desired.

For example, consider a model of a disease that follows SI dynamics in two separate age groups (children and adults), but for which only adults receive vaccination, with one or two doses of vaccine. With the simplified notation, this model could be specified as:

corresponding to 12 compartments, 4 of which are unnecessary to the model

Or, it could be specified with the less concise notation

which does not result in any unnecessary compartments being included.

These compartments are referenced in multiple different subsequent sections of the config. In the `seeding (LINK TBA)`

section the user can specify how the initial (or later imported) infections are distributed across compartments; in the `seir`

section the user can specify the form and rate of the transitions between these compartments encoded by the model; in the `outcomes`

section the user can specify how the observed variables are generated from the underlying model states.

Notation must be consistent between these sections.

## Specifying compartmental model transitions (`seir::transitions`

)

`seir::transitions`

)The way we specify transitions between compartments in the model is a bit more complicated than how the compartments themselves are specified, but allows users to specify complex stratified infectious disease models with minimal code. This makes checking, sharing, and updating models more efficient and less error-prone.

We specify one or more *transition globs*, each of which corresponds to one or more transitions. Since transition globs are shorthand for collections of transitions, we will first explain how to **specify a single transition** before discussing transition globs.

A transition has 5 pieces of associated information that a user can specify:

`source`

`destination`

`rate`

`proportional_to`

`proportion_exponent`

For more details on the mathematical forms possible for transitions in our models, read the Model Description section.

We first consider a simple example of an SI model where individuals may either be vaccinated (*v*) or unvaccinated (*u*), but the vaccine does not change the susceptibility to infection nor the infectiousness of infected individuals.

We will focus on describing the first transition of this model, the rate at which unvaccinated individuals move from the susceptible to infected state.

### Specifying a single transition

#### Source

The compartment the transition moves individuals *out of* (e.g., the *source* compartment) is an array. For example, to describe a transition that moves unvaccinated susceptible individuals to another state, we would write

which corresponds to the compartment `S_unvaccinated`

#### Destination

The compartment the transition moves individuals *into* (e.g. the *destination* compartment) is an array. For example, to describe a transition that moves individuals into the unvaccinated but infected state, we would write

which corresponds to the compartment `I_unvaccinated`

#### Rate

The rate constant specifies the probability per time that an individual in the source compartment changes state and moves to the destination compartment. For example, to describe a transition that occurs with rate 5/time, we would write:

instead, we could describe the rate using a parameter `beta`

, which can be given a numeric value later:

The interpretation and unit of the rate constant depend on the model details, as the rate may potentially also be per number (or proportion) of individuals in other compartments (see below).

#### Proportional to

A vector of groups of compartments (each of which is an array) that modify the overall rate of transition between the source and destination compartment. Each separate group of compartments in the vector are first summed, and then all entries of the vector are multiplied to get the rate modifier. For example, to specify that the transition rate depends on the product of the number of unvaccinated susceptible individuals and the total infected individuals (vaccinated and unvaccinated), we would write:

To understand this term, consider the compartments written out as strings

and then sum the terms in each group

From here, we can say that the transition we are describing is proportional to `S_unvaccinated`

and `I_unvaccinated + I_vaccinated,`

i.e., the rate depends on the product `S_unvaccinated * (I_unvaccinated + I_vaccinated)`

.

For transitions that occur at a constant per-capita rate (ie, E -> I at rate $\gamma$ in an SEIR model), it is possible to simply write `proportional_to: ["source"]`

.

#### Proportion exponent

This is an exponent modifying each group of compartments that contribute to the rate. It is equivalent to the "order" term in chemical kinetics. For example, if the reaction rate for the model above depends linearly on the number of unvaccinated susceptible individuals but on the total infected individuals sub-linearly, for example to a power 0.9, we would write:

or a power parameter `alpha`

, which can be given a numeric value later:

The (top level) length of the `proportion_exponent`

vector must be the same as the (top level) length of the `proportional_to`

vector, even if the desire of the user is to have the same exponent for all terms being multiplied together to get the rate.

#### Summary

Putting it all together, the model transition is specified as

would correspond to the following model if expressed as an ordinary differential equation

with parameter and parameter (we will describe how to use parameter symbols in the transitions and specify their numeric values separately in the section Specifying compartmental model parameters).

### Transition globs

We now explain a shorthand we have developed for specifying multiple transitions that have similar forms all at once, via *transition globs*. The basic idea is that for each component of the single transitions described above where a term corresponded to a single model compartment, we can instead specify *one or more* compartment. Similarly, multiple rate values can be specified at once, for each involved compartment. From one transition glob, multiple individual transitions are created, by *broadcasting* across the specified compartments.

For transition globs, any time you could specify multiple arguments as a list, you may instead specify one argument as a non-list, which will be used for every broadcast. So [1,1,1] is equivalent to 1 if the dimension of that broadcast is 3.

We continue with the same SI model example, where individuals are stratified by vaccination status, but expand it to allow infection to occur at different rates in vaccinated and unvaccinated individuals:

#### Source

We allow one or more arguments to be specified for each compartment. So to specify the transitions out of both susceptible compartments (`S_unvaccinated`

and `S_unvaccinated`

), we would use

#### Destination

The destination variable should be the same shape as the `source`

, and in the same relative order. So to specify a transition from `S_unvaccinated`

to `I_unvaccinated`

and `S_vaccinated`

to `I_vaccinated`

, we would write the `destination`

as:

If instead we wrote:

we would have a transition from `S_unvaccinated`

to `I_vaccinated`

and `S_vaccinated`

to `I_unvaccinated`

.

#### Rate

The rate vector allows users to specify the rate constant for all the source -> destination transitions that are defined in a shorthand way, by instead specifying how the rate is altered depending on the compartment type. For example, the rate of transmission between a susceptible (S) and an infected (I) individual may vary depending on whether the susceptible individual is vaccinated or not **and** whether the infected individual is vaccinated or not. The overall rate constant is constructed by multiplying together or "broadcasting" all the compartment type-specific terms that are relevant to a given compartment.

For example,

This would mean our transition from `S_unvaccinated`

to `I_unvaccinated`

would have a rate of `3 * 0.6`

while our transition from `S_vaccinated`

to I`_vaccinated`

would have a rate of `3 * 0.5`

.

The rate vector should be the same shape as `source`

and `destination`

and in the same relative order.

Note that if the desire is to make a model where the difference in the rate constants varies in a more complicated than multiplicative way between different compartment types, it would be better to specify separate transitions for each compartment type instead of using this shorthand.

#### Proportional to

The broadcasting here is a bit more complicated. In other cases, each broadcast is over a single component. However, in this case, we have a broadcast over a group of components. We allow a different group to be chosen for each broadcast.

Again, let's unpack what it says. Since the broadcast is over groups, let's split the config back up

into those groups

From here, we can say that we are describing two transitions. Both occur proportionally to the same compartments: `S_unvaccinated`

and the total number of infections (`I_unvaccinated+I_vaccinated`

).

If, for example, we want to model a situation where vaccinated susceptibles cannot be infected by unvaccinated individuals, we would instead write:

#### Proportion exponent

Similarly to `rate`

and `proportional_to`

, we provide an exponent for each component and every group across the broadcast. So we could for example use:

The (top level) length of the `proportion_exponent`

vector must be the same as the (top level) length of the `proportional_to`

vector, even if the desire of the user is to have the same exponent for all terms being multiplied together to get the rate. Within each vector entry, the arrays must have the same length as the `source`

and `destination`

vectors.

#### Summary

Putting it all together, the transition glob

is equivalent to the following transitions

#### Warning

We warn the user that with this shorthand, it is possible to specify large models with few lines of code in the configuration file. The more compartments and transitions you specify, the longer the model will take to run, and the more memory it will require.

## Specifying compartmental model parameters (`seir::parameters`

)

`seir::parameters`

)When the transitions of the compartmental model are specified as described above, they can either be entered as numeric values (e.g., `0.1`

) or as strings which can be assigned numeric values later (e.g., `beta`

). We recommend the latter method for all but the simplest models, since parameters may recur in multiple transitions and so that parameter values may be edited without risk of editing the model structure itself. It also improves readability of the configuration files.

Parameters can take on three types of values:

Fixed values

Value drawn from distributions

Values read from timeseries specified in a data file

### Specifying fixed parameter values

Parameters can be assigned values by using the `value`

argument after their name and then simply stating their numeric argument. For example, in a config describing a simple SIR model with transmission rate $\beta$ (`beta`

) = 0.1/day and recovery rate $\gamma$ (`gamma`

) = 0.2/day. This could be specified as

The full model section of the config could then read

For the stratified SI model described above, this portion of the config would read

If there are no parameter values that need to be specified (all rates given numeric values when defining model transitions), the `seir::parameters`

section of the config can be left blank or omitted.

### Specifying parameters values from distributions

Parameter values can also be specified as random values drawn from a distribution, as a way of including uncertainty in parameters in the model output. In this case, every time the model is run independently, a new random value of the parameter is drawn. For example, to choose the same value of `beta`

= 0.1 each time the model is run but to choose a random values of `gamma`

with mean on a log scale of $e^{-1.6} = 0.2$ and standard deviation on a log scale of $e^{0.2} = 1.2$ (e.g., 1.2-fold variation):

Details on the possible distributions that are currently available, and how to specify their parameters, is provided in the Distributions section.

Note that understanding when a new parameter values from this distribution is drawn becomes more complicated when the model is run in Inference mode. In Inference mode, we distinguish model runs as occurring in different "slots" – i.e., completely independent model instances that could be run on different processing cores in a parallel computing environment – and different "iterations" of the model that occur sequentially when the model is being fit to data and update fitted parameters each time based on the fit quality found in the previous iteration. A new parameter values is only drawn from the above distribution **once per slot**. Within a slot, at each iteration during an inference run, the parameter is only changed if it is being fit and the inference algorithm decides to perturb it to test a possible improved fit. Otherwise, it would maintain the same value no matter how many times the model was run within a slot.

### Specifying parameter values as timeseries from data files

Sometimes, we want to be able to specify model parameters that have different values at different timepoints. For example, the relative transmissibility may vary throughout the year based on the weather conditions, or the rate at which individuals are vaccinated may vary as vaccine programs are rolled out. One way to do this is to instead specify the parameter values as a timeseries.

This can be done by providing a data file in .csv or .parquet format that has a list of values of the parameter for a corresponding timepoint and subpopulation name. One column should be `date`

, which should have an entry for every calendar day of the simulation, with the first and last date corresponding to the `start_date`

and `end_date`

for the simulation specified in the header of the config. There should be another column for each subpopulation, where the column name is the subpop name used in other files and the values are the desired parameter values for that subpopulation for the corresponding day. If any day or subpopulation is missing, an error will occur. However, if you want all subpopulations to have the same parameter value for every day, then only a single column in addition to date is needed, which can have any name, and will be applied to every subpop.

For example, for an SIR model with a simple two-province population structure where the relative transmissibility peaks on January 1 then decreases linearly to a minimal value on June 1 then increases linearly again, but varies more in the small province than the large province, the `theta`

parameter could be constructed from the file **seasonal_transmission_2pop.csv** with contents including

as a part of a configuration file with the model sections:

Note that there is an alternative way to specify time dependence in parameter values that is described in the Specifying time-varying parameter modifications section. That method allows the user to define intervention parameters that apply specific additive or multiplicative shifts to other parameter values for a defined time interval. Interventions are useful if the parameter doesn't vary frequently and if the values of the shift is unknown and it is desired to either sample over uncertainty in it or try to estimate its value by fitting the model to data. If the parameter varies frequently and its value or relative value over time is known, specifying it as a timeseries is more efficient.

Compartmental model parameters can have an additional attribute beyond `value`

or `timeseries`

, which is called `stacked_modifier_method`

. This value is explained in the section on coding `time-dependent parameter modifications`

(also known as "modifiers") as it determines what happens when two different modifiers act on the same parameter at the same time (are they combined additively or multiplicatively?).

Config item | Required? | Type/Format | Description |
---|---|---|---|

| either value or timeseries is required | numerical, or distribution | This defines the value of the parameter, as described above. |

| either value or timeseries is required | path to a csv file | This defines a timeseries for each day, as above. |

| optional | string: | This option defines the method used when modifiers are applied. The default is |

| optional | integer | The size of the rolling mean window if a rolling mean is applied. |

## Specifying model simulation method `(seir::integration)`

`(seir::integration)`

A compartmental model defined using the notation in the previous sections describes rules for classifying individuals in the population based on infection state dynamically, but does not uniquely specify the mathematical framework that should be used to simulate the model.

Our framework allows for two major methods for implementing compartmental models of disease transmission:

ordinary differential equations, which are completely deterministic, operate in continuous time (consider infinitesimally small timesteps), and allow for arbitrary fractions of the population (i.e., not just discrete individuals) to move between model compartments

discrete-time stochastic process, which tracks discrete individuals and produces random variation in the number of individuals transitioning between states for any given rate, and which allows transitions between states only to occur at discrete time intervals

The mathematics behind each implementation is described in the Model Description section

Config item | Required? | Type/Format | Description |
---|---|---|---|

| optional | string: | The algorithm used to simulate the mode equations. If |

| optional | Any positive real number | The timestep used for the numerical integration or discrete time stochastic update. Default is |

For example, to simulate a model deterministically using the 4th order Runge-Kutta algorithm for numerical integration with a timestep of 1 day:

Alternatively, to simulate a model stochastically with a timestep of 0.1 days

For any method, the results of the model will be more accurate when the timestep is smaller (i.e., output will more precisely match the mathematics of the model description and be invariant to the choice of timestep). However, the computing time required to simulate the model for a certain time range of interest increases with the number of timesteps required (i.e., with smaller timesteps). In our experience, the 4th order Runge-Kutta algorithm (for details see Advanced section) is a very accurate method of numerically integrating such models and can handle timesteps as large as roughly a day for models with the maximum per capita transition rates in this same order of magnitude. However, the discrete time stochastic model or the legacy method for integrating the model in deterministic mode require smaller timesteps to be accurate (around 0.1 for COVID-19-like dynamics in our experience.

Last updated