Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
This section describes how to specify the values of each model state at the time the simulation starts, and how to make instantaneous changes to state values at other times (e.g., due to importations)
flepiMoP allows users to specify instantaneous changes in values of model variables, at any time during the simulation. We call this "seeding". For example, some individuals in the population may travel or otherwise acquire infection from outside the population throughout the epidemic, and this importation of infection could be specified with the seeding option. As another example, new genetic variants of the pathogen may arise due to mutation and selection that occurs within infected individuals, and this generation of new strains can also be modeled with seeding. Seeding allows individuals to change state at specified times in ways that do not depend on the model equations. In the first example, the individuals would be "seeded" into the infected compartment from the susceptible compartment, and in the second example, individuals would be seeded into the "infected with new variant" compartment from the "infected with wild type" compartment.
The seeding option can also be used as a convenient alternative way to specify initial conditions. By default, flepiMoP initiates models by putting the entire population size (specified in the geodata
file) in the first model compartment. If the desired initial condition is only slightly different than the default state, it may be more convenient to specify it with a few "seedings" that occur on the first day of the simulation. For example, for a simple SIR model where the desired initial condition is just a small number of infected individuals, this could be specified by a single seeding into the infected compartment from the susceptible compartment at time zero, instead of specifying the initial values of three separate compartments. For larger models, the difference becomes more relevant.
The configuration items in the seeding
section of the config file are
seeding:method
Must be either "NoSeeding"
, "FromFile"
, "PoissonDistributed"
, "NegativeBinomialDistributed"
, or "FolderDraw".
seeding::seeding_file
Only required for method: “FromFile”.
Path to a .csv file containing the list of seeding events
seeding::lambda_file
Only required for methods "PoissonDistributed"
or "NegativeBinomialDistributed".
Path to a .csv file containing the list of the events from which the actual seeding will be randomly drawn.
seeding::seeding_file_type
Only required for method "FolderDraw".
Either seir
or seed
Details on implementing each seeding method and the options that go along with it are below.
If there is no seeding, then the amount of individuals in each compartment will be initiated using the values specified in theinitial_conditions
section and will only be changed at later times based on the equations defined in the seir
section. No other arguments are needed in the seeding section in this case
Example
This seeding method reads in a user-defined file with a list of seeding events (instantaneous transitions of individuals between compartments) including the time of the event and subpopulation where it occurs, and the source and destination compartment of the individuals. For example, for the simple two-subpopulation SIR model where the outbreak starts with 5 individuals in the small province being infected from a source outside the population, the seeding section of the config could be specified as
Where seeding.csv contains
seeding::seeding_file
must contain the following columns:
subpop
– the name of the subpopulation in which the seeding event takes place. Seeding cannot move individuals between different subpopulations.
date
– the date the seeding event occurs, in YYYY-MM-DD format
amount
– an integer value for the amount of individuals who transition between states in the seeding event
source_*
and destination_*
– For each compartment group (i.e., infection stage, vaccination stage, age group), a different column describes the status of individuals before and after the transition described by the seeding event. For example, for a model where individuals are stratified by age and vaccination status, and a 1-day vaccination campaign for young children and the elderly moves a large number of individuals into a vaccinated state, this file could be something like
These methods are very similar to FromFile, except the seeding value used in the simulation is randomly drawn from the seeding value specified in the file, with an average value equal to the file value. These methods can be useful when the true seeded value is unknown, and only an observed value is available which is assumed to be observed with some uncertainty. The input requirements are the same for both distributions
or
and the lambda_file
has the same format requirements as the seeding_file
for the FromFile method described above.
For method::PoissonDistributed
, the seeding value for each seeding event is drawn from a Poisson distribution with mean and variance equal to the value in the amount
column. Formethod::NegativeBinomialDistributed
, seeding is drawn from a negative binomial distribution with mean amount
and variance amount+5
(so identical to "PoissonDistributed"
for large values of amount
but has higher variance for small values).
TB ;
This section describes how to specify the values of each model state at the time the simulation starts, and how to make instantaneous changes to state values at other times (e.g., due to importations)
In order for the models specified previously to be dynamically simulated, the user must provide initial conditions, in addition to the model structure and parameter values. Initial conditions describe the value of each variable in the model at the time point that the simulation is to start. For example, on day zero of an outbreak, we may assume that the entire population is susceptible except for one single infected individual. Alternatively, we could assume that some portion of the population already has prior immunity due to vaccination or previous infection. Different initial conditions lead to different model trajectories.
The initial_conditions
section of the configuration file is detailed below. Note that in some cases, the seeding
section can replace or complement the initial condition, the table below provides a quick comparison of these sections.
Config section optional or required?
Optional
Optional
Function of section
Specify number of individuals in each compartment at time zero
Allow for instantaneous changes in individuals' states
Default
Entire population in first compartment, zero in all other compartments
No seeding events
Requires input file?
Yes, .csv
Yes, .csv
Input description
Input is a list of compartment names, location names, and amounts of individuals in that compartment location. All compartments must be listed unless a setting to default missing compartments to zero is turned on.
Input is list of seeding events defined by source compartment, destination compartment, number of individuals transitioning, and date of movement. Compartments without seeding events don't need to be listed.
Specifies an incidence or prevalence?
Amounts specified are prevalence values
Amounts specified are instantaneous incidence values
Useful for?
Specifying initial conditions, especially if simulation does not start with a single infection introduced into a naive population.
Modeling importations, evolution of new strains, and specifying initial conditions
The configuration items in the initial_conditions
section of the config file are
initial_conditions:method
Must be either "Default"
, "SetInitialConditions"
, or "FromFile".
initial_conditions:initial_conditions_file
Required for methods “SetInitialConditions
” and “FromFile
” . Path to a .csv or .parquet file containing the list of initial conditions for each compartment.
initial_conditions:initial_file_type
Only required for method: “FolderDraw”
. Description TBA
initial_conditions::allow_missing_subpops
Optional for all methods, determines what will happen if initial_conditions_file
is missing values for some subpopulations. If FALSE, the default behavior, or unspecified, an error will occur if subpopulations are missing. If TRUE, then for subpopulations missing from the initial_conditions
file, it will be assumed that all individuals begin in the first compartment (the “first” compartment depends on how the model was specified, and will be the compartment that contains the first named category in each compartment group), unless another compartment is designated to hold the rest of the individuals ;
initial_conditions::allow_missing_compartments
Optional for all methods. If FALSE, the default behavior, or unspecified, an error will occur if any compartments are missing for any subpopulation. If TRUE, then it will be assumed there are zero individuals in compartments missing from the initial_conditions file
.
initial_conditions::proportional
If TRUE, assume that the user has specified all input initial conditions as fractions of the population, instead of numbers of individuals (the default behavior, or if set to FALSE). Code will check that initial values in all compartments sum to 1.0 and throw an error if not, and then will multiply all values by the total population size for that subpopulation ;
Details on implementing each initial conditions method and the options that go along with it are below.
initial_conditions::method
The default initial conditions are that the initial value of all compartments for each subpopulation will be zero, except for the first compartment, whose value will be the population size. The “first” compartment depends on how the model was specified, and will be the compartment that contains the first named category in each compartment group.
For example, a model with the following compartments
with the accompanying geodata file
will be started with 1000 individuals in the S_child_unvaxxed in the "small province" and 10,000 in that compartment in the "large province".
With this method users can specify arbitrary initial conditions in a convenient formatted input .csv or .parquet file.
For example, for a model with the following compartments
and initial_conditions
sections
with the accompanying geodata file
where initial_conditions.csv
contains
the model will be started with half of the population of both subpopulations, consisting of children and the other half of adults, everyone unvaccinated, and 5 infections (in exposed-but-not-yet-infectious class) among the unvaccinated adults in the large province, with the remaining individuals susceptible (4995). All other compartments will contain zero individuals initially ;
initial_conditions::initial_conditions_file
must contain the following columns:
subpop
– the name of the subpopulation for which the initial condition is being specified. By default, all subpopulations must be listed in this file, unless the allow_missing_subpops
option is set to TRUE.
mc_name
– the concatenated name of the compartment for which an initial condition is being specified. The order of the compartment groups in the name must be the same as the order in which these groups are defined in the config for the model, e.g., you cannot say unvaccinated_S
.
amount
– the value of the initial condition; either a numeric value or the string "rest".
For each subpopulation, if there are compartments that are not listed in SetInitialConditions
, an error will be thrown unless allow_missing_compartments
is set to TRUE, in which case it will be assumed there are zero individuals in them. If the sum of the values of the initial conditions in all compartments in a location does not add up to the total population of that location (specified in the geodata file), an error will be thrown. To allocate all remaining individuals in a subpopulation (the difference between the total population size and those allocated by defined initial conditions) to a single pre-specified compartment, include this compartment in the initial_conditions_file
but instead of a number in the amount
column, put the word "rest" ;
If allow_missing_subpops
is FALSE or unspecified, an error will occur if initial conditions for some subpopulations are missing. If TRUE, then for subpopulations missing from the initial_conditions
file, it will be assumed that all individuals begin in the first compartment. (The “first” compartment depends on how the model was specified, and will be the compartment that contains the first named category in each compartment group.)
Similar to "SetInitialConditions"
, with this method users can specify arbitrary initial conditions in a formatted .csv or .parquet input file. However, the format of the input file is different. The required file format is consistent with the output "seir" file from the compartmental model, so the user could take output from one simulation and use it as input into another simulation with the same model structure ;
For example, for an input configuration file containing
with the accompanying geodata file
where initial_conditions_from_previous.csv
contains
The simulation would be initiated on 2021-06-01 with these values in each compartment (no children vaccinated, only adults in the small province vaccinated, some past and current infection in both compartments but ).
initial_conditions::initial_conditions_file
must contain the following columns:
mc_value_type
– in model output files, this is either prevalence
or incidence
. Prevalence values only are selected to be used as initial conditions, since compartmental models described the prevalence (number of individuals at any given time) in each compartment. Prevalence is taken to be the value measured instantaneously at the start of the day
mc_name
– The name of the compartment for which the value is reported, which is a concatenation of the compartment status in each state type, e.g. "S_adult_unvaxxed" and must be in the same order as these groups are defined in the config for the model, e.g., you cannot say unvaxxed_S_adult
.
subpop_1
, subpop_2
, etc. – one column for each different subpopulation, containing the value of the number of individuals in the described compartment in that subpopulation at the given date. Note that these are named after the nodenames defined by the user in the geodata
file.
date
– The calendar date in the simulation, in YYYY-MM-DD format. Only values with a date that matches to the simulation start_date
will be used ;
The way that initial conditions is specified with SetInitialConditions
and FromFile
results in a single value for each compartment and does not easily allow the user to instead specify a distribution (like is possible for compartmental or outcome model parameters). If a user wants to use different possible initial condition values each time the model is run, the way to do this is to instead specify a folder containing a set of file with initial condition values for each simulation that will be run. The user can do this using files with the format described in initial_conditions::method::SetInitialConditions
using instead method::SetInitialConditionsFolder
draw. Similarly, to provide a folder of initial condition files with the format described in initial_conditions::method:FromFile
using instead method::FromFileFolderDraw
;
Each file in the folder needs to be named according to the same naming conventions as the model output files: run_number.runID.file_type.[csv or parquet] where ....[DESCRIBE] as it is now taking the place of the seeding files the model would normally outpu ;
Only one additional config argument is needed to use a FolderDraw method for initial conditions:
initial_file_type
: either seir
or seed
When using FolderDraw methods, initial_conditions_file
should now be the path to the directory that contains the folder with all the initial conditions files. For example, if you are using output from another model run and so the files are in an seir folder within a model_output folder which is in within your project directory, you would use initial_conditions_file: model_outpu ;
This section describes how to specify modifications to any of the parameters of the transmission model or observational model during certain time periods.
Modifiers are a powerful feature in flepiMoP to enable users to modify any of the parameters being specified in the model during particular time periods. They can be used, for example, to mirror public health control interventions, like non-pharmaceutical interventions (NPIs) or increased access to diagnosis or care, or annual seasonal variations in disease parameters. Modifiers can act on any of the transmission model parameters or observation model parameters ;
In the seir_modifiers
and outcome_modifiers
sections of the configuration file the user can specify several possible types of modifiers which will then be implemented in the model. Each modifier changes a parameter during one or multiple time periods and for one or multiple specified subpopulations.
We currently support the following intervention types. Each of these is described in detail below:
"SinglePeriodModifier"
– Modifies a parameter during a single time period
"MultiPeriodModifier"
– Modifies a parameter by the same amount during a multiple time periods
"ModifierModifier"
– Modifies another intervention during a single time period
"StackedModifier"
– Combines two or more interventions additively or multiplicatively, and is used to be able to turn on and off groups of interventions easily for different runs ;
Note that if you want a parameter to vary continuously over time (for example, a daily transmission rate that is influenced by temperature and humidity), then it is easier to do this by using a "timeseries" parameter value than by combining many separate modifiers. Timeseries parameter values are described in the section. Timeseries parameters for parameters (e.g., a testing rate that fluctuates rapidly due to test availability) are in development but not currently available ;
Within flepiMoP, modifiers can be run as "scenarios". With scenarios, we can use the same configuration file to run multiple versions of the model where only the modifiers applied differ.
The modifiers
section contains two sub-sections: modifiers::scenarios
, which lists the name of the modifiers that will run in each separate scenario, and modifiers::modifiers
, where the details of each modifier are specified (e.g., the parameter it acts on, the time it is active, and the subpopulation it is applied to). An example is outlined below
In this example, each scenario runs a single intervention, but more complicated examples are possible. ;
The major benefit of specifying both "scenarios" and "modifiers" is that the user can use "StackedModifier"
option to combine other modifiers in different ways, and then run either the individual or combined modifiers as scenarios. This way, each scenario may consist of one or more individual parameter modifications, and each modification may be part of multiple scenarios. This provides a shorthand to quickly consider multiple different versions of a model that have different combinations of parameter modifications occurring. For example, during an outbreak we could evaluate the impact of school closures, case isolation, and masking, or any one or two of these three measures. An example of a configuration file combining modifiers to create new scenarios is given below
The seir_modifiers::scenarios
andoutcome_modifiers::scenarios
sections are optional. If the scenarios
section is not included, the model will run with all of the modifiers turned "on" ;
If thescenarios
section is included for either seir
or outcomes
, then each time a configuration file is run, the user much specify which modifier scenarios will be run. If not specified, the model will be run one time for each combination of seir
and outcome
scenario ;
[Give a configuration file that tries to use all the possible option available. Based on simple SIR model with parameters beta
and gamma
in 2 subpopulations. Maybe a SinglePeriodModifier on beta
for a lockdown and gamma
for isolation, one having a fixed value and one from a distribution, MultiPeriodModifier for school year in different places, ModifierModifer for ..., StackedModifier for .... ]
modifiers::scenarios
A optional list consisting of a subset of the modifiers that are described in modifiers::settings
, each of which will be run as a separate scenario. For example
or
modifiers::settings
A formatted list consisting of the description of each modifier, including its name, the parameter it acts on, the duration and amount of the change to that parameter, and the subset of subpopulations in which the parameter modification takes place. The list items are summarized in the table below and detailed in the sections below.
SinglePeriodModifier
interventions enable the user to specify a multiplicative reduction to a parameter
of interest. It take a parameter
, and reduces it's value by value
(new = (1-value
) * old) for the subpopulations listed insubpop
during the time interval [period_start_date
, period_end_date
]
For example, if you would like to create an SEIR modifier called lockdown
that reduces transmission by 70% in the state of California and the District of Columbia between two dates, you could specify this with a SinglePeriodModifier, as in the example below
Or to create an outcome variable modifier called enhanced_testing during which the case detection rate double ;
method
: SinglePeriodModifier
period_start_date
: The date when the modification starts, in YYYY-MM-DD format. The modification will only reduce the value of the parameter after (inclusive of) this date.
period_end_date
: The date when the modification ends, in YYYY-MM-DD format. The modification will only reduce the value of the parameter before (inclusive of) this date.
subpop:
A list of subpopulation names/ids in which the specified modification will be applied. This can be a single subpop
, a list, or the word "all"
(specifying the modification applies to all existing subpopulations in the model). The modification will do nothing for any subpopulations not listed here.
MultiPeriodModifier
interventions enable the user to specify a multiplicative reduction to the parameter
of interest by value
(new = (1-value
) * old) for the subpopulations listed in subpop
during multiple different time intervals each defined by a start_date
and end_date.
For example, if you would like to describe the impact that transmission in schools has on overall disease spread, you could create a modifier that increases transmission by 30% during the dates that K-12 schools are in session in different regions (e.g., Massachusetts and Florida):
method: MultiPeriodModifier
groups:
A list of subpopulations (subpops
) or groups of them, and time periods the modification will be active in each of them
groups:subpop
A list of subpopulation names/ids in which the specified modification will be applied. This can be a single subpop
, a list, or the word "all" (
specifying the modification applies to all existing subpopulations in the model). The modification will do nothing for any subpopulations not listed here.
groups: periods
A list of time periods, each defined by a start and end date, when the modification will be applied
groups:periods:start_date
The date when the modification starts, in YYYY-MM-DD format. The modification will only reduce the value of the parameter after (inclusive of) this date.
groups:periods:end_date
The date when the modification ends, in YYYY-MM-DD format. The modification will only reduce the value of the parameter before (inclusive of) this date.
ModifierModifier
interventions allow the user to specify an intervention that acts to modify the value of another intervention, as opposed to modifying a baseline parameter value. The intervention multiplicatively reduces the modifier
of interest by value
(new = (1-value
) * old) for the subpopulations listed in subpop
during the time interval [period_start_date
, period_end_date
].
For example, ModifierModifier
could be used to describe a social distancing policy that is in effect between two dates and reduces transmission by 60% if followed by the whole population, but part way through this period, adherence to the policy drops to only 50% of in one of the subpopulations population:
Note that this configuration is identical to the following alternative specification
However, there are situations when the ModiferModifier
notation is more convenient, especially when doing parameter fitting. ;
method: ModifierModifier
baseline_modifier:
The name of the original parameter modification which will be further modified.
parameter
: The name of the parameter in the baseline_scenario
that will be modified ;
period_start_date
: The date when the intervention modifier starts, in YYYY-MM-DD format. The intervention modifier will only reduce the value of the other intervention after (inclusive of) this date.
period_end_date
: The date when the intervention modifier ends, in YYYY-MM-DD format. The intervention modifier will only reduce the value of the other intervention before (inclusive of) this date.
subpop:
A list of subpopulation names/ids in which the specified intervention modifier will be applied. This can be a single subpop
, a list, or the word "all"
(specifying the interventions applies to all existing subpopulations in the model). The intervention will do nothing for any subpopulations not listed here.
and so the value of the underlying parameter that was modified by the baseline intervention will be
Combine two or more modifiers into a scenario, so that they can easily be singled out to be run together without the other modifiers. If multiply modifiers act during the same time period in the same subpopulation, their effects are combined multiplicatively. Modifiers of different types (i.e. SinglePeriodModifier, MultiPeriodModifier, ModifierModifier, other StackedModifiers) can be combined ;
or
method
: StackedModifier
modifiers
: A list of names of the other modifiers (specified above) that will be combined to create the new modifier (which we typically refer to as a "scenario")
subpop_groups:
For any of the modifier types, subpop_groups
is an optional list of lists specifying which subsets of subpopulations in subpop
should share parameter values; when parameters are drawn from a distribution or fit to data. All other subpopulations not listed will have unique intervention values unlinked to other areas. If the value is 'all',
then all subpopulations will be assumed to have the same modifier value. When the subpop_groups
option is not specified, all subpopulations will be assumed to have unique values of the modifier ;
For example, for a model of disease spread in Canada where we want to specify that the (to be varied) value of a modification to the transmission rate should be the same in all the Atlantic provinces (Nova Scotia, Newfoundland, Prince Edward Island, and New Brunswick), the same in all the prairie provinces (Manitoba, Saskatchewan, Alberta), the same in the three territories (Nunavut, Northwest Territories, and Yukon), and yet take unique values in Ontario, Quebec, and British Columbia, we could write
This page describes how users specify the names, sizes, and connectivities of the different subpopulations comprising the total population to be modeled
The subpop_setup
section of the configuration file is where users can input the information required to define a population structure on which to simulate the model. The options allow the user to determine the population size of each subpopulation that makes up the overall population, and to specify the amount of mixing that occurs between each pair of subpopulations.
An example configuration file with the global header and the spatial_setup section is below:
geodata
filegeodata
is a .csv with column headers, with at least two columns: subpop
and population
.
nodenames
is the name of a column in geodata
that specifies unique geographical identification strings for each subpopulation.
selected
is the list of selected locations in geodata to be modeled
mobility
fileIt is also possible, but not recommended to specify the mobility
file as a .txt with space-separated values in the shape of a matrix. This matrix is symmetric and of size K x K, with K being the number of rows in geodata
. The above example corresponds to
To simulate a simple population structure with two subpopulations, a large province with 10,000 individuals and a small province with only 1,000 individuals, where every day 100 residents of the large province travel to the small province and interact with residents there, and 50 residents of the small province visit the large province
geodata.csv
contains the population structure (with columns subpop
and population
)
mobility.csv
contains
This page describes the configuration schema for specifying distributions
This section describes how to specify the compartmental model of infectious disease transmission.
We want to allow users to work with a wide variety of infectious diseases or, one infectious disease under a wide variety of modeling assumptions. To facilitate this, we allow the user to specify their compartmental model of disease dynamics via the configuration file.
We originally considered asking users to specify each compartment and transition manually. However, we quickly found that this created long, confusing configuration files, and so we created a shorthand to more succinctly specify both compartments and transitions between them. This works especially well for models where individuals are stratified by other properties (like age, vaccination status, etc.) in addition to their infection status.
The model is specified in two separate sections of the configuration file. In the compartments
section, users define the possible states individuals can be categorized into. Then in the seir
section, users define the possible transitions between states, the values of parameters that govern the rates of these transitions, and the numerical method used to simulate the model.
An example section of a configuration file defining a simple SIR model is below.
compartments
)The first stage of specifying the model is to define the infection states (variables) that the model will track. These "compartments" are defined first in the compartments
section of the config file, before describing the processes that lead to transitions between them. The compartments are defined separately from the rest of the model because they are also used by the seeding
section that defines initial conditions and importations.
For simple disease models, the compartments can simply be listed with whatever notation the user chooses. For example, for a simple SIR model, the compartments could be ["S", "I", "R"]
. The config also requires that there be a variable name for the property of the individual that these compartments describe, which for example in this case could be infection_stage
Our syntax allows for more complex models to be specified without much additional notation. For example, consider a model of a disease that followed SIR dynamics but for which individuals could receive vaccination, which might change how they experience infection.
In this case we can specify compartments as the cross product of multiple states of interest. For example:
Corresponds to 6 compartments, which the code internally converts to this data frame
In order to more easily describe transitions, we want to be able to refer to a compartment by its components, but then use it by its compartment name.
If the user wants to specify a model in which some compartments are repeated across states but others are not, there will be pros and cons of how the model is specified. Specifying it using the cross product notation is simpler, less error prone, and makes config files easier to read, and there is no issue with having compartments that have zero individuals in them throughout the model. However, for very large models, extra compartments increase the memory required to conduct the simulation, and so having unnecessary compartments tracked may not be desired.
For example, consider a model of a disease that follows SI dynamics in two separate age groups (children and adults), but for which only adults receive vaccination, with one or two doses of vaccine. With the simplified notation, this model could be specified as:
corresponding to 12 compartments, 4 of which are unnecessary to the model
Or, it could be specified with the less concise notation
which does not result in any unnecessary compartments being included.
Notation must be consistent between these sections.
seir::transitions
)The way we specify transitions between compartments in the model is a bit more complicated than how the compartments themselves are specified, but allows users to specify complex stratified infectious disease models with minimal code. This makes checking, sharing, and updating models more efficient and less error-prone.
We specify one or more transition globs, each of which corresponds to one or more transitions. Since transition globs are shorthand for collections of transitions, we will first explain how to specify a single transition before discussing transition globs.
A transition has 5 pieces of associated information that a user can specify:
source
destination
rate
proportional_to
proportion_exponent
We first consider a simple example of an SI model where individuals may either be vaccinated (v) or unvaccinated (u), but the vaccine does not change the susceptibility to infection nor the infectiousness of infected individuals.
We will focus on describing the first transition of this model, the rate at which unvaccinated individuals move from the susceptible to infected state.
The compartment the transition moves individuals out of (e.g., the source compartment) is an array. For example, to describe a transition that moves unvaccinated susceptible individuals to another state, we would write
which corresponds to the compartment S_unvaccinated
The compartment the transition moves individuals into (e.g. the destination compartment) is an array. For example, to describe a transition that moves individuals into the unvaccinated but infected state, we would write
which corresponds to the compartment I_unvaccinated
The rate constant specifies the probability per time that an individual in the source compartment changes state and moves to the destination compartment. For example, to describe a transition that occurs with rate 5/time, we would write:
instead, we could describe the rate using a parameter beta
, which can be given a numeric value later:
The interpretation and unit of the rate constant depend on the model details, as the rate may potentially also be per number (or proportion) of individuals in other compartments (see below).
A vector of groups of compartments (each of which is an array) that modify the overall rate of transition between the source and destination compartment. Each separate group of compartments in the vector are first summed, and then all entries of the vector are multiplied to get the rate modifier. For example, to specify that the transition rate depends on the product of the number of unvaccinated susceptible individuals and the total infected individuals (vaccinated and unvaccinated), we would write:
To understand this term, consider the compartments written out as strings
and then sum the terms in each group
From here, we can say that the transition we are describing is proportional to S_unvaccinated
and I_unvaccinated + I_vaccinated,
i.e., the rate depends on the product S_unvaccinated * (I_unvaccinated + I_vaccinated)
.
This is an exponent modifying each group of compartments that contribute to the rate. It is equivalent to the "order" term in chemical kinetics. For example, if the reaction rate for the model above depends linearly on the number of unvaccinated susceptible individuals but on the total infected individuals sub-linearly, for example to a power 0.9, we would write:
or a power parameter alpha
, which can be given a numeric value later:
The (top level) length of the proportion_exponent
vector must be the same as the (top level) length of the proportional_to
vector, even if the desire of the user is to have the same exponent for all terms being multiplied together to get the rate.
Putting it all together, the model transition is specified as
would correspond to the following model if expressed as an ordinary differential equation
We now explain a shorthand we have developed for specifying multiple transitions that have similar forms all at once, via transition globs. The basic idea is that for each component of the single transitions described above where a term corresponded to a single model compartment, we can instead specify one or more compartment. Similarly, multiple rate values can be specified at once, for each involved compartment. From one transition glob, multiple individual transitions are created, by broadcasting across the specified compartments.
For transition globs, any time you could specify multiple arguments as a list, you may instead specify one argument as a non-list, which will be used for every broadcast. So [1,1,1] is equivalent to 1 if the dimension of that broadcast is 3.
We continue with the same SI model example, where individuals are stratified by vaccination status, but expand it to allow infection to occur at different rates in vaccinated and unvaccinated individuals:
We allow one or more arguments to be specified for each compartment. So to specify the transitions out of both susceptible compartments (S_unvaccinated
and S_unvaccinated
), we would use
The destination variable should be the same shape as the source
, and in the same relative order. So to specify a transition from S_unvaccinated
to I_unvaccinated
and S_vaccinated
to I_vaccinated
, we would write the destination
as:
If instead we wrote:
we would have a transition from S_unvaccinated
to I_vaccinated
and S_vaccinated
to I_unvaccinated
.
The rate vector allows users to specify the rate constant for all the source -> destination transitions that are defined in a shorthand way, by instead specifying how the rate is altered depending on the compartment type. For example, the rate of transmission between a susceptible (S) and an infected (I) individual may vary depending on whether the susceptible individual is vaccinated or not and whether the infected individual is vaccinated or not. The overall rate constant is constructed by multiplying together or "broadcasting" all the compartment type-specific terms that are relevant to a given compartment.
For example,
This would mean our transition from S_unvaccinated
to I_unvaccinated
would have a rate of 3 * 0.6
while our transition from S_vaccinated
to I_vaccinated
would have a rate of 3 * 0.5
.
The rate vector should be the same shape as source
and destination
and in the same relative order.
Note that if the desire is to make a model where the difference in the rate constants varies in a more complicated than multiplicative way between different compartment types, it would be better to specify separate transitions for each compartment type instead of using this shorthand.
The broadcasting here is a bit more complicated. In other cases, each broadcast is over a single component. However, in this case, we have a broadcast over a group of components. We allow a different group to be chosen for each broadcast.
Again, let's unpack what it says. Since the broadcast is over groups, let's split the config back up
into those groups
From here, we can say that we are describing two transitions. Both occur proportionally to the same compartments: S_unvaccinated
and the total number of infections (I_unvaccinated+I_vaccinated
).
If, for example, we want to model a situation where vaccinated susceptibles cannot be infected by unvaccinated individuals, we would instead write:
Similarly to rate
and proportional_to
, we provide an exponent for each component and every group across the broadcast. So we could for example use:
The (top level) length of the proportion_exponent
vector must be the same as the (top level) length of the proportional_to
vector, even if the desire of the user is to have the same exponent for all terms being multiplied together to get the rate. Within each vector entry, the arrays must have the same length as the source
and destination
vectors.
Putting it all together, the transition glob
is equivalent to the following transitions
We warn the user that with this shorthand, it is possible to specify large models with few lines of code in the configuration file. The more compartments and transitions you specify, the longer the model will take to run, and the more memory it will require.
seir::parameters
)When the transitions of the compartmental model are specified as described above, they can either be entered as numeric values (e.g., 0.1
) or as strings which can be assigned numeric values later (e.g., beta
). We recommend the latter method for all but the simplest models, since parameters may recur in multiple transitions and so that parameter values may be edited without risk of editing the model structure itself. It also improves readability of the configuration files.
Parameters can take on three types of values:
Fixed values
Value drawn from distributions
Values read from timeseries specified in a data file
The full model section of the config could then read
If there are no parameter values that need to be specified (all rates given numeric values when defining model transitions), the seir::parameters
section of the config can be left blank or omitted.
Sometimes, we want to be able to specify model parameters that have different values at different timepoints. For example, the relative transmissibility may vary throughout the year based on the weather conditions, or the rate at which individuals are vaccinated may vary as vaccine programs are rolled out. One way to do this is to instead specify the parameter values as a timeseries.
This can be done by providing a data file in .csv or .parquet format that has a list of values of the parameter for a corresponding timepoint and subpopulation name. One column should be date
, which should have an entry for every calendar day of the simulation, with the first and last date corresponding to the start_date
and end_date
for the simulation specified in the header of the config. There should be another column for each subpopulation, where the column name is the subpop name used in other files and the values are the desired parameter values for that subpopulation for the corresponding day. If any day or subpopulation is missing, an error will occur. However, if you want all subpopulations to have the same parameter value for every day, then only a single column in addition to date is needed, which can have any name, and will be applied to every subpop ;
as a part of a configuration file with the model sections:
(seir::integration)
A compartmental model defined using the notation in the previous sections describes rules for classifying individuals in the population based on infection state dynamically, but does not uniquely specify the mathematical framework that should be used to simulate the model.
Our framework allows for two major methods for implementing compartmental models of disease transmission:
ordinary differential equations, which are completely deterministic, operate in continuous time (consider infinitesimally small timesteps), and allow for arbitrary fractions of the population (i.e., not just discrete individuals) to move between model compartments
discrete-time stochastic process, which tracks discrete individuals and produces random variation in the number of individuals transitioning between states for any given rate, and which allows transitions between states only to occur at discrete time intervals
For example, to simulate a model deterministically using the 4th order Runge-Kutta algorithm for numerical integration with a timestep of 1 day:
Alternatively, to simulate a model stochastically with a timestep of 0.1 days
parameter
: The name of the parameter that will be modified. This could be a parameter defined for the transmission model in or for the observational model in . If the parameter is used in multiple transitions in the model then all those transitions will be modified by this amount ;
value:
The fractional reduction of the parameter during the time period the modification is active. This can be a scalar number, or a distribution using the notation described in the section. The new parameter value will be
subpop_groups:
An optional list of lists specifying which subsets of subpopulations in subpop should share parameter values; when parameters are drawn from a distribution or fit to data. See section below for more details ;
parameter
: The name of the parameter that will be modified. This could be a parameter defined for the transmission model in or for the observational model in . If the parameter is used in multiple transitions in the model then all those transitions will be modified by this amount ;
value:
The fractional reduction of the parameter during the time period the modification is active. This can be a scalar number, or a distribution using the notation described in the section. The new parameter value will be
subpop_groups:
An optional list of lists specifying which subsets of subpopulations in subpop should share parameter values; when parameters are drawn from a distribution or fit to data. See section below for more details ;
value:
The fractional reduction of the baseline intervention during the time period the modifier intervention is active. This can be a scalar number, or a distribution using the notation described in the section. The new parameter value will be
subpop_groups:
An optional list of lists specifying which subsets of subpopulations in subpop should share parameter values; when parameters are drawn from a distribution or fit to data. See section below for more details ;
The mobility
file is a .csv file (it has to contain .csv as extension) with long form comma separated values. Columns have to be named ori
, dest
, amount,
with amount being the average number individuals moving from the origin subpopulation ori
to destination subpopulation dest
on any given day. Details on the mathematics of this model of contact are explained in the . Unassigned relations are assumed to be zero. The location entries in the ori
and dest
columns should match exactly the subpop
column in geodata.csv
These compartments are referenced in multiple different subsequent sections of the config. In the seeding (LINK TBA)
section the user can specify how the initial (or later imported) infections are distributed across compartments; in the section the user can specify the form and rate of the transitions between these compartments encoded by the model; in the section the user can specify how the observed variables are generated from the underlying model states.
For more details on the mathematical forms possible for transitions in our models, read the .
For transitions that occur at a constant per-capita rate (ie, E -> I at rate in an SEIR model), it is possible to simply write proportional_to: ["source"]
.
with parameter and parameter (we will describe how to use parameter symbols in the transitions and specify their numeric values separately in the section ).
Parameters can be assigned values by using the value
argument after their name and then simply stating their numeric argument. For example, in a config describing a simple SIR model with transmission rate (beta
) = 0.1/day and recovery rate (gamma
) = 0.2/day. This could be specified as
For the stratified SI model described , this portion of the config would read
Parameter values can also be specified as random values drawn from a distribution, as a way of including uncertainty in parameters in the model output. In this case, every time the model is run independently, a new random value of the parameter is drawn. For example, to choose the same value of beta
= 0.1 each time the model is run but to choose a random values of gamma
with mean on a log scale of and standard deviation on a log scale of (e.g., 1.2-fold variation):
Details on the possible distributions that are currently available, and how to specify their parameters, is provided in the .
Note that understanding when a new parameter values from this distribution is drawn becomes more complicated when the model is run in mode. In Inference mode, we distinguish model runs as occurring in different "slots" – i.e., completely independent model instances that could be run on different processing cores in a parallel computing environment – and different "iterations" of the model that occur sequentially when the model is being fit to data and update fitted parameters each time based on the fit quality found in the previous iteration. A new parameter values is only drawn from the above distribution once per slot. Within a slot, at each iteration during an inference run, the parameter is only changed if it is being fit and the inference algorithm decides to perturb it to test a possible improved fit. Otherwise, it would maintain the same value no matter how many times the model was run within a slot.
For example, for an SIR model with a simple where the relative transmissibility peaks on January 1 then decreases linearly to a minimal value on June 1 then increases linearly again, but varies more in the small province than the large province, the theta
parameter could be constructed from the file seasonal_transmission_2pop.csv with contents including
Note that there is an alternative way to specify time dependence in parameter values that is described in the section. That method allows the user to define intervention parameters that apply specific additive or multiplicative shifts to other parameter values for a defined time interval. Interventions are useful if the parameter doesn't vary frequently and if the values of the shift is unknown and it is desired to either sample over uncertainty in it or try to estimate its value by fitting the model to data. If the parameter varies frequently and its value or relative value over time is known, specifying it as a timeseries is more efficient.
Compartmental model parameters can have an additional attribute beyond value
or timeseries
, which is called stacked_modifier_method
. This value is explained in the section on coding (also known as "modifiers") as it determines what happens when two different modifiers act on the same parameter at the same time (are they combined additively or multiplicatively?) ;
The mathematics behind each implementation is described in the section
For any method, the results of the model will be more accurate when the timestep is smaller (i.e., output will more precisely match the mathematics of the model description and be invariant to the choice of timestep). However, the computing time required to simulate the model for a certain time range of interest increases with the number of timesteps required (i.e., with smaller timesteps). In our experience, the 4th order Runge-Kutta algorithm (for details see section) is a very accurate method of numerically integrating such models and can handle timesteps as large as roughly a day for models with the maximum per capita transition rates in this same order of magnitude. However, the discrete time stochastic model or the legacy method for integrating the model in deterministic mode require smaller timesteps to be accurate (around 0.1 for COVID-19-like dynamics in our experience.
method
required
string
one of SinglePeriodModifier
, MultiPeriodModifier
, ModifierModifier
, or StackedModifier
parameter
required
string
The parameter on which the modification is acting. Must be a parameter defined in seir::parameters
or outcomes
period_start_date
or periods::start_date
required
numeric, YYYY-MM-DD
The date when the modification starts. Notation depends on value of method.
period_end_date
or periods::end_date
required
numeric, YYYY-MM-DD
The date when the modification ends. Notation depends on value of method.
subpop
required
String, or list of strings
The subpopulations to which the modifications will be applied, or "all"
. Subpopulations must appear in the geodata
file.
value
required
Distribution, or single value
The relative amount by which a modification reduces the value of a parameter.
subpop_groups
optional
string or a list of lists of strings
A list of lists defining groupings of subpopulations, which defines how modification values should be shared between them, or 'all'
in which case all subpopulations are put into one group with identical modification values. By default, if parameters are chosen randomly from a distribution or fit based on data, they can have unique values in each subpopulation.
baseline_scenario
Used only for ModifierModifier
String
Name of the original modification which will be further modified
modifiers
Used only for StackedModifier
List of strings
List of modifier names to be grouped into the new combined modifier/scenario name
fixed
value
Any real number
Draws all values exactly equal to value
uniform
low
Any real number
Draws all values randomly from a uniform distribution with range [low, high]
high
Any real number greater than low
poisson
lam
Any positive real number
Draws all values randomly from a Poisson distribution with rate parameter (mean) lam
(lambda)
binomial
size
Any non-negative integer
Draws all values randomly from a binomial distribution with number of trials (n) = size
and probability of success on each trial (p) = prob
prob
Any number in [0,1]
lognormal
meanlog
Any real number
Draws all values randomly from a lognormal distribution (natural log, base e) with mean on a log scale of meanlog
and standard deviation on a log scale of sdlog
sdlog
Any non-negative real number
truncnorm
mean
Any real number
Draws all values randomly from a truncated normal distribution with mean mean
and standard deviation sd
, truncated to have a maximum value of a
and a minimum value of b
sd
Any non-negative real number
a
Any real number, or -Inf
b
Any real number greater than a
, or Inf
value
either value or timeseries is required
numerical, or distribution
This defines the value of the parameter, as described above.
timeseries
either value or timeseries is required
path to a csv file
This defines a timeseries for each day, as above.
stacked_modifier_method
optional
string: sum
, product
, reduction_product
This option defines the method used when modifiers are applied. The default is product
.
rolling_mean_windows
optional
integer
The size of the rolling mean window if a rolling mean is applied.
method
optional
string: stochastic
,rk4
, or legacy
The algorithm used to simulate the mode equations. If stochastic
, uses a discrete-time stochastic process with a rate-saturation correction. If rk4
, model is simulated deterministically by numerical integration using a 4th order Runge-Kutta algorithm. If legacy
(Default), uses the transition rates for the stochastic model but always chooses the average rate (an Euler style update)
dt
optional
Any positive real number
The timestep used for the numerical integration or discrete time stochastic update. Default is dt = 2
geodata
required
path to file
path to file relative to data_path
mobility
required
path to file
path to file relative to data_path
selected
optional
string
name of selected location ingeodata
flepiMop is set up so that all parameters and other options for running the pipeline can be specified in a single "configuration" file (aka "config"). Users do not need to edit any other code files, or even be aware of their contents, to create and run complex model scenarios. Configuration files also provide a convenient record of model options and promote reproducibility of model results.
We use the YAML
language syntax to write config files, which are typically named something like config.yml
. The file has simple plain text contents and follows a tabbed outline structure. When config files are read by the model code, a data structure encoding the model options is created.
Comments can be added to the config file by starting with the hash key (#
) then a space. Comments can start anywhere on a line and continue until the end, but if they run over to a new line, a new # must be used at the start of the new line.
(give a simple configuration for a toy model with two subpopulations, SEIR, single "cases" outcome, single seeded infection, single NPI that starts after some time? this page is currently under development, please see our example repo _for some simple configurations) ;
When referring to config items (individual parameters), we use their full position in the outline. For example, in the sample config file above, we denote
as subpop_setup::geodata
having a value of minimal
Parameters and other options specified in the configuration files can take on a variety of types of values, using the following notations:
dates are specified as [year]-[month]-[day]. (e.g., 2020-01-31)
boolean values are either "TRUE" or "FALSE"
files names are strings
probability is a float between 0 and 1
distribution is a probability distribution from which a random value for the parameter is drawn each time a new simulation is run (or chain, if doing inference). See here for the require schema.
Required section
These global configuration options typically sit at the top of the configuration file.
name
required
string
Name of this configuration. Will be used in file names created to store model output.
start_date
required
date
model simulation start date
end_date
required
date
model simulation end date
start_date_groundtruth
optional for non-inference runs, required for inference runs
date
start date for comparing model to data
end_date_groundtruth
optional for non-inference runs, required for inference runs
date
end date for comparing model to data
nslots
optional (can also be defined by an environmental variable)
int
number of independent simulations to run
setup_name
optional
string
setup name used to describe the run, used in setting up file names
model_output_dirname
optional
folder path
path to folder where all the outputs created by the model are stored, if not specified, default is model_output
For example, for a configuration file to simulate the spread of COVID-19 in the US during 2020 and compare to data from March 1 onwards, with 1000 independent simulations, the header of the config might read:
subpop_setup
sectionRequired section
This section specifies the population structure on which the model will be simulated, including the names and sizes of each subpopulation and the connectivity between them. More details here.
compartments
sectionRequired section
This section is where users can specify the variables (infection states) that will be tracked in the infectious disease transmission model. More details can be found here. The other details of the model are specified in the seir
section, including transitions between these compartments (seir::transitions
), the names of the parameters governing the transitions (seir::parameters
), and the numerical method used to simulate the equations over time (seir::integration
). The initial conditions of the model can be specified in the initial_conditions
section, and any other inputs into the model from external populations or instantaneous transitions between states that occur at later times can be specified in the seeding
section. ;
seir
sectionRequired section
This section is where users can specify the details of the infectious disease transmission model they wish to simulate (e.g., SEIR). This model describes the allowed transitions (seir::transitions
) between the compartments that were specified in the compartments
section, the values of the parameters involved in these transitions (seir::parameters
), and the numerical method used to simulate the equations over time (seir::integration
). More details here. The initial conditions of the model can be specified in the separate initial_conditions
section, and any other inputs into the model from external populations or instantaneous transitions between states that occur at later times can be specified in the seeding
section. ;
initial_conditions
sectionOptional section
This section is used to specify the initial conditions of the model, which define how individuals are distributed between the model compartments at the time the model simulation begins. Importantly, the initial conditions specify the time and location where infection is first introduced. If this section is omitted, default values are used. If users want to add infections to the population at later times, or add or remove individuals from compartments separately from the model rules, they can do so via the related seeding
section. More details here ;
seeding
sectionOptional section
This section is used to specify how individuals are instantaneously "seeded" from one compartment to another, where they then continue to be governed by the model equations. For example, this seeding could be used to represent importations of infected individuals from an outside population, mutation events that create new strains, or vaccinations that alter disease susceptibility. Seeding events can occur at any time in the simulation. The seeding section specifies the numeric values added to or removed from any compartment of the model. More details here ;
outcomes
sectionOptional section
This section is where users can define new variables representing the observed quantities and how they are related to the underlying state variables in the model (e.g., the fraction of infections that are detected as cases). More details here ;
interventions
sectionRequired section
This section is where users can specify time-varying changes to parameters governing either the infectious disease model or the observational model. More details here ;
inference
sectionOptional section
This section is where users can specify the details of how the model is fit to data, including what data streams they will be included and which outcome variables they represent and the likelihood functions describing the probability of the data given the model. More details here. ;
This page describes how to specify the outcomes section of the configuration file
outcomes
variablesOur pipeline allows users to encode state variables describing the infection status of individuals in the population in two different ways. The first way is via the state variables and transitions of the compartmental model of disease transmission, which are specified in the compartments
and seir
sections of the config. This model should include all variables that influence the natural course of the epidemic (i.e., all variables that feed back into the model by influencing the rate of change of other variables). For example, the number of infected individuals influences the rate at which new infections occur, and the number of immune individuals influences the number of individuals at risk of acquiring infection.
However, these intrinsic model variables may be difficult to observe in the real world and so directly comparing model predictions about the values of these variables to data might not make sense. Instead, the observable outcomes of infection may include only a subset of individuals in any state, and may only be observed with a time delay. Thus, we allow users to define new outcome
variables that are functions of the underlying model variables. Commonly used examples include detected cases or hospitalizations ;
Variables should not be included as outcomes if they influence the infection trajectory. The choice of what variables to include in the compartmental disease model vs. the outcomes section may be very model specific. For example, hospitalizations due to infection could be encoded as an outcome variable that is some fraction of infections, but if we believe hospitalized individuals are isolated from the population and don't contribute to onward infection, or that the number of hospitalizations feeds back into the population's perception of risk of infection and influences everyone's contact behavior, this would not be the best choice. Similarly, we could include deaths due to infection as an outcome variable that is also some fraction of infections, but unless death is a very rare outcome of infection and we aren't worried about actually removing deceased individuals from the modeled populations, deaths should be in the compartmental model instead.
The outcomes
section is not required in the config. However, there are benefits to including it, even if the only outcome variable is set to be equivalent to one of the infection model variables. If the compartmental model is complicated but you only want to visualize a few output variables, the outcomes output file will be much easier to work with. Outcome variables always occur with some fixed delay from their source infection model variable, which can be more convenient than the exponential distribution underlying the infection model. Outcome variables can be created to automatically sum over multiple compartments of the infection model, removing the need for post-processing code to do this. If the model is being fit to data, then the outcomes
section is required, as only outcome variables can be compared to data.
As an example, imagine we are simulating an SIR-style model and want to compare it to real epidemic data in which cases of infection and death from infection are reported. Our model doesn't explicitly include death, but suppose we know that 1% of all infections eventually lead to hospitalization, and that hospitalization occurs on average 1 week after infection. We know that not all infections are reported as cases, and assume that only 50% are detected and are reported 2 days after infection begins. The model and outcomes
section of the config for these outcomes, which we call incidC
(daily incidence of cases) and incidH
(daily incidence of hospital admission) would be
in the following sections we describe in more detail how this specification works
outcomes
in the configuration fileThe outcomes
config section consists of a list of defined outcome variables (observables), which are defined by a user-created name (e.g., "incidH
"). For each of these outcome variables, the user defines the source
compartment(s) in the infectious disease model that they draw from and whether they draw from the incidence
(new individuals entering into that compartment) or prevalence
(total current individuals in that compartment). Each new outcome variable is always associated with two mandatory parameters ;
probability
of being counted in this outcome variable if in the source compartment
;delay
between when an individual enters the source
compartment and when they are counted in the outcome variable
and one optional parameter
duration
after entering that an individual is counted as part of the outcome variable
The value
of the probability
, delay
, and duration
parameters can be a single value or come from distribution ;
Outcome model parameters probability
, delay
, and distribution
can have an additional attribute beyond value
called modifier_key
. This value is explained in the section on coding time-dependent parameter modifications
(also known as "modifiers") as it provides a way to have the same modifier act on multiple different outcomes ;
Just like the case for compartment model parameters, when outcome parameters are drawn from a distribution, each time the model is run, a different value for this parameter will be drawn from this distribution, but that value will be used for all calculations within this model run. Note that understanding when a new parameter values from this distribution is drawn becomes more complicated when the model is run in Inference mode. In Inference mode, we distinguish model runs as occurring in different "slots" – i.e., completely independent model instances that could be run on different processing cores in a parallel computing environment – and different "iterations" of the model that occur sequentially when the model is being fit to data and update fitted parameters each time based on the fit quality found in the previous iteration. A new parameter values is only drawn from the above distribution once per slot. Within a slot, at each iteration during an inference run, the parameter is only changed if it is being fit and the inference algorithm decides to perturb it to test a possible improved fit. Otherwise, it would maintain the same value no matter how many times the model was run within a slot.
Example
source
Yes
Varies
The infection model variable or outcome variable from which the named outcome variable is created
probability
Yes, unless sum option is used instead
value or distribution
The probability that an individual in the source
variable appears in the named outcome variable
delay
Yes, unless sum option is used instead
value or distribution
The time delay between individual's appearance in source
variable and appearance in named outcome variable
duration
No
value or distribution
The duration of time an individual remains counted within the named outcome variablet
sum
No
List
A list of other outcome variables to sum into the current outcome variable
Required, unless sum
option is used instead. This sub-section describes the compartment(s) in the infectious disease model from which this outcome variable is drawn. Outcome variables can be drawn from the incidence
of a variable - meaning that some fraction of new individuals entering the infection model state each day are chosen to contribute to the outcome variable - or from the prevalence
, meaning that each day some fraction of individuals currently in the infection state are chosen to contribute to the outcome variable. Note that whatever the source type, the named outcome variable itself is always a measure of incidence ;
To specify which compartment(s) contribute the user must specify the state(s) within each model stratification. For stratifications not mentioned, the outcome will sum over that states in all strata ;
For example, consider a configuration in which the compartmental model was constructed to track infection status stratified by vaccination status and age group. The following code would be used to create an outcome called incidH_child
(incidence of hospitalization for children) and incidH_adult
(incidence of hospitalization for adults) where some fraction of infected individuals would become hospitalized and we wanted to track separately track pediatric vs adult hospitalizations, but did not care about tracking the vaccination status of hospitalized individuals as in reality it was not tracked by the hospitals ;
to instead create an outcome variable for cases where on each day of infection there is some probability of testing positive (for example, for the situation of an asymptomatic infection where testing is administered totally randomly), the following code would be used
The source of an outcome variable can also be a previous defined outcome variable. For example, t to create a new variable for the number of individuals recruited to be part of a contact tracing program (incidT), which is just some fraction of diagnosed cases ;
Required, unless sum
option is used instead. Probability
is the fraction of individuals in the source compartment who are counted as part of this outcome variable (if the source is incidence; if the source is prevalence it is the fraction of individuals per day). It must be between 0 and 1 ;
Specifying the probability creates a parameter called outcome_name::probability
that can be referred to in the outcome_modifiers
section of the config. The value of this parameter can be changed using the probability::intervention_param_name
option ;
For example, to track the incidence of hospitalization when 5% of children but only 1% of adults infected require hospitalization, and to create a modifier_key
such that both of these rates could be modified by the same amount during some time period using the outcomes_modifier
section:
To track the incidence of diagnosed cases iterating over uncertainty in the case detection rate (ranging 20% to 30%), and naming this parameter "case_detect_rate"
Each time the model is run a new random value for the probability of case detection will be chosen ;
Required, unless sum
option is used instead. delay
is the time delay between when individuals are chosen from the source compartment and when they are counted as part of this outcome variable ;
For example, to track the incidence of hospitalization when 5% of children are hospitalized and hospitalization occurs 7 days after infection:
To iterate over uncertainty in the exact delay time, we could include some variation between simulations in the delay time using a normal distribution with standard deviation of 2 (truncating to make sure the delay does not become negative). Note that a delay distribution here does not mean that the delay time varies between individuals - it is identical) ;
By default, all outcome variables describe incidence (new individuals entering each day). However, they can also track an associated "prevalence" if the user specifies how long individuals will stay classified as the outcome state the outcome variable describes. This is the duration
parameter ;
When the duration parameter is set, a new outcome variable is automatically created and named with the name of the original outcome variable + "_curr". This name can be changed using the duration::name
option ;
For example, to track the incidence and prevalence of hospitalization when 5% of children are hospitalized, hospitalization occurs 7 days after infection, and the duration of hospitalization is 3 days:
which creates the variable "incidH_child_curr" to track all currently hospitalized children. Since it doesn't make sense to call this new outcome variable an incidence, as it is a prevalence, we could instead rename it:
Optional. sum
is used to create new outcome variables that are sums over other previously defined outcome variables ;
If sum
is included, source
, probability
, delay
, and duration
will be ignored ;
For example, to track new hospital admissions and current hospitalizations separately for children and adults, as well as for all ages combined
There are other required and optional configuration items for the outcomes
section which can be specified under outcomes::settings
:
method
: delayframe.
This is the mathematical method used to create the outcomes variable values from the transmission model variables. Currently, the only model supported is delayframe
, which .. ;
param_from_file:
Optional, TRUE
or FALSE
. It is possible to allow any of the outcomes variables to have values that vary across the subpopulations. For example, disease severity rates or diagnosis rates may differ by demographic group. In this case, all the outcome parameter values defined in outcomes::outcomes will represent baseline values, and then you can define a relative change from this baseline for any particular subpopulation using the paths section. If params_from_file: TRUE
is specified, then these relative values will be read from the params_subpop_file
. Otherwise, if params_from_file: FALSE
or is not listed at all, all subpopulations will have the same values for the outcome parameters, defined below ;
param_subpop_file
: Required if params_from_file: TRUE
. The path to a .csv or .parquet file that contains the relative amount by which a given outcome variable is shifted relative to baseline in each subpopulation. File must contain the following columns:
subpop
: The subpopulation for which the parameter change applies. Must be a subpopulation defined in the geodata
file. For example, small_province
parameter: The outcomes parameter which will be altered for this subpopulation. For example, incidH_child: probability
value: The amount by which the baseline value will be multiplied, for example, 0.75 or 1.1
Consider a disease described by an SIR model in a population that is divided into two age groups, adults and children, which experience the disease separately. We are interested in comparing the predictions of the model to real world data, but we know we cannot observe every infected individual. Instead, we have two types of outcomes that are observed.
First, via syndromic surveillance, we have a database that records how many individuals in the population are experiencing symptoms from the disease at any given time. Suppose careful cohort studies have shown that 50% of infected adults and 80% of infected children will develop symptoms, and that symptoms occur in both age groups around 3 days after infection (following a log-normal distribution with log mean X and log standard deviation of Y). The duration that symptoms persist is also a variable, following a ...
Secondly, via laboratory surveillance we have a database of every positive test result for the infection. We assume the test is 100% sensitive and specific. Only individuals with symptoms are tested, and they are always tested exactly 1 day after their symptom onset. We are unsure what portion of symptomatic individuals are seeking out testing, but are interested in considering two extreme scenarios: 95% of symptomatic individuals are tested, or only 75% of individuals are tested.
The configuration file we could use to model this situation includes
flepiMoP allows some input parameters/options to be specified in the command line at the time of model submission, in addition to or instead of in the configuration file. This can be helpful for users who want to quickly run different versions of the model – typically a different number of simulations or a different intervention scenario from among all those specified in the config – without having to edit or create a new configuration file every time. In addition, some arguments can only be specified via the command line.
In addition to the configuration file and the command line, the inputs described below can also be specified as environmental variables.
In all cases, command line arguments override configuration file entries which override environmental variables. The order of command line arguments does not matter.
Details on how to run the model, including how to add command line arguments or environmental variables, are in the section How to Run.
-c
or --config
CONFIG_PATH
file path
Name of configuration file. Must be located in the current working directory, or else relative or absolute file path must be provided.
Yes
NA
-i
or --first_sim_index
FIRST_SIM_INDEX
The index of the first simulation
No
1
-j
or --jobs
FLEPI_NJOBS
Number of parallel processors used to run the simulation. If there are more slots that jobs, slots will be divided up between processors and run in series on each.
No
Number of processors on the computer used to run the simulation
--interactive
or --batch
NA
Choose either option
Run simulation in interactive or batch mode
No
batch
--write-csv
or --no-write-csv
NA
Choose either option
Whether model output will be saved as .csv files
No
no_write_csv
--write-parquet
or --no-write-parquet
NA
Choose either option
No
write_parquet
-s
or --npi_scenario
interventions: scenarios
FLEPI_NPI_SCENARIOS
list of strings
Names of the intervention scenarios described in the config file that will be run. Must be a subset of scenarios defined.
No
All scenarios described in config
-n
or --nslots
nslots
FLEPI_NUM_SLOTS
Number of independent simulations of the model to be run
No
Config value
--stochastic
or --non-stochastic
seir: integration: method
FLEPI_STOCHASTIC_RUN
choose either option
Whether the model will be run stochastically or non-stochastically (deterministic numerical integration of equations using the RK4 algorithm)
No
Config value
--in-id
FLEPI_RUN_INDEX
string
Unique ID given to the model runs. If the same config is run multiple times, you can avoid the output being overwritten by using unique model run IDs.
No
Constructed from current date and time as YYYY.MM.DD.HH/MM/SS
--out-id
FLEPI_RUN_INDEX
string
Unique ID given to the model runs. If the same config is run multiple times, you can avoid the output being overwritten by using unique model run IDs.
No
Constructed from current date and time as YYYY.MM.DD.HH/MM/SS
As an example, consider running the following configuration file
To run this model directly in Python (it can alternatively be run from R, for all details see section How to Run), we could use the command line entry
Alternatively, to run 100 simulations using only 4 of the available processors on our computer, but only running the "" scenario with a deterministic model, and to save the files as .csv (since the model is relatively simple), we could call the model using the command line entry
TBA
Things below here are very out of date. Put here as place holder but not updated recently.
global: smh_round, setup_name, disease
spatial_setup: census_year, modeled_states, state_level
For creating US-based population structures using the helper script build_US_setup.R
which is run before the main model simulation script, the following extra parameters can be specified
census_year
optional
integer (year)
Determines the year for which census population size data is pulled.
state_level
optional
boolean
Determines whether county-level population-size data is instead grouped into state-level data (TRUE). Default FALSE
modeled_states
optional
list of location codes
A vector of locations that will be modeled; others will be ignored
To simulate an epidemic across all 50 states of the US or a subset of them, users can take advantage of built in machinery to create geodata and mobility files for the US based on the population size and number of daily commuting trips reported in the US Census.
Before running the simulation, the script build_US_setup.R
can be run to get the required population data files from online census data and filter out only states/territories of interest for the model. More details are provided in the How to Run section.
This example simulates COVID-19 in the New England states, assuming no transmission from other states, using 2019 census data for the population sizes and a pre-created file for estimated interstate commutes during the 2011-2015 period.
geodata.csv
contains
mobility_2011-2015_statelevel.csv
contains
importation
section (optional)This section is optional. It is used by the covidImportation package to import global air importation data for seeding infections into the United States.
If you wish to include it, here are the options.
census_api_key
required
string
travel_dispersion
required
number
ow dispersed daily travel data is; default = 3.
maximum_destinations
required
integer
number of airports to limit importation to
dest_type
required
categorical
location type
dest_country
required
string (Country)
ISO3 code for country of importation. Currently only USA is supported
aggregate_to
required
categorical
location type to aggregate to
cache_work
required
boolean
whether to save case data
update_case_data
required
boolean
deprecated; whether to update the case data or used saved
draw_travel_from_distribution
required
boolean
whether to add additional stochasticity to travel data; default is FALSE
print_progress
required
boolean
whether to print progress of importation model simulations
travelers_threshold
required
integer
include airports with at least the travelers_threshold
mean daily number of travelers
airport_cluster_distance
required
numeric
cluster airports within airport_cluster_distance
km
param_list
required
See section below
see below
importation::param_list
incub_mean_log
required
numeric
incubation period, log mean
incub_sd_log
required
numeric
incubation period, log standard deviation
inf_period_nohosp_mean
required
numeric
infectious period, non-hospitalized, mean
inf_period_nohosp_sd
required
numeric
infectious period, non-hospitalized, sd
inf_period_hosp_mean_log
required
numeric
infectious period, hospitalized, log-normal mean
inf_period_hosp_sd_log
required
numeric
infectious period, hospitalized, log-normal sd
p_report_source
required
numeric
reporting probability, Hubei and elsewhere
shift_incid_days
required
numeric
mean delay from infection to reporting of cases; default = -10
delta
required
numeric
days per estimations period
report
sectionThe report
section is completely optional and provides settings for making an R Markdown report. For an example of a report, see the Supplementary Material of our preprint
If you wish to include it, here are the options.
data_settings::pop_year
integer
plot_settings::plot_intervention
boolean
formatting::scenario_labels_short
list of strings; one for each scenario in interventions::scenarios
formatting::scenario_labels
list of strings; one for each scenario in interventions::scenarios
formatting::scenario_colors
list of strings; one for each scenario in interventions::scenarios
formatting::pdeath_labels
list of strings
formatting::display_dates
list of dates
formatting::display_dates2
optional
list of dates
a 2nd string of display dates that can optionally be supplied to specific report functions
integar 1
integar 1
Whether model output will be saved as .parquet files (a compressed representation that can be opened and manipulated with minimal memory. May be required for large simulations). Read more about .
integar 1