1 of 5

Inference Implementation

Specifying data source and fitted variables

inference settings

iterations_per_slot

do_inference

gt_data_path

With inference model runs, the number of simulations nsimulations refers to the number of final model simulations that will be produced. The filtering$simulations_per_slot setting refers to the number of iterative simulations that will be run in order to produce a single final simulation (i.e., number of simulations in a single MCMC chain).

Item

Required?

Type/Format

Description

iterations_per_slot

required

Number of iterations in a single MCMC inference chain

do_inference

required

TRUE/FALSE

TRUE if inference should be performed. If FALSE, just runs a single run per slot, without perturbing parameters

gt_data_path

required

file path

Path to files containing "ground truth" data to which model output will be compared

statistics

required

config subsection

Specifies details of how each model output variable will be compared to data during fitting. See inference::statistics section.

hierarchical_stats_geo

optional

config subsection

Specifies whether a hierarchical structure should be applied the likelihood function for any of the fitted parameters. See inference::hierarchical_stats_geo for details.

priors

optional

config subsection

Specifies prior distributions on fitted parameters. See inference::priors for details

`f`

inference::statistics options

required options

name

aggregator

period

sim_var

data_var

likelihood

The statistics specified here are used to calibrate the model to empirical data. If multiple statistics are specified, this inference is performed jointly and they are weighted in the likelihood according to the number of data points and the variance of the proposal distribution.

Item

Required?

Type/Format

Description

name

required

string

name of statistic, user defined

period

required

days, weeks, or months

Duration of time over which data and model output should be aggregated before being used in the likelihood. If weeks, epiweeks are used

aggregator

required

string, name of any R function

Function used to aggregate data over theperiod, usually sum or mean

sim_var

required

string

Name of the outcome variable - as defined inoutcomes section of the config - that will be compared to data when calculating the likelihood. This will also be the column name of this variable in the hosp files in the model_output directory

data_var

required

string

Name of the data variable that will be compared to the model output variable when calculating the likelihood. This should be the name of a column in the

file specified in inference::gt_data_path config option

remove_na

required

logical

if TRUE if FALSE

add_one

required

logical

if TRUE if FALSE Will be overwritten to TRUE if the likelihood distribution is chosen to be log

likelihood::dist

required

Distribution of the likelihood

likelihood::param

required

parameter value(s) for the likelihood distribution. These differ by distribution so check the code in inference/R/functions.R/logLikStat function.

`f`

optional options ?

remove_na

add_one

gt_start_date

gt_end_date

Optional sections

`inference::hierarchical_stats_geo`

The hierarchical settings specified here are used to group the inference of certain parameters together (similar to inference in "hierarchical" or "fixed/group effects" models). For example, users may desire to group all counties in a given state because they are geograhically proximate and impacted by the same statewide policies. The effect should be to make these inferred parameters follow a normal distribution and to observe shrinkage among the variance in these grouped estimates.

Item

Required?

Type/Format

scenario name

required

name of hierarchical scenario, user defined

name

required

name of the estimated parameter that will be grouped (e.g., the NPI scenario name or a standardized, combined health outcome name like probability_incidI_incidC)

module

required

name of the module where this parameter is estimated (important for finding the appropriate files)

geo_group_col

required

geodata column name that should be used to group parameter estimation

transform

required

type of transform that should be applied to the likelihood: "none" or "logit"

`inference::priors`

It is now possible to specify prior values for inferred parameters. This will have the effect of speeding up model convergence.

Item

Required?

Type/Format

scenario name

required

name of prior scenario, user defined

name

required

name of NPI scenario or parameter that will have the prior

module

required

name of the module where this parameter is estimated

likelihood

required

specifies the distribution of the prior

Ground truth data

name

module

geo_group_col

transform

inference:::priors

inference::

(OLD) Configuration options

`filtering` section

The filtering section configures the settings for the inference algorithm. The below example shows the settings for some typical default settings, where the model is calibrated to the weekly incident deaths and weekly incident confirmed cases for each subpop. Statistics, hierarchical_stats_geo, and priors each have scenario names (e.g., sum_deaths, local_var_hierarchy, and local_var_prior, respectively).

`filtering` settings

`filtering::statistics`

`filtering::hierarchical_stats_geo`

`filtering::priors`

It is now possible to specify prior values for inferred parameters. This will have the effect of speeding up model convergence.

Ground truth data

Likelihood function

Fitting parameters

Ground truth data

(OLD) Configuration setup

Need to add MultiPeriodModifier and hospitalization interventions

Overview

This documentation describes the new YAML configuration file options that may be used when performing inference on model runs. As compared to previous model releases, there are additions to the seeding and interventions sections, the outcomes section replaces the hospitalization section, and the filtering section added to the file.

Importantly, we now name our pipeline modules: seeding, seir, hospitalization and this becomes relevant to some of the new filtering specifications.

Models may be calibrated to any available time series data that is also an outcome of the model (COVID-19 confirmed cases, deaths, hospitalization or ICU admissions, hospital or ICU occupancy, and ventilator use). Our typical usage has calibrated the model to deaths, confirmed cases, or both. We can also perform inference on intervention effectiveness, county-specific baseline R0, and the risk of specific health outcomes.

We describe these options below and present default values in the example configuration sections.

Modifications to `seeding`

The model can perform inference on the seeding date and initial number of seeding infections in each subpop. An example of this new config section is:

Config Item

Required?

Type/Format

Description

The method for determining the proposal distribution for the seeding amount is hard-coded in the inference package (R/pkgs/inference/R/functions/perturb_seeding.R). It is pertubed with a normal distribution where the mean of the distribution 10 times the number of confirmed cases on a given date and the standard deviation is 1.

Modifications to `interventions`

The model can perform inference on the effectiveness of interventions as long as there is at least some calibration health outcome data that overlaps with the intervention period. For example, if calibrating to deaths, there should be data from time points where it would be possible to observe deaths from infections that occurred during the intervention period (e.g., assuming 10-18 day delay between infection and death, on average).

An example configuration file where inference is performed on scenario planning interventions is as follows:

`interventions::settings::[setting_name]`

Interventions may be specified in the same way as before, or with an added perturbation section that indicates that inference should be performed on a given intervention's effectiveness. As previously, interventions with perturbations may be specified for all modeled locations or for explicit subpop. In this setup, both the prior distribution and the range of the support of the final inferred value are specified by the value section. In the configuration above, the inference algorithm will search 0 to 0.9 for all subpops to estimate the effectiveness of the stayhome intervention period. The prior distribution on intervention effectiveness follows a truncated normal distribution with a mean of 0.6 and a standard deviation of 0.3. The perturbation section specifies the perturbation/step size between the previously-accepted values and the next proposal value.

New `outcomes` section

This section is now structured more like the interventions section of the config, in that it has scenarios and settings. We envision that separate scenarios will be specified for each IFR assumption.

`outcomes::settings::[setting_name]`

The settings for each scenario correspond to a set of different health outcome risks, most often just differences in the probability of death given infection (Pr(incidD|incidI)) and the probability of hospitalization given infection (Pr(incidH|incidI)). Each health outcome risk is referenced in relation to the outcome indicated in source. For example, the probability and delay in becoming a confirmed case (incidC) is most likely to be indexed off of the number and timing of infection (incidI).

Importantly, we note that incidI is automatically defined from the SEIR transmission model outputs, while the other compartment sources must be defined in the config before they are used.

Users must specific two metrics for each health outcome, probability and delay, while a duration is optional (e.g., duration of time spent in the hospital). It is also optional to specify a perturbation section (similar to perturbations specified in the NPI section) for a given health outcome and metric. If you want to perform inference (i.e., if perturbation is specified) on a given metric, that metric must be specified as a distribution (i.e., not fixed) and the range of support for the distribution represents the range of parameter space explored in the inference.

New `filtering` section

This section configures the settings for the inference algorithm. The below example shows the settings for some typical default settings, where the model is calibrated to the weekly incident deaths and weekly incident confirmed cases for each subpop. Statistics, hierarchical_stats_geo, and priors each have scenario names (e.g., sum_deaths, local_var_hierarchy, and local_var_prior, respectively).

`filtering` settings

`filtering::statistics`

`filtering::hierarchical_stats_geo`

`filtering::priors`

It is now possible to specify prior values for inferred parameters. This will have the effect of speeding up model convergence.

Specifying data source and fitted variables

inference settings

iterations_per_slot

do_inference

gt_data_path

Item

Required?

Type/Format

Description

iterations_per_slot

required

Integer 1

Number of iterations in a single MCMC inference chain

do_inference

required

TRUE/FALSE

TRUE if inference should be performed. If FALSE, just runs a single run per slot, without perturbing parameters

gt_data_path

required

file path

Path to files containing "ground truth" data to which model output will be compared

statistics

required

config subsection

Specifies details of how each model output variable will be compared to data during fitting. See inference::statistics section.

hierarchical_stats_geo

optional

config subsection

Specifies whether a hierarchical structure should be applied the likelihood function for any of the fitted parameters. See inference::hierarchical_stats_geo for details.

priors

optional

config subsection

Specifies prior distributions on fitted parameters. See inference::priors for details

`f`

inference::statistics options

required options

name

aggregator

period

sim_var

data_var

likelihood

Item

Required?

Type/Format

Description

name

required

string

name of statistic, user defined

period

required

days, weeks, or months

Duration of time over which data and model output should be aggregated before being used in the likelihood. If weeks, epiweeks are used

aggregator

required

string, name of any R function

Function used to aggregate data over theperiod, usually sum or mean

sim_var

required

string

data_var

required

string

Name of the data variable that will be compared to the model output variable when calculating the likelihood. This should be the name of a column in the

file specified in inference::gt_data_path config option

remove_na

required

logical

if TRUE if FALSE

add_one

required

logical

if TRUE if FALSE Will be overwritten to TRUE if the likelihood distribution is chosen to be log

likelihood::dist

required

Distribution of the likelihood

likelihood::param

required

parameter value(s) for the likelihood distribution. These differ by distribution so check the code in inference/R/functions.R/logLikStat function.

`f`

optional options ?

remove_na

add_one

gt_start_date

gt_end_date

Optional sections

`inference::hierarchical_stats_geo`

Item

Required?

Type/Format

scenario name

required

name of hierarchical scenario, user defined

name

required

name of the estimated parameter that will be grouped (e.g., the NPI scenario name or a standardized, combined health outcome name like probability_incidI_incidC)

module

required

name of the module where this parameter is estimated (important for finding the appropriate files)

geo_group_col

required

geodata column name that should be used to group parameter estimation

transform

required

type of transform that should be applied to the likelihood: "none" or "logit"

`inference::priors`

It is now possible to specify prior values for inferred parameters. This will have the effect of speeding up model convergence.

Item

Required?

Type/Format

scenario name

required

name of prior scenario, user defined

name

required

name of NPI scenario or parameter that will have the prior

module

required

name of the module where this parameter is estimated

likelihood

required

specifies the distribution of the prior

Ground truth data

name

module

geo_group_col

transform

inference:::priors

inference::

(OLD) Configuration setup

Need to add MultiPeriodModifier and hospitalization interventions

Overview

Importantly, we now name our pipeline modules: seeding, seir, hospitalization and this becomes relevant to some of the new filtering specifications.

We describe these options below and present default values in the example configuration sections.

Modifications to `seeding`

The model can perform inference on the seeding date and initial number of seeding infections in each subpop. An example of this new config section is:

Config Item

Required?

Type/Format

Description

Modifications to `interventions`

An example configuration file where inference is performed on scenario planning interventions is as follows:

interventions:
  scenarios:
    - Scenario1
  settings:
    local_variance:
      template: SinglePeriodModifierR0
      value:
        distribution: truncnorm
        mean: 0
        sd: .1
        a: -1
        b: 1
      perturbation:
        distribution: truncnorm
        mean: 0
        sd: .1
        a: -1
        b: 1
    stayhome:
      template: SinglePeriodModifierR0
      period_start_date: 2020-04-04
      period_end_date: 2020-04-30
      value:
        distribution: truncnorm
        mean: 0.6
        sd: 0.3
        a: 0
        b: 0.9
      perturbation:
        distribution: truncnorm
        mean: 0
        sd: .1
        a: -1
        b: 1
    Scenario1:
      template: StackedModifier
      scenarios: 
        - local_variance
        - stayhome

`interventions::settings::[setting_name]`

This configuration allows us to infer subpop-level baseline R0 estimates by adding a local_variance intervention. The baseline subpop-specific R0 estimate may be calculated as where R0 is the baseline simulation R0 value, and local_variance is an estimated subpop-specific value.

Item

Required?

Type/Format

New `outcomes` section

Item

Required?

Type/Format

`outcomes::settings::[setting_name]`

Importantly, we note that incidI is automatically defined from the SEIR transmission model outputs, while the other compartment sources must be defined in the config before they are used.

Item

Required?

Type/Format

New `filtering` section

`filtering` settings

Item

Required?

Type/Format

`filtering::statistics`

Item

Required?

Type/Format

`filtering::hierarchical_stats_geo`

Item

Required?

Type/Format

`filtering::priors`

It is now possible to specify prior values for inferred parameters. This will have the effect of speeding up model convergence.

Item

Required?

Type/Format

Inference Implementation

Specifying data source and fitted variables

inference settings

f

inference::statistics options

required options

f

optional options ?

inference::hierarchical_stats_geo

inference::priors

Ground truth data

(OLD) Configuration options

filtering section

filtering settings

filtering::statistics

filtering::hierarchical_stats_geo

filtering::priors

Ground truth data

Likelihood function

Fitting parameters

Ground truth data

(OLD) Configuration setup

Overview

Modifications to seeding

Modifications to interventions

interventions::settings::[setting_name]

New outcomes section

outcomes::settings::[setting_name]

New filtering section

filtering settings

filtering::statistics

filtering::hierarchical_stats_geo

filtering::priors

Specifying data source and fitted variables

inference settings

f

inference::statistics options

required options

f

optional options ?

inference::hierarchical_stats_geo

inference::priors

Ground truth data

(OLD) Configuration setup

Overview

Modifications to seeding

Modifications to interventions

interventions::settings::[setting_name]

New outcomes section

outcomes::settings::[setting_name]

New filtering section

filtering settings

filtering::statistics

filtering::hierarchical_stats_geo

filtering::priors

(OLD) Configuration options

filtering section

filtering settings

filtering::statistics

filtering::hierarchical_stats_geo

filtering::priors

Ground truth data

Likelihood function

Fitting parameters

Ground truth data

Code structure

COVIDScenarioPipeline

/R/scripts

/R/pkgs

`f`

`f`

`inference::hierarchical_stats_geo`

`inference::priors`

`filtering` section

`filtering` settings

`filtering::statistics`

`filtering::hierarchical_stats_geo`

`filtering::priors`

Modifications to `seeding`

Modifications to `interventions`

`interventions::settings::[setting_name]`

New `outcomes` section

`outcomes::settings::[setting_name]`

New `filtering` section

`filtering` settings

`filtering::statistics`

`filtering::hierarchical_stats_geo`

`filtering::priors`

`f`

`f`

`inference::hierarchical_stats_geo`

`inference::priors`

Modifications to `seeding`

Modifications to `interventions`

`interventions::settings::[setting_name]`

New `outcomes` section

`outcomes::settings::[setting_name]`

New `filtering` section

`filtering` settings

`filtering::statistics`

`filtering::hierarchical_stats_geo`

`filtering::priors`

`filtering` section

`filtering` settings

`filtering::statistics`

`filtering::hierarchical_stats_geo`

`filtering::priors`