Risk Calculation Explanation

What is the probability of encountering someone with SARS-CoV-2 given a gathering size and the rolling seven day average case rate for a each county?

Introduction

The purpose of this analysis is to estimate the risk of encountering someone who is SARS-CoV-2 positive given a an average number of cases per day. Risk is subjective (i.e. each person has their own tolerance for risk) and thus this analysis provides a probability which can then be used to calibrate one’s risk tolerance.

Methodology

Average Daily Cases to Active Infections

The first step in the analysis is to determine the number of likely infected persons given a case rate. This is done through a stochastic compartmental model that models the number of susceptible, the number of infected/infectious, and the removed persons (either through recovery or death). Births/ exits were not modeled in this analysis and can be ignored given the short time horizon.

This model can be represented as follows:

\[ \frac{\partial S}{\partial t} \sim -Poi(\epsilon) \\ \frac{\partial I}{\partial t} \sim Poi(\epsilon) - Bin(n_I,\text{P(Recovery)}) \\ \frac{\partial R}{\partial t} \sim Bin(n_I, \text{P(Recovery)}) \\ \]

Where:

\[ \epsilon = \text{ Cases per Day} \\ P(Recovery) = 1 - e^{-\text{Recovery Rate}} \\ \text{Recovery Rate} = \text{Average Infectious Time (days)} \]

A Poisson distribution was chosen for simplicity and generally represents a consistent incidence of new cases over the short-term horizon.1 Other distributions could be a lognormal or Erlang distribution in order to captured the case rate distribution. Regardless, for approximation, the Poisson distribution is a good approximation for the entry rate into this system.

This stochastic system of ordinary differential equations is run 100 iterations and the results are outputs for the number of infected persons on average given the user specified rate. Additionally, there is an option to increase the estimated number of infected by the testing positivity rate. Because of the relatively high positivity rate in the state of North Carolina (>5%) we can estimate that there is a degree of under-counting of cases in the community. A quick rule of thumb is to multiple the reported cases by the positive testing rate when the rate is greater than 1%, especially for infections with a high proportion of asymptomatic cases. For instance if there are 10 cases reported with a 5% positive testing rate, that means that there are likely 50 “true” cases in the community. An assumed average of 5% positivity rate is assumed in the below example.

Probability of a Positive Contact

With a given number of infected persons calculated above, we can solve the equivalent of the “Birthday Problem” which is given a series of Bernoulli trials (with a binary outcome of success or failure), what is the probability that you come in contact with a person that is in the “infected/infectious” compartment.

\[ P(\text{Covid+ Encounter}) \sim Bin(n, \theta)\\ \]

Where

\[ n = \text{Number of Contacts} \\ \\ \theta = \frac{\text{Number Infected}}{\text{Population of County}} \]

This can be solved with a Taylor approximation:

\[ P(\text{COVID}^+\text{ Encounter}) \approx 1 - e^{-n*\theta} \]

This assumes that the population is well-mixed (e.g. one group is not any more likely to be infected than another) and that diligent quarantining is not taking place.

Simulation

Now we can illustration this methodology by applying this approach to a single county.

Data Preparation for Alamance

We can pull the daily cases using the nccovid package which is an R package that uses the scraped records from the North Carolina Department of Health and Human Services Covid-19 dashboard developed by the Cone Health Enterprise Analytics team.

alamance_cases <- nccovid::get_covid_state(select_county = "Alamance")

We can also pull the population from the nccovid package as well which has the latest estimates from the North Carolina State Demographer.

alamance_population <- nccovid::nc_population

(alamance_population <- alamance_population[county=="Alamance"]$july_2020)
[1] 174055

Now we can calculate the 7 day rolling average cases per 100k.

alamance_complete <- alamance_cases[, avg_daily_cases_per_100k := cases_daily/(alamance_population/100000)] %>% 
  .[,avg_daily_cases_per_100k_rolling := zoo::rollmean(avg_daily_cases_per_100k, 
                                                       k = 7, 
                                                       align = "right",
                                                       na.pad = TRUE)] %>% 
  .[,avg_daily_cases_per_100k_rolling:=round(avg_daily_cases_per_100k_rolling,1)]

Generate Simulation Grid

Given the above methodology, we can create a simulation grid which represent the different scenarios for which we wish to solve.

gathering_size <- seq(10,50,5)
cases_per_100 <- alamance_complete[date==max(date)]$avg_daily_cases_per_100k_rolling

sim_options <- tidyr::crossing(gathering_size, 
                               cases_per_100, 
                               pop = alamance_population)

We can then use our created functions to iterate over this grid of scenarios.

results <-purrr::pmap_dbl(.l = list(..1=sim_options$gathering_size,
                                    ..2=sim_options$cases_per_100,
                                    ..3 = sim_options$pop),
                          ~calc_risk(I = mean(
                            run_infections_simulation(n_people = 10000,
                                                    contact_rate_per_day = ..2,
                                                    positive_test_rate = .05,
                                                    infectious_period =
                                                      12)[["average_q"]])*..3/100000,
                            n = ..1, pop = ..3))

Put the outputs together:

output <- tibble::tibble(
  gathering_size = sim_options$gathering_size,
  cases_per_100 = sim_options$cases_per_100,
  prob_someone_having_covid = results
) %>% 
  unique()

We can then examine our results (at a case rate of 5 cases per 100k people) to see that as we increase the number of contacts, we will increase the probability that one of those contacts is SARS-CoV-2 positive.

output %>% 
  dplyr::select(1,3) %>% 
  setNames(c("Contacts per Day", "Probability of COVID+ Contact%")) %>% 
  kable()
Contacts per Day Probability of COVID+ Contact%
10 3.0
15 4.4
20 5.8
25 7.3
30 8.7
35 10.0
40 11.4
45 12.7
50 14.0

  1. As the number of susceptible persons (those who do not have any acquired immunity from infection) decreases in the long term, this approximation will not be as valid.↩︎

References

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".