Methods

This section discusses the methodology used to calculate the reproduction number for the county and the state. This includes the data sources, data processing, and final reproduction number calculation.

Why Calculate the Reproduction Number?

The Reproduction Number and basic reproduction number are often discussed with a high degree of uncertainty, but these are values that are not observed directly. The reproduction number must be estimated from available data that is often imperfect.1 The sources for these data include case counts, death counts, hospitalizations, and results from large scale serological surveys.

Regardless of this uncertainty, understanding the current reproduction number is important for the public and especially for policy-makers (Metcalf, Morris, and Park 2020; Di Domenico et al. 2020). The current reproduction number helps us understand the rate of spread and if we are reaching a point where the pandemic will die out or conversely if it is accelerating. This information can be used understand if more stringent non-pharmaceutical interventions are needed (such as a tighter lockdown) or if we can begin to relax policies. The reproduction number should never be taken in isolation and should be compared with the average number of daily cases in order to understand both the rate of spread and the existing number of infected in any given area.

Constraints Around the Reproduction Number

During an emerging outbreak such as SARS-CoV-2, large scale surveys are unavailable and we typically rely on case counts and perhaps hospitalizations and death counts to help inform out models.2

There are a variety of ways to calculate the reproduction number, but each method typically requires some measure of incidence (e.g. number of new cases or deaths per day) and the time between the new incidences. For incidence we can use information on daily new infections, new deaths, and/or new hospitalizations. The time between cases becomes more tricky. Ideally we would like to know the generation time or the time between first exposure to next exposure. However, we rarely directly observe the generation time and instead measure the serial interval or the time between two cases.3 In either case the availability of testing plays a key role as well - both in the delay distribution (i.e. the rate at which tests are resulted and appear as new cases) and the testing availability (i.e. in the case of low testing availability and a high testing rate, the incidence is likely underestimating the infection burden).

How We Calculate the Reproduction Number

Data Acquisition

The daily case counts are pulled and aggregated from the North Carolina Department of Health and Human Service Covid-19 Dashboard and are deposited in a GitHub repository at https://github.com/conedatascience/covid-data. The case counts by county that then be easily accessed via an R package developed to access these data (DeWitt 2020).

Data Treatment

As discussed above, understanding the impact of testing availability and testing positivity rate needs to be accounted in our reported case counts. The positive testing rate, or the proportion of tests that result positive, is shown in Figure 1. Ideally the percentage of positives would be as low as possible. Higher positive testing rates indicate that we are likely not capturing all the cases within a community.

North Carolina Positive Testing Rate

Figure 1: North Carolina Positive Testing Rate

In order to correct the case count, we utilize methods proposed by Boyce et al. (2016) to correct for under-testing. First a weighted generalised additive model is applied to the testing positive rate in order to smooth the rate. This method is applied to the overall state measures, though if the data at the county level were available it would be superior than using the state data uniformly across all counties. County specific data would captured spatial heterogeneity of spread and would provide greater insight into the impact of testing on case counts on a county by county basis.

Using the method described by Boyce et al. (2016) we can increase the cases using the following formula:4

\[ \text{Adjusted Cases} = \text{Observed Cases} * \text{Positive Rate}^{k*m} \]

Adjusted vs Reported Cases Counts by Day for North Carolina

Figure 2: Adjusted vs Reported Cases Counts by Day for North Carolina

The values of k and m are calibrated against national and internal data and review systematically.

Additionally, the overall testing average lag time measured using the reported cases and specimen collection. While this method is not perfect, it provides insight to the likely delay across the state.

Estimated North Carolina Testing Lag

Figure 3: Estimated North Carolina Testing Lag

Calculating R

Finally, with the adjusted case data we can start estimating the reproduction number. In the case of this analysis we utilize the methods put forward by Abbott et al. (2020) and operationalized in their R package called EpiNow2. All analysis is done in the R statistical environment (R Core Team 2019). Additionally, this package utilizes the Stan probabilistic language for Bayesian inference (Carpenter et al. 2017). This package applies convolutions to account for testing delay and the observed serial interval given literature values of the generation time in order to estimate the reproduction number.

Reporting the R

Uncertainty intervals should always be shown with the reproduction number estimates!5

It is important that we always express the range of possible values of the reproduction number. This is because the reproduction number is an estimate and uncertainty around the different components have been propagated throughout. This is especially important when the reproduction number is near 1. If R is slightly above 1, then infection is growing, while if it is less than 1 then the epidemic is dying out.

Definition of Indicator Thresholds

The following section details the criteria used for cards used for each county. The color guidelines should be used as references and not as absolutes or the ideal.

Reproduction Number

For each county the “Reproduction Number” card will be green if the upper bound of the reproduction number estimate is below 1. This indicates that there is a high probability that the spread of the disease is slowing down. Conversely, if the lower bound of the estimated reproduction number is above 1, then the card is red as there is a higher probability that the infection is spreading. In the case that the interval between the upper and lower bounds for the calculated reproduction include 1, then the card is marked yellow. The target for the state and each county is for the card to be green and thus high probability that the spread of the virus is slowing.

Average Number of Daily Cases per 100k

The average number of daily cases per uses the following thresholds:

Average Number of Daily Cases per 100k Color
Less Than 10 Green
10 - 15 Yellow
> 15 Red

Average Number of Daily Deaths per 100k

The average number of daily deaths per uses the following thresholds:

Average Number of Daily Deaths per 100k Color
Less Than 1 Green
1 - 5 Yellow
> 5 Red

Abbott, Sam, Joel Hellewell, Robin N. Thompson, Katharine Sherratt, Hamish P. Gibbs, Nikos I. Bosse, James D. Munday, et al. 2020. “Estimating the Time-Varying Reproduction Number of SARS-CoV-2 Using National and Subnational Case Counts.” Wellcome Open Research 5 (June): 112. https://doi.org/10.12688/wellcomeopenres.16006.1.

Boyce, Ross M., Raquel Reyes, Michael Matte, Moses Ntaro, Edgar Mulogo, Feng-Chang Lin, and Mark J. Siedner. 2016. “Practical Implications of the Non-Linear Relationship Between the Test Positivity Rate and Malaria Incidence.” PLoS ONE 11 (3). https://doi.org/10.1371/journal.pone.0152410.

Carpenter, Bob, Andrew Gelman, Matthew D. Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell. 2017. “Stan: A Probabilistic Programming Language.” Journal of Statistical Software 76 (1): 1–32. https://doi.org/10.18637/jss.v076.i01.

DeWitt, Michael. 2020. Nccovid: Pull North Carolina Covid-19 Outbreak Information. https://github.com/conedatascience/nccovid.

Di Domenico, Laura, Giulia Pullano, Chiara E. Sabbatini, Pierre-Yves Boëlle, and Vittoria Colizza. 2020. “Impact of Lockdown on COVID-19 Epidemic in île-de-France and Possible Exit Strategies.” BMC Medicine 18 (1): 240. https://doi.org/10.1186/s12916-020-01698-4.

Gostic, Katelyn M, Lauren McGough, Ed Baskerville, Sam Abbott, Keya Joshi, Christine Tedijanto, Rebecca Kahn, et al. n.d. “Practical Considerations for Measuring the Effective Reproductive Number, Rt,” 21.

Metcalf, C. Jessica E., Dylan H. Morris, and Sang Woo Park. 2020. “Mathematical Models to Guide Pandemic Response.” Science 369 (6502): 368–69. https://doi.org/10.1126/science.abd1668.

Pollán, Marina, Beatriz Pérez-Gómez, Roberto Pastor-Barriuso, Jesús Oteo, Miguel A. Hernán, Mayte Pérez-Olmeda, Jose L. Sanmartín, et al. 2020. “Prevalence of SARS-CoV-2 in Spain (ENE-COVID): A Nationwide, Population-Based Seroepidemiological Study.” The Lancet 0 (0). https://doi.org/10.1016/S0140-6736(20)31483-5.

R Core Team. 2019. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.

Stringhini, Silvia, Ania Wisniak, Giovanni Piumatti, Andrew S Azman, Stephen A Lauer, Helene Baysson, David De Ridder, et al. 2020. “Repeated Seroprevalence of Anti-SARS-CoV-2 IgG Antibodies in a Population-Based Sample from Geneva, Switzerland.” Preprint. Infectious Diseases (except HIV/AIDS). https://doi.org/10.1101/2020.05.02.20088898.

Thompson, Robin N., T. Déirdre Hollingsworth, Valerie Isham, Daniel Arribas-Bel, Ben Ashby, Tom Britton, Peter Challenor, et al. 2020. “Key Questions for Modelling COVID-19 Exit Strategies.” Proceedings of the Royal Society B: Biological Sciences 287 (1932): 20201405. https://doi.org/10.1098/rspb.2020.1405.

Zelner, Jon, Julien Riou, Ruth Etzioni, and Andrew Gelman. 2020. “Accounting for Uncertainty During a Pandemic.” arXiv:2006.08745 [Physics, Q-Bio, Stat], June. http://arxiv.org/abs/2006.08745.


  1. See Thompson et al. (2020) for a much larger discussion around these questions and models during a pandemic.↩︎

  2. There have been some large scale seroprevalence surveys that have provided great insight into the heterogeneous spread of SARS-CoV-2 as detailed in Switzerland in Stringhini et al. (2020) and in Spain as discussed in Pollán et al. (2020). However, such studies are not available for North Carolina at this time–though there is an active study being run by Wake Forest University Baptist Health trying to answer this question.↩︎

  3. See Gostic et al. (n.d.) for a much longer discussion of some of these constraints and the differences in modeling methodologies.↩︎

  4. Peter Ellis has a fantastic blog post that illustrates this method and associated calibration requirements. See http://freerangestats.info/blog/2020/05/09/covid-population-incidence for details.↩︎

  5. See Zelner et al. (2020) on why reinforcing uncertainty is so important.↩︎

References

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".