Methods

This section discusses the methodology used to calculate the reproduction number for the county and the state. This includes the data sources, data processing, and final reproduction number calculation.

Why Calculate the Reproduction Number?

The Reproduction Number and basic reproduction number are often discussed with a high degree of uncertainty, but these are values that are not observed directly. The reproduction number must be estimated from available data that is often imperfect.1 The sources for these data include case counts, death counts, hospitalizations, and results from large scale serological surveys.

Regardless of this uncertainty, understanding the current reproduction number is important for the public and especially for policy-makers (Metcalf, Morris, and Park 2020; Di Domenico et al. 2020). The current reproduction number helps us understand the rate of spread and if we are reaching a point where the pandemic will die out or conversely if it is accelerating. This information can be used understand if more stringent non-pharmaceutical interventions are needed (such as a tighter lockdown) or if we can begin to relax policies. The reproduction number should never be taken in isolation and should be compared with the average number of daily cases in order to understand both the rate of spread and the existing number of infected in any given area.

Constraints Around the Reproduction Number

During an emerging outbreak such as SARS-CoV-2, large scale surveys are unavailable and we typically rely on case counts and perhaps hospitalizations and death counts to help inform out models.2

There are a variety of ways to calculate the reproduction number, but each method typically requires some measure of incidence (e.g. number of new cases or deaths per day) and the time between the new incidences. For incidence we can use information on daily new infections, new deaths, and/or new hospitalizations. The time between cases becomes more tricky. Ideally we would like to know the generation time or the time between first exposure to next exposure. However, we rarely directly observe the generation time and instead measure the serial interval or the time between two cases.3 In either case the availability of testing plays a key role as well - both in the delay distribution (i.e. the rate at which tests are resulted and appear as new cases) and the testing availability (i.e. in the case of low testing availability and a high testing rate, the incidence is likely underestimating the infection burden).

How We Calculate the Reproduction Number

Data Acquisition

The daily case counts are pulled and aggregated from the North Carolina Department of Health and Human Service Covid-19 Dashboard and are deposited in a GitHub repository at https://github.com/conedatascience/covid-data. The case counts by county that then be easily accessed via an R package developed to access these data (DeWitt 2020).

Data Treatment

We continue to evolve our methodology given the data available to make estimations. As data reporting by NCDHHS has transitioned to a weekly cadence and the community reporting has decreased (with the increase in at home testing availability and reduction of mandatory testing procedures), positivity rates interpretations vary and thus we have transitioned away from using positivity as a means to adjust case rates in favor of estimation using only known weekly cases. This is an underestimate of infection burden, however upward/downward trends in reported cases still likely provide a useful indicator of infection trends in the short run.

Calculating R

In the case of this analysis we utilize the methods put forward by Cori et al. (2013) and operationalized in the R package called EpiEstim (Cori et al. 2021). All analysis is done in the R statistical environment (R Core Team 2019). These estimates can be accessed via the r-estimates-cori github repository.

Reporting the R

Uncertainty intervals should always be shown with the reproduction number estimates!4

It is important that we always express the range of possible values of the reproduction number. This is because the reproduction number is an estimate and uncertainty around the different components have been propagated throughout. This is especially important when the reproduction number is near 1. If R is slightly above 1, then infection is growing, while if it is less than 1 then the epidemic is dying out.

Definition of Indicator Thresholds

The following section details the criteria used for cards used for each county. The color guidelines should be used as references and not as absolutes or the ideal.

Reproduction Number

For each county the “Reproduction Number” card will be green if the upper bound of the reproduction number estimate is below 1. This indicates that there is a high probability that the spread of the disease is slowing down. Conversely, if the lower bound of the estimated reproduction number is above 1, then the card is red as there is a higher probability that the infection is spreading. In the case that the interval between the upper and lower bounds for the calculated reproduction include 1, then the card is marked yellow. The target for the state and each county is for the card to be green and thus high probability that the spread of the virus is slowing.

Average Number of Daily Cases per 100k

The average number of daily cases per uses the following thresholds:

Average Number of Daily Cases per 100k Color
Less Than 10 Green
10 - 15 Yellow
> 15 Red

Average Number of Daily Deaths per 100k

The average number of daily deaths per uses the following thresholds which are benchmarked against the reported number of daily cases:

Average Number of Daily Deaths per 100k Color
Less Than 0.8% Fatality Rate Green
0.8% - 1.2% Yellow
> 1.2% Red

See Russell et al. (2020); Meyerowitz-Katz and Merone (2020); Verity et al. (2020) for more details regarding the case fatality and infection fatality rates.

Prior R Estimation Methodolody

Prior to August 2022

Data Treatment

As discussed above, understanding the impact of testing availability and testing positivity rate needs to be accounted in our reported case counts. Ideally the percentage of positives would be as low as possible. Higher positive testing rates indicate that we are likely not capturing all the cases within a community.

In order to correct the case count, we utilize methods proposed by Boyce et al. (2016) to correct for under-testing. First a weighted generalised additive model is applied to the testing positive rate in order to smooth the rate. This method is applied to the overall state measures, though if the data at the county level were available it would be superior than using the state data uniformly across all counties. County specific data would captured spatial heterogeneity of spread and would provide greater insight into the impact of testing on case counts on a county by county basis.

Using the method described by Boyce et al. (2016) we can increase the cases using the following formula:5

\[ \text{Adjusted Cases} = \text{Observed Cases} * \text{Positive Rate}^{k*m} \]

The values of k and m are calibrated against national and internal data and review systematically.

Additionally, the overall testing average lag time measured using the reported cases and specimen collection. While this method is not perfect, it provides insight to the likely delay across the state.

Calculating R

Finally, with the adjusted case data we can start estimating the reproduction number. In the case of this analysis we utilize the methods put forward by Abbott et al. (2020) and operationalized in their R package called EpiNow2. All analysis is done in the R statistical environment (R Core Team 2019). Additionally, this package utilizes the Stan probabilistic language for Bayesian inference (Carpenter et al. 2017). This package applies convolutions to account for testing delay and the observed serial interval given literature values of the generation time in order to estimate the reproduction number.

Abbott, Sam, Joel Hellewell, Robin N. Thompson, Katharine Sherratt, Hamish P. Gibbs, Nikos I. Bosse, James D. Munday, et al. 2020. “Estimating the Time-Varying Reproduction Number of SARS-CoV-2 Using National and Subnational Case Counts.” Wellcome Open Research 5 (June): 112. https://doi.org/10.12688/wellcomeopenres.16006.1.
Boyce, Ross M., Raquel Reyes, Michael Matte, Moses Ntaro, Edgar Mulogo, Feng-Chang Lin, and Mark J. Siedner. 2016. “Practical Implications of the Non-Linear Relationship Between the Test Positivity Rate and Malaria Incidence.” PLoS ONE 11 (3). https://doi.org/10.1371/journal.pone.0152410.
Carpenter, Bob, Andrew Gelman, Matthew D. Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell. 2017. “Stan: A Probabilistic Programming Language.” Journal of Statistical Software 76 (1): 1–32. https://doi.org/10.18637/jss.v076.i01.
Cori, A, NM Ferguson, C Fraser, and S Cauchemez. 2013. A New Framework and Software to Estimate Time-Varying Reproduction Numbers During Epidemics.” Am. J. Epidemiol. https://doi.org/10.1093/aje/kwt133.
Cori, A, ZN Kamvar, J Stockwin, T Jombart, E Dahlqwist, R FitzJohn, and R Thompson. 2021. EpiEstim v2.2-3: A tool to estimate time varying instantaneous reproduction number during epidemics.” GitHub Repository. https://github.com/mrc-ide/EpiEstim; GitHub.
DeWitt, Michael. 2020. Nccovid: Pull North Carolina Covid-19 Outbreak Information. https://github.com/conedatascience/nccovid.
Di Domenico, Laura, Giulia Pullano, Chiara E. Sabbatini, Pierre-Yves Boëlle, and Vittoria Colizza. 2020. “Impact of Lockdown on COVID-19 Epidemic in île-de-France and Possible Exit Strategies.” BMC Medicine 18 (1): 240. https://doi.org/10.1186/s12916-020-01698-4.
Gostic, Katelyn M, Lauren McGough, Ed Baskerville, Sam Abbott, Keya Joshi, Christine Tedijanto, Rebecca Kahn, et al. n.d. “Practical Considerations for Measuring the Effective Reproductive Number, Rt,” 21.
Health, Cone. n.d. “The Network for Exceptional Care.” http://www.conehealth.com/.
Metcalf, C. Jessica E., Dylan H. Morris, and Sang Woo Park. 2020. “Mathematical Models to Guide Pandemic Response.” Science 369 (6502): 368–69. https://doi.org/10.1126/science.abd1668.
Meyerowitz-Katz, Gideon, and Lea Merone. 2020. “A Systematic Review and Meta-Analysis of Published Research Data on COVID-19 Infection Fatality Rates.” International Journal of Infectious Diseases 101 (December): 138–48. https://doi.org/10.1016/j.ijid.2020.09.1464.
Pollán, Marina, Beatriz Pérez-Gómez, Roberto Pastor-Barriuso, Jesús Oteo, Miguel A. Hernán, Mayte Pérez-Olmeda, Jose L. Sanmartín, et al. 2020. “Prevalence of SARS-CoV-2 in Spain (ENE-COVID): A Nationwide, Population-Based Seroepidemiological Study.” The Lancet 0 (0). https://doi.org/10.1016/S0140-6736(20)31483-5.
R Core Team. 2019. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Russell, Timothy W., Joel Hellewell, Christopher I. Jarvis, Kevin van Zandvoort, Sam Abbott, Ruwan Ratnayake, CMMID COVID-19 working Group, et al. 2020. “Estimating the Infection and Case Fatality Ratio for Coronavirus Disease (COVID-19) Using Age-Adjusted Data from the Outbreak on the Diamond Princess Cruise Ship, February 2020.” Eurosurveillance 25 (12): 2000256. https://doi.org/10.2807/1560-7917.ES.2020.25.12.2000256.
Stringhini, Silvia, Ania Wisniak, Giovanni Piumatti, Andrew S Azman, Stephen A Lauer, Helene Baysson, David De Ridder, et al. 2020. “Repeated Seroprevalence of Anti-SARS-CoV-2 IgG Antibodies in a Population-Based Sample from Geneva, Switzerland.” Preprint. Infectious Diseases (except HIV/AIDS). https://doi.org/10.1101/2020.05.02.20088898.
Thompson, Robin N., T. Déirdre Hollingsworth, Valerie Isham, Daniel Arribas-Bel, Ben Ashby, Tom Britton, Peter Challenor, et al. 2020. “Key Questions for Modelling COVID-19 Exit Strategies.” Proceedings of the Royal Society B: Biological Sciences 287 (1932): 20201405. https://doi.org/10.1098/rspb.2020.1405.
Verity, Robert, Lucy C Okell, Ilaria Dorigatti, Peter Winskill, Charles Whittaker, Natsuko Imai, Gina Cuomo-Dannenburg, et al. 2020. “Estimates of the Severity of Coronavirus Disease 2019: A Model-Based Analysis.” The Lancet Infectious Diseases 20 (6): 669–77. https://doi.org/10.1016/S1473-3099(20)30243-7.
Zelner, Jon, Julien Riou, Ruth Etzioni, and Andrew Gelman. 2020. “Accounting for Uncertainty During a Pandemic.” arXiv:2006.08745 [Physics, q-Bio, Stat], June. http://arxiv.org/abs/2006.08745.

  1. See Thompson et al. (2020) for a much larger discussion around these questions and models during a pandemic.↩︎

  2. There have been some large scale seroprevalence surveys that have provided great insight into the heterogeneous spread of SARS-CoV-2 as detailed in Switzerland in Stringhini et al. (2020) and in Spain as discussed in Pollán et al. (2020). However, such studies are not available for North Carolina at this time–though there is an active study being run by Wake Forest University Baptist Health trying to answer this question.↩︎

  3. See Gostic et al. (n.d.) for a much longer discussion of some of these constraints and the differences in modeling methodologies.↩︎

  4. See Zelner et al. (2020) on why reinforcing uncertainty is so important.↩︎

  5. Peter Ellis has a fantastic blog post that illustrates this method and associated calibration requirements. See http://freerangestats.info/blog/2020/05/09/covid-population-incidence for details.↩︎

References

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".