Introduction
Zika virus (ZIKV) is a flavivirus transmitted primarily by Aedes species mosquitoes. Although typically characterised by a mild illness, ZIKV infection has been associated with severe birth defects, including microcephaly and with Guillain–Barré Syndrome in adults [Reference Rasmussen1, Reference Santos2]. ZIKV infections spread rapidly throughout Central and South America beginning in 2015. The local mosquito-borne transmission has been reported in a majority of these countries [3] and occurred in the continental USA in Florida and Texas in 2016 [Reference Likos4, 5]. Although Aedes aegypti mosquitoes are not found in New York City (NYC), Aedes albopictus mosquitoes are routinely detected and might be a competent vector for ZIKV [Reference Bajwa6–Reference Azar8]. A modelling study conducted in 2016 suggested that NYC might be the destination city at third highest risk worldwide of the local ZIKV establishment if the competence of the A. albopictus vector is similar to that of A. aegypti [Reference Gardner, Chen and Sarkar7]. In 2016, given a large number of imported ZIKV infections [9] and a lack of clarity regarding the competence of A. albopictus as a vector for ZIKV transmission [10, Reference Jupille11], the NYC Department of Health and Mental Hygiene (DOHMH) began to prepare for the possibility of local mosquito-borne transmission [Reference Lee12].
During April–October 2016, DOHMH conducted enhanced surveillance for local mosquito-borne ZIKV transmission through sentinel surveillance, syndromic surveillance of chief complaints of patients presenting to NYC emergency departments and routine mapping and analyses of human cases and mosquito data [Reference Lee12–Reference Greene, Lim and Fine14]. In August 2016, DOHMH began contingency planning for a urosurvey to detect ZIKV RNA in urine from residents in any area where a mosquito-borne, locally acquired case of confirmed ZIKV infection was suspected in a NYC resident. A urosurvey, as opposed to a serosurvey, was considered appropriate for ZIKV detection because of ease of specimen collection, problems with cross-reactivity with serological assays and high sensitivity of urine polymerase chain reaction (PCR) soon after infection [Reference Gourinat15, Reference Priyamvada16]. In 2016, when local transmission was first suspected in Florida, Texas and Utah, urosurveys were conducted to screen all residents within a 150–300 m radius of the residence of the confirmed case of suspected local transmission [Reference Likos4, Reference Brent17] (personal communication: Tom Sidwa, DVM, MPH, Texas State Public Health Veterinarian, 29 June 2017). These urosurveys were conducted in low population density areas. NYC has a population of over 8.5 million persons and a population density of over 27 000 persons/square mile city-wide and over 69 000 persons/square mile in the borough of Manhattan [18].
To develop a contingency plan for conducting a urosurvey in a high population density environment in which urine specimens from all persons residing in a suspected risk area around the residence of a case might not feasibly be collected and tested because of resource limitations, we estimated sample size requirements. Preparations included establishing the capability to rapidly estimate the population size in any suspected risk area, to calculate the sample size of the number of persons required for testing to substantiate freedom from locally acquired ZIKV infection in NYC and to estimate the number of households needed to be approached to achieve the required sample size.
Materials and methods
A confirmed case of ZIKV infection in an NYC resident with no reported history of travel to an area with active ZIKV circulation, no sexual exposure and no suspected exposure through blood transfusion would be considered to represent suspected local mosquito-borne transmission. In the event of two such cases with transmission suspected to have occurred within one mile, a circular suspected risk area with a 150-m radius around each residence or suspected exposure location would be established to search for additional possible cases.
Geographic information system (GIS) application to estimate population size in the suspected risk area
For the purposes of a urosurvey, we determined we would first attempt to quickly and accurately estimate the population living within any given 150-m radius in NYC. This radius corresponds to the typical lifetime flight distance of Aedes mosquitoes [19, Reference Marini20]. We leveraged existing NYC infrastructure and emergency planning data developed and maintained by the DOHMH GIS Center, NYC Department of Information Technology and Telecommunications; the NYC Office of Emergency Management; and the NYC DOHMH Office of Emergency Preparedness and Response (OEPR). As part of OEPR's mission to prevent, prepare for, respond to and recover from health emergencies in NYC, a program known as the Post-Emergency Canvassing Operation [21] relies on a detailed spatial dataset to facilitate rapid population surveys to determine areas at greatest risk of adverse health events in a post-emergency setting. We took advantage of these existing city resources to develop a GIS-based application for population estimation. The application references open source data regarding the location of address points [22], joined with geocoded US Postal Service data to determine the number of residential units in buildings in the suspected risk area [23]. We estimated the population size in the suspected risk area by using a series of crosswalks to assign a household size to each residential unit.
First, each address point in the suspected risk area was joined to a tax lot in NYC. To assign the mean 2010 US Census block household size to each address point [24], we linked each tax lot of an address point to the tax lot of the census block. Each residential unit (assuming 100% occupancy) was then assigned the mean census block household size and the total population was derived by summing the estimated size of each household for all residential units.
Method for substantiating freedom from infection
Methods to calculate the sample size for a one-sample proportion test often rely on the normal approximation to the binomial distribution, which is only appropriate if the prevalence is not very low or high (e.g. where 0.2 ⩽ prevalence ⩽0.8) or if the sample is very large [25]. In our use case, the expected prevalence of additional local transmission cases was near zero, so we looked to methods commonly used by veterinary epidemiologists to perform sampling of livestock herds to determine if they meet livestock trade requirements and to document freedom from infection after an outbreak [Reference Cameron and Baldock26, Reference Ziller27]. We used the function computeOptimalSampleSize within a package called Freedom from Disease (FFD) in the statistical software R (R Foundation for Statistical Computing, Vienna, Austria, 2017) to calculate sample size [Reference Kopacka28, Reference Kopacka29]. This method uses a hypergeometric function, which is appropriate for sampling without replacement from a population of known size, modified to account for the imperfection of the diagnostic assay [Reference Cameron and Baldock26]. After data are collected, the upper limit of the confidence interval for the ZIKV prevalence estimate would be calculated using the Clopper-Pearson method for a hypergeometric distribution [Reference Wang30, Reference Korn and Graubard31].
Inputs required for sample size calculation
(1) Estimated population size. Underlying population size varies on the basis of location of any given suspected risk area.
(2) Design prevalence. For illustrative purposes, the maximum number of permitted cases that could be missed in any suspected risk area was determined to be 10 by DOHMH epidemiologists and specialists in logistics and emergency operations. Therefore, the design prevalence, i.e. the minimal prevalence expected if local mosquito-borne ZIKV transmission occurred, was inputted as: 10/estimated population size for any given suspected risk area. This input can be modified as needed, depending on the specific context.
(3) Alpha level. Significance level, or risk of false rejection of a true null hypothesis, was set to 0.05. The lower the alpha level, the greater the certainty that the number of missed cases was lower than specified.
(4) Specificity of urine PCR assay. As is common practice in surveys substantiating freedom from infection, the specificity of the assay was assumed to be 100%, which leads to perfect statistical power [Reference Kopacka28]. The power of the statistical test to assess whether a population is disease-free is related to the specificity of the diagnostic assay (power = 1 - type II error (β)). As the diagnostic test specificity increases (fewer false positives), the value of β decreases; thus, the power increases.
(5) Sensitivity of urine PCR diagnostic assay. CDC's Trioplex Real-time RT-PCR Assay was used for the qualitative detection of ZIKV RNA from serum and urine under the Food and Drug Administration's Emergency Use Authorisation [32]. We estimated the sensitivity of the urine PCR assay by evaluating data collected from persons reported with ZIKV to DOHMH in 2016. Of 377 patients with confirmed ZIKV — who had both serum and urine sample collected <14 days from symptom onset, during 1 January–28 October 2016, either on the same day or within a single day of each other and tested by DOHMH — 349 (92.6%) patients were positive by urine PCR and 120 (31.8%) patients were positive by serum PCR. The A total of 92 (24.4%) patients tested positive by both. Therefore, the sensitivity of the urine PCR assay was assumed to be 92.6%. We evaluated a range of the sensitivity of the diagnostic assay to assess the effect of our assumed sensitivity on the final sample size.
(6) Participation rate. In 1999, when West Nile virus was introduced into NYC, DOHMH conducted a household-based serosurvey among 1861 homes. In that survey, a total of 1069 (57%) homes had an adult present at the time of approach. Of those households, 470 (44%) participated [Reference Mostashari33]. Because a urosurvey is less invasive than a serosurvey, we assumed urosurvey participation would be at a higher rate of 60%. Therefore, we inflated the number of samples output by the sample size calculations to estimate the number of persons who would need to be approached, assuming 60% would agree to participate. To estimate the number of households to be approached, we first divided the total number of persons required to approach by the mean census block household size [24]. We further assumed that only 57% of households would have an adult at home at the time of approach; therefore, we further inflated the number of households that needed to be approached to obtain samples.
Sampling scenarios
If we could approach households in all identified buildings in the suspected risk area, then we would perform a simple random sample of households. For multi-unit apartment buildings, if we could obtain a list of residences within each building from building management, then households would be selected either randomly or systematically.
Exclusions
Persons who recently travelled or had sex with someone who recently travelled to a country with local ZIKV transmission would be excluded because of the focus on identifying local mosquito-borne ZIKV infection. All other household residents would be approached for sampling. Participation would be completely voluntary and with informed consent.
Representativeness of the collected data
As part of the urosurvey, data on age and sex would be collected for each participant. Distribution of these demographic characteristics would be compared using chi-square tests with the underlying demographic distribution, as determined by census data [24], to assess the representativeness of the sample. If the urosurvey sample was nonrepresentative by age or sex, post-stratification weighting adjustments would be performed to reduce potential nonresponse bias.
Suspected example risk areas
We selected three suspected example risk areas in NYC on the basis of varying combinations of the number of residential units (Fig. 1a) and the number of residential units per building (Fig. 1b). We chose example areas with the following profiles: (i) low numbers of residential units of predominately single family homes, (ii) high numbers of residential units with residences distributed across multiple low-rise residential buildings and (iii) high numbers of residential units with residences concentrated in a limited number of high-rise apartment complexes.
CDC reviewed this study for human subjects protection and deemed it to be nonresearch.
RESULTS
An in-house GIS application facilitated rapid quantification and visualisation of the number of buildings and residential units within any given 150-m radius in NYC (Fig. 2). Estimated population sizes of the three suspected example risk areas ranged from 479 in Area A to 4453 in Area B (Table 1).
a From New York City Department of Information Technology and Telecommunications [22].
b From US Postal Service data [23].
c Derived by assigning mean census block household size to all US Postal Service residential doorways included in the 150-m radius [23, 24].
d Using a modified hypergeometric function, simple random sampling, assuming a maximum of 10 missed cases in each location and a diagnostic assay sensitivity of 92.6%.
e Assuming 60% participation rate.
f Census 2010 data [24].
g Assuming 57% of households have an adult at home when approached for participation.
The minimum required number of urine samples to substantiate freedom from infection ranged from 133 in Area A to 1244 in Area B. This volume was considered manageable by NYC emergency operations and laboratory partners.
Using Area B as an example, with a sample of 1244 negative urine samples from a population of 4453 in the 150-m suspected risk area, a maximum of 10 cases (0.2% of the population) could have been missed. To account for nonparticipation, 2073 persons in Area B would need to be approached for participation. This corresponds to 1415 households, assuming a mean household size of 2.57 persons in Area B and that 57% of approached households have an adult at home at the time of the survey.
Sensitivity analyses demonstrated that the minimum required number of samples increased as the assumed sensitivity of the urine PCR diagnostic assay decreased (Table 2). In Area B, for example, assuming 100% sensitivity of the diagnostic assay yielded a required sample size of 1152. Reducing assumed sensitivity from 100% to 80% led to a 25% increase in required sample size (n = 1440). Reducing the assumed sensitivity from 100% to 70% resulted in a 43% increase in required sample size (n = 1646).
a Primary analysis sensitivity of urine PCR assay was assumed 92.6%.
Discussion
In 2016, local mosquito-borne transmission of ZIKV was considered possible in NYC. As resources would likely not allow for sampling all persons residing in a suspected risk area in this dense, urban setting, NYC DOHMH developed a sampling plan reliant on detailed population spatial data and sample size methods for rare events. To substantiate freedom from infection in three suspected example risk areas in NYC, the minimum number of required urosurvey samples ranged from 133 to 1244.
To accurately estimate the population size residing in any 150-m radius within NYC, we used an internal mapping platform leveraging existing, detailed population spatial data from local and national sources. This approach can be useful in other emergency responses requiring estimation of the number of persons at-risk, such as water contamination events and man-made or natural disasters. Investigators in Taiwan developed a GIS dengue surveillance platform to estimate a population in a suspected risk area that relies on open source address data and current and historical public health surveillance data to aid in real-time dengue prevention and control decisions [Reference Chan34].
Second, we needed to determine the number of persons needed to test to rule out ongoing disease transmission; therefore, we used a method commonly used to substantiate freedom from infection in animal herds [Reference Cameron and Baldock26, Reference Ziller27]. The FFD package in the statistical software R is available at no cost and readily implemented when reasonable assumptions can be made regarding population size and the sensitivity and specificity of the diagnostic assay [Reference Cameron and Baldock26]. In addition to ZIKV use, these methods can be used for seroprevalence surveys for infections (e.g. novel influenza or other imported diseases) with extremely low anticipated prevalence.
Based on DOHMH data for patients tested <14 days from symptom onset, we assumed that the sensitivity of the urine PCR assay was 92.6%. However, this estimate might be too high, because data from Puerto Rico indicated that most patients clear ZIKV RNA from urine by 8 days (95% confidence interval 6.4–10.0 days) after symptom onset [Reference Paz-Bailey35]. Our sensitivity analysis revealed that reducing the assumed sensitivity of the diagnostic assay increased the final sample size estimate; therefore, a conservative approach would be to assume the lowest diagnostic assay sensitivity that is reasonable. We also assumed that the specificity of the urine PCR assay was 100%. According to CDC, despite the specificity of molecular testing, false positive PCR-based assay results have been reported in rare cases and might depend on the type of assay performed and the patient population (i.e. patients with limited or no prevalence of viral transmission) being tested [36].
Sample size calculations relied on several additional assumptions. First, we assumed that the estimated population sizes were sufficiently accurate. However, the 2010 US Census and 2014 US Postal Service data used to estimate the population size might have been outdated, resulting in underestimates of the population at-risk residing in areas of the city experiencing population increases. Additionally, we might have overestimated the population size by assuming every residential unit was occupied. Population estimates also included persons ineligible for urosurvey participation (recent travelers) and excluded persons who did not reside in but worked in or otherwise spent time in a suspected risk area.
Second, we assumed that it would be acceptable for the urosurvey to miss ⩽10 locally acquired ZIKV cases living within a suspected risk area. To assert that as few as 1–2 cases were missed would have required that all persons were surveyed and a near-perfect participation rate, which is unrealistic. The choice of ⩽10 missed cases was a compromise, balancing available financial and staff resources (particularly if multiple urosurveys were operational simultaneously), statistical confidence in not having missed cases, laboratory capacity and local political considerations. The design prevalence input is easily modifiable, however, depending on context.
Third, we implicitly assumed the risk of contracting ZIKV is homogenous for all residents within a suspected risk area and the risk for survey participants is the same as for nonparticipants. We did not account for within-household or within-building clustering, because we assumed exposure to an infected mosquito would most likely occur outside. Homes in the outer boroughs of NYC (Brooklyn, the Bronx, Queens and Staten Island) typically have screens and large buildings in Manhattan typically have air conditioning. Furthermore, A. albopictus mosquitoes typically feed outdoors without entering homes or flying higher than 35 feet [Reference Obenauer37]. If a subset of residents is thought to be at increased risk, these residents could be more intensively sampled than the rest of the population for additional assurance that no cases are missed. Furthermore, if incoming sample data during an active urosurvey suggest that the participation rate is considerably lower than 60%, we can modify this input and increase the number of households to approach.
Our sample size calculation efforts were one component of a complete urosurvey contingency plan for suspected local ZIKV transmission. In addition, DOHMH community outreach staff developed operational plans for door-to-door outreach in the suspected area at risk, logistics staff worked to define processes regarding specimen collection and transport, laboratory staff developed a laboratory testing protocol and communications specialists helped to define inter-agency and public communication strategies. A Zika testing community site plan was developed to facilitate the collection of urine samples for residents who are unable to provide a sample at the time of the initial home visit. Furthermore, neighbourhood awareness activities such as mosquito breeding site prevention and education to reduce mosquito bites would be conducted by environmental health colleagues within DOHMH. Much of this effort was supported by NYC's incident command system infrastructure, which relied upon the agency and citywide resources for local transmission contingency planning and ZIKV surveillance and vector control [Reference Lee12].
CONCLUSIONS
The availability of detailed population spatial data and specialised methods to calculate the sample size necessary to substantiate freedom from infection supported a contingency plan to quickly respond to suspected mosquito-borne, local ZIKV transmission in NYC. Other jurisdictions that are unable to feasibly sample all residents in a suspected risk area can adapt these methods to rapidly estimate an appropriate sample size to substantiate freedom from infection in the event of suspected locally acquired ZIKV. Although ZIKV transmission declined extensively in Central and South America in 2017, reducing the risk of importation to NYC [38], this exercise improved our capability to rule out local transmission of emerging infections (such as other vectorborne or respiratory infections) in a dense, urban area where sampling the entire population is not feasible.
Acknowledgements
We sincerely thank Drs Marcelle Layton and Don Weiss (Bureau of Communicable Disease, NYC DOHMH) for oversight and critical input and Dr. Brad Biggerstaff (Division of Vectorborne Diseases, CDC, Fort Collins, Colorado, USA) for his advice regarding sampling methods and software. This investigation received no specific grant from any funding agency, commercial or not-for-profit sectors.
Conflict of interest
The findings and conclusions in this report are those of the authors and do not necessarily represent the official position of the Centers for Disease Control and Prevention.