Yale University

Estimating the Size of Hidden and Hard-to-Reach Populations using Respondent-driven Sampling

Principle Investigator(s):

Funder: NIMH through CIRA's Pilot Project Program
Project period: 02/20/2015 - 02/19/2016
Grant Type: Pilot Project

Abstract Text:

A major unresolved problem in population epidemiology – especially epidemiology of HIV – is accurate estimation of the number of people who are members of key risk populations. When these populations are stigmatized or criminalized, there may be no “sampling frame”. Several methods for hidden population size estimation exist, including the multiplier method, capture-recapture, and the network scale-up estimator. Unfortunately these techniques rely on unrealistic modeling assumptions or require a random sample to ensure validity. Use of existing tools for inferring hidden population size can produce dramatically biased estimates. The lack of robust alternatives has hindered epidemiology for hidden and hard-to-reach risk groups.

Respondent-driven sampling (RDS) is a widely used procedure for recruiting members of hidden populations into a research study. In RDS, subjects recruit acquaintances, and the recruitment process spreads through the social network of the target population. However, there is no reliable method for estimating hidden population sizes from data obtained by RDS. Researchers must conduct a separate study to estimate population sizes, which can be prohibitively expensive or logistically impossible.

I propose to develop rigorous statistical methodology and software to estimate population sizes directly from an RDS study. The key insight is that the pattern of recruitments and network sizes of subjects provide information about the number of individuals in the target population who are not in the RDS sample. I will validate the method using simulated data and a study in which the size of the target population is known. I will apply the technique to RDS studies of injection drug users in Connecticut, St. Petersburg, Russia, and Kohtla-Järve, Estonia. This pilot project will provide preliminary results and statistical software for a more ambitious R01 proposal on social network inference in public health.