Yale University

Network-Based Epidemiology for Hidden and Hard-To-Reach Populations

Principle Investigator(s):

Funder: Office of the Director, National Institutes of Health
Project period: 09/14/2016 - 08/31/2021
Grant Type: Research
Further Detail

Abstract Text:

Hidden or hard-to-reach populations such as sex workers, men who have sex with men, or people who inject drugs suffer from a disproportionately high burden of adverse health outcomes, but are the most difficult to study. Members of these groups are often socially stigmatized or legally criminalized, so potential subjects are not directly enumerable and random sampling is usually impossible. For this reason, researchers have developed survey techniques that rely on tracing social network links between individuals. The most popular technique is respondent-driven sampling (RDS), and it is widely used in epidemiological and clinical research on HIV, HBV, HCV, and syphilis, tobacco, alcohol, and illicit drug use, and access to treatment. RDS is also used by the CDC for HIV surveillance in the US, and by UNAIDS/WHO internationally. Remarkably, most of the network information contained in these samples is either discarded or misapplied in standard approaches to estimate characteristics of the target population from RDS data. Methodological research on RDS is focused on two related inferential targets: social network characteristics (e.g. clustering, degree, centrality) and population-level quantities (e.g. HIV prevalence, total population size), but accurate estimation of network structures, disease rates, and risk factors in high-risk hidden and hard-to-reach populations remains a major unsolved problem in public health. In this proposal I outline a plan to develop rigorous methodology for social epidemiology from social link-tracing designs in hidden and hard-to-reach populations. The key insight in this work is that RDS reveals structural information about the target population social network that can be used to dramatically improve epidemiological inference. I will begin by showing that existing statistical approaches to analyzing data from link-tracing studies rely on unrealistic assumptions, neglect important observable data, and produce estimates that suffer from serious bias. By rigorously characterizing the observed and missing network data for each sampling process, I will provide statistical and mathematical tools that allow researchers leverage the network data revealed by RDS. The approach allows accurate estimation of population averages (e.g. HIV prevalence), assessment of risk factors associated with epidemiological outcomes using network regression, hidden population size estimation, and geospatial mapping of risk and health outcomes. This network-based perspective is a radical departure from established approaches to RDS and has the potential to revolutionize the way epidemiologists collect and analyze data from surveys of hidden and hard-to-reach risk groups. Finally, I will develop free, open-source, web-based software for design and analysis of RDS studies that will be available to anyone anywhere in the world. Preliminary application of these ideas to empirical studies in real-world risk populations has already yielded promising results. The proposed work is innovative because it leverages previously neglected network information collected by every RDS study and uses it to dramatically improve the accuracy and precision of population-level estimates for key risk populations in public health.