Building a National HIV Cohort from Routine Laboratory Data: Probabilistic Record-Linkage with Graphs

ABSTRACT Background Chronic disease management requires the ability to link patient records across multiple interactions with the health sector. South Africa’s National Health Laboratory Service (NHLS) conducts all routine laboratory monitoring for the country’s national public sector HIV program. However, the absence of a validated patient identifier has limited the potential of the NHLS database for epidemiological research, policy evaluation, and longitudinal patient care. We developed and validated a record linkage algorithm, creating a unique patient identifier and enabling analysis of the NHLS database as a national HIV cohort. To our knowledge, this is the first national HIV cohort in any low-or middle-income country.Methods. We linked data on all CD4 counts, HIV viral loads (VL), and ART workup laboratory tests from 2004-2016. Each NHLS laboratory test result is associated with a name, sex, date of birth (DOB), gender, and facility. However, due to typographical and other errors and patient mobility between facilities, different patient specimens may be associated with different sets of identifying information. We developed a graph-based probabilistic record linkage algorithm and used it to construct a unique identifier for all patients with laboratory results in the national HIV program. We used standard probabilistic linkage methods with Jaro-Winkler string comparisons and weights informed by response frequency. We also used graph concepts to guide the linkage in determining whether a cluster of patient specimens could plausibly reflect a single patient. This approach allows matching thresholds to vary with the density of the network and limits over-matching.To train and validate our approach, we constructed a quasi-gold standard based on manual review of 59,000 candidate matches associated with 1000 randomly sampled specimens. These data were divided into training and validation sets. Domain weights and graph parameters were optimized using the manually matched training data.To evaluate performance, we calculated the probability that a true match was correctly identified by our algorithm (sensitivity, Sen) and the probability that a match identified by our algorithm was truly a match (positive predictive value, PPV) in the manually-matched data. We also assessed validity in the full cohort using proxies for under-and over-matching and assessed sensitivity vis-à-vis national identification numbers and patient folder numbers, which were available for a sub-set of records. We compared the performance of our algorithm for exact matching and a prior identifier that had been developed by the NHLS Corporate Data Warehouse.Results. As of December 2016, the NHLS database contained 117 million patient specimens with a CD4, VL, or other laboratory test used in HIV care. These specimens had 63 million unique combinations of patient identifying information. From these data, our matching algorithm identified 11.6 million unique HIV patients who had at least one CD4 count or VL result. These patients 70.9 million total specimens, with a median of 3 specimens per patient (IQR 1 to 8). Sensitivity and PPV of the algorithm were estimated to be 93.7% and 98.6% in manually-matched data, compared to 64.1% and 100.0% for the existing NHLS identifier. We estimated that in 2016 there were 3.35 million patients on ART and virologically monitored, similar to the National Department of Health estimate of 3.50 million.Conclusion. We constructed a South African National HIV Cohort by applying novel graph-based probabilistic record linkage techniques to routinely collected laboratory data, with high sensitivity and positive predictive value. Information on graph structure can guide record linkage in large populations when identifying data are limited..

Medienart:

Preprint

Erscheinungsjahr:

2023

Erschienen:

2023

Enthalten in:

bioRxiv.org - (2023) vom: 06. Sept. Zur Gesamtaufnahme - year:2023

Sprache:

Englisch

Beteiligte Personen:

Bor, Jacob [VerfasserIn]
MacLeod, William [VerfasserIn]
Oleinik, Katia [VerfasserIn]
Potter, James [VerfasserIn]
Brennan, Alana T. [VerfasserIn]
Candy, Sue [VerfasserIn]
Maskew, Mhairi [VerfasserIn]
Fox, Matthew P. [VerfasserIn]
Sanne, Ian [VerfasserIn]
Stevens, Wendy S. [VerfasserIn]
Carmona, Sergio [VerfasserIn]

Links:

Volltext [kostenfrei]

Themen:

570
Biology

doi:

10.1101/450304

funding:

Förderinstitution / Projekttitel:

PPN (Katalog-ID):

XBI000376906