Implementing a hash-based privacy-preserving record linkage tool in the OneFlorida clinical research network
© The Author(s) 2019. Published by Oxford University Press on behalf of the American Medical Informatics Association..
OBJECTIVE: To implement an open-source tool that performs deterministic privacy-preserving record linkage (RL) in a real-world setting within a large research network.
MATERIALS AND METHODS: We learned 2 efficient deterministic linkage rules using publicly available voter registration data. We then validated the 2 rules' performance with 2 manually curated gold-standard datasets linking electronic health records and claims data from 2 sources. We developed an open-source Python-based tool-OneFL Deduper-that (1) creates seeded hash codes of combinations of patients' quasi-identifiers using a cryptographic one-way hash function to achieve privacy protection and (2) links and deduplicates patient records using a central broker through matching of hash codes with a high precision and reasonable recall.
RESULTS: We deployed the OneFl Deduper (https://github.com/ufbmi/onefl-deduper) in the OneFlorida, a state-based clinical research network as part of the national Patient-Centered Clinical Research Network (PCORnet). Using the gold-standard datasets, we achieved a precision of 97.25∼99.7% and a recall of 75.5%. With the tool, we deduplicated ∼3.5 million (out of ∼15 million) records down to 1.7 million unique patients across 6 health care partners and the Florida Medicaid program. We demonstrated the benefits of RL through examining different disease profiles of the linked cohorts.
CONCLUSIONS: Many factors including privacy risk considerations, policies and regulations, data availability and quality, and computing resources, can impact how a RL solution is constructed in a real-world setting. Nevertheless, RL is a significant task in improving the data quality in a network so that we can draw reliable scientific discoveries from these massive data resources.
Medienart: |
E-Artikel |
---|
Erscheinungsjahr: |
2019 |
---|---|
Erschienen: |
2019 |
Enthalten in: |
Zur Gesamtaufnahme - volume:2 |
---|---|
Enthalten in: |
JAMIA open - 2(2019), 4 vom: 31. Dez., Seite 562-569 |
Sprache: |
Englisch |
---|
Beteiligte Personen: |
Bian, Jiang [VerfasserIn] |
---|
Links: |
---|
Themen: |
Clinical research network |
---|
Anmerkungen: |
Date Revised 12.04.2022 published: Electronic-eCollection Citation Status PubMed-not-MEDLINE |
---|
doi: |
10.1093/jamiaopen/ooz050 |
---|
funding: |
|
---|---|
Förderinstitution / Projekttitel: |
|
PPN (Katalog-ID): |
NLM306203782 |
---|
LEADER | 01000naa a22002652 4500 | ||
---|---|---|---|
001 | NLM306203782 | ||
003 | DE-627 | ||
005 | 20231225122840.0 | ||
007 | cr uuu---uuuuu | ||
008 | 231225s2019 xx |||||o 00| ||eng c | ||
024 | 7 | |a 10.1093/jamiaopen/ooz050 |2 doi | |
028 | 5 | 2 | |a pubmed24n1020.xml |
035 | |a (DE-627)NLM306203782 | ||
035 | |a (NLM)32025654 | ||
040 | |a DE-627 |b ger |c DE-627 |e rakwb | ||
041 | |a eng | ||
100 | 1 | |a Bian, Jiang |e verfasserin |4 aut | |
245 | 1 | 0 | |a Implementing a hash-based privacy-preserving record linkage tool in the OneFlorida clinical research network |
264 | 1 | |c 2019 | |
336 | |a Text |b txt |2 rdacontent | ||
337 | |a ƒaComputermedien |b c |2 rdamedia | ||
338 | |a ƒa Online-Ressource |b cr |2 rdacarrier | ||
500 | |a Date Revised 12.04.2022 | ||
500 | |a published: Electronic-eCollection | ||
500 | |a Citation Status PubMed-not-MEDLINE | ||
520 | |a © The Author(s) 2019. Published by Oxford University Press on behalf of the American Medical Informatics Association. | ||
520 | |a OBJECTIVE: To implement an open-source tool that performs deterministic privacy-preserving record linkage (RL) in a real-world setting within a large research network | ||
520 | |a MATERIALS AND METHODS: We learned 2 efficient deterministic linkage rules using publicly available voter registration data. We then validated the 2 rules' performance with 2 manually curated gold-standard datasets linking electronic health records and claims data from 2 sources. We developed an open-source Python-based tool-OneFL Deduper-that (1) creates seeded hash codes of combinations of patients' quasi-identifiers using a cryptographic one-way hash function to achieve privacy protection and (2) links and deduplicates patient records using a central broker through matching of hash codes with a high precision and reasonable recall | ||
520 | |a RESULTS: We deployed the OneFl Deduper (https://github.com/ufbmi/onefl-deduper) in the OneFlorida, a state-based clinical research network as part of the national Patient-Centered Clinical Research Network (PCORnet). Using the gold-standard datasets, we achieved a precision of 97.25∼99.7% and a recall of 75.5%. With the tool, we deduplicated ∼3.5 million (out of ∼15 million) records down to 1.7 million unique patients across 6 health care partners and the Florida Medicaid program. We demonstrated the benefits of RL through examining different disease profiles of the linked cohorts | ||
520 | |a CONCLUSIONS: Many factors including privacy risk considerations, policies and regulations, data availability and quality, and computing resources, can impact how a RL solution is constructed in a real-world setting. Nevertheless, RL is a significant task in improving the data quality in a network so that we can draw reliable scientific discoveries from these massive data resources | ||
650 | 4 | |a Journal Article | |
650 | 4 | |a PCORnet | |
650 | 4 | |a clinical research network | |
650 | 4 | |a privacy-preserving record linkage | |
700 | 1 | |a Loiacono, Alexander |e verfasserin |4 aut | |
700 | 1 | |a Sura, Andrei |e verfasserin |4 aut | |
700 | 1 | |a Mendoza Viramontes, Tonatiuh |e verfasserin |4 aut | |
700 | 1 | |a Lipori, Gloria |e verfasserin |4 aut | |
700 | 1 | |a Guo, Yi |e verfasserin |4 aut | |
700 | 1 | |a Shenkman, Elizabeth |e verfasserin |4 aut | |
700 | 1 | |a Hogan, William |e verfasserin |4 aut | |
773 | 0 | 8 | |i Enthalten in |t JAMIA open |d 2018 |g 2(2019), 4 vom: 31. Dez., Seite 562-569 |w (DE-627)NLM290202396 |x 2574-2531 |7 nnns |
773 | 1 | 8 | |g volume:2 |g year:2019 |g number:4 |g day:31 |g month:12 |g pages:562-569 |
856 | 4 | 0 | |u http://dx.doi.org/10.1093/jamiaopen/ooz050 |3 Volltext |
912 | |a GBV_USEFLAG_A | ||
912 | |a GBV_NLM | ||
951 | |a AR | ||
952 | |d 2 |j 2019 |e 4 |b 31 |c 12 |h 562-569 |