Democratizing EHR analyses with FIDDLE : a flexible data-driven preprocessing pipeline for structured clinical data
© The Author(s) 2020. Published by Oxford University Press on behalf of the American Medical Informatics Association..
OBJECTIVE: In applying machine learning (ML) to electronic health record (EHR) data, many decisions must be made before any ML is applied; such preprocessing requires substantial effort and can be labor-intensive. As the role of ML in health care grows, there is an increasing need for systematic and reproducible preprocessing techniques for EHR data. Thus, we developed FIDDLE (Flexible Data-Driven Pipeline), an open-source framework that streamlines the preprocessing of data extracted from the EHR.
MATERIALS AND METHODS: Largely data-driven, FIDDLE systematically transforms structured EHR data into feature vectors, limiting the number of decisions a user must make while incorporating good practices from the literature. To demonstrate its utility and flexibility, we conducted a proof-of-concept experiment in which we applied FIDDLE to 2 publicly available EHR data sets collected from intensive care units: MIMIC-III and the eICU Collaborative Research Database. We trained different ML models to predict 3 clinically important outcomes: in-hospital mortality, acute respiratory failure, and shock. We evaluated models using the area under the receiver operating characteristics curve (AUROC), and compared it to several baselines.
RESULTS: Across tasks, FIDDLE extracted 2,528 to 7,403 features from MIMIC-III and eICU, respectively. On all tasks, FIDDLE-based models achieved good discriminative performance, with AUROCs of 0.757-0.886, comparable to the performance of MIMIC-Extract, a preprocessing pipeline designed specifically for MIMIC-III. Furthermore, our results showed that FIDDLE is generalizable across different prediction times, ML algorithms, and data sets, while being relatively robust to different settings of user-defined arguments.
CONCLUSIONS: FIDDLE, an open-source preprocessing pipeline, facilitates applying ML to structured EHR data. By accelerating and standardizing labor-intensive preprocessing, FIDDLE can help stimulate progress in building clinically useful ML tools for EHR data.
Medienart: |
E-Artikel |
---|
Erscheinungsjahr: |
2020 |
---|---|
Erschienen: |
2020 |
Enthalten in: |
Zur Gesamtaufnahme - volume:27 |
---|---|
Enthalten in: |
Journal of the American Medical Informatics Association : JAMIA - 27(2020), 12 vom: 09. Dez., Seite 1921-1934 |
Sprache: |
Englisch |
---|
Beteiligte Personen: |
Tang, Shengpu [VerfasserIn] |
---|
Links: |
---|
Themen: |
Electronic health records |
---|
Anmerkungen: |
Date Completed 15.04.2021 Date Revised 10.11.2023 published: Print Citation Status MEDLINE |
---|
doi: |
10.1093/jamia/ocaa139 |
---|
funding: |
|
---|---|
Förderinstitution / Projekttitel: |
|
PPN (Katalog-ID): |
NLM316100455 |
---|
LEADER | 01000naa a22002652 4500 | ||
---|---|---|---|
001 | NLM316100455 | ||
003 | DE-627 | ||
005 | 20231225160342.0 | ||
007 | cr uuu---uuuuu | ||
008 | 231225s2020 xx |||||o 00| ||eng c | ||
024 | 7 | |a 10.1093/jamia/ocaa139 |2 doi | |
028 | 5 | 2 | |a pubmed24n1053.xml |
035 | |a (DE-627)NLM316100455 | ||
035 | |a (NLM)33040151 | ||
040 | |a DE-627 |b ger |c DE-627 |e rakwb | ||
041 | |a eng | ||
100 | 1 | |a Tang, Shengpu |e verfasserin |4 aut | |
245 | 1 | 0 | |a Democratizing EHR analyses with FIDDLE |b a flexible data-driven preprocessing pipeline for structured clinical data |
264 | 1 | |c 2020 | |
336 | |a Text |b txt |2 rdacontent | ||
337 | |a ƒaComputermedien |b c |2 rdamedia | ||
338 | |a ƒa Online-Ressource |b cr |2 rdacarrier | ||
500 | |a Date Completed 15.04.2021 | ||
500 | |a Date Revised 10.11.2023 | ||
500 | |a published: Print | ||
500 | |a Citation Status MEDLINE | ||
520 | |a © The Author(s) 2020. Published by Oxford University Press on behalf of the American Medical Informatics Association. | ||
520 | |a OBJECTIVE: In applying machine learning (ML) to electronic health record (EHR) data, many decisions must be made before any ML is applied; such preprocessing requires substantial effort and can be labor-intensive. As the role of ML in health care grows, there is an increasing need for systematic and reproducible preprocessing techniques for EHR data. Thus, we developed FIDDLE (Flexible Data-Driven Pipeline), an open-source framework that streamlines the preprocessing of data extracted from the EHR | ||
520 | |a MATERIALS AND METHODS: Largely data-driven, FIDDLE systematically transforms structured EHR data into feature vectors, limiting the number of decisions a user must make while incorporating good practices from the literature. To demonstrate its utility and flexibility, we conducted a proof-of-concept experiment in which we applied FIDDLE to 2 publicly available EHR data sets collected from intensive care units: MIMIC-III and the eICU Collaborative Research Database. We trained different ML models to predict 3 clinically important outcomes: in-hospital mortality, acute respiratory failure, and shock. We evaluated models using the area under the receiver operating characteristics curve (AUROC), and compared it to several baselines | ||
520 | |a RESULTS: Across tasks, FIDDLE extracted 2,528 to 7,403 features from MIMIC-III and eICU, respectively. On all tasks, FIDDLE-based models achieved good discriminative performance, with AUROCs of 0.757-0.886, comparable to the performance of MIMIC-Extract, a preprocessing pipeline designed specifically for MIMIC-III. Furthermore, our results showed that FIDDLE is generalizable across different prediction times, ML algorithms, and data sets, while being relatively robust to different settings of user-defined arguments | ||
520 | |a CONCLUSIONS: FIDDLE, an open-source preprocessing pipeline, facilitates applying ML to structured EHR data. By accelerating and standardizing labor-intensive preprocessing, FIDDLE can help stimulate progress in building clinically useful ML tools for EHR data | ||
650 | 4 | |a Journal Article | |
650 | 4 | |a Research Support, N.I.H., Extramural | |
650 | 4 | |a Research Support, Non-U.S. Gov't | |
650 | 4 | |a Research Support, U.S. Gov't, Non-P.H.S. | |
650 | 4 | |a electronic health records | |
650 | 4 | |a machine learning | |
650 | 4 | |a preprocessing pipeline | |
700 | 1 | |a Davarmanesh, Parmida |e verfasserin |4 aut | |
700 | 1 | |a Song, Yanmeng |e verfasserin |4 aut | |
700 | 1 | |a Koutra, Danai |e verfasserin |4 aut | |
700 | 1 | |a Sjoding, Michael W |e verfasserin |4 aut | |
700 | 1 | |a Wiens, Jenna |e verfasserin |4 aut | |
773 | 0 | 8 | |i Enthalten in |t Journal of the American Medical Informatics Association : JAMIA |d 1997 |g 27(2020), 12 vom: 09. Dez., Seite 1921-1934 |w (DE-627)NLM074735535 |x 1527-974X |7 nnns |
773 | 1 | 8 | |g volume:27 |g year:2020 |g number:12 |g day:09 |g month:12 |g pages:1921-1934 |
856 | 4 | 0 | |u http://dx.doi.org/10.1093/jamia/ocaa139 |3 Volltext |
912 | |a GBV_USEFLAG_A | ||
912 | |a GBV_NLM | ||
951 | |a AR | ||
952 | |d 27 |j 2020 |e 12 |b 09 |c 12 |h 1921-1934 |