Details der Publikation - A Deep Learning Approach for Transgender and Gender Diverse Patient Identification in Electronic Health Records

A Deep Learning Approach for Transgender and Gender Diverse Patient Identification in Electronic Health Records

ABSTRACT <jats:sec id="s1">Background Although accurate identification of gender identity in the electronic health record (EHR) is crucial for providing equitable health care, particularly for transgender and gender diverse (TGD) populations, it remains a challenging task due to incomplete gender information in structured EHR fields.<jats:sec id="s2">Objective To develop a deep learning classifier to accurately identify patient gender identity using patient-level EHR data, including free-text notes.<jats:sec id="s3">Methods This study included adult patients in a large healthcare system in Boston, MA, between 4/1/2017 to 4/1/2022. To identify relevant information from massive clinical notes and to denoise, we compiled a list of gender-related keywords through expert curation, literature review, and expansion via a fine-tuned BioWordVec model. This keyword list was used to pre-screen potential TGD individuals and create two datasets for model training, testing, and validation. Dataset I was a balanced dataset that contained clinician-confirmed TGD patients and cases without keywords. Dataset II contained cases with keywords. The performance of the deep learning model was compared to traditional machine learning and rule-based algorithms.<jats:sec id="s4">Results The final keyword list consists of 109 keywords, of which 58 (53.2%) were expanded by the BioWordVec model. Dataset I contained 3,150 patients (50% TGD) while Dataset II contained 200 patients (90% TGD). On Dataset I the deep learning model achieved a F1 score of 0.917, sensitivity of 0.854, and a precision of 0.980; and on Dataset II a F1 score of 0.969, sensitivity of 0.967, and precision of 0.972. The deep learning model significantly outperformed rule-based algorithms.<jats:sec id="s5">Conclusion This is the first study to show that deep learning algorithms can accurately identify gender identity using EHR data. Future work should leverage and evaluate additional diverse data sources to generate more generalizable algorithms.<jats:sec id="s6">Graphical abstract <jats:fig id="ufig1" position="float" orientation="portrait" fig-type="figure"><jats:graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="23290988v1_unfig1" position="float" orientation="portrait" /></jats:fig>.

Medienart:	Preprint

Erscheinungsjahr:	2024
Erschienen:	2024

Enthalten in:	bioRxiv.org - (2024) vom: 23. Apr. Zur Gesamtaufnahme - year:2024

Sprache:	Englisch

Beteiligte Personen:	Hua, Yining [VerfasserIn] Wang, Liqin [VerfasserIn] Nguyen, Vi [VerfasserIn] Rieu-Werden, Meghan [VerfasserIn] McDowell, Alex [VerfasserIn] Bates, David W. [VerfasserIn] Foer, Dinah [VerfasserIn] Zhou, Li [VerfasserIn]

Links:	Volltext [lizenzpflichtig] Volltext [kostenfrei]

Themen:	570 Biology

doi:	10.1101/2023.06.07.23290988

funding:
Förderinstitution / Projekttitel:

PPN (Katalog-ID):	XBI039856771

Internformat


LEADER	01000caa a22002652 4500
001	XBI039856771
003	DE-627
005	20240425104530.0
007	cr uuu---uuuuu
008	230611s2024 xx \|\|\|\|\|o 00\| \|\|eng c
024	7		\|a 10.1101/2023.06.07.23290988 \|2 doi
035			\|a (DE-627)XBI039856771
035			\|a (biorXiv)10.1101/2023.06.07.23290988
040			\|a DE-627 \|b ger \|c DE-627 \|e rakwb
041			\|a eng
100	1		\|a Hua, Yining \|e verfasserin \|0 (orcid)0000-0001-7779-1208 \|4 aut
245	1	0	\|a A Deep Learning Approach for Transgender and Gender Diverse Patient Identification in Electronic Health Records
264		1	\|c 2024
336			\|a Text \|b txt \|2 rdacontent
337			\|a Computermedien \|b c \|2 rdamedia
338			\|a Online-Ressource \|b cr \|2 rdacarrier
520			\|a ABSTRACT <jats:sec id="s1">Background Although accurate identification of gender identity in the electronic health record (EHR) is crucial for providing equitable health care, particularly for transgender and gender diverse (TGD) populations, it remains a challenging task due to incomplete gender information in structured EHR fields.<jats:sec id="s2">Objective To develop a deep learning classifier to accurately identify patient gender identity using patient-level EHR data, including free-text notes.<jats:sec id="s3">Methods This study included adult patients in a large healthcare system in Boston, MA, between 4/1/2017 to 4/1/2022. To identify relevant information from massive clinical notes and to denoise, we compiled a list of gender-related keywords through expert curation, literature review, and expansion via a fine-tuned BioWordVec model. This keyword list was used to pre-screen potential TGD individuals and create two datasets for model training, testing, and validation. Dataset I was a balanced dataset that contained clinician-confirmed TGD patients and cases without keywords. Dataset II contained cases with keywords. The performance of the deep learning model was compared to traditional machine learning and rule-based algorithms.<jats:sec id="s4">Results The final keyword list consists of 109 keywords, of which 58 (53.2%) were expanded by the BioWordVec model. Dataset I contained 3,150 patients (50% TGD) while Dataset II contained 200 patients (90% TGD). On Dataset I the deep learning model achieved a F1 score of 0.917, sensitivity of 0.854, and a precision of 0.980; and on Dataset II a F1 score of 0.969, sensitivity of 0.967, and precision of 0.972. The deep learning model significantly outperformed rule-based algorithms.<jats:sec id="s5">Conclusion This is the first study to show that deep learning algorithms can accurately identify gender identity using EHR data. Future work should leverage and evaluate additional diverse data sources to generate more generalizable algorithms.<jats:sec id="s6">Graphical abstract <jats:fig id="ufig1" position="float" orientation="portrait" fig-type="figure"><jats:graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="23290988v1_unfig1" position="float" orientation="portrait" /></jats:fig>
650		4	\|a Biology \|7 (dpeaa)DE-84
650		4	\|a 570 \|7 (dpeaa)DE-84
700	1		\|a Wang, Liqin \|e verfasserin \|4 aut
700	1		\|a Nguyen, Vi \|e verfasserin \|4 aut
700	1		\|a Rieu-Werden, Meghan \|e verfasserin \|4 aut
700	1		\|a McDowell, Alex \|e verfasserin \|4 aut
700	1		\|a Bates, David W. \|e verfasserin \|4 aut
700	1		\|a Foer, Dinah \|e verfasserin \|4 aut
700	1		\|a Zhou, Li \|e verfasserin \|4 aut
773	0	8	\|i Enthalten in \|t bioRxiv.org \|g (2024) vom: 23. Apr.
773	1	8	\|g year:2024 \|g day:23 \|g month:04
856	4	0	\|u https://doi.org/10.1016/j.jbi.2023.104507 \|x 0 \|z lizenzpflichtig \|3 Volltext
856	4	0	\|u http://dx.doi.org/10.1101/2023.06.07.23290988 \|x 0 \|z kostenfrei \|3 Volltext
912			\|a GBV_XBI
951			\|a AR
952			\|j 2024 \|b 23 \|c 04

A Deep Learning Approach for Transgender and Gender Diverse Patient Identification in Electronic Health Records

Zugang & Verfügbarkeit

Zugehörige Publikationen/Bände