Details der Publikation - Comparison of predicting cardiovascular disease hospitalization using individual, ZIP code-derived, and machine learning model-predicted educational attainment in New York City

Comparison of predicting cardiovascular disease hospitalization using individual, ZIP code-derived, and machine learning model-predicted educational attainment in New York City

Copyright: © 2024 Takkavatakarn et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited..

BACKGROUND: Area-level social determinants of health (SDOH) based on patients' ZIP codes or census tracts have been commonly used in research instead of individual SDOHs. To our knowledge, whether machine learning (ML) could be used to derive individual SDOH measures, specifically individual educational attainment, is unknown.

METHODS: This is a retrospective study using data from the Mount Sinai BioMe Biobank. We included participants that completed a validated questionnaire on educational attainment and had home addresses in New York City. ZIP code-level education was derived from the American Community Survey matched for the participant's gender and race/ethnicity. We tested several algorithms to predict individual educational attainment from routinely collected clinical and demographic data. To evaluate how using different measures of educational attainment will impact model performance, we developed three distinct models for predicting cardiovascular (CVD) hospitalization. Educational attainment was imputed into models as either survey-derived, ZIP code-derived, or ML-predicted educational attainment.

RESULTS: A total of 20,805 participants met inclusion criteria. Concordance between survey and ZIP code-derived education was 47%, while the concordance between survey and ML model-predicted education was 67%. A total of 13,715 patients from the cohort were included into our CVD hospitalization prediction models, of which 1,538 (11.2%) had a history of CVD hospitalization. The AUROC of the model predicting CVD hospitalization using survey-derived education was significantly higher than the model using ZIP code-level education (0.77 versus 0.72; p < 0.001) and the model using ML model-predicted education (0.77 versus 0.75; p < 0.001). The AUROC for the model using ML model-predicted education was also significantly higher than that using ZIP code-level education (p = 0.003).

CONCLUSION: The concordance of survey and ZIP code-level educational attainment in NYC was low. As expected, the model utilizing survey-derived education achieved the highest performance. The model incorporating our ML model-predicted education outperformed the model relying on ZIP code-derived education. Implementing ML techniques can improve the accuracy of SDOH data and consequently increase the predictive performance of outcome models.

Medienart:	E-Artikel

Erscheinungsjahr:	2024
Erschienen:	2024

Enthalten in:	Zur Gesamtaufnahme - volume:19
Enthalten in:	PloS one - 19(2024), 2 vom: 08., Seite e0297919

Sprache:	Englisch

Beteiligte Personen:	Takkavatakarn, Kullaya [VerfasserIn] Dai, Yang [VerfasserIn] Hsun Wen, Huei [VerfasserIn] Kauffman, Justin [VerfasserIn] Charney, Alexander [VerfasserIn] Coca, Steven G [VerfasserIn] Nadkarni, Girish N [VerfasserIn] Chan, Lili [VerfasserIn]

Links:	Volltext

Themen:	Journal Article

Anmerkungen:	Date Completed 14.02.2024 Date Revised 14.02.2024 published: Electronic-eCollection Citation Status MEDLINE

doi:	10.1371/journal.pone.0297919

funding:
Förderinstitution / Projekttitel:

PPN (Katalog-ID):	NLM368194698

Internformat


LEADER	01000caa a22002652 4500
001	NLM368194698
003	DE-627
005	20240214233242.0
007	cr uuu---uuuuu
008	240209s2024 xx \|\|\|\|\|o 00\| \|\|eng c
024	7		\|a 10.1371/journal.pone.0297919 \|2 doi
028	5	2	\|a pubmed24n1293.xml
035			\|a (DE-627)NLM368194698
035			\|a (NLM)38329973
040			\|a DE-627 \|b ger \|c DE-627 \|e rakwb
041			\|a eng
100	1		\|a Takkavatakarn, Kullaya \|e verfasserin \|4 aut
245	1	0	\|a Comparison of predicting cardiovascular disease hospitalization using individual, ZIP code-derived, and machine learning model-predicted educational attainment in New York City
264		1	\|c 2024
336			\|a Text \|b txt \|2 rdacontent
337			\|a ƒaComputermedien \|b c \|2 rdamedia
338			\|a ƒa Online-Ressource \|b cr \|2 rdacarrier
500			\|a Date Completed 14.02.2024
500			\|a Date Revised 14.02.2024
500			\|a published: Electronic-eCollection
500			\|a Citation Status MEDLINE
520			\|a Copyright: © 2024 Takkavatakarn et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
520			\|a BACKGROUND: Area-level social determinants of health (SDOH) based on patients' ZIP codes or census tracts have been commonly used in research instead of individual SDOHs. To our knowledge, whether machine learning (ML) could be used to derive individual SDOH measures, specifically individual educational attainment, is unknown
520			\|a METHODS: This is a retrospective study using data from the Mount Sinai BioMe Biobank. We included participants that completed a validated questionnaire on educational attainment and had home addresses in New York City. ZIP code-level education was derived from the American Community Survey matched for the participant's gender and race/ethnicity. We tested several algorithms to predict individual educational attainment from routinely collected clinical and demographic data. To evaluate how using different measures of educational attainment will impact model performance, we developed three distinct models for predicting cardiovascular (CVD) hospitalization. Educational attainment was imputed into models as either survey-derived, ZIP code-derived, or ML-predicted educational attainment
520			\|a RESULTS: A total of 20,805 participants met inclusion criteria. Concordance between survey and ZIP code-derived education was 47%, while the concordance between survey and ML model-predicted education was 67%. A total of 13,715 patients from the cohort were included into our CVD hospitalization prediction models, of which 1,538 (11.2%) had a history of CVD hospitalization. The AUROC of the model predicting CVD hospitalization using survey-derived education was significantly higher than the model using ZIP code-level education (0.77 versus 0.72; p < 0.001) and the model using ML model-predicted education (0.77 versus 0.75; p < 0.001). The AUROC for the model using ML model-predicted education was also significantly higher than that using ZIP code-level education (p = 0.003)
520			\|a CONCLUSION: The concordance of survey and ZIP code-level educational attainment in NYC was low. As expected, the model utilizing survey-derived education achieved the highest performance. The model incorporating our ML model-predicted education outperformed the model relying on ZIP code-derived education. Implementing ML techniques can improve the accuracy of SDOH data and consequently increase the predictive performance of outcome models
650		4	\|a Journal Article
700	1		\|a Dai, Yang \|e verfasserin \|4 aut
700	1		\|a Hsun Wen, Huei \|e verfasserin \|4 aut
700	1		\|a Kauffman, Justin \|e verfasserin \|4 aut
700	1		\|a Charney, Alexander \|e verfasserin \|4 aut
700	1		\|a Coca, Steven G \|e verfasserin \|4 aut
700	1		\|a Nadkarni, Girish N \|e verfasserin \|4 aut
700	1		\|a Chan, Lili \|e verfasserin \|4 aut
773	0	8	\|i Enthalten in \|t PloS one \|d 2006 \|g 19(2024), 2 vom: 08., Seite e0297919 \|w (DE-627)NLM167327399 \|x 1932-6203 \|7 nnns
773	1	8	\|g volume:19 \|g year:2024 \|g number:2 \|g day:08 \|g pages:e0297919
856	4	0	\|u http://dx.doi.org/10.1371/journal.pone.0297919 \|3 Volltext
912			\|a GBV_USEFLAG_A
912			\|a GBV_NLM
951			\|a AR
952			\|d 19 \|j 2024 \|e 2 \|b 08 \|h e0297919

Comparison of predicting cardiovascular disease hospitalization using individual, ZIP code-derived, and machine learning model-predicted educational attainment in New York City

Zugang & Verfügbarkeit

Zugehörige Publikationen/Bände