Predicting high health-cost users among people with cardiovascular disease using machine learning and nationwide linked social administrative datasets / Nhung Nghiem, June Atkinson, Binh P. Nguyen, An Tran-Duy and Nick Wilson
Objectives To optimise planning of public health services, the impact of high-cost users needs to be considered. However, most of the existing statistical models for costs do not include many clinical and social variables from administrative data that are associated with elevated health care resource use, and are increasingly available. This study aimed to use machine learning approaches and big data to predict high-cost users among people with cardio‑ vascular disease (CVD). Methods We used nationally representative linked datasets in New Zealand to predict CVD prevalent cases with the most expensive cost belonging to the top quintiles by cost. We compared the performance of four popular machine learning models (L1-regularised logistic regression, classifcation trees, k-nearest neighbourhood (KNN) and random forest) with the traditional regression models. Results The machine learning models had far better accuracy in predicting high health-cost users compared with the logistic models. The harmony score F1 (combining sensitivity and positive predictive value) of the machine learn‑ ing models ranged from 30.6% to 41.2% (compared with 8.6-9.1% for the logistic models). Previous health costs, income, age, chronic health conditions, deprivation, and receiving a social security beneft were among the most important predictors of the CVD high-cost users. Conclusions This study provides additional evidence that machine learning can be used as a tool together with big data in health economics for identifcation of new risk factors and prediction of high-cost users with CVD. As such, machine learning may potentially assist with health services planning and preventive measures to improve popula‑ tion health while potentially saving healthcare costs..
Medienart: |
E-Artikel |
---|
Erscheinungsjahr: |
2023 |
---|---|
Erschienen: |
2023 |
Enthalten in: |
Zur Gesamtaufnahme - volume:13 |
---|---|
Enthalten in: |
Health economics review - 13(2023), 1 vom: Dez., Artikel-ID 9, Seite 1-13 |
Sprache: |
Englisch |
---|
Beteiligte Personen: |
Nhung Nghiem [VerfasserIn] |
---|
Links: |
healtheconomicsreview.biomedcentral.com [kostenfrei] |
---|
Themen: |
CVD cost prediction |
---|
doi: |
10.1186/s13561-023-00422-1 |
---|
funding: |
|
---|---|
Förderinstitution / Projekttitel: |
|
PPN (Katalog-ID): |
1884321909 |
---|
LEADER | 01000naa a2200265 4500 | ||
---|---|---|---|
001 | 1884321909 | ||
003 | DE-627 | ||
005 | 20240326075339.0 | ||
007 | cr uuu---uuuuu | ||
008 | 240326s2023 xx |||||o 00| ||eng c | ||
024 | 7 | |a 10.1186/s13561-023-00422-1 |2 doi | |
035 | |a (DE-627)1884321909 | ||
035 | |a (DE-599)KXP1884321909 | ||
040 | |a DE-627 |b ger |c DE-627 |e rda | ||
041 | |a eng | ||
084 | |a C55 |a I15 |a N37 |2 jelc | ||
100 | 0 | |a Nhung Nghiem |e verfasserin |4 aut | |
245 | 1 | 0 | |a Predicting high health-cost users among people with cardiovascular disease using machine learning and nationwide linked social administrative datasets |c Nhung Nghiem, June Atkinson, Binh P. Nguyen, An Tran-Duy and Nick Wilson |
264 | 1 | |c 2023 | |
336 | |a Text |b txt |2 rdacontent | ||
337 | |a Computermedien |b c |2 rdamedia | ||
338 | |a Online-Ressource |b cr |2 rdacarrier | ||
506 | 0 | |q DE-206 |a Open Access |e Controlled Vocabulary for Access Rights |u http://purl.org/coar/access_right/c_abf2 | |
520 | |a Objectives To optimise planning of public health services, the impact of high-cost users needs to be considered. However, most of the existing statistical models for costs do not include many clinical and social variables from administrative data that are associated with elevated health care resource use, and are increasingly available. This study aimed to use machine learning approaches and big data to predict high-cost users among people with cardio‑ vascular disease (CVD). Methods We used nationally representative linked datasets in New Zealand to predict CVD prevalent cases with the most expensive cost belonging to the top quintiles by cost. We compared the performance of four popular machine learning models (L1-regularised logistic regression, classifcation trees, k-nearest neighbourhood (KNN) and random forest) with the traditional regression models. Results The machine learning models had far better accuracy in predicting high health-cost users compared with the logistic models. The harmony score F1 (combining sensitivity and positive predictive value) of the machine learn‑ ing models ranged from 30.6% to 41.2% (compared with 8.6-9.1% for the logistic models). Previous health costs, income, age, chronic health conditions, deprivation, and receiving a social security beneft were among the most important predictors of the CVD high-cost users. Conclusions This study provides additional evidence that machine learning can be used as a tool together with big data in health economics for identifcation of new risk factors and prediction of high-cost users with CVD. As such, machine learning may potentially assist with health services planning and preventive measures to improve popula‑ tion health while potentially saving healthcare costs. | ||
540 | |q DE-206 |a Namensnennung 4.0 International |f CC BY 4.0 |2 cc |u https://creativecommons.org/licenses/by/4.0/ | ||
650 | 4 | |a Machine learning |7 (dpeaa)DE-206 | |
650 | 4 | |a High-cost users |7 (dpeaa)DE-206 | |
650 | 4 | |a CVD cost prediction |7 (dpeaa)DE-206 | |
650 | 4 | |a Health and social administrative data |7 (dpeaa)DE-206 | |
650 | 4 | |a New Zealand |7 (dpeaa)DE-206 | |
700 | 1 | |a Atkinson, June |e verfasserin |4 aut | |
700 | 0 | |a Binh P. Nguyen |e verfasserin |4 aut | |
700 | 0 | |a An Tran-Duy |e verfasserin |4 aut | |
700 | 1 | |a Wilson, Nick |e verfasserin |4 aut | |
773 | 0 | 8 | |i Enthalten in |t Health economics review |d Heidelberg : Springer, 2011 |g 13(2023), 1 vom: Dez., Artikel-ID 9, Seite 1-13 |h Online-Ressource |w (DE-627)670605115 |w (DE-600)2634483-X |w (DE-576)354003127 |x 2191-1991 |7 nnns |
773 | 1 | 8 | |g volume:13 |g year:2023 |g number:1 |g month:12 |g elocationid:9 |g pages:1-13 |
856 | 4 | 0 | |u https://healtheconomicsreview.biomedcentral.com/counter/pdf/10.1186/s13561-023-00422-1.pdf |x Verlag |z kostenfrei |
856 | 4 | 0 | |u https://doi.org/10.1186/s13561-023-00422-1 |x Resolving-System |z kostenfrei |
912 | |a GBV_USEFLAG_U | ||
912 | |a GBV_ILN_26 | ||
912 | |a ISIL_DE-206 | ||
912 | |a SYSFLAG_1 | ||
912 | |a GBV_KXP | ||
912 | |a GBV_ILN_20 | ||
912 | |a GBV_ILN_22 | ||
912 | |a GBV_ILN_23 | ||
912 | |a GBV_ILN_24 | ||
912 | |a GBV_ILN_39 | ||
912 | |a GBV_ILN_40 | ||
912 | |a GBV_ILN_60 | ||
912 | |a GBV_ILN_62 | ||
912 | |a GBV_ILN_63 | ||
912 | |a GBV_ILN_65 | ||
912 | |a GBV_ILN_69 | ||
912 | |a GBV_ILN_70 | ||
912 | |a GBV_ILN_73 | ||
912 | |a GBV_ILN_95 | ||
912 | |a GBV_ILN_105 | ||
912 | |a GBV_ILN_110 | ||
912 | |a GBV_ILN_151 | ||
912 | |a GBV_ILN_161 | ||
912 | |a GBV_ILN_206 | ||
912 | |a GBV_ILN_213 | ||
912 | |a GBV_ILN_230 | ||
912 | |a GBV_ILN_285 | ||
912 | |a GBV_ILN_293 | ||
912 | |a GBV_ILN_370 | ||
912 | |a GBV_ILN_602 | ||
912 | |a GBV_ILN_2009 | ||
912 | |a GBV_ILN_2014 | ||
912 | |a GBV_ILN_2111 | ||
912 | |a GBV_ILN_2129 | ||
912 | |a GBV_ILN_4012 | ||
912 | |a GBV_ILN_4037 | ||
912 | |a GBV_ILN_4046 | ||
912 | |a GBV_ILN_4112 | ||
912 | |a GBV_ILN_4125 | ||
912 | |a GBV_ILN_4126 | ||
912 | |a GBV_ILN_4249 | ||
912 | |a GBV_ILN_4305 | ||
912 | |a GBV_ILN_4306 | ||
912 | |a GBV_ILN_4307 | ||
912 | |a GBV_ILN_4313 | ||
912 | |a GBV_ILN_4322 | ||
912 | |a GBV_ILN_4323 | ||
912 | |a GBV_ILN_4324 | ||
912 | |a GBV_ILN_4325 | ||
912 | |a GBV_ILN_4326 | ||
912 | |a GBV_ILN_4335 | ||
912 | |a GBV_ILN_4338 | ||
912 | |a GBV_ILN_4367 | ||
912 | |a GBV_ILN_4700 | ||
912 | |a GBV_ILN_2403 | ||
912 | |a GBV_ILN_2403 | ||
912 | |a ISIL_DE-LFER | ||
951 | |a AR | ||
952 | |d 13 |j 2023 |e 1 |c 12 |i 9 |h 1-13 | ||
980 | |2 26 |1 01 |x 0206 |b 4503883224 |y x1z |z 26-03-24 | ||
980 | |2 2403 |1 01 |x DE-LFER |b 4511203687 |c 00 |f --%%-- |d --%%-- |e n |j --%%-- |y l01 |z 12-04-24 | ||
981 | |2 2403 |1 01 |x DE-LFER |r https://doi.org/10.1186/s13561-023-00422-1 | ||
981 | |2 2403 |1 01 |x DE-LFER |r https://healtheconomicsreview.biomedcentral.com/counter/pdf/10.1186/s13561-023-00422-1.pdf |