Alzheimer-type dementia prediction by sparse logistic regression using claim data
Copyright © 2020. Published by Elsevier B.V..
This study aimed to predict the risk of Alzheimer-type dementia for persons aged over 75 years old without receiving long-term care services using regularly collected claim data. A refined dataset including 48,123 persons was prepared from claim data of health insurance and long-term care insurance in a large city in the metropolitan area in Japan. The utilized features include the age and sex of subjects, 502 diseases based on ICD-10 diagnosis codes, and 107 prescription drugs based on therapeutic classes. The most important challenge in this work was feature selection form a large number of features. We adopted sparse logistic regression models with L0 regularization (SLR-L0) and L1 regularization (SLR-L1) as classification models based on machine learning. These regularizations enable feature selection by estimating sparse solution of non-zero coefficients in the model optimization. Predictions were performed by integrating 100 predictors trained by bootstrap samples. As a result, the area under the ROC curves (AUCs) were 0.663 for SLR-L0 and 0.660 for SLR-L1. These performances were similar, however, the average numbers of selected features were 13 out of a total of 611 for SLR-L0 and 253 for SLR-R1. The results indicate that SLR-L1 tended to include less useful features, whereas SLR-L0 narrowed down influential features. SLR-L0 might be more useful than SLR-L1 for practical use or the discussion of risk factors with medical experts.
Medienart: |
E-Artikel |
---|
Erscheinungsjahr: |
2020 |
---|---|
Erschienen: |
2020 |
Enthalten in: |
Zur Gesamtaufnahme - volume:196 |
---|---|
Enthalten in: |
Computer methods and programs in biomedicine - 196(2020) vom: 01. Nov., Seite 105582 |
Sprache: |
Englisch |
---|
Beteiligte Personen: |
Fukunishi, Hiroaki [VerfasserIn] |
---|
Links: |
---|
Themen: |
Alzheimer-type dementia |
---|
Anmerkungen: |
Date Completed 14.05.2021 Date Revised 14.05.2021 published: Print-Electronic Citation Status MEDLINE |
---|
doi: |
10.1016/j.cmpb.2020.105582 |
---|
funding: |
|
---|---|
Förderinstitution / Projekttitel: |
|
PPN (Katalog-ID): |
NLM312785895 |
---|
LEADER | 01000naa a22002652 4500 | ||
---|---|---|---|
001 | NLM312785895 | ||
003 | DE-627 | ||
005 | 20231225145148.0 | ||
007 | cr uuu---uuuuu | ||
008 | 231225s2020 xx |||||o 00| ||eng c | ||
024 | 7 | |a 10.1016/j.cmpb.2020.105582 |2 doi | |
028 | 5 | 2 | |a pubmed24n1042.xml |
035 | |a (DE-627)NLM312785895 | ||
035 | |a (NLM)32702573 | ||
035 | |a (PII)S0169-2607(20)31415-2 | ||
040 | |a DE-627 |b ger |c DE-627 |e rakwb | ||
041 | |a eng | ||
100 | 1 | |a Fukunishi, Hiroaki |e verfasserin |4 aut | |
245 | 1 | 0 | |a Alzheimer-type dementia prediction by sparse logistic regression using claim data |
264 | 1 | |c 2020 | |
336 | |a Text |b txt |2 rdacontent | ||
337 | |a ƒaComputermedien |b c |2 rdamedia | ||
338 | |a ƒa Online-Ressource |b cr |2 rdacarrier | ||
500 | |a Date Completed 14.05.2021 | ||
500 | |a Date Revised 14.05.2021 | ||
500 | |a published: Print-Electronic | ||
500 | |a Citation Status MEDLINE | ||
520 | |a Copyright © 2020. Published by Elsevier B.V. | ||
520 | |a This study aimed to predict the risk of Alzheimer-type dementia for persons aged over 75 years old without receiving long-term care services using regularly collected claim data. A refined dataset including 48,123 persons was prepared from claim data of health insurance and long-term care insurance in a large city in the metropolitan area in Japan. The utilized features include the age and sex of subjects, 502 diseases based on ICD-10 diagnosis codes, and 107 prescription drugs based on therapeutic classes. The most important challenge in this work was feature selection form a large number of features. We adopted sparse logistic regression models with L0 regularization (SLR-L0) and L1 regularization (SLR-L1) as classification models based on machine learning. These regularizations enable feature selection by estimating sparse solution of non-zero coefficients in the model optimization. Predictions were performed by integrating 100 predictors trained by bootstrap samples. As a result, the area under the ROC curves (AUCs) were 0.663 for SLR-L0 and 0.660 for SLR-L1. These performances were similar, however, the average numbers of selected features were 13 out of a total of 611 for SLR-L0 and 253 for SLR-R1. The results indicate that SLR-L1 tended to include less useful features, whereas SLR-L0 narrowed down influential features. SLR-L0 might be more useful than SLR-L1 for practical use or the discussion of risk factors with medical experts | ||
650 | 4 | |a Journal Article | |
650 | 4 | |a Alzheimer-type dementia | |
650 | 4 | |a Health insurance claim data | |
650 | 4 | |a Long-term care insurance claim data | |
650 | 4 | |a Machine learning | |
650 | 4 | |a Prediction | |
650 | 4 | |a Sparse logistic regression | |
700 | 1 | |a Nishiyama, Mitsuki |e verfasserin |4 aut | |
700 | 1 | |a Luo, Yuan |e verfasserin |4 aut | |
700 | 1 | |a Kubo, Masahiro |e verfasserin |4 aut | |
700 | 1 | |a Kobayashi, Yasuki |e verfasserin |4 aut | |
773 | 0 | 8 | |i Enthalten in |t Computer methods and programs in biomedicine |d 1993 |g 196(2020) vom: 01. Nov., Seite 105582 |w (DE-627)NLM012836133 |x 1872-7565 |7 nnns |
773 | 1 | 8 | |g volume:196 |g year:2020 |g day:01 |g month:11 |g pages:105582 |
856 | 4 | 0 | |u http://dx.doi.org/10.1016/j.cmpb.2020.105582 |3 Volltext |
912 | |a GBV_USEFLAG_A | ||
912 | |a GBV_NLM | ||
951 | |a AR | ||
952 | |d 196 |j 2020 |b 01 |c 11 |h 105582 |