Alzheimer-type dementia prediction by sparse logistic regression using claim data

Copyright © 2020. Published by Elsevier B.V..

This study aimed to predict the risk of Alzheimer-type dementia for persons aged over 75 years old without receiving long-term care services using regularly collected claim data. A refined dataset including 48,123 persons was prepared from claim data of health insurance and long-term care insurance in a large city in the metropolitan area in Japan. The utilized features include the age and sex of subjects, 502 diseases based on ICD-10 diagnosis codes, and 107 prescription drugs based on therapeutic classes. The most important challenge in this work was feature selection form a large number of features. We adopted sparse logistic regression models with L0 regularization (SLR-L0) and L1 regularization (SLR-L1) as classification models based on machine learning. These regularizations enable feature selection by estimating sparse solution of non-zero coefficients in the model optimization. Predictions were performed by integrating 100 predictors trained by bootstrap samples. As a result, the area under the ROC curves (AUCs) were 0.663 for SLR-L0 and 0.660 for SLR-L1. These performances were similar, however, the average numbers of selected features were 13 out of a total of 611 for SLR-L0 and 253 for SLR-R1. The results indicate that SLR-L1 tended to include less useful features, whereas SLR-L0 narrowed down influential features. SLR-L0 might be more useful than SLR-L1 for practical use or the discussion of risk factors with medical experts.

Medienart:

E-Artikel

Erscheinungsjahr:

2020

Erschienen:

2020

Enthalten in:

Zur Gesamtaufnahme - volume:196

Enthalten in:

Computer methods and programs in biomedicine - 196(2020) vom: 01. Nov., Seite 105582

Sprache:

Englisch

Beteiligte Personen:

Fukunishi, Hiroaki [VerfasserIn]
Nishiyama, Mitsuki [VerfasserIn]
Luo, Yuan [VerfasserIn]
Kubo, Masahiro [VerfasserIn]
Kobayashi, Yasuki [VerfasserIn]

Links:

Volltext

Themen:

Alzheimer-type dementia
Health insurance claim data
Journal Article
Long-term care insurance claim data
Machine learning
Prediction
Sparse logistic regression

Anmerkungen:

Date Completed 14.05.2021

Date Revised 14.05.2021

published: Print-Electronic

Citation Status MEDLINE

doi:

10.1016/j.cmpb.2020.105582

funding:

Förderinstitution / Projekttitel:

PPN (Katalog-ID):

NLM312785895