Machine Learning Prediction of Autism Spectrum Disorder Through Linking Mothers’ and Children’s Electronic Health Record Data

Abstract Autism spectrum disorder (ASD) is a neurodevelopmental disorder typically diagnosed in children. Early detection of ASD, particularly in girls who are often diagnosed late, can aid long-term development for children. We aimed to develop machine learning models for predicting ASD diagnosis in children, both boys and girls, using child-mother linked electronic health records (EHRs) data from a large clinical research network. Model features were children and mothers’ risk factors in EHRs, including maternal health factors. We tested XGBoost and logistic regression with Random Oversampling (ROS) and Random Undersampling (RUS) to address imbalanced data. Logistic regression with RUS considering a three-year observation window for children’s risk factors achieved the best performance for predicting ASD among the overall study population (AUROC = 0.798), boys (AUROC = 0.786), and girls (AUROC = 0.791). We calculated SHAP values to quantify the impacts of important clinical and sociodemographic risk factors..

Medienart:

Preprint

Erscheinungsjahr:

2024

Erschienen:

2024

Enthalten in:

bioRxiv.org - (2024) vom: 28. März Zur Gesamtaufnahme - year:2024

Sprache:

Englisch

Beteiligte Personen:

Li, Yongqiu [VerfasserIn]
Huang, Yu [VerfasserIn]
Yang, Shuang [VerfasserIn]
Shychuk, Elahe M. [VerfasserIn]
Shenkman, Elizabeth A. [VerfasserIn]
Bian, Jiang [VerfasserIn]
Angell, Amber M. [VerfasserIn]
Guo, Yi [VerfasserIn]

Links:

Volltext [kostenfrei]

Themen:

570
Biology

doi:

10.1101/2024.03.24.24304813

funding:

Förderinstitution / Projekttitel:

PPN (Katalog-ID):

XBI043057780