Methodology for Biomarker Discovery with Reproducibility in Microbiome Data using Machine Learning

Abstract Background: In recent years, human microbiome studies have receivedincreasing attention as this field is considered a potential source for clinicalapplications. With the advancements in omics technologies and AI, researchfocused on the discovery for potential biomarkers in the human microbime usingmachine learning tools has produced positive outcomes. Despite the promisingresults, several issues can still be found in these studies such as datasets withsmall number of samples, inconsistent results, lack of uniform processing andmethodologies, and other additional factors lead to lack of reproducibility inbiomedical research. In this work, we propose a methodology that combines theDADA2 pipeline for 16s rRNA sequences processing and the Recursive EnsembleFeature Selection (REFS) in multiple datasets to increase reproducibility andobtain robust and reliable results in biomedical research. Results: Three experiments were performed analysing microbiome data frompatients/cases in Inflammatory Bowel Disease (IBD), Autism Spectrum Disorder(ASD), and Type 2 Diabetes (T2D). In each experiment, we found a biomarkersignature in one dataset and applied to 2 other as further validation. Theeffectiveness of the proposed methodology was compared with other featureselection methods such as K-Best with F-score and random selection as a baseline. The Area Under the Curve (AUC) was employed as a measure of diagnosticaccuracy and used as a metric for comparing the results of the proposedmethodology with other feature selection methods. Conclusions: We developed a methodology for reproducible biomarker discoveryfor 16s rRNA microbiome sequence analysis, addressing the issues related withdata dimensionality, inconsistent results and validation across independentdatasets. The findings from the three experiments, across 9 different datasets,show that the proposed methodology achieved higher accuracy compared toother feature selection methods. This methodology is a first approach to increasereproducibility, to provide robust and reliable results..

Medienart:

Preprint

Erscheinungsjahr:

2024

Erschienen:

2024

Enthalten in:

ResearchSquare.com - (2024) vom: 22. Jan. Zur Gesamtaufnahme - year:2024

Sprache:

Englisch

Beteiligte Personen:

Rojas-Velazquez, David [VerfasserIn]
Kidwai, Sarah [VerfasserIn]
Kraneveld, Aletta D. [VerfasserIn]
Tonda, Alberto [VerfasserIn]
Oberski, Daniel [VerfasserIn]
Garssen, Johan [VerfasserIn]
Lopez-Rincon, Alejandro [VerfasserIn]

Links:

Volltext [lizenzpflichtig]
Volltext [kostenfrei]

Themen:

570
Biology

doi:

10.21203/rs.3.rs-3699085/v1

funding:

Förderinstitution / Projekttitel:

PPN (Katalog-ID):

XRA041785932