Details der Publikation - Methodology for biomarker discovery with reproducibility in microbiome data using machine learning

Methodology for biomarker discovery with reproducibility in microbiome data using machine learning

© 2024. The Author(s)..

BACKGROUND: In recent years, human microbiome studies have received increasing attention as this field is considered a potential source for clinical applications. With the advancements in omics technologies and AI, research focused on the discovery for potential biomarkers in the human microbiome using machine learning tools has produced positive outcomes. Despite the promising results, several issues can still be found in these studies such as datasets with small number of samples, inconsistent results, lack of uniform processing and methodologies, and other additional factors lead to lack of reproducibility in biomedical research. In this work, we propose a methodology that combines the DADA2 pipeline for 16s rRNA sequences processing and the Recursive Ensemble Feature Selection (REFS) in multiple datasets to increase reproducibility and obtain robust and reliable results in biomedical research.

RESULTS: Three experiments were performed analyzing microbiome data from patients/cases in Inflammatory Bowel Disease (IBD), Autism Spectrum Disorder (ASD), and Type 2 Diabetes (T2D). In each experiment, we found a biomarker signature in one dataset and applied to 2 other as further validation. The effectiveness of the proposed methodology was compared with other feature selection methods such as K-Best with F-score and random selection as a base line. The Area Under the Curve (AUC) was employed as a measure of diagnostic accuracy and used as a metric for comparing the results of the proposed methodology with other feature selection methods. Additionally, we use the Matthews Correlation Coefficient (MCC) as a metric to evaluate the performance of the methodology as well as for comparison with other feature selection methods.

CONCLUSIONS: We developed a methodology for reproducible biomarker discovery for 16s rRNA microbiome sequence analysis, addressing the issues related with data dimensionality, inconsistent results and validation across independent datasets. The findings from the three experiments, across 9 different datasets, show that the proposed methodology achieved higher accuracy compared to other feature selection methods. This methodology is a first approach to increase reproducibility, to provide robust and reliable results.

Medienart:	E-Artikel

Erscheinungsjahr:	2024
Erschienen:	2024

Enthalten in:	Zur Gesamtaufnahme - volume:25
Enthalten in:	BMC bioinformatics - 25(2024), 1 vom: 15. Jan., Seite 26

Sprache:	Englisch

Beteiligte Personen:	Rojas-Velazquez, David [VerfasserIn] Kidwai, Sarah [VerfasserIn] Kraneveld, Aletta D [VerfasserIn] Tonda, Alberto [VerfasserIn] Oberski, Daniel [VerfasserIn] Garssen, Johan [VerfasserIn] Lopez-Rincon, Alejandro [VerfasserIn]

Links:	Volltext

Themen:	Biomarkers Journal Article Machine learning Microbiome RNA, Ribosomal, 16S Reproducibility

Anmerkungen:	Date Completed 17.01.2024 Date Revised 18.01.2024 published: Electronic Citation Status MEDLINE

doi:	10.1186/s12859-024-05639-3

funding:
Förderinstitution / Projekttitel:

PPN (Katalog-ID):	NLM367161168

Internformat


LEADER	01000caa a22002652 4500
001	NLM367161168
003	DE-627
005	20240118232117.0
007	cr uuu---uuuuu
008	240116s2024 xx \|\|\|\|\|o 00\| \|\|eng c
024	7		\|a 10.1186/s12859-024-05639-3 \|2 doi
028	5	2	\|a pubmed24n1263.xml
035			\|a (DE-627)NLM367161168
035			\|a (NLM)38225565
040			\|a DE-627 \|b ger \|c DE-627 \|e rakwb
041			\|a eng
100	1		\|a Rojas-Velazquez, David \|e verfasserin \|4 aut
245	1	0	\|a Methodology for biomarker discovery with reproducibility in microbiome data using machine learning
264		1	\|c 2024
336			\|a Text \|b txt \|2 rdacontent
337			\|a ƒaComputermedien \|b c \|2 rdamedia
338			\|a ƒa Online-Ressource \|b cr \|2 rdacarrier
500			\|a Date Completed 17.01.2024
500			\|a Date Revised 18.01.2024
500			\|a published: Electronic
500			\|a Citation Status MEDLINE
520			\|a © 2024. The Author(s).
520			\|a BACKGROUND: In recent years, human microbiome studies have received increasing attention as this field is considered a potential source for clinical applications. With the advancements in omics technologies and AI, research focused on the discovery for potential biomarkers in the human microbiome using machine learning tools has produced positive outcomes. Despite the promising results, several issues can still be found in these studies such as datasets with small number of samples, inconsistent results, lack of uniform processing and methodologies, and other additional factors lead to lack of reproducibility in biomedical research. In this work, we propose a methodology that combines the DADA2 pipeline for 16s rRNA sequences processing and the Recursive Ensemble Feature Selection (REFS) in multiple datasets to increase reproducibility and obtain robust and reliable results in biomedical research
520			\|a RESULTS: Three experiments were performed analyzing microbiome data from patients/cases in Inflammatory Bowel Disease (IBD), Autism Spectrum Disorder (ASD), and Type 2 Diabetes (T2D). In each experiment, we found a biomarker signature in one dataset and applied to 2 other as further validation. The effectiveness of the proposed methodology was compared with other feature selection methods such as K-Best with F-score and random selection as a base line. The Area Under the Curve (AUC) was employed as a measure of diagnostic accuracy and used as a metric for comparing the results of the proposed methodology with other feature selection methods. Additionally, we use the Matthews Correlation Coefficient (MCC) as a metric to evaluate the performance of the methodology as well as for comparison with other feature selection methods
520			\|a CONCLUSIONS: We developed a methodology for reproducible biomarker discovery for 16s rRNA microbiome sequence analysis, addressing the issues related with data dimensionality, inconsistent results and validation across independent datasets. The findings from the three experiments, across 9 different datasets, show that the proposed methodology achieved higher accuracy compared to other feature selection methods. This methodology is a first approach to increase reproducibility, to provide robust and reliable results
650		4	\|a Journal Article
650		4	\|a Machine learning
650		4	\|a Microbiome
650		4	\|a Reproducibility
650		7	\|a RNA, Ribosomal, 16S \|2 NLM
650		7	\|a Biomarkers \|2 NLM
700	1		\|a Kidwai, Sarah \|e verfasserin \|4 aut
700	1		\|a Kraneveld, Aletta D \|e verfasserin \|4 aut
700	1		\|a Tonda, Alberto \|e verfasserin \|4 aut
700	1		\|a Oberski, Daniel \|e verfasserin \|4 aut
700	1		\|a Garssen, Johan \|e verfasserin \|4 aut
700	1		\|a Lopez-Rincon, Alejandro \|e verfasserin \|4 aut
773	0	8	\|i Enthalten in \|t BMC bioinformatics \|d 2000 \|g 25(2024), 1 vom: 15. Jan., Seite 26 \|w (DE-627)NLM109215982 \|x 1471-2105 \|7 nnns
773	1	8	\|g volume:25 \|g year:2024 \|g number:1 \|g day:15 \|g month:01 \|g pages:26
856	4	0	\|u http://dx.doi.org/10.1186/s12859-024-05639-3 \|3 Volltext
912			\|a GBV_USEFLAG_A
912			\|a GBV_NLM
951			\|a AR
952			\|d 25 \|j 2024 \|e 1 \|b 15 \|c 01 \|h 26

Methodology for biomarker discovery with reproducibility in microbiome data using machine learning

Zugang & Verfügbarkeit

Zugehörige Publikationen/Bände