Integrative analysis of microbial 16S gene and shotgun metagenomic sequencing data improves statistical efficiency

Abstract Background: The most widely used technologies for profiling microbial communities are 16S marker-gene sequencing and shotgun metagenomic sequencing. Interestingly, many microbiome studies have performed both sequencing experiments on the same cohort of samples. The two sequencing datasets often reveal consistent patterns of microbial signatures, highlighting the potential for an integrative analysis to improve power of testing these signatures. However, differential experimental biases, partially overlapping samples, and differential library sizes pose tremendous challenges when combining the two datasets. Currently, researchers either discard one dataset entirely or use different datasets for different objectives. Methods: In this article, we introduce the first method of this kind, named Com-2seq, that combines the two sequencing datasets for testing differential abundance at the genus and community levels while overcoming these difficulties. The new method is based on our LOCOM model (Hu et al., 2022), which employs logistic regression for testing taxon differential abundance while remaining robust to experimental bias. To benchmark the performance of Com-2seq, we introduce two ad hoc approaches: applying LOCOM to pooled taxa count data and combining LOCOM p-values from analyzing each dataset separately. Results: Our simulation studies indicate that Com-2seq substantially improves statistical efficiency over analysis of either dataset alone and works better than the two ad hoc approaches. An application of Com-2seq to two real microbiome studies uncovered scientifically plausible findings that would have been missed by analyzing individual datasets. Conclusions: Com-2seq performs integrative analysis of 16S and metagenomic sequencing data, which improves statistical efficiency and has the potential to accelerate the search of microbial communities and taxa that are involved in human health and diseases..

Medienart:

Preprint

Erscheinungsjahr:

2024

Erschienen:

2024

Enthalten in:

ResearchSquare.com - (2024) vom: 21. März Zur Gesamtaufnahme - year:2024

Sprache:

Englisch

Beteiligte Personen:

Yue, Ye [VerfasserIn]
Read, Timothy [VerfasserIn]
Fedirko, Veronika [VerfasserIn]
Satten, Glen [VerfasserIn]
Hu, Yi-Juan [VerfasserIn]

Links:

Volltext [kostenfrei]

Themen:

570
Biology

doi:

10.21203/rs.3.rs-3376801/v1

funding:

Förderinstitution / Projekttitel:

PPN (Katalog-ID):

XRA041073983