Analysis of the limited<i>M. tuberculosis</i>accessory genome reveals potential pitfalls of pan-genome analysis approaches

Abstract Pan-genome analysis is a fundamental tool in the study of bacterial genome evolution. Benchmarking the accuracy of pan-genome analysis methods is challenging, because it can be significantly influenced by both the methodology used to compare genomes, as well as differences in the accuracy and representativeness of the genomes analyzed. In this work, we curated a collection of 151Mycobacterium tuberculosis(Mtb) isolates to evaluate sources of variability in pan-genome analysis.Mtbis characterized by its clonal evolution, absence of horizontal gene transfer, and limited accessory genome, making it an ideal test case for this study. Using a state-of-the-art graph-genome approach, we found that a majority of the structural variation observed inMtboriginates from rearrangement, deletion, and duplication of redundant nucleotide sequences. In contrast, we found that pan-genome analyses that focus on comparison of coding sequences (at the amino acid level) can yield surprisingly variable results, driven by differences in assembly quality and the softwares used. Upon closer inspection, we found that coding sequence annotation discrepancies were a major contributor to inflatedMtbaccessory genome estimates. To address this, we developed panqc, a software that detects annotation discrepancies and collapses nucleotide redundancy in pan-genome estimates. We characterized the effect of the panqc adjustment on both pan-genome analysis ofMtbandE. coligenomes, and highlight how different levels of genomic diversity are prone to unique biases. Overall, this study illustrates the need for careful methodological selection and quality control to accurately map the evolutionary dynamics of a bacterial species..

Medienart:

Preprint

Erscheinungsjahr:

2024

Erschienen:

2024

Enthalten in:

bioRxiv.org - (2024) vom: 28. März Zur Gesamtaufnahme - year:2024

Sprache:

Englisch

Beteiligte Personen:

Marin, Maximillian G. [VerfasserIn]
Wippel, Christoph [VerfasserIn]
Quinones-Olvera, Natalia [VerfasserIn]
Behruznia, Mahboobeh [VerfasserIn]
Jeffrey, Brendan M. [VerfasserIn]
Harris, Michael [VerfasserIn]
Mann, Brendon C. [VerfasserIn]
Rosenthal, Alex [VerfasserIn]
Jacobson, Karen R. [VerfasserIn]
Warren, Robin M. [VerfasserIn]
Li, Heng [VerfasserIn]
Meehan, Conor J. [VerfasserIn]
Farhat, Maha R. [VerfasserIn]

Links:

Volltext [kostenfrei]

Themen:

570
Biology

doi:

10.1101/2024.03.21.586149

funding:

Förderinstitution / Projekttitel:

PPN (Katalog-ID):

XBI043056083