Development of the Pneumococcal Genome Library, a core genome multilocus sequence typing scheme, and a taxonomic life identification number barcoding system to investigate and define pneumococcal population structure

Abstract Investigating the genomic epidemiology of major bacterial pathogens is integral to understanding transmission, evolution, colonisation, disease, antimicrobial resistance, and vaccine impact. Furthermore, the recent accumulation of large numbers of whole genome sequences for many bacterial species enhances the development of robust genome-wide typing schemes to define the overall bacterial population structure and lineages within it. Using previously published data, we developed the Pneumococcal Genome Library (PGL), a curated dataset of 30,976 genomes and contextual data for carriage and disease pneumococci recovered between 1916-2018 in 82 countries. We leveraged the size and diversity of the PGL to develop a core genome multilocus sequence typing (cgMLST) scheme comprised of 1,222 loci. Finally, using multilevel single-linkage clustering, we stratified pneumococci into hierarchical clusters based on allelic similarity thresholds, and defined these with a taxonomic life identification number (LIN) barcoding system. The PGL, cgMLST scheme, and LIN barcodes represent a high-quality genomic resource and fine-scale clustering approaches for the analysis of pneumococcal populations, which support the genomic epidemiology and surveillance of this leading global pathogen.Impact statement Many thousands of pneumococcal genomes are available in the public domain, and this creates opportunities for the scientific community to re-use existing data; however, these data are most useful when the contextual data (provenance and phenotype) are also linked to the genomes. Therefore, we created a curated, open-access database in PubMLST that contained nearly 31,000 published pneumococcal genomes and the corresponding contextual data for each genome. This large and diverse pneumococcal database was used to create a novel cgMLST scheme and multilevel clustering method to define genetic lineages with high resolution and a standardised nomenclature. These are open-access resources for all to use and provide a unified framework for the characterisation of global pneumococcal populations..

Medienart:

Preprint

Erscheinungsjahr:

2023

Erschienen:

2023

Enthalten in:

bioRxiv.org - (2023) vom: 22. Dez. Zur Gesamtaufnahme - year:2023

Sprache:

Englisch

Beteiligte Personen:

Jansen van Rensburg, Melissa J. [VerfasserIn]
Berger, Duncan J. [VerfasserIn]
Fohrmann, Andy [VerfasserIn]
Bray, James E. [VerfasserIn]
Jolley, Keith A. [VerfasserIn]
Maiden, Martin C.J. [VerfasserIn]
Brueggemann, Angela B. [VerfasserIn]

Links:

Volltext [kostenfrei]

Themen:

570
Biology

doi:

10.1101/2023.12.19.571883

funding:

Förderinstitution / Projekttitel:

PPN (Katalog-ID):

XBI04192987X