Compositional features analysis by machine learning in genome represents linear adaptation of monkeypox virus

Copyright © 2024 Zhang, Li, Cai, Kang, Feng, Li, Chen, Li, Bao and Jiang..

Introduction: The global headlines have been dominated by the sudden and widespread outbreak of monkeypox, a rare and endemic zoonotic disease caused by the monkeypox virus (MPXV). Genomic composition based machine learning (ML) methods have recently shown promise in identifying host adaptability and evolutionary patterns of virus. Our study aimed to analyze the genomic characteristics and evolutionary patterns of MPXV using ML methods. Methods: The open reading frame (ORF) regions of full-length MPXV genomes were filtered and 165 ORFs were selected as clusters with the highest homology. Unsupervised machine learning methods of t-distributed stochastic neighbor embedding (t-SNE), Principal Component Analysis (PCA), and hierarchical clustering were performed to observe the DCR characteristics of the selected ORF clusters. Results: The results showed that MPXV sequences post-2022 showed an obvious linear adaptive evolution, indicating that it has become more adapted to the human host after accumulating mutations. For further accurate analysis, the ORF regions with larger variations were filtered out based on the ranking of homology difference to narrow down the key ORF clusters, which drew the same conclusion of linear adaptability. Then key differential protein structures were predicted by AlphaFold 2, which meant that difference in main domains might be one of the internal reasons for linear adaptive evolution. Discussion: Understanding the process of linear adaptation is critical in the constant evolutionary struggle between viruses and their hosts, playing a significant role in crafting effective measures to tackle viral diseases. Therefore, the present study provides valuable insights into the evolutionary patterns of the MPXV in 2022 from the perspective of genomic composition characteristics analysis through ML methods.

Medienart:

E-Artikel

Erscheinungsjahr:

2024

Erschienen:

2024

Enthalten in:

Zur Gesamtaufnahme - volume:15

Enthalten in:

Frontiers in genetics - 15(2024) vom: 12., Seite 1361952

Sprache:

Englisch

Beteiligte Personen:

Zhang, Sen [VerfasserIn]
Li, Ya-Dan [VerfasserIn]
Cai, Yu-Rong [VerfasserIn]
Kang, Xiao-Ping [VerfasserIn]
Feng, Ye [VerfasserIn]
Li, Yu-Chang [VerfasserIn]
Chen, Yue-Hong [VerfasserIn]
Li, Jing [VerfasserIn]
Bao, Li-Li [VerfasserIn]
Jiang, Tao [VerfasserIn]

Links:

Volltext

Themen:

Dinucleotide composition representation (DCR)
Journal Article
Linear adaptation
Machine learning
Monkeypox viruses
Open reading frame clusters

Anmerkungen:

Date Revised 19.03.2024

published: Electronic-eCollection

Citation Status PubMed-not-MEDLINE

doi:

10.3389/fgene.2024.1361952

funding:

Förderinstitution / Projekttitel:

PPN (Katalog-ID):

NLM369853563