Cluster Analysis of Coronavirus Sequences using Computational Sequence Descriptors : With Applications to SARS, MERS and SARS-CoV-2 (CoVID-19)

Copyright© Bentham Science Publishers; For any queries, please email at epubbenthamscience.net..

INTRODUCTION: Coronaviruses comprise a group of enveloped, positive-sense single-stranded RNA viruses that infect humans as well as a wide range of animals. The study was performed on a set of 573 sequences belonging to SARS, MERS and SARS-CoV-2 (CoVID-19) viruses. The sequences were represented with alignment-free sequence descriptors and analyzed with different chemometric methods: Euclidean/Mahalanobis distances, principal component analysis and self-organizing maps (Kohonen networks). We report the cluster structures of the data. The sequences are well-clustered regarding the type of virus; however, some of them show the tendency to belong to more than one virus type.

BACKGROUND: This is a study of 573 genome sequences belonging to SARS, MERS and SARS-- CoV-2 (CoVID-19) coronaviruses.

OBJECTIVES: The aim was to compare the virus sequences, which originate from different places around the world.

METHODS: The study used alignment free sequence descriptors for the representation of sequences and chemometric methods for analyzing clusters.

RESULTS: Majority of genome sequences are clustered with respect to the virus type, but some of them are outliers.

CONCLUSION: We indicate 71 sequences, which tend to belong to more than one cluster.

Medienart:

E-Artikel

Erscheinungsjahr:

2021

Erschienen:

2021

Enthalten in:

Zur Gesamtaufnahme - volume:17

Enthalten in:

Current computer-aided drug design - 17(2021), 7 vom: 12., Seite 936-945

Sprache:

Englisch

Beteiligte Personen:

Vračko, Marjan [VerfasserIn]
Basak, Subhash C [VerfasserIn]
Dey, Tathagata [VerfasserIn]
Nandy, Ashesh [VerfasserIn]

Links:

Volltext

Themen:

Alignment-free sequenc descriptors.
Clustering
Euclidean distance
Journal Article
MERS
Mahalanobis distance
Mathematical representation of sequences
Principal component analysis
SARS
SARS-CoV-2 (CoVID-19)

Anmerkungen:

Date Completed 17.01.2022

Date Revised 17.01.2022

published: Print

Citation Status MEDLINE

doi:

10.2174/1573409917666210202092646

funding:

Förderinstitution / Projekttitel:

PPN (Katalog-ID):

NLM320918203