Cluster Analysis of Coronavirus Sequences using Computational Sequence Descriptors : With Applications to SARS, MERS and SARS-CoV-2 (CoVID-19)
Copyright© Bentham Science Publishers; For any queries, please email at epubbenthamscience.net..
INTRODUCTION: Coronaviruses comprise a group of enveloped, positive-sense single-stranded RNA viruses that infect humans as well as a wide range of animals. The study was performed on a set of 573 sequences belonging to SARS, MERS and SARS-CoV-2 (CoVID-19) viruses. The sequences were represented with alignment-free sequence descriptors and analyzed with different chemometric methods: Euclidean/Mahalanobis distances, principal component analysis and self-organizing maps (Kohonen networks). We report the cluster structures of the data. The sequences are well-clustered regarding the type of virus; however, some of them show the tendency to belong to more than one virus type.
BACKGROUND: This is a study of 573 genome sequences belonging to SARS, MERS and SARS-- CoV-2 (CoVID-19) coronaviruses.
OBJECTIVES: The aim was to compare the virus sequences, which originate from different places around the world.
METHODS: The study used alignment free sequence descriptors for the representation of sequences and chemometric methods for analyzing clusters.
RESULTS: Majority of genome sequences are clustered with respect to the virus type, but some of them are outliers.
CONCLUSION: We indicate 71 sequences, which tend to belong to more than one cluster.
Medienart: |
E-Artikel |
---|
Erscheinungsjahr: |
2021 |
---|---|
Erschienen: |
2021 |
Enthalten in: |
Zur Gesamtaufnahme - volume:17 |
---|---|
Enthalten in: |
Current computer-aided drug design - 17(2021), 7 vom: 12., Seite 936-945 |
Sprache: |
Englisch |
---|
Beteiligte Personen: |
Vračko, Marjan [VerfasserIn] |
---|
Links: |
---|
Anmerkungen: |
Date Completed 17.01.2022 Date Revised 17.01.2022 published: Print Citation Status MEDLINE |
---|
doi: |
10.2174/1573409917666210202092646 |
---|
funding: |
|
---|---|
Förderinstitution / Projekttitel: |
|
PPN (Katalog-ID): |
NLM320918203 |
---|
LEADER | 01000naa a22002652 4500 | ||
---|---|---|---|
001 | NLM320918203 | ||
003 | DE-627 | ||
005 | 20231225174713.0 | ||
007 | cr uuu---uuuuu | ||
008 | 231225s2021 xx |||||o 00| ||eng c | ||
024 | 7 | |a 10.2174/1573409917666210202092646 |2 doi | |
028 | 5 | 2 | |a pubmed24n1069.xml |
035 | |a (DE-627)NLM320918203 | ||
035 | |a (NLM)33530913 | ||
040 | |a DE-627 |b ger |c DE-627 |e rakwb | ||
041 | |a eng | ||
100 | 1 | |a Vračko, Marjan |e verfasserin |4 aut | |
245 | 1 | 0 | |a Cluster Analysis of Coronavirus Sequences using Computational Sequence Descriptors |b With Applications to SARS, MERS and SARS-CoV-2 (CoVID-19) |
264 | 1 | |c 2021 | |
336 | |a Text |b txt |2 rdacontent | ||
337 | |a ƒaComputermedien |b c |2 rdamedia | ||
338 | |a ƒa Online-Ressource |b cr |2 rdacarrier | ||
500 | |a Date Completed 17.01.2022 | ||
500 | |a Date Revised 17.01.2022 | ||
500 | |a published: Print | ||
500 | |a Citation Status MEDLINE | ||
520 | |a Copyright© Bentham Science Publishers; For any queries, please email at epubbenthamscience.net. | ||
520 | |a INTRODUCTION: Coronaviruses comprise a group of enveloped, positive-sense single-stranded RNA viruses that infect humans as well as a wide range of animals. The study was performed on a set of 573 sequences belonging to SARS, MERS and SARS-CoV-2 (CoVID-19) viruses. The sequences were represented with alignment-free sequence descriptors and analyzed with different chemometric methods: Euclidean/Mahalanobis distances, principal component analysis and self-organizing maps (Kohonen networks). We report the cluster structures of the data. The sequences are well-clustered regarding the type of virus; however, some of them show the tendency to belong to more than one virus type | ||
520 | |a BACKGROUND: This is a study of 573 genome sequences belonging to SARS, MERS and SARS-- CoV-2 (CoVID-19) coronaviruses | ||
520 | |a OBJECTIVES: The aim was to compare the virus sequences, which originate from different places around the world | ||
520 | |a METHODS: The study used alignment free sequence descriptors for the representation of sequences and chemometric methods for analyzing clusters | ||
520 | |a RESULTS: Majority of genome sequences are clustered with respect to the virus type, but some of them are outliers | ||
520 | |a CONCLUSION: We indicate 71 sequences, which tend to belong to more than one cluster | ||
650 | 4 | |a Journal Article | |
650 | 4 | |a Euclidean distance | |
650 | 4 | |a MERS | |
650 | 4 | |a Mahalanobis distance | |
650 | 4 | |a SARS | |
650 | 4 | |a SARS-CoV-2 (CoVID-19) | |
650 | 4 | |a alignment-free sequenc descriptors. | |
650 | 4 | |a clustering | |
650 | 4 | |a mathematical representation of sequences | |
650 | 4 | |a principal component analysis | |
700 | 1 | |a Basak, Subhash C |e verfasserin |4 aut | |
700 | 1 | |a Dey, Tathagata |e verfasserin |4 aut | |
700 | 1 | |a Nandy, Ashesh |e verfasserin |4 aut | |
773 | 0 | 8 | |i Enthalten in |t Current computer-aided drug design |d 2008 |g 17(2021), 7 vom: 12., Seite 936-945 |w (DE-627)NLM191691046 |x 1875-6697 |7 nnns |
773 | 1 | 8 | |g volume:17 |g year:2021 |g number:7 |g day:12 |g pages:936-945 |
856 | 4 | 0 | |u http://dx.doi.org/10.2174/1573409917666210202092646 |3 Volltext |
912 | |a GBV_USEFLAG_A | ||
912 | |a GBV_NLM | ||
951 | |a AR | ||
952 | |d 17 |j 2021 |e 7 |b 12 |h 936-945 |