Using artificial intelligence to document the hidden RNA virosphere
Abstract RNA viruses are diverse and abundant components of global ecosystems. The metagenomic identification of RNA viruses is currently limited to those that exhibit sequence similarity to known viruses. Consequently, the detection of highly divergent viruses with poor sequence similarity to known viruses remains a challenging task. We developed a deep learning algorithm, termed LucaProt, to identify highly divergent RNA-dependent RNA polymerase (RdRP) sequences in 10,487 metatranscriptomes from diverse global ecosystems. LucaProt integrates both sequence and structural information to accurately and efficiently detect RdRP sequences. With this approach we identified 161,979 putative RNA virus species and 180 RNA virus supergroups, among which only 21 contained members of phyla or classes currently defined by the International Committee on Taxonomy of Viruses, and includes many groups that were either undescribed or poorly characterized in previous studies. The newly identified RNA viruses were present in diverse ecological settings, including the air, hot springs and hydrothermal vents, and both virus diversity and abundance varied substantially among ecosystems. We also identified the longest RNA virus genome (nido-like virus) documented to date, at 47,250 nucleotides. This study marks the beginning of a new era of virus discovery, providing computational tools that will help expand our understanding of the global RNA virosphere and of virus evolution..
Medienart: |
Preprint |
---|
Erscheinungsjahr: |
2024 |
---|---|
Erschienen: |
2024 |
Enthalten in: |
bioRxiv.org - (2024) vom: 17. Feb. Zur Gesamtaufnahme - year:2024 |
---|
Sprache: |
Englisch |
---|
Beteiligte Personen: |
Hou, Xin [VerfasserIn] |
---|
Links: |
Volltext [kostenfrei] |
---|
Themen: |
---|
doi: |
10.1101/2023.04.18.537342 |
---|
funding: |
|
---|---|
Förderinstitution / Projekttitel: |
|
PPN (Katalog-ID): |
XBI039310264 |
---|
LEADER | 01000caa a22002652 4500 | ||
---|---|---|---|
001 | XBI039310264 | ||
003 | DE-627 | ||
005 | 20240218090425.0 | ||
007 | cr uuu---uuuuu | ||
008 | 230420s2024 xx |||||o 00| ||eng c | ||
024 | 7 | |a 10.1101/2023.04.18.537342 |2 doi | |
035 | |a (DE-627)XBI039310264 | ||
035 | |a (biorXiv)10.1101/2023.04.18.537342 | ||
040 | |a DE-627 |b ger |c DE-627 |e rakwb | ||
041 | |a eng | ||
100 | 1 | |a Hou, Xin |e verfasserin |4 aut | |
245 | 1 | 0 | |a Using artificial intelligence to document the hidden RNA virosphere |
264 | 1 | |c 2024 | |
336 | |a Text |b txt |2 rdacontent | ||
337 | |a Computermedien |b c |2 rdamedia | ||
338 | |a Online-Ressource |b cr |2 rdacarrier | ||
520 | |a Abstract RNA viruses are diverse and abundant components of global ecosystems. The metagenomic identification of RNA viruses is currently limited to those that exhibit sequence similarity to known viruses. Consequently, the detection of highly divergent viruses with poor sequence similarity to known viruses remains a challenging task. We developed a deep learning algorithm, termed LucaProt, to identify highly divergent RNA-dependent RNA polymerase (RdRP) sequences in 10,487 metatranscriptomes from diverse global ecosystems. LucaProt integrates both sequence and structural information to accurately and efficiently detect RdRP sequences. With this approach we identified 161,979 putative RNA virus species and 180 RNA virus supergroups, among which only 21 contained members of phyla or classes currently defined by the International Committee on Taxonomy of Viruses, and includes many groups that were either undescribed or poorly characterized in previous studies. The newly identified RNA viruses were present in diverse ecological settings, including the air, hot springs and hydrothermal vents, and both virus diversity and abundance varied substantially among ecosystems. We also identified the longest RNA virus genome (nido-like virus) documented to date, at 47,250 nucleotides. This study marks the beginning of a new era of virus discovery, providing computational tools that will help expand our understanding of the global RNA virosphere and of virus evolution. | ||
650 | 4 | |a Biology |7 (dpeaa)DE-84 | |
650 | 4 | |a 570 |7 (dpeaa)DE-84 | |
700 | 1 | |a He, Yong |4 aut | |
700 | 1 | |a Fang, Pan |4 aut | |
700 | 1 | |a Mei, Shi-Qiang |4 aut | |
700 | 1 | |a Xu, Zan |4 aut | |
700 | 1 | |a Wu, Wei-Chen |4 aut | |
700 | 1 | |a Tian, Jun-Hua |4 aut | |
700 | 1 | |a Zhang, Shun |4 aut | |
700 | 1 | |a Zeng, Zhen-Yu |4 aut | |
700 | 1 | |a Gou, Qin-Yu |4 aut | |
700 | 1 | |a Xin, Gen-Yang |4 aut | |
700 | 1 | |a Le, Shi-Jia |4 aut | |
700 | 1 | |a Xia, Yin-Yue |4 aut | |
700 | 1 | |a Zhou, Yu-Lan |4 aut | |
700 | 1 | |a Hui, Feng-Ming |4 aut | |
700 | 1 | |a Pan, Yuan-Fei |4 aut | |
700 | 1 | |a Eden, John-Sebastian |4 aut | |
700 | 1 | |a Yang, Zhao-Hui |4 aut | |
700 | 1 | |a Han, Chong |4 aut | |
700 | 1 | |a Shu, Yue-Long |4 aut | |
700 | 1 | |a Guo, Deyin |4 aut | |
700 | 1 | |a Li, Jun |4 aut | |
700 | 1 | |a Holmes, Edward C. |4 aut | |
700 | 1 | |a Li, Zhao-Rong |4 aut | |
700 | 1 | |a Shi, Mang |4 aut | |
773 | 0 | 8 | |i Enthalten in |t bioRxiv.org |g (2024) vom: 17. Feb. |
773 | 1 | 8 | |g year:2024 |g day:17 |g month:02 |
856 | 4 | 0 | |u http://dx.doi.org/10.1101/2023.04.18.537342 |z kostenfrei |3 Volltext |
912 | |a GBV_XBI | ||
951 | |a AR | ||
952 | |j 2024 |b 17 |c 02 |