Rapid and accurate identification of ribosomal RNA sequences via deep learning
© The Author(s) 2022. Published by Oxford University Press on behalf of Nucleic Acids Research..
Advances in transcriptomic and translatomic techniques enable in-depth studies of RNA activity profiles and RNA-based regulatory mechanisms. Ribosomal RNA (rRNA) sequences are highly abundant among cellular RNA, but if the target sequences do not include polyadenylation, these cannot be easily removed in library preparation, requiring their post-hoc removal with computational techniques to accelerate and improve downstream analyses. Here, we describe RiboDetector, a novel software based on a Bi-directional Long Short-Term Memory (BiLSTM) neural network, which rapidly and accurately identifies rRNA reads from transcriptomic, metagenomic, metatranscriptomic, noncoding RNA, and ribosome profiling sequence data. Compared with state-of-the-art approaches, RiboDetector produced at least six times fewer misclassifications on the benchmark datasets. Importantly, the few false positives of RiboDetector were not enriched in certain Gene Ontology (GO) terms, suggesting a low bias for downstream functional profiling. RiboDetector also demonstrated a remarkable generalizability for detecting novel rRNA sequences that are divergent from the training data with sequence identities of <90%. On a personal computer, RiboDetector processed 40M reads in less than 6 min, which was ∼50 times faster in GPU mode and ∼15 times in CPU mode than other methods. RiboDetector is available under a GPL v3.0 license at https://github.com/hzi-bifo/RiboDetector.
Medienart: |
E-Artikel |
---|
Erscheinungsjahr: |
2022 |
---|---|
Erschienen: |
2022 |
Enthalten in: |
Zur Gesamtaufnahme - volume:50 |
---|---|
Enthalten in: |
Nucleic acids research - 50(2022), 10 vom: 10. Juni, Seite e60 |
Sprache: |
Englisch |
---|
Beteiligte Personen: |
Deng, Zhi-Luo [VerfasserIn] |
---|
Links: |
---|
Themen: |
63231-63-0 |
---|
Anmerkungen: |
Date Completed 10.06.2022 Date Revised 16.07.2022 published: Print Citation Status MEDLINE |
---|
doi: |
10.1093/nar/gkac112 |
---|
funding: |
|
---|---|
Förderinstitution / Projekttitel: |
|
PPN (Katalog-ID): |
NLM337203717 |
---|
LEADER | 01000naa a22002652 4500 | ||
---|---|---|---|
001 | NLM337203717 | ||
003 | DE-627 | ||
005 | 20231225233856.0 | ||
007 | cr uuu---uuuuu | ||
008 | 231225s2022 xx |||||o 00| ||eng c | ||
024 | 7 | |a 10.1093/nar/gkac112 |2 doi | |
028 | 5 | 2 | |a pubmed24n1123.xml |
035 | |a (DE-627)NLM337203717 | ||
035 | |a (NLM)35188571 | ||
040 | |a DE-627 |b ger |c DE-627 |e rakwb | ||
041 | |a eng | ||
100 | 1 | |a Deng, Zhi-Luo |e verfasserin |4 aut | |
245 | 1 | 0 | |a Rapid and accurate identification of ribosomal RNA sequences via deep learning |
264 | 1 | |c 2022 | |
336 | |a Text |b txt |2 rdacontent | ||
337 | |a ƒaComputermedien |b c |2 rdamedia | ||
338 | |a ƒa Online-Ressource |b cr |2 rdacarrier | ||
500 | |a Date Completed 10.06.2022 | ||
500 | |a Date Revised 16.07.2022 | ||
500 | |a published: Print | ||
500 | |a Citation Status MEDLINE | ||
520 | |a © The Author(s) 2022. Published by Oxford University Press on behalf of Nucleic Acids Research. | ||
520 | |a Advances in transcriptomic and translatomic techniques enable in-depth studies of RNA activity profiles and RNA-based regulatory mechanisms. Ribosomal RNA (rRNA) sequences are highly abundant among cellular RNA, but if the target sequences do not include polyadenylation, these cannot be easily removed in library preparation, requiring their post-hoc removal with computational techniques to accelerate and improve downstream analyses. Here, we describe RiboDetector, a novel software based on a Bi-directional Long Short-Term Memory (BiLSTM) neural network, which rapidly and accurately identifies rRNA reads from transcriptomic, metagenomic, metatranscriptomic, noncoding RNA, and ribosome profiling sequence data. Compared with state-of-the-art approaches, RiboDetector produced at least six times fewer misclassifications on the benchmark datasets. Importantly, the few false positives of RiboDetector were not enriched in certain Gene Ontology (GO) terms, suggesting a low bias for downstream functional profiling. RiboDetector also demonstrated a remarkable generalizability for detecting novel rRNA sequences that are divergent from the training data with sequence identities of <90%. On a personal computer, RiboDetector processed 40M reads in less than 6 min, which was ∼50 times faster in GPU mode and ∼15 times in CPU mode than other methods. RiboDetector is available under a GPL v3.0 license at https://github.com/hzi-bifo/RiboDetector | ||
650 | 4 | |a Journal Article | |
650 | 4 | |a Research Support, Non-U.S. Gov't | |
650 | 7 | |a RNA, Ribosomal |2 NLM | |
650 | 7 | |a RNA |2 NLM | |
650 | 7 | |a 63231-63-0 |2 NLM | |
700 | 1 | |a Münch, Philipp C |e verfasserin |4 aut | |
700 | 1 | |a Mreches, René |e verfasserin |4 aut | |
700 | 1 | |a McHardy, Alice C |e verfasserin |4 aut | |
773 | 0 | 8 | |i Enthalten in |t Nucleic acids research |d 1974 |g 50(2022), 10 vom: 10. Juni, Seite e60 |w (DE-627)NLM000063398 |x 1362-4962 |7 nnns |
773 | 1 | 8 | |g volume:50 |g year:2022 |g number:10 |g day:10 |g month:06 |g pages:e60 |
856 | 4 | 0 | |u http://dx.doi.org/10.1093/nar/gkac112 |3 Volltext |
912 | |a GBV_USEFLAG_A | ||
912 | |a GBV_NLM | ||
951 | |a AR | ||
952 | |d 50 |j 2022 |e 10 |b 10 |c 06 |h e60 |