LazySampling and LinearSampling : fast stochastic sampling of RNA secondary structure with applications to SARS-CoV-2
© The Author(s) 2022. Published by Oxford University Press on behalf of Nucleic Acids Research..
Many RNAs fold into multiple structures at equilibrium, and there is a need to sample these structures according to their probabilities in the ensemble. The conventional sampling algorithm suffers from two limitations: (i) the sampling phase is slow due to many repeated calculations; and (ii) the end-to-end runtime scales cubically with the sequence length. These issues make it difficult to be applied to long RNAs, such as the full genomes of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). To address these problems, we devise a new sampling algorithm, LazySampling, which eliminates redundant work via on-demand caching. Based on LazySampling, we further derive LinearSampling, an end-to-end linear time sampling algorithm. Benchmarking on nine diverse RNA families, the sampled structures from LinearSampling correlate better with the well-established secondary structures than Vienna RNAsubopt and RNAplfold. More importantly, LinearSampling is orders of magnitude faster than standard tools, being 428× faster (72 s versus 8.6 h) than RNAsubopt on the full genome of SARS-CoV-2 (29 903 nt). The resulting sample landscape correlates well with the experimentally guided secondary structure models, and is closer to the alternative conformations revealed by experimentally driven analysis. Finally, LinearSampling finds 23 regions of 15 nt with high accessibilities in the SARS-CoV-2 genome, which are potential targets for COVID-19 diagnostics and therapeutics.
Errataetall: | |
---|---|
Medienart: |
E-Artikel |
Erscheinungsjahr: |
2023 |
---|---|
Erschienen: |
2023 |
Enthalten in: |
Zur Gesamtaufnahme - volume:51 |
---|---|
Enthalten in: |
Nucleic acids research - 51(2023), 2 vom: 25. Jan., Seite e7 |
Sprache: |
Englisch |
---|
Beteiligte Personen: |
Zhang, He [VerfasserIn] |
---|
Links: |
---|
Themen: |
Journal Article |
---|
Anmerkungen: |
Date Completed 31.01.2023 Date Revised 12.07.2023 published: Print UpdateOf: bioRxiv. 2021 Nov 24;:. - PMID 33398265 Citation Status MEDLINE |
---|
doi: |
10.1093/nar/gkac1029 |
---|
funding: |
|
---|---|
Förderinstitution / Projekttitel: |
|
PPN (Katalog-ID): |
NLM349150893 |
---|
LEADER | 01000naa a22002652 4500 | ||
---|---|---|---|
001 | NLM349150893 | ||
003 | DE-627 | ||
005 | 20231226041932.0 | ||
007 | cr uuu---uuuuu | ||
008 | 231226s2023 xx |||||o 00| ||eng c | ||
024 | 7 | |a 10.1093/nar/gkac1029 |2 doi | |
028 | 5 | 2 | |a pubmed24n1163.xml |
035 | |a (DE-627)NLM349150893 | ||
035 | |a (NLM)36401871 | ||
040 | |a DE-627 |b ger |c DE-627 |e rakwb | ||
041 | |a eng | ||
100 | 1 | |a Zhang, He |e verfasserin |4 aut | |
245 | 1 | 0 | |a LazySampling and LinearSampling |b fast stochastic sampling of RNA secondary structure with applications to SARS-CoV-2 |
264 | 1 | |c 2023 | |
336 | |a Text |b txt |2 rdacontent | ||
337 | |a ƒaComputermedien |b c |2 rdamedia | ||
338 | |a ƒa Online-Ressource |b cr |2 rdacarrier | ||
500 | |a Date Completed 31.01.2023 | ||
500 | |a Date Revised 12.07.2023 | ||
500 | |a published: Print | ||
500 | |a UpdateOf: bioRxiv. 2021 Nov 24;:. - PMID 33398265 | ||
500 | |a Citation Status MEDLINE | ||
520 | |a © The Author(s) 2022. Published by Oxford University Press on behalf of Nucleic Acids Research. | ||
520 | |a Many RNAs fold into multiple structures at equilibrium, and there is a need to sample these structures according to their probabilities in the ensemble. The conventional sampling algorithm suffers from two limitations: (i) the sampling phase is slow due to many repeated calculations; and (ii) the end-to-end runtime scales cubically with the sequence length. These issues make it difficult to be applied to long RNAs, such as the full genomes of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). To address these problems, we devise a new sampling algorithm, LazySampling, which eliminates redundant work via on-demand caching. Based on LazySampling, we further derive LinearSampling, an end-to-end linear time sampling algorithm. Benchmarking on nine diverse RNA families, the sampled structures from LinearSampling correlate better with the well-established secondary structures than Vienna RNAsubopt and RNAplfold. More importantly, LinearSampling is orders of magnitude faster than standard tools, being 428× faster (72 s versus 8.6 h) than RNAsubopt on the full genome of SARS-CoV-2 (29 903 nt). The resulting sample landscape correlates well with the experimentally guided secondary structure models, and is closer to the alternative conformations revealed by experimentally driven analysis. Finally, LinearSampling finds 23 regions of 15 nt with high accessibilities in the SARS-CoV-2 genome, which are potential targets for COVID-19 diagnostics and therapeutics | ||
650 | 4 | |a Journal Article | |
650 | 4 | |a Research Support, N.I.H., Extramural | |
650 | 4 | |a Research Support, U.S. Gov't, Non-P.H.S. | |
650 | 7 | |a RNA, Viral |2 NLM | |
700 | 1 | |a Li, Sizhen |e verfasserin |4 aut | |
700 | 1 | |a Zhang, Liang |e verfasserin |4 aut | |
700 | 1 | |a Mathews, David H |e verfasserin |4 aut | |
700 | 1 | |a Huang, Liang |e verfasserin |4 aut | |
773 | 0 | 8 | |i Enthalten in |t Nucleic acids research |d 1974 |g 51(2023), 2 vom: 25. Jan., Seite e7 |w (DE-627)NLM000063398 |x 1362-4962 |7 nnns |
773 | 1 | 8 | |g volume:51 |g year:2023 |g number:2 |g day:25 |g month:01 |g pages:e7 |
856 | 4 | 0 | |u http://dx.doi.org/10.1093/nar/gkac1029 |3 Volltext |
912 | |a GBV_USEFLAG_A | ||
912 | |a GBV_NLM | ||
951 | |a AR | ||
952 | |d 51 |j 2023 |e 2 |b 25 |c 01 |h e7 |