SurVirus : a repeat-aware virus integration caller
© The Author(s) 2021. Published by Oxford University Press on behalf of Nucleic Acids Research..
A significant portion of human cancers are due to viruses integrating into human genomes. Therefore, accurately predicting virus integrations can help uncover the mechanisms that lead to many devastating diseases. Virus integrations can be called by analysing second generation high-throughput sequencing datasets. Unfortunately, existing methods fail to report a significant portion of integrations, while predicting a large number of false positives. We observe that the inaccuracy is caused by incorrect alignment of reads in repetitive regions. False alignments create false positives, while missing alignments create false negatives. This paper proposes SurVirus, an improved virus integration caller that corrects the alignment of reads which are crucial for the discovery of integrations. We use publicly available datasets to show that existing methods predict hundreds of thousands of false positives; SurVirus, on the other hand, is significantly more precise while it also detects many novel integrations previously missed by other tools, most of which are in repetitive regions. We validate a subset of these novel integrations, and find that the majority are correct. Using SurVirus, we find that HPV and HBV integrations are enriched in LINE and Satellite regions which had been overlooked, as well as discover recurrent HBV and HPV breakpoints in human genome-virus fusion transcripts.
Medienart: |
E-Artikel |
---|
Erscheinungsjahr: |
2021 |
---|---|
Erschienen: |
2021 |
Enthalten in: |
Zur Gesamtaufnahme - volume:49 |
---|---|
Enthalten in: |
Nucleic acids research - 49(2021), 6 vom: 06. Apr., Seite e33 |
Sprache: |
Englisch |
---|
Beteiligte Personen: |
Rajaby, Ramesh [VerfasserIn] |
---|
Links: |
---|
Themen: |
Evaluation Study |
---|
Anmerkungen: |
Date Completed 12.05.2021 Date Revised 12.05.2021 published: Print Citation Status MEDLINE |
---|
doi: |
10.1093/nar/gkaa1237 |
---|
funding: |
|
---|---|
Förderinstitution / Projekttitel: |
|
PPN (Katalog-ID): |
NLM320077071 |
---|
LEADER | 01000naa a22002652 4500 | ||
---|---|---|---|
001 | NLM320077071 | ||
003 | DE-627 | ||
005 | 20231225172814.0 | ||
007 | cr uuu---uuuuu | ||
008 | 231225s2021 xx |||||o 00| ||eng c | ||
024 | 7 | |a 10.1093/nar/gkaa1237 |2 doi | |
028 | 5 | 2 | |a pubmed24n1066.xml |
035 | |a (DE-627)NLM320077071 | ||
035 | |a (NLM)33444454 | ||
040 | |a DE-627 |b ger |c DE-627 |e rakwb | ||
041 | |a eng | ||
100 | 1 | |a Rajaby, Ramesh |e verfasserin |4 aut | |
245 | 1 | 0 | |a SurVirus |b a repeat-aware virus integration caller |
264 | 1 | |c 2021 | |
336 | |a Text |b txt |2 rdacontent | ||
337 | |a ƒaComputermedien |b c |2 rdamedia | ||
338 | |a ƒa Online-Ressource |b cr |2 rdacarrier | ||
500 | |a Date Completed 12.05.2021 | ||
500 | |a Date Revised 12.05.2021 | ||
500 | |a published: Print | ||
500 | |a Citation Status MEDLINE | ||
520 | |a © The Author(s) 2021. Published by Oxford University Press on behalf of Nucleic Acids Research. | ||
520 | |a A significant portion of human cancers are due to viruses integrating into human genomes. Therefore, accurately predicting virus integrations can help uncover the mechanisms that lead to many devastating diseases. Virus integrations can be called by analysing second generation high-throughput sequencing datasets. Unfortunately, existing methods fail to report a significant portion of integrations, while predicting a large number of false positives. We observe that the inaccuracy is caused by incorrect alignment of reads in repetitive regions. False alignments create false positives, while missing alignments create false negatives. This paper proposes SurVirus, an improved virus integration caller that corrects the alignment of reads which are crucial for the discovery of integrations. We use publicly available datasets to show that existing methods predict hundreds of thousands of false positives; SurVirus, on the other hand, is significantly more precise while it also detects many novel integrations previously missed by other tools, most of which are in repetitive regions. We validate a subset of these novel integrations, and find that the majority are correct. Using SurVirus, we find that HPV and HBV integrations are enriched in LINE and Satellite regions which had been overlooked, as well as discover recurrent HBV and HPV breakpoints in human genome-virus fusion transcripts | ||
650 | 4 | |a Evaluation Study | |
650 | 4 | |a Journal Article | |
650 | 4 | |a Research Support, Non-U.S. Gov't | |
700 | 1 | |a Zhou, Yi |e verfasserin |4 aut | |
700 | 1 | |a Meng, Yifan |e verfasserin |4 aut | |
700 | 1 | |a Zeng, Xi |e verfasserin |4 aut | |
700 | 1 | |a Li, Guoliang |e verfasserin |4 aut | |
700 | 1 | |a Wu, Peng |e verfasserin |4 aut | |
700 | 1 | |a Sung, Wing-Kin |e verfasserin |4 aut | |
773 | 0 | 8 | |i Enthalten in |t Nucleic acids research |d 1974 |g 49(2021), 6 vom: 06. Apr., Seite e33 |w (DE-627)NLM000063398 |x 1362-4962 |7 nnns |
773 | 1 | 8 | |g volume:49 |g year:2021 |g number:6 |g day:06 |g month:04 |g pages:e33 |
856 | 4 | 0 | |u http://dx.doi.org/10.1093/nar/gkaa1237 |3 Volltext |
912 | |a GBV_USEFLAG_A | ||
912 | |a GBV_NLM | ||
951 | |a AR | ||
952 | |d 49 |j 2021 |e 6 |b 06 |c 04 |h e33 |