SurVirus : a repeat-aware virus integration caller

© The Author(s) 2021. Published by Oxford University Press on behalf of Nucleic Acids Research..

A significant portion of human cancers are due to viruses integrating into human genomes. Therefore, accurately predicting virus integrations can help uncover the mechanisms that lead to many devastating diseases. Virus integrations can be called by analysing second generation high-throughput sequencing datasets. Unfortunately, existing methods fail to report a significant portion of integrations, while predicting a large number of false positives. We observe that the inaccuracy is caused by incorrect alignment of reads in repetitive regions. False alignments create false positives, while missing alignments create false negatives. This paper proposes SurVirus, an improved virus integration caller that corrects the alignment of reads which are crucial for the discovery of integrations. We use publicly available datasets to show that existing methods predict hundreds of thousands of false positives; SurVirus, on the other hand, is significantly more precise while it also detects many novel integrations previously missed by other tools, most of which are in repetitive regions. We validate a subset of these novel integrations, and find that the majority are correct. Using SurVirus, we find that HPV and HBV integrations are enriched in LINE and Satellite regions which had been overlooked, as well as discover recurrent HBV and HPV breakpoints in human genome-virus fusion transcripts.

Medienart:

E-Artikel

Erscheinungsjahr:

2021

Erschienen:

2021

Enthalten in:

Zur Gesamtaufnahme - volume:49

Enthalten in:

Nucleic acids research - 49(2021), 6 vom: 06. Apr., Seite e33

Sprache:

Englisch

Beteiligte Personen:

Rajaby, Ramesh [VerfasserIn]
Zhou, Yi [VerfasserIn]
Meng, Yifan [VerfasserIn]
Zeng, Xi [VerfasserIn]
Li, Guoliang [VerfasserIn]
Wu, Peng [VerfasserIn]
Sung, Wing-Kin [VerfasserIn]

Links:

Volltext

Themen:

Evaluation Study
Journal Article
Research Support, Non-U.S. Gov't

Anmerkungen:

Date Completed 12.05.2021

Date Revised 12.05.2021

published: Print

Citation Status MEDLINE

doi:

10.1093/nar/gkaa1237

funding:

Förderinstitution / Projekttitel:

PPN (Katalog-ID):

NLM320077071