EASTR: Correcting systematic alignment errors in multi-exon genes

Abstract Accurate alignment of transcribed RNA to reference genomes is a critical step in the analysis of gene expression, which in turn has broad applications in biomedical research and in the basic sciences. We have discovered that widely used splice-aware aligners, such as STAR and HISAT2, can introduce erroneous spliced alignments between repeated sequences, leading to the inclusion of falsely spliced transcripts in RNA-seq experiments. In some cases, the “phantom” introns resulting from these errors have made their way into widely-used genome annotation databases. To address this issue, we have developed EASTR (Emending Alignments of Spliced Transcript Reads), a novel software tool that can detect and remove falsely spliced alignments or transcripts from alignment and annotation files. EASTR improves the accuracy of spliced alignments across diverse species, including human, maize, andArabidopsis thaliana, by detecting sequence similarity between intron-flanking regions. We demonstrate that applying EASTR before transcript assembly substantially reduces false positive introns, exons, and transcripts, improving the overall accuracy of assembled transcripts. Additionally, we show that EASTR’s application to reference annotation databases can detect and correct likely cases of mis-annotated transcripts..

Medienart:

Preprint

Erscheinungsjahr:

2023

Erschienen:

2023

Enthalten in:

bioRxiv.org - (2023) vom: 15. Mai Zur Gesamtaufnahme - year:2023

Sprache:

Englisch

Beteiligte Personen:

Shinder, Ida [VerfasserIn]
Hu, Richard [VerfasserIn]
Ji, Hyun Joo [VerfasserIn]
Chao, Kuan-Hao [VerfasserIn]
Pertea, Mihaela [VerfasserIn]

Links:

Volltext [kostenfrei]

Themen:

570
Biology

doi:

10.1101/2023.05.10.540179

funding:

Förderinstitution / Projekttitel:

PPN (Katalog-ID):

XBI03956360X