EASTR : Identifying and eliminating systematic alignment errors in multi-exon genes
© 2023. The Author(s)..
Accurate alignment of transcribed RNA to reference genomes is a critical step in the analysis of gene expression, which in turn has broad applications in biomedical research and in the basic sciences. We reveal that widely used splice-aware aligners, such as STAR and HISAT2, can introduce erroneous spliced alignments between repeated sequences, leading to the inclusion of falsely spliced transcripts in RNA-seq experiments. In some cases, the 'phantom' introns resulting from these errors make their way into widely-used genome annotation databases. To address this issue, we present EASTR (Emending Alignments of Spliced Transcript Reads), a software tool that detects and removes falsely spliced alignments or transcripts from alignment and annotation files. EASTR improves the accuracy of spliced alignments across diverse species, including human, maize, and Arabidopsis thaliana, by detecting sequence similarity between intron-flanking regions. We demonstrate that applying EASTR before transcript assembly substantially reduces false positive introns, exons, and transcripts, improving the overall accuracy of assembled transcripts. Additionally, we show that EASTR's application to reference annotation databases can detect and correct likely cases of mis-annotated transcripts.
Medienart: |
E-Artikel |
---|
Erscheinungsjahr: |
2023 |
---|---|
Erschienen: |
2023 |
Enthalten in: |
Zur Gesamtaufnahme - volume:14 |
---|---|
Enthalten in: |
Nature communications - 14(2023), 1 vom: 09. Nov., Seite 7223 |
Sprache: |
Englisch |
---|
Beteiligte Personen: |
Shinder, Ida [VerfasserIn] |
---|
Links: |
---|
Themen: |
63231-63-0 |
---|
Anmerkungen: |
Date Completed 10.11.2023 Date Revised 22.11.2023 published: Electronic Citation Status MEDLINE |
---|
doi: |
10.1038/s41467-023-43017-4 |
---|
funding: |
|
---|---|
Förderinstitution / Projekttitel: |
|
PPN (Katalog-ID): |
NLM364323930 |
---|
LEADER | 01000naa a22002652 4500 | ||
---|---|---|---|
001 | NLM364323930 | ||
003 | DE-627 | ||
005 | 20231226095209.0 | ||
007 | cr uuu---uuuuu | ||
008 | 231226s2023 xx |||||o 00| ||eng c | ||
024 | 7 | |a 10.1038/s41467-023-43017-4 |2 doi | |
028 | 5 | 2 | |a pubmed24n1214.xml |
035 | |a (DE-627)NLM364323930 | ||
035 | |a (NLM)37940654 | ||
040 | |a DE-627 |b ger |c DE-627 |e rakwb | ||
041 | |a eng | ||
100 | 1 | |a Shinder, Ida |e verfasserin |4 aut | |
245 | 1 | 0 | |a EASTR |b Identifying and eliminating systematic alignment errors in multi-exon genes |
264 | 1 | |c 2023 | |
336 | |a Text |b txt |2 rdacontent | ||
337 | |a ƒaComputermedien |b c |2 rdamedia | ||
338 | |a ƒa Online-Ressource |b cr |2 rdacarrier | ||
500 | |a Date Completed 10.11.2023 | ||
500 | |a Date Revised 22.11.2023 | ||
500 | |a published: Electronic | ||
500 | |a Citation Status MEDLINE | ||
520 | |a © 2023. The Author(s). | ||
520 | |a Accurate alignment of transcribed RNA to reference genomes is a critical step in the analysis of gene expression, which in turn has broad applications in biomedical research and in the basic sciences. We reveal that widely used splice-aware aligners, such as STAR and HISAT2, can introduce erroneous spliced alignments between repeated sequences, leading to the inclusion of falsely spliced transcripts in RNA-seq experiments. In some cases, the 'phantom' introns resulting from these errors make their way into widely-used genome annotation databases. To address this issue, we present EASTR (Emending Alignments of Spliced Transcript Reads), a software tool that detects and removes falsely spliced alignments or transcripts from alignment and annotation files. EASTR improves the accuracy of spliced alignments across diverse species, including human, maize, and Arabidopsis thaliana, by detecting sequence similarity between intron-flanking regions. We demonstrate that applying EASTR before transcript assembly substantially reduces false positive introns, exons, and transcripts, improving the overall accuracy of assembled transcripts. Additionally, we show that EASTR's application to reference annotation databases can detect and correct likely cases of mis-annotated transcripts | ||
650 | 4 | |a Journal Article | |
650 | 4 | |a Research Support, N.I.H., Extramural | |
650 | 4 | |a Research Support, U.S. Gov't, Non-P.H.S. | |
650 | 7 | |a RNA |2 NLM | |
650 | 7 | |a 63231-63-0 |2 NLM | |
700 | 1 | |a Hu, Richard |e verfasserin |4 aut | |
700 | 1 | |a Ji, Hyun Joo |e verfasserin |4 aut | |
700 | 1 | |a Chao, Kuan-Hao |e verfasserin |4 aut | |
700 | 1 | |a Pertea, Mihaela |e verfasserin |4 aut | |
773 | 0 | 8 | |i Enthalten in |t Nature communications |d 2010 |g 14(2023), 1 vom: 09. Nov., Seite 7223 |w (DE-627)NLM199274525 |x 2041-1723 |7 nnns |
773 | 1 | 8 | |g volume:14 |g year:2023 |g number:1 |g day:09 |g month:11 |g pages:7223 |
856 | 4 | 0 | |u http://dx.doi.org/10.1038/s41467-023-43017-4 |3 Volltext |
912 | |a GBV_USEFLAG_A | ||
912 | |a GBV_NLM | ||
951 | |a AR | ||
952 | |d 14 |j 2023 |e 1 |b 09 |c 11 |h 7223 |