SPLASH: a statistical, reference-free genomic algorithm unifies biological discovery
Summary Today’s genomics workflows typically require alignment to a reference sequence, which limits discovery. We introduce a new unifying paradigm, SPLASH (Statistically Primary aLignment Agnostic Sequence Homing), an approach that directly analyzes raw sequencing data to detect a signature of regulation: sample-specific sequence variation. The approach, which includes a new statistical test, is computationally efficient and can be run at scale. SPLASH unifies detection of myriad forms of sequence variation. We demonstrate that SPLASH identifies complex mutation patterns in SARS-CoV-2 strains, discovers regulated RNA isoforms at the single cell level, documents the vast sequence diversity of adaptive immune receptors, and uncovers biology in non-model organisms undocumented in their reference genomes: geographic and seasonal variation and diatom association in eelgrass, an oceanic plant impacted by climate change, and tissue-specific transcripts in octopus. SPLASH is a new unifying approach to genomic analysis that enables an expansive scope of discovery without metadata or references.One-sentence summary SPLASH is a unifying, statistically driven approach to biological discovery from raw sequencing data, bypassing alignment..
Medienart: |
Preprint |
---|
Erscheinungsjahr: |
2024 |
---|---|
Erschienen: |
2024 |
Enthalten in: |
bioRxiv.org - (2024) vom: 23. Apr. Zur Gesamtaufnahme - year:2024 |
---|
Sprache: |
Englisch |
---|
Beteiligte Personen: |
Chaung, Kaitlin [VerfasserIn] |
---|
Links: |
---|
Themen: |
---|
doi: |
10.1101/2022.06.24.497555 |
---|
funding: |
|
---|---|
Förderinstitution / Projekttitel: |
|
PPN (Katalog-ID): |
XBI036385174 |
---|
LEADER | 01000caa a22002652 4500 | ||
---|---|---|---|
001 | XBI036385174 | ||
003 | DE-627 | ||
005 | 20240424105223.0 | ||
007 | cr uuu---uuuuu | ||
008 | 220629s2024 xx |||||o 00| ||eng c | ||
024 | 7 | |a 10.1101/2022.06.24.497555 |2 doi | |
035 | |a (DE-627)XBI036385174 | ||
035 | |a (biorXiv)10.1101/2022.06.24.497555 | ||
040 | |a DE-627 |b ger |c DE-627 |e rakwb | ||
041 | |a eng | ||
100 | 1 | |a Chaung, Kaitlin |e verfasserin |0 (orcid)0000-0002-0397-1430 |4 aut | |
245 | 1 | 0 | |a SPLASH: a statistical, reference-free genomic algorithm unifies biological discovery |
264 | 1 | |c 2024 | |
336 | |a Text |b txt |2 rdacontent | ||
337 | |a Computermedien |b c |2 rdamedia | ||
338 | |a Online-Ressource |b cr |2 rdacarrier | ||
520 | |a Summary Today’s genomics workflows typically require alignment to a reference sequence, which limits discovery. We introduce a new unifying paradigm, SPLASH (Statistically Primary aLignment Agnostic Sequence Homing), an approach that directly analyzes raw sequencing data to detect a signature of regulation: sample-specific sequence variation. The approach, which includes a new statistical test, is computationally efficient and can be run at scale. SPLASH unifies detection of myriad forms of sequence variation. We demonstrate that SPLASH identifies complex mutation patterns in SARS-CoV-2 strains, discovers regulated RNA isoforms at the single cell level, documents the vast sequence diversity of adaptive immune receptors, and uncovers biology in non-model organisms undocumented in their reference genomes: geographic and seasonal variation and diatom association in eelgrass, an oceanic plant impacted by climate change, and tissue-specific transcripts in octopus. SPLASH is a new unifying approach to genomic analysis that enables an expansive scope of discovery without metadata or references.One-sentence summary SPLASH is a unifying, statistically driven approach to biological discovery from raw sequencing data, bypassing alignment. | ||
650 | 4 | |a Biology |7 (dpeaa)DE-84 | |
650 | 4 | |a 570 |7 (dpeaa)DE-84 | |
700 | 1 | |a Baharav, Tavor Z. |e verfasserin |0 (orcid)0000-0001-8924-0243 |4 aut | |
700 | 1 | |a Henderson, George |e verfasserin |4 aut | |
700 | 1 | |a Zheludev, Ivan N. |e verfasserin |0 (orcid)0000-0002-9572-0574 |4 aut | |
700 | 1 | |a Wang, Peter L. |e verfasserin |4 aut | |
700 | 1 | |a Salzman, Julia |e verfasserin |0 (orcid)0000-0001-7630-3436 |4 aut | |
773 | 0 | 8 | |i Enthalten in |t bioRxiv.org |g (2024) vom: 23. Apr. |
773 | 1 | 8 | |g year:2024 |g day:23 |g month:04 |
856 | 4 | 0 | |u https://doi.org/10.1016/j.cell.2023.10.028 |x 0 |z lizenzpflichtig |3 Volltext |
856 | 4 | 0 | |u http://dx.doi.org/10.1101/2022.06.24.497555 |x 0 |z kostenfrei |3 Volltext |
912 | |a GBV_XBI | ||
951 | |a AR | ||
952 | |j 2024 |b 23 |c 04 |