CHOP : haplotype-aware path indexing in population graphs
The practical use of graph-based reference genomes depends on the ability to align reads to them. Performing substring queries to paths through these graphs lies at the core of this task. The combination of increasing pattern length and encoded variations inevitably leads to a combinatorial explosion of the search space. Instead of heuristic filtering or pruning steps to reduce the complexity, we propose CHOP, a method that constrains the search space by exploiting haplotype information, bounding the search space to the number of haplotypes so that a combinatorial explosion is prevented. We show that CHOP can be applied to large and complex datasets, by applying it on a graph-based representation of the human genome encoding all 80 million variants reported by the 1000 Genomes Project.
Medienart: |
E-Artikel |
---|
Erscheinungsjahr: |
2020 |
---|---|
Erschienen: |
2020 |
Enthalten in: |
Zur Gesamtaufnahme - volume:21 |
---|---|
Enthalten in: |
Genome biology - 21(2020), 1 vom: 11. März, Seite 65 |
Sprache: |
Englisch |
---|
Beteiligte Personen: |
Mokveld, Tom [VerfasserIn] |
---|
Links: |
---|
Themen: |
Graph-based reference genomes |
---|
Anmerkungen: |
Date Completed 19.02.2021 Date Revised 13.11.2023 published: Electronic Citation Status MEDLINE |
---|
doi: |
10.1186/s13059-020-01963-y |
---|
funding: |
|
---|---|
Förderinstitution / Projekttitel: |
|
PPN (Katalog-ID): |
NLM307491188 |
---|
LEADER | 01000naa a22002652 4500 | ||
---|---|---|---|
001 | NLM307491188 | ||
003 | DE-627 | ||
005 | 20231225125657.0 | ||
007 | cr uuu---uuuuu | ||
008 | 231225s2020 xx |||||o 00| ||eng c | ||
024 | 7 | |a 10.1186/s13059-020-01963-y |2 doi | |
028 | 5 | 2 | |a pubmed24n1024.xml |
035 | |a (DE-627)NLM307491188 | ||
035 | |a (NLM)32160922 | ||
040 | |a DE-627 |b ger |c DE-627 |e rakwb | ||
041 | |a eng | ||
100 | 1 | |a Mokveld, Tom |e verfasserin |4 aut | |
245 | 1 | 0 | |a CHOP |b haplotype-aware path indexing in population graphs |
264 | 1 | |c 2020 | |
336 | |a Text |b txt |2 rdacontent | ||
337 | |a ƒaComputermedien |b c |2 rdamedia | ||
338 | |a ƒa Online-Ressource |b cr |2 rdacarrier | ||
500 | |a Date Completed 19.02.2021 | ||
500 | |a Date Revised 13.11.2023 | ||
500 | |a published: Electronic | ||
500 | |a Citation Status MEDLINE | ||
520 | |a The practical use of graph-based reference genomes depends on the ability to align reads to them. Performing substring queries to paths through these graphs lies at the core of this task. The combination of increasing pattern length and encoded variations inevitably leads to a combinatorial explosion of the search space. Instead of heuristic filtering or pruning steps to reduce the complexity, we propose CHOP, a method that constrains the search space by exploiting haplotype information, bounding the search space to the number of haplotypes so that a combinatorial explosion is prevented. We show that CHOP can be applied to large and complex datasets, by applying it on a graph-based representation of the human genome encoding all 80 million variants reported by the 1000 Genomes Project | ||
650 | 4 | |a Journal Article | |
650 | 4 | |a Research Support, Non-U.S. Gov't | |
650 | 4 | |a Graph-based reference genomes | |
650 | 4 | |a Haplotype-aware graph indexes | |
650 | 4 | |a Read alignment | |
700 | 1 | |a Linthorst, Jasper |e verfasserin |4 aut | |
700 | 1 | |a Al-Ars, Zaid |e verfasserin |4 aut | |
700 | 1 | |a Holstege, Henne |e verfasserin |4 aut | |
700 | 1 | |a Reinders, Marcel |e verfasserin |4 aut | |
773 | 0 | 8 | |i Enthalten in |t Genome biology |d 2000 |g 21(2020), 1 vom: 11. März, Seite 65 |w (DE-627)NLM110197372 |x 1474-760X |7 nnns |
773 | 1 | 8 | |g volume:21 |g year:2020 |g number:1 |g day:11 |g month:03 |g pages:65 |
856 | 4 | 0 | |u http://dx.doi.org/10.1186/s13059-020-01963-y |3 Volltext |
912 | |a GBV_USEFLAG_A | ||
912 | |a GBV_NLM | ||
951 | |a AR | ||
952 | |d 21 |j 2020 |e 1 |b 11 |c 03 |h 65 |