Robust retrieval of data stored in DNA by de Bruijn graph-based <i>de novo</i> strand assembly
Abstract DNA data storage is a rapidly developing technology with great potential due to its high density, long-term durability, and low maintenance cost. The major technical challenges include various errors, such as the strand breaks, rearrangements, and indels that frequently arise during DNA synthesis, amplification, sequencing, and preservation. Through a de novo assembly strategy, we developed an algorithm based on the de Bruijn graph and greedy path search (DBGPS) to address these issues. DBGPS shows distinct advantages in handling DNA breaks, rearrangements, and indels. The robustness of DBGPS is demonstrated by accelerated aging, multiple independent data retrievals, deep error-prone PCR, and large data scale simulations. Remarkably, 6.8 MB of data can be retrieved accurately from a seriously corrupted sample that has been treated at 70 °C for 70 days. With DBGPS, we were able to achieve a logical density of 1.30 bits/cycle and a physical density of 295 PB/g.One-Sentence Summary A de Bruijn graph-based de novo assembly algorithm for DNA data storage enables fast and robust data readouts even with DNA samples that have been severely corrupted..
Medienart: |
Preprint |
---|
Erscheinungsjahr: |
2022 |
---|---|
Erschienen: |
2022 |
Enthalten in: |
bioRxiv.org - (2022) vom: 02. Apr. Zur Gesamtaufnahme - year:2022 |
---|
Sprache: |
Englisch |
---|
Beteiligte Personen: |
Song, Lifu [VerfasserIn] |
---|
Links: |
Volltext [kostenfrei] |
---|
doi: |
10.1101/2020.12.20.423642 |
---|
funding: |
|
---|---|
Förderinstitution / Projekttitel: |
|
PPN (Katalog-ID): |
XBI019587724 |
---|
LEADER | 01000caa a22002652 4500 | ||
---|---|---|---|
001 | XBI019587724 | ||
003 | DE-627 | ||
005 | 20230429085723.0 | ||
007 | cr uuu---uuuuu | ||
008 | 201229s2022 xx |||||o 00| ||eng c | ||
024 | 7 | |a 10.1101/2020.12.20.423642 |2 doi | |
035 | |a (DE-627)XBI019587724 | ||
035 | |a (biorXiv)10.1101/2020.12.20.423642 | ||
040 | |a DE-627 |b ger |c DE-627 |e rakwb | ||
041 | |a eng | ||
082 | 0 | |a 570 |q DE-84 | |
100 | 1 | |a Song, Lifu |e verfasserin |4 aut | |
245 | 1 | 0 | |a Robust retrieval of data stored in DNA by de Bruijn graph-based <i>de novo</i> strand assembly |
264 | 1 | |c 2022 | |
336 | |a Text |b txt |2 rdacontent | ||
337 | |a Computermedien |b c |2 rdamedia | ||
338 | |a Online-Ressource |b cr |2 rdacarrier | ||
520 | |a Abstract DNA data storage is a rapidly developing technology with great potential due to its high density, long-term durability, and low maintenance cost. The major technical challenges include various errors, such as the strand breaks, rearrangements, and indels that frequently arise during DNA synthesis, amplification, sequencing, and preservation. Through a de novo assembly strategy, we developed an algorithm based on the de Bruijn graph and greedy path search (DBGPS) to address these issues. DBGPS shows distinct advantages in handling DNA breaks, rearrangements, and indels. The robustness of DBGPS is demonstrated by accelerated aging, multiple independent data retrievals, deep error-prone PCR, and large data scale simulations. Remarkably, 6.8 MB of data can be retrieved accurately from a seriously corrupted sample that has been treated at 70 °C for 70 days. With DBGPS, we were able to achieve a logical density of 1.30 bits/cycle and a physical density of 295 PB/g.One-Sentence Summary A de Bruijn graph-based de novo assembly algorithm for DNA data storage enables fast and robust data readouts even with DNA samples that have been severely corrupted. | ||
700 | 1 | |a Geng, Feng |e verfasserin |4 aut | |
700 | 1 | |a Gong, Ziyi |e verfasserin |4 aut | |
700 | 1 | |a Chen, Xin |e verfasserin |4 aut | |
700 | 1 | |a Tang, Jijun |e verfasserin |4 aut | |
700 | 1 | |a Gong, Chunye |e verfasserin |4 aut | |
700 | 1 | |a Zhou, Libang |e verfasserin |4 aut | |
700 | 1 | |a Xia, Rui |e verfasserin |4 aut | |
700 | 1 | |a Han, Mingzhe |e verfasserin |4 aut | |
700 | 1 | |a Xu, Jingyi |e verfasserin |4 aut | |
700 | 1 | |a Li, Bingzhi |e verfasserin |4 aut | |
700 | 1 | |a Yuan, Yingjin |e verfasserin |4 aut | |
773 | 0 | 8 | |i Enthalten in |t bioRxiv.org |g (2022) vom: 02. Apr. |
773 | 1 | 8 | |g year:2022 |g day:02 |g month:04 |
856 | 4 | 0 | |u http://dx.doi.org/10.1101/2020.12.20.423642 |z kostenfrei |3 Volltext |
912 | |a GBV_XBI | ||
912 | |a SSG-OLC-PHA | ||
951 | |a AR | ||
952 | |j 2022 |b 02 |c 04 | ||
953 | |2 045F |a 570 |