Robust data storage in DNA by de Bruijn graph-based de novo strand assembly
© 2022. The Author(s)..
DNA data storage is a rapidly developing technology with great potential due to its high density, long-term durability, and low maintenance cost. The major technical challenges include various errors, such as strand breaks, rearrangements, and indels that frequently arise during DNA synthesis, amplification, sequencing, and preservation. In this study, a de novo strand assembly algorithm (DBGPS) is developed using de Bruijn graph and greedy path search to meet these challenges. DBGPS shows substantial advantages in handling DNA breaks, rearrangements, and indels. The robustness of DBGPS is demonstrated by accelerated aging, multiple independent data retrievals, deep error-prone PCR, and large-scale simulations. Remarkably, 6.8 MB of data is accurately recovered from a severely corrupted sample that has been treated at 70 °C for 70 days. With DBGPS, we are able to achieve a logical density of 1.30 bits/cycle and a physical density of 295 PB/g.
Medienart: |
E-Artikel |
---|
Erscheinungsjahr: |
2022 |
---|---|
Erschienen: |
2022 |
Enthalten in: |
Zur Gesamtaufnahme - volume:13 |
---|---|
Enthalten in: |
Nature communications - 13(2022), 1 vom: 12. Sept., Seite 5361 |
Sprache: |
Englisch |
---|
Beteiligte Personen: |
Song, Lifu [VerfasserIn] |
---|
Links: |
---|
Themen: |
---|
Anmerkungen: |
Date Completed 14.09.2022 Date Revised 02.11.2022 published: Electronic figshare: 10.6084/m9.figshare.17193170.v2, 10.6084/m9.figshare.17192639.v1, 10.6084/m9.figshare.18515078.v1, 10.6084/m9.figshare.16727122.v2, 10.6084/m9.figshare.17193128.v1, 10.6084/m9.figshare.18515045.v1, 10.6084/m9.figshare.17183081.v1 Citation Status MEDLINE |
---|
doi: |
10.1038/s41467-022-33046-w |
---|
funding: |
|
---|---|
Förderinstitution / Projekttitel: |
|
PPN (Katalog-ID): |
NLM346139376 |
---|
LEADER | 01000naa a22002652 4500 | ||
---|---|---|---|
001 | NLM346139376 | ||
003 | DE-627 | ||
005 | 20231226030716.0 | ||
007 | cr uuu---uuuuu | ||
008 | 231226s2022 xx |||||o 00| ||eng c | ||
024 | 7 | |a 10.1038/s41467-022-33046-w |2 doi | |
028 | 5 | 2 | |a pubmed24n1153.xml |
035 | |a (DE-627)NLM346139376 | ||
035 | |a (NLM)36097016 | ||
040 | |a DE-627 |b ger |c DE-627 |e rakwb | ||
041 | |a eng | ||
100 | 1 | |a Song, Lifu |e verfasserin |4 aut | |
245 | 1 | 0 | |a Robust data storage in DNA by de Bruijn graph-based de novo strand assembly |
264 | 1 | |c 2022 | |
336 | |a Text |b txt |2 rdacontent | ||
337 | |a ƒaComputermedien |b c |2 rdamedia | ||
338 | |a ƒa Online-Ressource |b cr |2 rdacarrier | ||
500 | |a Date Completed 14.09.2022 | ||
500 | |a Date Revised 02.11.2022 | ||
500 | |a published: Electronic | ||
500 | |a figshare: 10.6084/m9.figshare.17193170.v2, 10.6084/m9.figshare.17192639.v1, 10.6084/m9.figshare.18515078.v1, 10.6084/m9.figshare.16727122.v2, 10.6084/m9.figshare.17193128.v1, 10.6084/m9.figshare.18515045.v1, 10.6084/m9.figshare.17183081.v1 | ||
500 | |a Citation Status MEDLINE | ||
520 | |a © 2022. The Author(s). | ||
520 | |a DNA data storage is a rapidly developing technology with great potential due to its high density, long-term durability, and low maintenance cost. The major technical challenges include various errors, such as strand breaks, rearrangements, and indels that frequently arise during DNA synthesis, amplification, sequencing, and preservation. In this study, a de novo strand assembly algorithm (DBGPS) is developed using de Bruijn graph and greedy path search to meet these challenges. DBGPS shows substantial advantages in handling DNA breaks, rearrangements, and indels. The robustness of DBGPS is demonstrated by accelerated aging, multiple independent data retrievals, deep error-prone PCR, and large-scale simulations. Remarkably, 6.8 MB of data is accurately recovered from a severely corrupted sample that has been treated at 70 °C for 70 days. With DBGPS, we are able to achieve a logical density of 1.30 bits/cycle and a physical density of 295 PB/g | ||
650 | 4 | |a Journal Article | |
650 | 4 | |a Research Support, Non-U.S. Gov't | |
650 | 7 | |a DNA |2 NLM | |
650 | 7 | |a 9007-49-2 |2 NLM | |
700 | 1 | |a Geng, Feng |e verfasserin |4 aut | |
700 | 1 | |a Gong, Zi-Yi |e verfasserin |4 aut | |
700 | 1 | |a Chen, Xin |e verfasserin |4 aut | |
700 | 1 | |a Tang, Jijun |e verfasserin |4 aut | |
700 | 1 | |a Gong, Chunye |e verfasserin |4 aut | |
700 | 1 | |a Zhou, Libang |e verfasserin |4 aut | |
700 | 1 | |a Xia, Rui |e verfasserin |4 aut | |
700 | 1 | |a Han, Ming-Zhe |e verfasserin |4 aut | |
700 | 1 | |a Xu, Jing-Yi |e verfasserin |4 aut | |
700 | 1 | |a Li, Bing-Zhi |e verfasserin |4 aut | |
700 | 1 | |a Yuan, Ying-Jin |e verfasserin |4 aut | |
773 | 0 | 8 | |i Enthalten in |t Nature communications |d 2010 |g 13(2022), 1 vom: 12. Sept., Seite 5361 |w (DE-627)NLM199274525 |x 2041-1723 |7 nnns |
773 | 1 | 8 | |g volume:13 |g year:2022 |g number:1 |g day:12 |g month:09 |g pages:5361 |
856 | 4 | 0 | |u http://dx.doi.org/10.1038/s41467-022-33046-w |3 Volltext |
912 | |a GBV_USEFLAG_A | ||
912 | |a GBV_NLM | ||
951 | |a AR | ||
952 | |d 13 |j 2022 |e 1 |b 12 |c 09 |h 5361 |