de novo diploid genome assembly using long noisy reads
The high sequencing error rate has impeded the application of long noisy reads for diploid genome assembly. Most existing assemblers failed to generate high-quality phased assemblies using long noisy reads. Here, we present PECAT, a Phased Error Correction and Assembly Tool, for reconstructing diploid genomes from long noisy reads. We design a haplotype-aware error correction method that can retain heterozygote alleles while correcting sequencing errors. We combine a corrected read SNP caller and a raw read SNP caller to further improve the identification of inconsistent overlaps in the string graph. We use a grouping method to assign reads to different haplotype groups. PECAT efficiently assembles diploid genomes using Nanopore R9, PacBio CLR or Nanopore R10 reads only. PECAT generates more contiguous haplotype-specific contigs compared to other assemblers. Especially, PECAT achieves nearly haplotype-resolved assembly on B. taurus (Bison x Simmental) using Nanopore R9 reads and phase block NG50 with 59.4/58.0 Mb for HG002 using Nanopore R10 reads..
Medienart: |
Preprint |
---|
Erscheinungsjahr: |
2024 |
---|---|
Erschienen: |
2024 |
Enthalten in: |
bioRxiv.org - (2024) vom: 11. März Zur Gesamtaufnahme - year:2024 |
---|
Sprache: |
Englisch |
---|
Beteiligte Personen: |
Nie, Fan [VerfasserIn] |
---|
Links: |
Volltext [kostenfrei] |
---|
Themen: |
---|
doi: |
10.1101/2022.09.25.509436 |
---|
funding: |
|
---|---|
Förderinstitution / Projekttitel: |
|
PPN (Katalog-ID): |
XBI037416286 |
---|
LEADER | 01000caa a22002652 4500 | ||
---|---|---|---|
001 | XBI037416286 | ||
003 | DE-627 | ||
005 | 20240311090707.0 | ||
007 | cr uuu---uuuuu | ||
008 | 220929s2024 xx |||||o 00| ||eng c | ||
024 | 7 | |a 10.1101/2022.09.25.509436 |2 doi | |
035 | |a (DE-627)XBI037416286 | ||
035 | |a (biorXiv)10.1101/2022.09.25.509436 | ||
040 | |a DE-627 |b ger |c DE-627 |e rakwb | ||
041 | |a eng | ||
100 | 1 | |a Nie, Fan |e verfasserin |4 aut | |
245 | 1 | 0 | |a de novo diploid genome assembly using long noisy reads |
264 | 1 | |c 2024 | |
336 | |a Text |b txt |2 rdacontent | ||
337 | |a Computermedien |b c |2 rdamedia | ||
338 | |a Online-Ressource |b cr |2 rdacarrier | ||
520 | |a The high sequencing error rate has impeded the application of long noisy reads for diploid genome assembly. Most existing assemblers failed to generate high-quality phased assemblies using long noisy reads. Here, we present PECAT, a Phased Error Correction and Assembly Tool, for reconstructing diploid genomes from long noisy reads. We design a haplotype-aware error correction method that can retain heterozygote alleles while correcting sequencing errors. We combine a corrected read SNP caller and a raw read SNP caller to further improve the identification of inconsistent overlaps in the string graph. We use a grouping method to assign reads to different haplotype groups. PECAT efficiently assembles diploid genomes using Nanopore R9, PacBio CLR or Nanopore R10 reads only. PECAT generates more contiguous haplotype-specific contigs compared to other assemblers. Especially, PECAT achieves nearly haplotype-resolved assembly on B. taurus (Bison x Simmental) using Nanopore R9 reads and phase block NG50 with 59.4/58.0 Mb for HG002 using Nanopore R10 reads. | ||
650 | 4 | |a Biology |7 (dpeaa)DE-84 | |
650 | 4 | |a 570 |7 (dpeaa)DE-84 | |
700 | 1 | |a Ni, Peng |4 aut | |
700 | 1 | |a Huang, Neng |4 aut | |
700 | 1 | |a Zhang, Jun |4 aut | |
700 | 1 | |a Wang, Zhenyu |4 aut | |
700 | 1 | |a Xiao, Chuan-Le |4 aut | |
700 | 1 | |a Luo, Feng |4 aut | |
700 | 1 | |a Wang, Jianxin |4 aut | |
773 | 0 | 8 | |i Enthalten in |t bioRxiv.org |g (2024) vom: 11. März |
773 | 1 | 8 | |g year:2024 |g day:11 |g month:03 |
856 | 4 | 0 | |u http://dx.doi.org/10.1101/2022.09.25.509436 |z kostenfrei |3 Volltext |
912 | |a GBV_XBI | ||
951 | |a AR | ||
952 | |j 2024 |b 11 |c 03 |