To Trim or Not to Trim : Effects of Read Trimming on the De Novo Genome Assembly of a Widespread East Asian Passerine, the Rufous-Capped Babbler (Cyanoderma ruficeps Blyth)

Trimming low quality bases from sequencing reads is considered as routine procedure for genome assembly; however, we know little about its pros and cons. Here, we used empirical data to examine how read trimming affects assembled genome quality and computational time for a widespread East Asian passerine, the rufous-capped babbler (Cyanoderma ruficeps Blyth). We found that scaffolds assembled from raw reads were always longer than those from trimmed ones, whereas computational times for the former were sometimes much longer than the latter. Nevertheless, assembly completeness showed little difference among the trimming strategies. One should determine the optimal trimming strategy based on what the assembled genome will be used for. For example, to identify single nucleotide polymorphisms (SNPs) associated with phenotypic evolution, applying PLATANUS to gently trim reads would yield a reference genome with a slightly shorter scaffold length (N50 = 15.64 vs. 16.89 Mb) than the raw reads, but would save 75% of computational time. We also found that chromosomes Z, W, and 4A of the rufous-capped babbler were poorly assembled, likely due to a recently fused, neo-sex chromosome. The rufous-capped babbler genome with long scaffolds and quality gene annotation can provide a good system to study avian ecological adaptation in East Asia.

Medienart:

E-Artikel

Erscheinungsjahr:

2019

Erschienen:

2019

Enthalten in:

Zur Gesamtaufnahme - volume:10

Enthalten in:

Genes - 10(2019), 10 vom: 23. Sept.

Sprache:

Englisch

Beteiligte Personen:

Yang, Shang-Fang [VerfasserIn]
Lu, Chia-Wei [VerfasserIn]
Yao, Cheng-Te [VerfasserIn]
Hung, Chih-Ming [VerfasserIn]

Links:

Volltext

Themen:

Computational time
De novo genome assemble
Genome quality
Journal Article
Reading trimming
Research Support, Non-U.S. Gov't
Rufous-capped babbler

Anmerkungen:

Date Completed 16.03.2020

Date Revised 16.03.2020

published: Electronic

Citation Status MEDLINE

doi:

10.3390/genes10100737

funding:

Förderinstitution / Projekttitel:

PPN (Katalog-ID):

NLM30156440X