Seriola dorsalis v1

Genome Assembly

Program, Pipeline Name or Method Name

Masurca

Program, Pipeline or Method version

2.3.2

Source Name

Seriola dorsalis assembly version 1 in fasta format

Time Executed

Sept. 9, 2016, midnight

Category

Other

Description

Paired-end and mate-pair reads in FASTQ format were used for the assembly. MaSuRCA assembler (version 2.3.2) (Zimin et al. 2013) was used to assemble the raw data into scaffolds. To obtain a more reasonable assembly for visualization in Jbrowse, scaffolds were filtered for the following parameters: scaffolds less than 800 bases or where 90% of its length was contained in a larger scaffold were removed, and (Sedor_35K.fasta) must contain a gene or have a size larger than 10,000 bases (bioprojectID PRJNA319656) resulting in the 4,717 scaffolds. These scaffolds were then scrutinized for contamination. NCBI Reference Sequence: NC_001422.1 was blast queried against the genome assembly to identify PhiX contamination. One scaffold (scaffold_26907) was identified and removed. Blobtools (Kumar et al. 2013) was used to identified another 277 scaffolds (contamination277.txt) that appear to be contamination from the phytoplankton Emiliania huxleyi. The final assembly size is 4,439 scaffolds. The quality of the final assembly was assessed using BUSCO{Simao:vp}. The genome contains 2,848/3,023 BUSCO groups.

File

sedor_v1.fasta.gz