Restarting this blog after a pause due to other duties, extra motivated by the acceptance of our first paper…
This autumn, our students worked hard to make their millions of reads into assembled genomes.
The students have worked on a combination of different strains, quality score and read length thresholds for quality control, assembly software, and k-mer length for the assembly:
See the big dip on the right? That’s quality going down at the end of the reads. Then we trimmed with fastq-mcf, with a quality threshold of 20 or 30, and a minimum read length after trimming of 150, 200 or 250 nucleotides. After trimming, we obtain the following:
After assembly with diverse parameters, we get a large variation of assemblies, whose N50 varies from 19’271 bp to 148’738 bp, and whose total length varies from 1.03 Mb to 6.17 Mb for one strain. We chose the best assemblies based on N50, total length and number of contigs >1kb.
We kept the following assemblies:
|Bacterium||N contigs > 1000||N50||Total length||Assembly parameters|
|705||92||145562||6142164||250nt Spades 79|
|705||93||148738||6149199||150nt Spades 91|
|705||101||108208||6160442||150nt Edena 79|
|705||116||108277||6144546||150nt Velvet 79|
|743||50||313482||7399154||200nt Spades 81|
|743||102||128710||7249125||150nt Edena 75|
|743||100||121247||7233116||150nt Velvet 75|
|757||82||185576||6218735||250nt Spades 87|
|757||90||159912||6144964||200nt Spades 73|
|757||98||118871||6162918||150nt Edena 83|
|757||110||113990||6146891||150nt Velvet 91|
The first line for each bacterial strain is considered the best assembly.