The students of the Sequence a genome class have been advancing in their analysis of bacterial RNA-seq (see design in previous post). Let’s present here the basic analysis, common to all biological questions which can then be studied using these data. For this, the students needed to venture first into Unix (using the Vital-IT cluster), and then into R.
First, quality control using FastQC. Take-home message: super good quality, we keep everything:
Second, mapping with BowTie, which is largely sufficient since bacteria don’t have intron (so no complex reconstruction of transcripts). A lot of annoying time and explanations spend on horrid formats and format conversions. But then we get this nice mapping that we can visualize in IGV:
Third, count reads with HTseq; again, since we have no issues with splicing, simple counting works. This is what the counts look like in a rapid PCA; they group by condition, good sanity check. There is signal!
Finally, the students have investigated differential expression between conditions using EdgeR on the counts. And low and behold there are differences:
Many thanks to the students who provided the figures used here, from their work.