#Bioinformatics treatment of bacterial #RNAseq data

The students of the Sequence a genome class have been advancing in their analysis of bacterial RNA-seq (see design in previous post). Let’s present here the basic analysis, common to all biological questions which can then be studied using these data. For this, the students needed to venture first into Unix (using the Vital-IT cluster), and then into R.

First, quality control using FastQC. Take-home message: super good quality, we keep everything:

Fastqc

FastQC plot of one of our RNA-seq samples. They all look like this.

Second, mapping with BowTie, which is largely sufficient since bacteria don’t have intron (so no complex reconstruction of transcripts). A lot of annoying time and explanations spend on horrid formats and format conversions. But then we get this nice mapping that we can visualize in IGV:

IGV

Visualization of reads mapping to a little portion of the chromosome which we sequenced and annotated in the previous semester, visualized with IGV.

Third, count reads with HTseq; again, since we have no issues with splicing, simple counting works. This is what the counts look like in a rapid PCA; they group by condition, good sanity check. There is signal!

PCA of read counts

PCA of read counts

Finally, the students have investigated differential expression between conditions using EdgeR on the counts. And low and behold there are differences:

Differential expression between pairs of conditions, with genes significant at FDR<0.05 highlighted in red.

Differential expression between pairs of conditions, with genes significant at FDR

Many thanks to the students who provided the figures used here, from their work.

This entry was posted in bioinformatics, rnaseq. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *