References and online notes will be posted here or on the Notes page.
New postings will be indicated on the Notices page.
A starting point is the book "Computational Biology: Unix/Linux Data Processing and Programming", available electronically through the Northwestern Library.
This course will give an introduction to the theory and practice of analyzing high-throughput RNA sequencing. Through lectures and hands-on exercises, it will cover:
Module 1: Files and software tools
This module will discuss the formats of the various files arising in high throughput sequencing, and the unix commands and other software tools most useful for downloading data and doing the analysis.
Module 2: Aligning and counting reads
This module will cover some of the theory of aligning reads to reference genomes or of doing pseudo-alignments. It will also cover viewing reads, assessing read quality, and the expected probability distribution for read counts.
Module 3: Normalizing and tranforming read counts
This module will cover how to normalize counts for sequencing depth differences, the difference between counts and TPM ("transcripts per million"), log2 transformation, variance reduction and principal components analysis
Module 4: Differential expression analysis
How to use DESeq2 to identify transcripts that are differentially expressed between two conditions, the theory behind the analysis, and how to visualize the results.