ESAM 472: An Introduction to the Analysis of RNA Sequencing Data

Fall 2023
 
Instructor: William L. Kath, Tech Room M460, x1-8784, kath@northwestern.edu

Primary Class times: Tuesdays and Thursdays 2:00-3:20 p.m. in person in Tech M416    

The plan is for the main course material to be presented during the first part of each class. The remainder of each class time will be set aside for hands-on sessions where participants will apply what was covered in the lecture material.  In addition, alternate times for hands-on assistance each week will be scheduled at the beginning of the quarter.

Office hours: TBA.

Textbook: There is no required textbook. Lecture notes and other online references will be posted on Canvas.

Assignments: The grade in the course will be assessed through computational projects. There will be two categories: a primary, required set of projects and a more advanced, optional set of projects.

Prerequisites: While there are no specific computational, mathematical or statistical prerequisites, the assignments will require working with the command line on Quest and adapting/writing computational scripts in order to download and analyze sequencing data. The early parts of the course will included material aimed at making sure everyone is up to speed with the computational tools.

Course Description and Goals:

This course will give an introduction to the theory and practice of analyzing high-throughput RNA sequencing data. "Practice" here means that by the end of the course you will be able to analyze your own RNA sequencing data: to be able to download the data from a repository or sequencing core, perform quality checks, align the reads to a reference genome (or, alternatively, generate pseudocounts), do a differential expression analysis (and other types of analysis), and to be able to investigate possible causes when one or more of the steps do not go as expected. In addition, the course will also cover some of the "theory" of these steps, i.e., we will discuss the mathematical and statistical assumptions made in order to perform the various steps described above. Understanding how the various algorithms work is important for both debugging problems that occur during the analysis steps and for improving new types of analysis that may be necessary when novel data sets are encountered.

Here is a brief list of topics to be covered:

  1. Raw sequencing data formats and where to find them
  2. Working effectively with gigabytes of sequencing data
  3. Aligning reads to a reference genome, vs. pseudocounting
  4. The format of/working with aligned SAM/BAM files
  5. Ways to perform read-based gene quantification
  6. Debugging your workflow when things go wrong
  7. The theory of/how to do differential expression analysis
  8. How to visually explore reads and read counts; variance shrinkage, principal components & beyond
  9. The basics of single-cell sequencing

A more detailed list of topics is available in pdf format: ESAM472syllabus.pdf

Additional topics of interest to the class may be added if time is available at the end of the quarter.

 

aligned reads