DESeq 2 NGS Analysis
Program Overview
Introduction to NGS Analysis
The NGS Analysis program is designed to provide students with a comprehensive understanding of next-generation sequencing technologies and their applications in various fields of biology.
Topics Covered
- Next-Generation Sequencing Analysis Resources
- Pre-Requisites
- Intro to R
- Introduction to Linux
- Linux Exercise
- Nano Tutorials
- NGS Sequencing Technology and File Formats
- How Sequencing Works
- FastA Format
- FastQ Format
- Quality Scores
- SAM/BAM/CRAM Format
- BED Format
- VCF Format
- GFF3 Format
- Alignment
- Trimming with Trimmomatic
- Visualization
- Variant Calling
- Pre-Processing
- Variant Discovery
- RNA-seq Analysis
- Aligning RNA-seq data
- Introduction to R
- DESeq
- DESeq 2
- Gene Set Enrichment Analysis with ClusterProfiler
- Over-Representation Analysis with ClusterProfiler
- Salmon & kallisto: Rapid Transcript Quantification for RNA-Seq Data
- Instructions to install R Modules on Dalma
- HPC
- Resources for editing files on the HPC
- Atom
- SSH Mounts
- Neovim
- SLURM
- Modules
- Gencore Infrastructure
- Gencore Variant Detection Example
- Software
- HPCRunner
- BioX Workflow
- Resources for editing files on the HPC
- ChipSeq analysis
- CHiP-seq considerations
- Prerequisites, data summary and availability
- Deeptools2 bamCoverage
- Deeptools2 computeMatrix and plotHeatmap using BioSAILs
- Exercise part4 – Alternative approach in R to plot and visualize the data
- CHiP-seq considerations
- De novo genome assembly
- Pre-processing and QC
- Exercise in de novo assembly
- Individual Commands
- Single cell RNA sequencing
- Prerequisites
- Seurat part 1 – Loading the data
- Seurat part 2 – Cell QC
- Seurat part 3 – Data normalization and PCA
- Seurat part 4 – Cell clustering
- Loading your own data in Seurat & Reanalyze a different dataset
- Metagenomics
- Quality Control
- Shotgun Metagenomics
- Taxonomic Classification
- Functional Analysis
- Deep Learning using Keras
DESeq 2
The Dataset
The goal of this experiment is to determine which Arabidopsis thaliana genes respond to nitrate. The dataset is a simple experiment where RNA is extracted from roots of independent plants and then sequenced. Two plants were treated with the control (KCl) and two samples were treated with Nitrate (KNO3).
The Data Files
We will use the result from the previous Aligning RNA-seq section. It's a good idea to start R from within the directory where the files are located.
Running R in Interactive Mode and Modules to Load
To start R, type R after loading the necessary modules.
The Alignment Files
The alignment files are in bam format. We will use Rsamtools to point to the bam files.
The Annotation File
The GTF file is very similar to a GFF file and is used to store the location of genome features. We will load the GTF file using GenomicFeatures and group the exons based on the gene annotation.
The Experimental Design
We will need meta information about our samples to define which sample is the KCL and which is the KNO3. A simple comma-separated file can be created using a text editor.
Counting the Reads
We will use the summarizeOverlaps function to count the number of reads that are matching a gene.
Filtering the Counts
We will remove all genes if neither of the groups (KCL or KNO3) have a median count of 10.
Differentially Expressed Genes
We will use DESeq to determine if any of the genes are significantly differentially expressed.
MA Plot
We will create an MA plot to show the relationship between the average count of a gene and the fold change.
Significant Genes
We will use the same values for our cutoff to determine which genes we want to consider as significantly differentially expressed.
Gene Annotations
We will load the gene descriptions into the workspace and find out what the names of the genes are.
GO-term Enrichment Analysis
We will perform a GO-term enrichment analysis of all differentially expressed genes using a Hypergeometric test.
Creating Heatmaps
We will use an enhanced version of the heatmap function to create a heatmap of the differentially expressed genes.
Clustering
We will perform hierarchical clustering on the normalized matrix and save the gene names for the two clusters in different vectors.
