Students
Tuition Fee
Not Available
Start Date
Not Available
Medium of studying
Not Available
Duration
Not Available
Details
Program Details
Degree
Courses
Major
Biotechnology
Area of study
Information and Communication Technologies | Natural Science
Course Language
English
About Program

Program Overview


Introduction to NGS Analysis

The NGS Analysis program is designed to provide students with a comprehensive understanding of next-generation sequencing technologies and their applications in various fields of biology.


Topics Covered

  • Next-Generation Sequencing Analysis Resources
  • Pre-Requisites
    • Intro to R
    • Introduction to Linux
      • Linux Exercise
      • Nano Tutorials
  • NGS Sequencing Technology and File Formats
    • How Sequencing Works
    • FastA Format
    • FastQ Format
    • Quality Scores
    • SAM/BAM/CRAM Format
    • BED Format
    • VCF Format
    • GFF3 Format
  • Alignment
    • Trimming with Trimmomatic
  • Visualization
  • Variant Calling
    • Pre-Processing
    • Variant Discovery
  • RNA-seq Analysis
    • Aligning RNA-seq data
    • Introduction to R
    • DESeq
    • DESeq 2
    • Gene Set Enrichment Analysis with ClusterProfiler
    • Over-Representation Analysis with ClusterProfiler
    • Salmon & kallisto: Rapid Transcript Quantification for RNA-Seq Data
    • Instructions to install R Modules on Dalma
  • HPC
    • Resources for editing files on the HPC
      • Atom
      • SSH Mounts
      • Neovim
    • SLURM
    • Modules
    • Gencore Infrastructure
      • Gencore Variant Detection Example
      • Software
        • HPCRunner
        • BioX Workflow
  • ChipSeq analysis
    • CHiP-seq considerations
      • Prerequisites, data summary and availability
    • Deeptools2 bamCoverage
    • Deeptools2 computeMatrix and plotHeatmap using BioSAILs
    • Exercise part4 – Alternative approach in R to plot and visualize the data
  • De novo genome assembly
    • Pre-processing and QC
    • Exercise in de novo assembly
    • Individual Commands
  • Single cell RNA sequencing
    • Prerequisites
    • Seurat part 1 – Loading the data
    • Seurat part 2 – Cell QC
    • Seurat part 3 – Data normalization and PCA
    • Seurat part 4 – Cell clustering
    • Loading your own data in Seurat & Reanalyze a different dataset
  • Metagenomics
    • Quality Control
    • Shotgun Metagenomics
      • Taxonomic Classification
      • Functional Analysis
  • Deep Learning using Keras

DESeq 2

The Dataset

The goal of this experiment is to determine which Arabidopsis thaliana genes respond to nitrate. The dataset is a simple experiment where RNA is extracted from roots of independent plants and then sequenced. Two plants were treated with the control (KCl) and two samples were treated with Nitrate (KNO3).


The Data Files

We will use the result from the previous Aligning RNA-seq section. It's a good idea to start R from within the directory where the files are located.


Running R in Interactive Mode and Modules to Load

To start R, type R after loading the necessary modules.


The Alignment Files

The alignment files are in bam format. We will use Rsamtools to point to the bam files.


The Annotation File

The GTF file is very similar to a GFF file and is used to store the location of genome features. We will load the GTF file using GenomicFeatures and group the exons based on the gene annotation.


The Experimental Design

We will need meta information about our samples to define which sample is the KCL and which is the KNO3. A simple comma-separated file can be created using a text editor.


Counting the Reads

We will use the summarizeOverlaps function to count the number of reads that are matching a gene.


Filtering the Counts

We will remove all genes if neither of the groups (KCL or KNO3) have a median count of 10.


Differentially Expressed Genes

We will use DESeq to determine if any of the genes are significantly differentially expressed.


MA Plot

We will create an MA plot to show the relationship between the average count of a gene and the fold change.


Significant Genes

We will use the same values for our cutoff to determine which genes we want to consider as significantly differentially expressed.


Gene Annotations

We will load the gene descriptions into the workspace and find out what the names of the genes are.


GO-term Enrichment Analysis

We will perform a GO-term enrichment analysis of all differentially expressed genes using a Hypergeometric test.


Creating Heatmaps

We will use an enhanced version of the heatmap function to create a heatmap of the differentially expressed genes.


Clustering

We will perform hierarchical clustering on the normalized matrix and save the gene names for the two clusters in different vectors.


See More