🧲RNA-Seq Part 3: Differential Expression Analysis & Best Practices🧲
In RNA-Seq, several hypotheses are tested to understand how gene expression differs between two biological conditions. But challenges like limited replicates, non-normal distributions, and variability in lowly expressed genes can hinder accurate detection. Tools like edgeR and DESeq2 overcome these limitations by using statistical models based on the negative binomial distribution, which helps account for variability and improve the reliability of results.
🐾Steps in Differential Expression Analysis:
🥇Normalization:
Adjusting raw counts to account for library sizes and compositions.
🥈Dispersion Estimation:
Estimating variance for each gene and using data from other genes with similar counts to improve precision.
🥉Model Fitting:
DESeq2 uses a Generalized Linear Model (GLM) to fit the count data, considering confounding variables like treatment, sequencing batch, and patient age.
🎖Hypothesis Testing: We test whether the log fold change of gene expression is significantly different between conditions.
🏃DESeq2 Workflow:
☁️Input data:
Raw read count table, ColData table (experimental design), Design formula (treatment vs. control).
⛈DESeq2 Functions:
Remove lowly expressed genes, Run DESeq() to compute size factors, dispersion, and GLM, Calculate differential expression and adjust for multiple tests.
🌈Results:
The DESeq2 output provides log fold changes, p-values, and adjusted p-values, helping identify significant genes. Diagnostic plots like MA plots, PCA, and RLE plots help assess data quality.
MA Plot: Shows the log fold change vs. average normalized counts. Most genes should not be differentially expressed, indicated by points clustering near zero.
PCA Plot: Helps visualize biological reproducibility and sample grouping.
RLE Plot: Ensures that normalization worked effectively.
🌪Functional Enrichment Analysis: