The bioinformatics core supports and maintains pipelines for standard differential expression analysis from RNASeq experiments. In addition, we can assist with additional types of analysis from RNASeq.
The use of biological replicates is essential to estimate the amount of biological variability in gene expression. At least three replicates are needed for yeast clones or identical cell line cultures that might be expected to have little variation, while inbred mouse strains will have more variation and may need four or five biological replicates. Individual patient samples may have very high variation and require six or more samples. A failure to capture enough variation will result in replicates that are too similar and most genes will be differentially expressed. Alternatively, too much variation among replicates will often result in no significant genes.
For high quality RNA sources, either polyA or ribo-zero prepared mRNA may be used. For poorer quality RNA sources where degradation is a possibility, only ribo-zero prepared mRNA should be used.
We maintain a local repository on our analysis servers of the latest genome reference sequences, annotation files, and alignment indexes for human and typical model organisms. We preferentially use the genomes and annotations from even-numbered Ensembl releases.
The bioinformatics core has developed a number of custom scripts and software libraries to facilitate differential expression analysis with a standard pipeline. We run the pipeline as follows:
- Trim adapters with cutadapt
- Align with STAR for single, paired-end, or small RNA sequences
- Collect alignment statistics with Picard and other tools
- Generate count matrices with featureCounts and RSEM
- Summarize alignments statistics and counts with multiQC
- Perform differential expression with DESeq2
- Generate reproducible and interactive R Markdown reports.
For HCI users who wish to run the pipeline themselves using our analysis servers, please check out our user guide.
When working with multiple variables–for example, treatment and gender or treatment in combination with genotype or time–it is possible to generate advanced statistical models with DESeq2 to account for these variables or identify interacting conditions. Contact the Core for assistance if you're working with this.
Transcript and Exon Analysis
For most investigators, gene-based differential expression is sufficient. However, if you suspect differences in alternative transcripts or alternative differential exon usage, then other analysis programs could be used. RSEM estimates expression levels from each transcript and can be used further in DESeq2. The DEXSeq package is an alternative package for identifying alternative exon usage.
An essential component of RNA-Seq analysis is identifying pathways and ontologies of differentially expressed genes. The Bioinformatics Core can provide a reproducible pathway report with gene set enrichment analysis and links to additional software. In addition, users can join the new site wide license to Ingenuity Pathway Analysis and identify and interactively explore signficant pathways, upstream regulators, networks and targets.
We don't generally recommend calling variants from RNASeq data, but it can be done. This is further discussed in the germline analysis section.