Variant detection of somatic mutations is best performed when both tumor and normal (germline) samples are available. Sequence may be obtained from whole genome, exome capture, or small gene panels, where only the exons of clinically relevant genes are sequenced. Adequate sequencing depth is paramount to detecting somatic variants, especially when tumor heterogeneity or clonality is concerned. A target of >=80x over 95% of target bps is needed to confidently detect somatic variants at a 10% allele frequency. Likewise, for the matched normal, a target coverage >=60x is needed to remove both germline variants and sequencing artifacts. The 80x and 60x refer to unique observations after removing duplicate reads and overlapping paired end base pairs, not mean coverage.
To analyze these datasets, the Cancer Bioinformatics (CBI) Shared Resource has developed more than a dozen workflows for the detection, annotation and prioritization of germline and somatic variants from tumor and tumor-normal DNASeq datasets.
- DnaAlignQC – BWA mem Hg38 alignments, GATK recalibration/realignment, unique observation read coverage estimation and comprehensive data quality control
- SomaticCaller – Optimized Manta/Strelka2 somatic variant calling, background error estimation, and recall frequency annotation
- JointGenotyping – GATK’s joint genotyping, vt normalization, and filtering for germline variants
- Annotator – SnpEff/CLINVAR/DbNSFP/Splice annotators and identification of ACMG incidental germline findings
- CopyAnalysis – Copy number variation analysis with optimized copy ratio GATK 4.0 tools
- RnaAlignQC – RNASeq transcriptome analysis with STAR and Picard QC tools
- BamConcordance – Sample concordance using DNASeq and RNASeq bam alignment files
Tumor Alone (Foundation One) Specific Workflows:
- VCFIntegration – Parsing of Foundation xml variant files, Crossmap calls to Hg38, integration with somatic variants from reprocessing the input Foundation bam files
These workflows utilize Docker containers running the Snakemake workflow manager to enable portability, reproducibility, and ease of use. Although designed to be run individually, two USeq tools (TNRunner and TRunner) have been developed to launch each workflow in CHPC’s protected environment for processing of hundreds of patient sample datasets in parallel.