Once the data has been mapped and filtered, we follow the best practices workflow created by The Broad Institute for their GATK software. This includes everything from data pre-processing, variant discovery, and preliminary analyses for cohorts of samples. In order to have enough power for all exome sequencing analysis, we require at least 30 samples for joint variant calling. If an experiment has less than 30 samples, we will use a random mix of samples from the 1000genomes project to increase power during variant calling and variant recalibration.
The GATK Toolkit is the industry standard when it comes to SNP and Indel mutation analyses. It can be applied to all kinds of datasets and genomes. It comes with a host of tools that can be used for data alignment refinement, coverage analysis, diagnosis, and mainly variant calling. GATK variant calling can be run in two modes, the UnifiedGenotyper mode or HaplotypeCaller mode. By default we run both the modes and provide a union of the two results. The variants found are then optimized using GATK re-calibrator tools. Please visit their page for more in-depth information about the software. Both the raw SNP/INDEL calls as well as the recalibrated files are provided.