Analysis

Variant Annotation: Exon Capture/Whole Genome

Variants detected by GATK are then annotated using snpEff. Annotations are incorporated in the VCF file and also presented as a user friendly TAB file, one for each sample as well as a joint table with containing the whole experiment. Annotations include:

COLUMN IDENTIFIERDESCRIPTION
CHROM The chromosome of the variant
POS The genomic location of the variant
REF Reference base
ALT Alternate base
VariantType Type of variant (SNP or INDEL)
ID rsID of dbSNP, if present
dbSNPBuildID The version of dbSNP the variant was first found in
SNPEFF_GENE_NAME Gene name for the highest-impact effect resulting from the current variant
Set (UNI: called by Unified Genotyper; HC: Called by HaploType Caller; Intersection: Called by both callers)
SampleName.GT Genotype
SampleName.AD Allelic Depth
SampleName.GQ Genomic Quality
SNPEFF_EXON_ID Exon ID for the highest-impact effect resulting from the current variant
SNPEFF_FUNCTIONAL_CLASS Functional class of the highest-impact effect resulting from the current variant (NONE, SILENT, MISSENSE, or NONSENSE)
SNPEFF_AMINO_ACID_CHANGE Old/New amino acid for the highest-impact effect resulting from the current variant
dbNSFP_Uniprot_acc Uniprot accession number. Multiple entries separated by ";"/td>
dbNSFP_SIFT_score SIFT score (SIFTori). The smaller the more damaging. Multiple scores separated by ";"
dbNSFP_Polyphen2_HDIV_score Polyphen2 score based on HumDiv, i.e. hdiv_prob. The score ranges from 0 to 1, and the corresponding prediction is "probably damaging" if it is in [0.957,1]; "possibly damaging" if it is in [0.453,0.956]; "benign" if it is in [0,0.452]. Score cutoff for binary classification is 0.5, i.e. the prediction is "neutral" if the score is smaller than 0.5 and "deleterious" if the score is larger than 0.5. Multiple entries separated by ";".
dbNSFP_GERP_RS GERP++ RS score, the larger the score, the more conserved the site.
SNPEFF_IMPACT Impact of the highest-impact effect resulting from the current variant (HIGH, MODERATE, LOW, or MODIFIER)
GAF Global Allele Frequency based on AC/AN (1000 Genomes)
AFR African Allele Frequency (1000 Genomes)
AMR American Allele Frequency (1000 Genomes)
ASN Asian Allele Frequency (1000 Genomes)
EUR European Allele Frequency (1000 Genomes)
AA_AC African American Allele Count in the order of AltAlleles,RefAllele. For INDELs, A1, A2, or An refers to the N-th alternate allele while R refers to the reference allele. (EVS)
EA_AC European American Allele Count in the order of AltAlleles,RefAllele. For INDELs, A1, A2, or An refers to the N-th alternate allele while R refers to the reference allele. (EVS)
GWASCAT Trait related to this chromosomal position, according to GWAS catalog
CLOSEST Closest Splice Site in bps (0 if variant is exonic)
CA Clinical Assiciation (EVS)

snpEff

snpEff is a Genetic variant annotation and effect prediction toolbox. It annotates and predicts the effects of variants on genes along with making the addition of custom annotations possible. The Broad Institute is a user of the snpEff toolbox and through collaboration have made GATK compatible with this highly efficient annotator. Please view their web-page for additional details.