Search for Missing Variants in Large Exome Sequencing Projects by Optimization of Analytic Pipelines, in application to Alzheimer’s disease
Yan Asmann, Mayo Clinic
The purpose of this Blue Waters allocation is to analyze a very large genomic sequencing dataset of 10,000 whole human exomes, in order to determine the optimal parameters for genomic variant detection. Next generation sequencing has been fruitful in identifying disease variants in recent years. The gold standard for the associated computational analysis is the Broad GATK best practice guidelines, which was established for small and medium sized projects. Recently, we have discovered that these recommendations are not optimal for large exome sequencing projects with thousands-to-hundreds of thousands of samples, required for complex diseases driven by rare variants, such as Alzheimer's disease and autism. The default practice misses a substantial number of good quality variants. This has a serious negative impact on the downstream analyses searching for the genomic underpinnings of disease, predicting treatment options and drug response. A new set of standards is badly needed in order to do analyses correctly at the modern scale, however such effort requires a petascale resource like Blue Waters.