Searching for missing heritability in Alzheimer’s disease by identification of rare genomic variants
The purpose of this Blue Waters allocation is to analyze a very large genomic sequencing dataset of 10,000 whole human exomes, in order to determine the optimal parameters for genomic variant detection. Next generation sequencing has been fruitful in identifying disease variants in recent years. The gold standard for the associated computational analysis is the Broad GATK best practice guidelines, which was established for small and medium sized projects. Recently, we have discovered that these recommendations are not optimal for large exome sequencing projects with thousands-to-hundreds of thousands of samples, required for complex diseases driven by rare variants, such as Alzheimer's disease and autism. The default practice misses a substantial number of good quality variants. This has a serious negative impact on the downstream analyses searching for the genomic underpinnings of disease, predicting treatment options and drug response. A new set of standards is badly needed in order to do analyses correctly at the modern scale, however such effort requires a petascale resource like Blue Waters.