Efficient, Scalable, and Fault Tolerant Genomics Pipelines
Ravishankar Iyer, University of Illinois at Urbana-Champaign
Usage Details
Ravishankar Iyer, Subho Banerjee, Saurabh Jha, Hao Jin, Valerio FormicolaWhole genome sequencing and analysis is increasingly becoming an important part of the standard of care in many hospitals and will continue to be more so in the years to come. In this setting, human genetic variant calling and genotyping needs to be performed reliably at scale serving potentially hundreds of patients in a timely manner. Based on proling and a study of algorithms used in over 40 genomic analyses, we built a prototype runtime environment that deals with several performance pathologies and optimizes them away for a wide range of genomic applications. For example, we were able to speed up Variant Calling by 9x on one machine, as well as scale out to multiple machines (as much as 81x on 10 Blue Waters nodes). Based on empirically validated performance models, we can surmise that we have reached a theoretical limit for performance on CPUs. In our current proposal, we will explore further avenues of acceleration by using GPUs available on the Blue Waters system. Additionally, we will emperically test the accuracy of the results of our tools and their resiliency.