Skip to Content

Efficient, Scalable, and Fault Tolerant Genomics Pipelines

Ravishankar Iyer, University of Illinois at Urbana-Champaign

Usage Details

Ravishankar Iyer, Subho Banerjee, Saurabh Jha, Hao Jin, Valerio Formicola

Whole genome sequencing and analysis is increasingly becoming an important part of the standard of care in many hospitals and will continue to be more so in the years to come. In this setting, human genetic variant calling and genotyping needs to be performed reliably at scale serving potentially hundreds of patients in a timely manner. Based on pro ling and a study of algorithms used in over 40 genomic analyses, we built a prototype runtime environment that deals with several performance pathologies and optimizes them away for a wide range of genomic applications. For example, we were able to speed up Variant Calling by 9x on one machine, as well as scale out to multiple machines (as much as 81x on 10 Blue Waters nodes). Based on empirically validated performance models, we can surmise that we have reached a theoretical limit for performance on CPUs. In our current proposal, we will explore further avenues of acceleration by using GPUs available on the Blue Waters system. Additionally, we will emperically test the accuracy of the results of our tools and their resiliency.