High Performance Methods for Big Data Phylogenomics, Proteomics, and Metagenomics
Tandy Warnow, University of Illinois at Urbana-Champaign
Usage Details
Erin Molloy, Tandy Warnow, Nam-phuong Nguyen, Pranjal Vachaspati, Michael Nute, Ehsan Saleh, Kodi CollinsPhylogenomics (genome-scale phylogeny estimation), proteomics (protein structure and function prediction), and metagenomics (analysis of environmental samples from shotgun sequence datasets) are three computational problems in biology and biomedicine, where large datasets are increasingly common and standard methods either do not run or do not provide sufficient accuracy. Critical improvements in large-scale multiple sequence alignment methods developed by the PI’s group create opportunities for transformative improvements in accuracy and scalability for these problems. This project will develop new methods with the ability to analyze ultra-large datasets with high accuracy, and will also develop parallel implementations of these methods that can take advantage of the special architecture of Blue Waters. The result will be open-source software that can be used by biologists and clinicians, greatly advancing the state of the art in methods, and enabling breakthroughs in biological understanding.