Skip to Content

High Performance Methods for Big Data Phylogenomics, Proteomics, and Metagenomics

Tandy Warnow, University of Illinois at Urbana-Champaign

Usage Details

Erin Molloy, Tandy Warnow, Nam-phuong Nguyen, Pranjal Vachaspati, Michael Nute, Ehsan Saleh, Kodi Collins

Phylogenomics (genome-scale phylogeny estimation), proteomics (protein structure and function prediction), and metagenomics (analysis of environmental samples from shotgun sequence datasets) are three computational problems in biology and biomedicine, where large datasets are increasingly common and standard methods either do not run or do not provide sufficient accuracy. Critical improvements in large-scale multiple sequence alignment methods developed by the PI’s group create opportunities for transformative improvements in accuracy and scalability for these problems. This project will develop new methods with the ability to analyze ultra-large datasets with high accuracy, and will also develop parallel implementations of these methods that can take advantage of the special architecture of Blue Waters. The result will be open-source software that can be used by biologists and clinicians, greatly advancing the state of the art in methods, and enabling breakthroughs in biological understanding.