Advancing genome-scale phylogenomic analysis
Tandy Warnow, University of Illinois at Urbana-Champaign
Usage Details
Tandy Warnow, Nam-phuong Nguyen, Pranjal Vachaspati, Siavash Mir arabbaygi, Michael Nute, Jed Chou, Kajori Banerjee, Ashu GuptaThis project will develop software for two related computational problems in biology: multiple sequence alignment and phylogenetic (i.e., evolutionary) tree estimation. While many methods have been developed for these problems, most methods have poor accuracy on large datasets with thousands of sequences, or simply cannot run on very large datasets. Even some moderate-sized datasets can be enormously computationally expensive; for example, an estimation of the avian tree of life, based on whole genomes for 50 species, used more than 200 CPU years and a terabyte of shared memory. As a result, a highly accurate construction of the Tree of Life is unlikely because of the failure of current analytical methods to produce highly accurate results on large datasets.This project will improve the accuracy and robustness of leading methods for multiple sequence alignment and phylogenetic tree estimation, and will also develop parallel implementations of these methods that can take advantage of the special architecture of Blue Waters. The result will be open-source software that can be used by biologists to produce highly accurate multiple sequence alignments and genome-scale phylogenies, greatly advancing the state of the art in methods, and enabling breakthroughs in biological understanding.