Blue Waters User Portal | Science Teams

Designing scalable algorithms for constructing large phylogenetic trees (almost without alignments) on supercomputers

Bill Gropp, University of Illinois at Urbana-Champaign

Usage Details

Bill Gropp, Erin Molloy

The organization of molecular sequences into evolutionary trees or phylogenies enables scientists to classify environmental sequence data and identify previously unrecognized microbes. The current and leading approaches to phylogenetic inference require the estimation of a multiple sequence alignment. This two-phase approach is not scalable. Our approach bypasses the creation of a multiple sequence alignment on the full set of sequences altogether, enabling the construction large phylogenetic trees (almost without alignments) on supercomputers. The exploratory allocation will be used for the following purposes:

Test the accuracy of our method on simulated datasets
Study and optimize communication patterns
Improve parallel efficiency (e.g., load balancing)
Analyze very large biological datasets (e.g., ribosomal RNA datasets)