Designing scalable algorithms for constructing large phylogenetic trees (almost without alignments) on supercomputers
Bill Gropp, University of Illinois at Urbana-Champaign
Usage Details
Bill Gropp, Erin MolloyThe organization of molecular sequences into evolutionary trees or phylogenies enables scientists to classify environmental sequence data and identify previously unrecognized microbes. The current and leading approaches to phylogenetic inference require the estimation of a multiple sequence alignment. This two-phase approach is not scalable. Our approach bypasses the creation of a multiple sequence alignment on the full set of sequences altogether, enabling the construction large phylogenetic trees (almost without alignments) on supercomputers. The exploratory allocation will be used for the following purposes:
- Test the accuracy of our method on simulated datasets
- Study and optimize communication patterns
- Improve parallel efficiency (e.g., load balancing)
- Analyze very large biological datasets (e.g., ribosomal RNA datasets)