Blue Waters User Portal | Science Teams

Developing TIPP2: High accuracy and scalable metagenomic sequence analysis

Tandy Warnow, University of Illinois at Urbana-Champaign

Usage Details

Erin Molloy, Tandy Warnow, Nidhi Shah, Elizabeth Koning

TIPP s a statistical method for taxonomic identication of sequencing reads obtained from environmental samples, generated in shotgun sequencing experiments. TIPP uses alignments of marker genes, and combines novel machine learning models (specifically, ensembles of pro le Hidden Markov Models) with phylogenetic inference and multiple sequence alignment tools to taxonomically characterize sequencing reads. Although TIPP has been shown in independent studies by different labs to be among the most accurate methods for classication and exploration of metagenomic datasets, it is slow, due to its algorithmic design. Furthermore, TIPP is based on a pre-calculated collection of multiple sequence alignments and re ned taxonomies (one for each marker gene), and these have not been updated since TIPP's publication in 2014.

We expect accuracy will improve through rebuilding these alignments and refined taxonomies using the much larger number of sequences available for the TIPP marker genes, and we also predict that accuracy will improve by expanding the set of marker genes. These improvements will create additional computational challenges, however, thus requiring that TIPP be re-designed to improve the running time. The outcome of the project will be TIPP2, a new improved version of TIPP, with improved scalability and accuracy, and that is more easily used by biologists.