Skip to Content

Improving the scalability of Mothur for large metagenomic studies

Charles Peck, Earlham College

Usage Details

Charles Peck

This project seeks to document and improve the strong and weak scaling of Mothur, a popular open-source bioinformatics software package that is used for analyzing 16S rRNA gene sequences. Through our experiments, we have discovered that the software has many technical issues, especially those related to distributed and shared memory implementations of the code. These issues cause wall times on the order of weeks to run the code on relatively modest data sets, if those runs complete correctly. There have been many cases where a data set is a week and a half through its run and generates an error, often a memory error, such as running out of RAM. We believe that Blue Waters, with its powerful compute nodes, fast memory access, and it's ability to work well with large data sets, will be useful as a part of the Blue Waters Petascale Student Internship Program.