Big Data on Small Organisms: Petascale Simulations of Data-driven, Whole-cell Microbial Models
This project aims to develop the next-generation of genome-scale, data-driven models for microbial organisms. The project will first focus on the most-studied microbe, the gram-negative bacterium Escherichia coli, due to the availability of high-throughput data, cellular organization, its significance to industry and human health. The project will take advantage of the multi-omics datasets that resulted from advances in parallel high-throughput molecular profiling over the past fifteen years, the emergence of data-driven, integrative, multi-scale models with substantial improvement of their predictive power and new techniques in machine learning, especially those related to deep learning. Accurate prediction of microbial fitness and cellular state can have profound implications to the way we test hypotheses that are directly related to health, social or economic benefits. This award will support the training of multiple undergraduate and graduate students in computational modeling and high-performance simulations of biological systems through undergraduate courses, IGEM teams and other initiatives.
This project will support the generation of knowledge from the largest normalized omics compendia for the most widely used microbe that will be a boon for the development and training of the next generation of data-driven predictive methods in molecular and cellular biology. It will provide the computational resources to evaluate a state-of-the-art multi-scale model with the capacity to predict phenotypic characteristics and environmental conditions from collective omics data. This will be the first systems-level simulator that targets a specific microbe (E. coli) and will be able to simulate populations of cells with a resolution ranging from individual gene concentrations to population dynamics. To achieve that, process migration, load-balancing and strong scaling techniques have to be adopted and applied in this context, which are all novel features for the area of whole-cell modeling. The proposed HPC simulations will be intimately related to hypothesis generation and testing. The simulations will address questions related to what their phenotype and expression profiles of microbial cultures are in complex environments. In the context of systems biology, integration of these techniques has the potential of being transformative, but only if the necessary computational infrastructure able to handle these tasks is available. The Blue Waters supercomputer with its unique architecture, large-scale simulation capabilities and professional support staff provides the ideal platform to achieve this ambitious goal.