Sequence Similarity Networks for the Protein "Universe"
John Gerlt, University of Illinois at Urbana-Champaign
Usage Details
Matthew Jacobson, C. Victor Jongeneel, John Gerlt, David Slater, Daniel Davidson, Boris Sadkhin, Ken YokoyamaThe Enzyme Function Initiative (EFI), a Large-Scale Collaborative Project supported by the National Institute of General Medical Sciences (U54GM093342), is devising sequence and structure based strategies to predict the functions of unknown (uncharacterized) enzymes discovered in genome sequencing projects. To accomplish this goal, the EFI is developing bioinformatic tools for dissemination to the scientific community. These include sequence similarity networks that provide large-scale descriptions of sequence-function “space” in protein families and superfamilies (“galaxies” and “clusters of galaxies”) in the complete set of protein sequences (“universe”). Because of the rapidly increasing number of sequences (now >43,000,000), the all-by-all BLAST comparisons of the sequence database necessary to construct the networks requires significant computational resources. This project is expected to allow the frequent (at least quarterly) calculation of sequence similarity networks for dissemination to the scientific community.
http://www.life.illinois.edu/gerltlab/research.html#efi