Skip to Content

Impacts of Silent Data Corruptions on HPC Application Runtimes

Jon Calhoun, Ohio Supercomputer Center

Usage Details

Steven Gordon, Jon Calhoun

Calhoun researches fault tolerance issues related to high-performance computing systems. In particular, he is interested in silent data corruptions and their impacts on HPC applications and runtimes. To this end, he has created an LLVM fault injection framework that he uses to simulate silent data corruptions. Currently, he is investigating impacts of silent data corruption on the linear solver algebraic multigrid and developing low-cost detection schemes and recovery schemes.



http://web.engr.illinois.edu/~jccalho2