2020
Saurabh Jha, Shengkun Cui, Subho Sankar Banerjee, Tianyin Xu, Jeremy Enos, Michael Showerman, Zbigniew T. Kalbarczyk, and Ravishankar K. Iyer (2020):
Live Forensics for HPC Systems: A Case Study on Distributed Storage Systems, Institute of Electrical & Electronics Engineers Press, SC '20: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp65, Atlanta, Georgia, U.S.A.
2018
Saurabh Jha, Valerio Formicola, Catello Di Martino, Mark Dalton, William T. Kramer, Zbigniew Kalbarczyk, and Ravishankar K. Iyer (2018):
Resiliency of HPC Interconnects: A Case Study of Interconnect Failures and Recovery in Blue Waters, IEEE Transactions on Dependable and Secure Computing, Institute of Electrical and Electronics Engineers, Vol 15, Num 6, pp915-930
2017
Valerio Formicola, Saurabh Jha, Daniel Chen, Fei Deng, Amanda Bonnie, Mike Mason, Jim Brandt, Ann Gentile, Larry Kaplan, Jason Repik, Jeremy Enos, Mike Showerman, Annette Greiner, Zbigniew Kalbarczyk, Ravishankar K. Iyer, William Kramer (2017):
Understanding Fault Scenarios and Impacts through Fault Injection Experiments in Cielo, presented at CUG 2017, Redmond, Washington, U.S.A.
Saurabh Jha, Jim Brandt, Ann Gentile, Zbigniew Kalbarczyk, Greg Bauer, Jeremy Enos, Michael Showerman, Larry Kaplan, Brett Bode, Annette Greiner, Amanda Bonnie, Mike Mason, William Kramer, and Ravishankar Iyer (2017):
Holistic Measurement Driven System Assessment, Institute of Electrical & Electronics Engineers, 2017 IEEE International Conference on Cluster Computing (CLUSTER), pp797-800, Honolulu, Hawai'i, U.S.A.
2016
Phuong Cao, Eric C. Badger, Zbigniew T. Kalbarczyk, and Ravishankar K. Iyer (2016):
A Framework for Generation, Replay, and Analysis of Real-World Attack Variants, Association for Computing Machinery, Proceedings of the Symposium and Bootcamp on the Science of Security (HotSos '16), pp28-37, Pittsburgh, Pennsylvania, U.S.A.
Subho S. Banerjee, Arjun P. Athreya, Liudmila S. Mainzer, C. Victor Jongeneel, Wen-Mei Hwu, Zbigniew T. Kalbarczyk, and Ravishankar K. Iyer (2016):
Efficient and Scalable Workflows for Genomic Analyses, Association for Computing Machinery, Proceedings of the ACM International Workshop on Data-Intensive Distributed Computing (DIDC '16), pp27-36, Kyoto, Japan
2015
Catello Di Martino, William Kramer, Zbigniew Kalbarczyk, and Ravishankar Iyer (2015):
Measuring and Understanding Extreme-Scale Application Resilience: A Field Study of 5,000,000 HPC Application Runs, Institute of Electrical & Electronics Engineers, 2015 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, pp25-36, Rio de Janeiro, Brazil
Catello Di Martino, Saurabh Jha, William Kramer, Zbigniew Kalbarczyk, and Ravishankar K. Iyer (2015):
LogDiver: A Tool for Measuring Resilience of Extreme-Scale Systems and Applications, Association for Computing Machinery, Proceedings of the 5th Workshop on Fault Tolerance for HPC at eXtreme Scale (FTXS '15), pp11-18, Portland, Oregon, U.S.A.
2014
Catello Di Martino, Zbigniew Kalbarczyk, Ravishankar K. Iyer, Fabio Baccanico, Joseph Fullop, and William Kramer (2014):
Lessons Learned from the Analysis of System Failures at Petascale: The Case of Blue Waters, IEEE, 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, pp610-621, Atlanta, Georgia, U.S.A.
2019
Ravishankar Iyer, Saurabh Jha, Shengkun Cui, Tianyin Xu, Jeremy Enos, Mike Showerman, Greg Bauer, Mark Dalton, Zbigniew Kalbarczyk, Bill Kramer (2019):
Kaleidoscope: Live Forensics for Large-Scale Data Center Storage Systems, 2019 Blue Waters Annual Report, pp220-221
2018
2017
2016
26th IEEE Annual Symposium on High-Performance Interconnects (HOTI), Santa Clara, California, U.S.A., Aug 16, 2019
Workshop on Monitoring and Analysis for High Performance Computing Systems Plus Applications (HPCMASPA), held during the IEEE Cluster Conference; Belfast, Northern Ireland, United Kingdom, Sep 10, 2018
Mar 7, 2017
Twenty-six research teams at the University of Illinois at Urbana-Champaign have been allocated computation time on the National Center for Supercomputing Application's (NCSA) sustained-petascale Blue Waters supercomputer after applying in Fall 2016. These allocations range from 25,000 to 600,000 node-hours of compute time over a time span of either six months or one year. The research pursuits of these teams are incredibly diverse, ranging anywhere from physics to political science.
Sources: