2018 Blue Waters Symposium: Tutorials
Blue Waters experts will present tips and tools to help researchers improve their results using the Blue Waters supercomputer.
Machine Learning with Python: Distributed Training and Data Resources on Blue Waters
Aaron Saxton, NCSA
Blue Waters currently supports TensorFlow 1.3, PyTorch 0.3.0 and we hope to support CNTK and Horovod in the near future. This tutorial will go over the minimum ingredients needed to do distributed training on Blue Waters with these packages. What's more, we also maintain an ImageNet data set to help researchers get started training CNN models. I will review the process by which a user can get access to this data set.
Watch the presentation | View the presentation slides (PDF)
File management with the Globus Online Python interface
Kyle Chard, University of Chicago
Globus provides various research data management capabilities to the research community. These capabilities include data transfer and sharing, identity and authorization, data publication, and search. All Globus capabilities are accessible via web interfaces as well as REST APIs. In this tutorial, we will show participants how to use the Globus Python SDK to manage, transfer, and share data.
Watch the presentation | View the presentation slides (PDF)
Profiling your code: roofline analysis with CrayPat
JaeHyuk Kwack, NCSA
The roofline analysis model is a visually intuitive performance model used to understand hardware performance limitations as well as potential benefits of optimizations for science and engineering applications. Intel Advisor has provided a useful roofline analysis feature since its version 2017 update 2, but it is not widely compatible with other compilers and chip-architectures. As an alternative, Blue Waters Science and Engineering Applications Support (SEAS) group has developed CrayPat-based roofline analysis scripts, Generalized Roofline Analysis Gadget (GREG). Using CrayPat and GREG, we will present several examples of how to execute roofline performance analysis of modern HPC applications on Blue Waters.
Watch the presentation | View the presentation slides (PDF)
Parallelizing your work with Python PARSL
Kyle Chard, University of Chicago
Parsl (Parallel Scripting Library) is a Python library for programming and executing data-oriented workflows in parallel. It is designed to be easy to use, allowing developers to construct scalable workflow comprised of Python functions and external applications. Parsl scripts are location independent and therefore the same script can be executed on different clusters, clouds, grids, and other resources. In this tutorial we will introduce participants to Parsl, illustrate how to develop workflows, and demonstrate how these workflows can be executed locally and on Blue Waters from a Jupyter notebook.
Watch the presentation | View the presentation slides (PDF)
How to use Jupyter Notebooks
Roland Haas, NCSA
Jupyter notebooks provide a web-based interface to Python, R, Julia and other languages. They allow code, code output, and documentation to be mixed in a single document making it possible to contain self-documented workflows. Focusing on Python I will show how to use Jupyter notebooks on Blue Waters to explore data, produce plots and analyze simulation output using numpy, matplotlib and time permitting, yt. I will show how to use notebooks on login nodes and on compute nodes as well as, time permitting, how to use parallelism inside of Jupyter notebooks.
Watch the presentation | View the presentation slides (PDF)
Python Best Practices on Blue Waters
Colin Maclean, NCSA
Watch the presentation | View the presentation slides (PDF)
Containers: Shifter and Singularity
Maxim Belkin, NCSA
Container solutions are a great way to seamlessly execute code on a variety of platforms. Not only they are used to abstract away from the software stack of the underlying operating system, they also enable reproducible computational research. In this mini-tutorial, I will review the process of working with Shifter and Singularity on Blue Waters.