Petascale Application Improvement Discovery
Blue Waters is about more than just hardware. The Blue Waters project strives to create an ecosystem of resources, services, and activities that propel computational science and engineering toward the next decade of computing technology. One aspect of this is the Petascale Application Improvement Discovery (PAID) program.
PAID targets the introduction of new, fundamental application approaches in addition to optimization of current applications for Blue Waters, to increase the knowledge of and use of "best practices" for highly scalable computing and data analysis. PAID is based on input from and observations of dozens of Blue Waters science and engineering teams, the adjustments and improvements teams have been making to their applications, and the projected use and challenges over the next decade as derived from detailed interactions with many of the team leaders.
PAID provides funds for Improvement Method Enablers (IMEs) to work with science and engineering teams to help create and implement application improvements technologies.
PLEASE NOTE: If you do not receive a response from a Blue Waters PAID IME within one workday, please email: email@example.com.
Submit a Proposal to work with an Improvement Method Enabler
Improvement Method Enablers
Best Practice Identification, Dissemination, and Implementation
Team Leaders: William Tang, Princeton University
Based on the significant experiences and lessons learned in developing the modern GTC-Princeton (GTC-P) code, we plan to provide information of interest and usefulness to the Blue Waters science applications community by accumulating, creating, and applying "best practices" for efforts needed to develop portable and efficient science codes across diverse architectures. The GTC-P code is a highly scalable particle-in-cell code used for studying micro-turbulence transport in tokamaks. It is a representative, discovery-science-capable 3D particle-in-cell code that has been successfully ported and optimized on a wide range of multi-petaflops platforms worldwide at the full or near-to-full capability of leading HPC systems, including NSF's "Blue Waters" and "Stampede." Associated benefits for portability come from the fact that GTC-P is not critically dependent on any third-party libraries. Various strategies employed to optimize performance, maximize parallelism, and utilize accelerator technology include multiple levels of decomposition for increasing scalability, choice of data layout for maximizing data reuse, leveraging GPU and Xeon Phi accelerators, and using hybrid programming models (MPI+OpenMP+optionally CUDA). In particular, we will exploit the trade-off of portability vs. speedup in making effective use of GPU between using CUDA and compiler directives such as OpenACC and OpenMP4.0
Effective Use of Accelerators/Highly Parallel Heterogeneous Units
Team Leader: Wen-mei Hwu, University of Illinois at Urbana-Champaign
An increasing portion of the top supercomputers in the world, including Blue Waters, have heterogeneous CPU-GPU computational units. While a handful of the science teams can already use GPUs in their production applications, there is still significant room for growing use. This program for enabling the science teams to make effective use of GPUs consists of two major components. The first is to make full use of vendor and community compiler technology, now defined by the OpenACC, OpenCL and C++AMP standards, introduce accelerator-based library capabilities for the science teams' applications, and provide support and enhancement for GPU-enabled performance and analysis tools. This will significantly reduce the programming effort and enhance code maintainability associated with the use of GPUs.
The second aspect is to provide expert support to the science teams through hands-on workshops and individualized collaboration programs. The goal of these efforts is not to develop new compiler technology but rather to help science teams take advantage of the most promising, mature, or experimental compiler-associated capabilities from Cray, NVIDIA, MultcoreWare, the University of Illinois, and other institutions, such as the Barcelona Supercomputing Center.
Furthermore, the activity will provide detailed feedback to OpenACC compiler providers in order to allow them to enable the compilers to produce more efficient code. In addition to the OpenACC-compliant, OpenMP-like compiler from Cray, the PGI FORTRAN compiler, and the Thrust C++ template library from NVIDIA, this project will provide and improve the C++AMP Compiler and the MxPA OpenCL Compiler to reduce the barrier to using the GPUs in the Cray system.
View presentation PDF
Team Lead: Ian Foster, Argonne National Laboratory
Globus is software-as-a-service for research data management, used at dozens of institutions and national facilities for moving and sharing big data. Globus provides easy-to-use services and tools for research data management, enabling researchers to access advanced capabilities using just a Web browser. Globus transfer and sharing has been deployed on Blue Waters and provides access to the filesystem, including HPSS. Globus has already added improvements upon file recall from tape, scaled transfer concurrency, endpoint load balancing and availability for Blue Waters, and further enhancements to meet the unique needs for such a large-scale system are planned over the next two years. Recent additions to Globus are services for data publication and discovery that enable: publication of large research data sets with appropriate policies for all types of institutions and researchers; the ability to publish data using your own storage or cloud storage that you manage, without third-party publishers; extensible metadata that describe the specific attributes for your field of research; publication and curation workflows that can be easily tailored to meet institutional requirements; public and restricted collections that give you complete control over who may access your published data; a rich discovery model that allows others to search and use your published data.
Model-based Code refactoring and auto tuning
Team Leaders: Mary Hall, University of Utah
The Department of Energy's SciDAC-3 Institute for Sustained Performance, Energy and Resilience (SUPER), aims to ensure that computational scientists can successfully exploit the current and emerging generation of high-performance computing systems by providing application scientists with strategies and tools to productively maximize performance and working with application teams to use and implement the tools. Dr. Mary Hall leads the SUPER research effort that focuses on compiler-based approaches to obtaining high performance on state-of-the-art architectures, including multi-cores, GPUs, and petascale platforms. This group is developing autotuning compiler technology to systematically map application code to these diverse architectures and make efficient use of heterogeneous resources in both today's and future extreme-scale systems. SUPER will assist Blue Waters teams in tuning and refactoring their applications.
Parallel I/O Performance
Team Lead: William Gropp, University of Illinois at Urbana-Champaign
Is I/O limiting your ability to do your science? Recent examination of I/O performance at several supercomputing centers showed that many applications were I/O limited; some were unaware that they could do much better. This project will be extending some existing tools to help identify sources of I/O performance problems and providing new tools to help improve I/O performance with minimum impact on applications. This project is looking for teams interested in (a) evaluating I/O performance and choice of I/O methods, (b)providing feedback on preferred I/O workflow (how you wish that it would work, not just how you've chosen to handle I/O in the pursuit of adequate performance), and (c) exploring the use of alternative I/O approaches including automatic performance tuning methods.
Scalability and Load Balancing
Team Lead: Sanjay Kale, University of Illinois at Urbana-Champaign
Team Member(s): Nikhil Jain
Load balancing can improve the performance of many parallel applications. Irregularity in a problem causes different processors to finish their workloads at different times, leading to idle time waiting for laggards to complete. Load balancing partitions problems in an intelligent way, ideally assigning an equivalent amount of work to every processor. For complicated problems, the load can vary dynamically as a program progresses, for example if cells migrate or a wave propagates, changing location in the problem domain. For these cases, balancing load requires introspectively monitoring the program as it runs to determine how to optimally move and balance work. Additionally, work must be balanced across processing elements of varying performance and characteristics, such as between CPUs and GPUs.
We plan on creating a generic load balancing library based on the load balancers of Charm++. This library will provide load balancing decisions to applications, given information on object layout, current load, and communication pattern. We will analyze how to balance load across heterogeneous systems and develop new strategies to accomplish heterogeneous load balancing. Also, we are open for consultation with application teams regarding load balancing, and will provide advice, suggestions, and guidance on what and how to balance.
Science Team Support for HDF5 on Blue Waters
Team Lead: Gerd Heber The HDF Group
The HDF Group has a long and mutually beneficial history with NCSA and the University of Illinois helping to improve the I/O performance of large-scale applications. In order to optimize the use of the HDF software on the Blue Waters system, the Blue Waters project selected the HDF Group to:
- Make assessments and improvements for serial and parallel versions of HDF5.
- Provide expedited resolution of high priority HDF5 defects and performance requirements.
- Provide active engagement by dedicated HDF5 experts with science teams as they enhance their applications to improve I/O performance.
- Perform an assessment of an application's needs, consisting of an audit of an application's current I/O use, planning and assessments of potential improvements, and recommendations for further improvements.
- Support development efforts based on specific applications needs of the science teams on improving the performance of HDF software to meet their application's needs, ranging from providing simple but extremely useful advice, such as better organization of HDF5 files to improve I/O, to extensive projects involving teams of developers.
- Research and implement trace-based I/O autotuning support with science teams, as appropriate for their needs.
Further information about the project, including information about using HDF5 on Blue Waters, science team support, and science team highlights using HDF5, can be found at: https://ncsa-bw.atlassian.net/wiki/display/HDFBW
Please sign into the portal in order to particiapte in the HDF survery or you will not be able to see the survey URL. All Blue Waters Science and Engineering teams are eligible to participate.