Tuesday, June 27, 2017



Overview and  Basic Features - Barbara Chapman, Stony Brook University
Synchronization and Tasking - Deepak Eachempati, Cray Research
Locality and Affinity - Helen He, NERSC
Target and SIMD features - Oscar Hernandez, Oak Ridge National Laboratory
Hybrid MPI and OpenMP - Alice Koniges, NERSC
Wrap-Up - Barbara Chapman, Stony Brook University


Introduction to OpenMP:
In this talk, Barbara Chapman from Stony Brook University gives an introduction to parallel programming with OpenMP. She explains how OpenMP programs are developed and executed, and how the application developer may use its features to specify that multiple threads will execute a portion of a program written in Fortran, C or C++. She also describes how threads collaborate to execute loop nests, and introduces some user-level runtime library routines and environment variables. The talk also discusses how OpenMP data may be specified to be shared or private, and the implications of this choice.
Synchronization and Tasks:
In this talk, Deepak Eachempati (Cray Inc.) presents an overview of OpenMP's support for synchronization, tasks, and cancellation. The first section on synchronization covers the following constructs: flush, atomic, critical, ordered, and barrier. OpenMP's lock API is also briefly covered. The second section on tasks covers the task and taskloop constructs, as well as various constructs for supporting task synchronization: barrier, taskwait, and taskgroup. The talk ends with a brief discussion on OpenMP's cancellation support.
Locality and Affinity:
In this talk, Helen He from NERSC/LBNL presents the topic about data locality and thread affinity in OpenMP, both are essential for achieving optimal performance.  The concepts of first touch memory, cache coherence and false sharing in Memory locality, and tips for cache locality are covered in the first section. Next section talks about process and thready affinity. Example compute node architectures and tools for obtaining node info are presented.  Mechanisms to specify OpenMP thread affinity using runtime environment variables, clauses, and runtime APIs are introduced. Process and thread affinity in nested OpenMP affinity and hybrid MPI/OpenMP are discussed at the end. 
Programming for Accelerators and SIMD Execution:
In this talk, Oscar Hernandez from ORNL, gives an overview of the new OpenMP 4.5 target features to program accelerators. It shows some examples of how to manage data transfer to/from the accelerator and the host. Then he explains the levels of parallelism available in OpenMP target regions and how to write performance portable code. In the last section, he gives a very brief introduction to the OpenMP SIMD directive and its important clauses to achieve good performance. 
Hybrid Parallel Programming with MPI and OpenMP:
In this talk, Alice Koniges from NERSC/BNL discusses how MPI and OpenMP may be used together to create applications on compute clusters. Critical to moving OpenMP into the performant HPC space is the combination of MPI (the Message Passing Interface) with OpenMP directives. She describes some of the ways to use this hybrid programming model and the “contract” the application must make with the programming models for correct and optimized implementations.

Scaling and Profiling

Presenter: Stephen Leak, NERSC User Engagement

Abstract: Measuring and understanding parallel scalability

In this session we'll explore factors that limit parallel scalability and profiling tools that can help to identify and characterize scaling bottlenecks in MPI and OpenMP code. We'll use our discoveries to develop theoretical models of strong and weak scaling (Amdahl's, Gustafson's laws) and estimate the parallel efficiency of an example. The session will have a large hands-on component running prepared jobs on one of NERSC's Cray Supercomputers and will end with an Open Lab in which attendees can apply the techniques to their own applications.