Slides and Materials
Abstract:Modern multi-core CPUs provide a large number of compute cores per compute nodes in a shared memory setup. Making efficient use shared memory parallelism can be used to both improve resource use on a compute node and to relieve communication pressure caused by many independent MPI ranks sending or receiving data. This section will cover using OpenMP+MP hybrid programming to improve scalability of codes and provide pointer to the do's and dont's of hybrid programming. Level of material: Introductory.
AbstractThe OpenACC presentation will focus on practical aspects of the OpenACC specification and how to implement a GPU code with directives. Aspects such as data movement, GPU routines with parallel loop and kernels directives, asynchronicity, and more advanced topics will be covered. However, the focus will be on the practical process of moving from a typical CPU-based code toward a refactored code that runs efficiently on GPUs. This will include aspects such as managing CPU threading, exposing threading, efficient test-driven development, portability concerns, and debugging. A small OpenACC code will also be used for hands-on training, and a larger code will also be made available for those with the desire to see OpenACC in a slightly more realistic application.