Wednesday, August 21, 2019
The OpenACC presentation will focus on practical aspects of the OpenACC specification and how to implement a GPU code with directives. Aspects such as data movement, GPU routines with parallel loop and kernels directives, asynchronicity, and more advanced topics will be covered. However, the focus will be on the practical process of moving from a typical CPU-based code toward a refactored code that runs efficiently on GPUs. This will include aspects such as managing CPU threading, exposing threading, efficient test-driven development, portability concerns, and debugging. A small OpenACC code will also be used for hands-on training, and a larger code will also be made available for those with the desire to see OpenACC in a slightly more realistic application.
The CUDA session will introduce the principle abstractions in the CUDA programming model that allow programmers to harness the throughput oriented computing hardware of GPUs to process hundreds of thousands of data parallel work items concurrently with high performance. The session will describe how to use CUDA to manage GPU memory and computing resources, execute work on the GPU, transferring input data between and results the host and the GPU. The session will emphasize the use of profiling and software analysis tools to inform software refactoring and GPU algorithm design decisions.
Presenter:Dmitry Liakh, OLCF
In this course we will develop a reduced and simplified version of the CUDA BLAS library by implementing CUDA kernels for a few frequently used BLAS functions. We will start from a base, unoptimized kernel implementation, and gradually introduce optimizations to improve the efficiency and compare our implementation to the state-of-the-art reference cuBLAS library.