Blue Waters User Portal |

▸ Registration
includes list of Host Sites

▸ Resources
Instructions, Slides, Links

▸ Additional Learning Resources

▸ FAQ

▸ Institute Organizers

Presenter: John E. Stone, UIUC
CUDA Slides

Introduction to CUDA programming model, key abstractions and terminology
CUDA thread model, differences w/ other programming systems
CUDA resource management intro (malloc/free/memcpy etc)
Mapping parallelism to grids/blocks/warps/threads, indexing work by thread IDs
Anatomy of basic CUDA kernels, comparison with serial code, loop nests, and so on, work through simple
examples

Presenter: John E. Stone, UIUC

Execution of grids/blocks/warps/threads, divergence, etc.
Memory-bandwidth-bound kernels vs. arithmetic bound kernels, concepts and strategies
Memory systems, performance traits and requirements, optimizations
- Global memory, coalescing, SOA vs. AOS, broadcasts of reads to multiple threads, use of vector
- intrinsic types for higher bandwidth
- Shared memory, bank conflicts, use for AOS to SOA conversion
- Collective operations and synchronization basics, use of shared memory
- Other memory systems: constant cache, 1D/2D/3D textures, host-mapped memory over
- PCIe/NVLink, Peer-to-Peer memory accesses and the like
- Atomic operations
Quick overview of GPU occupancy, register usage, launch configurations, and other kernel tuning concepts
Exciting new features in CUDA 10.x and beyond

Presenter: Woo-Sun Yang, NERSC Slides

Presenter: Dmitry Liakh, OLCF In this course we will develop a reduced and simplified version of the CUDA BLAS library by implementing CUDA kernels for a few frequently used BLAS functions. We will start from a base, unoptimized kernel implementation, and gradually introduce optimizations to improve the efficiency and compare our implementation to the state-of-the-art reference cuBLAS library.

Blue Waters User Portal

Petascale Computing Institute

▸ Agenda
Mon Tue Wed Thu Fri

▸ Registration
includes list of Host Sites

▸ Presenters

▸ Resources
Instructions, Slides, Links

▸ Additional Learning Resources

▸ Registrant Locations

▸ FAQ

▸ Call For Host Sites

▸ A/V Plan for Host Sites

▸ Institute Organizers

Wednesday

CUDA. Part 1/3

CUDA. Part 2/3 (CUDA and OpenACC)

Resources at NERSC

CUDA. Part 3/3 (Hands-on session)

Abstract

Petascale Computing Institute

▸ AgendaMon Tue Wed Thu Fri

▸ Registrationincludes list of Host Sites

▸ Presenters

▸ ResourcesInstructions, Slides, Links

▸ Additional Learning Resources

▸ Registrant Locations

▸ FAQ

▸ Call For Host Sites

▸ A/V Plan for Host Sites

▸ Institute Organizers

Wednesday

CUDA. Part 1/3

CUDA. Part 2/3 (CUDA and OpenACC)

Resources at NERSC

CUDA. Part 3/3 (Hands-on session)

Abstract

▸ Agenda
Mon Tue Wed Thu Fri

▸ Registration
includes list of Host Sites

▸ Resources
Instructions, Slides, Links