CUDA is a parallel computing platform and programming model from Nvidia.  The Nvidia Kepler K20X accelerators in the XK nodes support CUDA compute capability 3.5.

How to use CUDA

To use CUDA tools on Blue Waters, load the cudatoolkit module:

module load cudatoolkit

CUDA C code may be compiled with nvcc. Loading the cudatoolkit module will add nvcc to your PATH.

Helpful environment variables have been provided by Cray to assist with building CUDA code when using the Cray or PGI programming environment with MPI application Makefile or configure scripts.  Look here if you need to manually set Include or Library paths for your build:


To learn which environment variables are defined by the cudatoolkit module, type the following command

module show cudatoolkit


PGI Programming Environment supports CUDA FORTRAN language. To build CUDA FORTRAN code, load the PGI programming environment:

module swap PrgEnv-cray PrgEnv-pgi
module load cudatoolkit

Building sample code:

ftn -o matmul.x matmul.CUF

Running the test can be accomplished with help of the following PBS script:

#PBS -l nodes=1:ppn=16:xk
#PBS -l walltime=0:05:00
aprun -n1 ./matmul.x > job.out

CUDA with CMake

CMake does not configure CUDA correctly on Blue Waters. To fix this problem, add -DCUDA_HOST_COMPILER=$(which CC) to your CMake options.


Cray libsci_acc BLAS, LAPACK, and ScaLAPACK routines are provided to improve performance by generating and running automatically-tuned accelerator kernels on the XK nodes when appropriate.  To use it with PrgEnv-cray or PrgEnv-gnu, just add the module:

module load craype-accel-nvidia35 # <-- automatically includes libsci_acc

aprun -cc none -n <numranks> ...   # Cray recommends allowing threads to migrate within a node when using libsci_acc

See also: "man intro_libsci_acc" .


Nvidia's Nsight eclipse-based integrated development environment (IDE) is installed and available as:


Caveat: being eclipse-based, the GUI has a lot of widgets and proximity to our LAN will yield best performance.  If your ping time to the system is over 50ms, you may find the tool difficult to use.  In that case, it may be worth installing your own local version of CUDA if you want to use Nsight with your kernel development.

Example code

The following src code is from Nvidia's cudasamples tar bundle and is used to demonstrate techniques for compiling a basic MPI program with CUDA code.  The first example would work with cudatoolkit and PrgEnv-cray or PrgEnv-pgi. 

simpleMPI.h , simpleMPI.cpp ,

nvcc -c  -gencode=arch=compute_35,code=compute_35 -o \


CC -o simpleMPI simpleMPI.cpp simpleMPIcuda.o

The next example demonstrates compiling with PrgEnv-gnu and an earlier gcc version (gcc/4.6.3) that is compatible with nvcc. Done this way, the MPI headers and libraries are linked by the Cray CC wrapper on the nvcc command line. The file is the combined source from and simpleMPI.cpp.

nvcc -gencode=arch=compute_35,code=compute_35 -o gnusimpleMPI \

--compiler-bindir `which CC`



The CUDA drivers will also run OpenCL code.  To use OpenCL, you must load the PrgEnv-gnu programming environment and the cudatoolkit modules.
The following example demonstrates the use of OpenCL to add two vectors:
After loading the cudatoolkit module, the code can be compiled using the standard compiler commands.
cc -o main main.c -lOpenCL
Although the Nvidia drivers currently work with OpenCL code, OpenCL is not officially supported on Blue Waters.

A References


Nvidia CUDA

Nvidia OpenCL Programming Guide

Nvidia OpenCL SDK examples

Mixed gpu and cpu code at ORNL/TITAN

Mixing MPI and CUDA (Brown Univ.)