CUDA is a parallel computing platform and programming model from Nvidia. The Nvidia Kepler K20X accelerators in the XK nodes support CUDA compute capability 3.5.
How to use CUDA
To use CUDA tools on Blue Waters, load the cudatoolkit module:
module load cudatoolkit
CUDA C code may be compiled with nvcc. Loading the cudatoolkit module will add nvcc to your PATH.
Helpful environment variables have been provided by Cray to assist with building CUDA code when using the Cray or PGI programming environment with MPI application Makefile or configure scripts. Look here if you need to manually set Include or Library paths for your build:
To learn which environment variables are defined by the cudatoolkit module, type the following command
module show cudatoolkit
PGI Programming Environment supports CUDA FORTRAN language. To build CUDA FORTRAN code, load the PGI programming environment:
module swap PrgEnv-cray PrgEnv-pgi
module load cudatoolkit
Building sample code:
ftn -o matmul.x matmul.CUF
Running the test can be accomplished with help of the following PBS script:
#PBS -l nodes=1:ppn=16:xk
#PBS -l walltime=0:05:00
aprun -n1 ./matmul.x > job.out
CUDA with CMake
CMake does not configure CUDA correctly on Blue Waters. To fix this problem, add
-DCUDA_HOST_COMPILER=$(which CC) to your CMake options.
Through Cray libsci_acc, BLAS, LAPACK, and ScaLAPACK routines are provided to improve performance by generating and running automatically-tuned accelerator kernels on the XK nodes when appropriate. Use it with PrgEnv-cray or PrgEnv-gnu, by adding the module:
module load craype-accel-nvidia35 # <-- automatically includes libsci_acc
aprun -cc none -n <numranks> ... # Cray recommends allowing threads to migrate within a node when using libsci_acc
See also: "man intro_libsci_acc" .
Nvidia's Nsight eclipse-based integrated development environment (IDE) is installed and available as:
Caveat: being eclipse-based, the GUI has a lot of widgets and proximity to our LAN will yield best performance. If your ping time to the system is over 50ms, you may find the tool difficult to use. In that case, it may be worth installing your own local version of CUDA if you want to use Nsight with your kernel development.
The following src code is from Nvidia's cudasamples tar bundle and is used to demonstrate techniques for compiling a basic MPI program with CUDA code. The first example would work with cudatoolkit and PrgEnv-cray or PrgEnv-pgi.
simpleMPI.h , simpleMPI.cpp , simpleMPI.cu
nvcc -c -gencode=arch=compute_35,code=compute_35 -o \
CC -o simpleMPI simpleMPI.cpp simpleMPIcuda.o
The next example demonstrates compiling with PrgEnv-gnu and an earlier gcc version (gcc/4.6.3) that is compatible with nvcc. Done this way, the MPI headers and libraries are linked by the Cray CC wrapper on the nvcc command line. The t.cu file is the combined source from simpleMPI.cu and simpleMPI.cpp.
-o gnusimpleMPI \
--compiler-bindir `which CC` t.cu
The CUDA drivers will also run OpenCL code. To use OpenCL, you must load the PrgEnv-gnu programming environment and the cudatoolkit modules.
The following example demonstrates the use of OpenCL to add two vectors:
main.c , vector_add_kernel.cl
After loading the cudatoolkit module, the code can be compiled using the standard compiler commands.
cc -o main main.c -lOpenCL
Although the Nvidia drivers currently work with OpenCL code, OpenCL is not officially supported on Blue Waters.
Nvidia OpenCL Programming Guide
Nvidia OpenCL SDK examples
Mixed gpu and cpu code at ORNL/TITAN
Mixing MPI and CUDA (Brown Univ.)