CUDA is a parallel computing platform and programming model from Nvidia. The Nvidia Kepler K20X accelerators in the XK nodes support CUDA compute capability 3.5.
How to use CUDA
To use CUDA tools on Blue Waters, load the cudatoolkit module:
module load cudatoolkit
CUDA C code may be compiled with nvcc. Loading the cudatoolkit module will add nvcc to your PATH.
Helpful environment variables have been provided by Cray to assist with building CUDA code when using the Cray or PGI programming environment with MPI application Makefile or configure scripts. Look here if you need to manually set Include or Library paths for your build:
To learn which environment variables are defined by the cudatoolkit module, type the following command
module show cudatoolkit
PGI Programming Environment supports CUDA FORTRAN language. To build CUDA FORTRAN code, load the PGI programming environment:
module swap PrgEnv-cray PrgEnv-pgi
module load cudatoolkit
Building sample code:
ftn -o matmul.x matmul.CUF
Running the test can be accomplished with help of the following PBS script:
#PBS -l nodes=1:ppn=16:xk
#PBS -l walltime=0:05:00
aprun -n1 ./matmul.x > job.out
CUDA with CMake
CMake does not configure CUDA correctly on Blue Waters. To fix this problem, add
-DCUDA_HOST_COMPILER=$(which CC) to your CMake options.
Cray libsci_acc BLAS, LAPACK, and ScaLAPACK routines are provided to improve performance by generating and running automatically-tuned accelerator kernels on the XK nodes when appropriate. To use it with PrgEnv-cray or PrgEnv-gnu, just add the module:
module load craype-accel-nvidia35 # <-- automatically includes libsci_acc
aprun -cc none -n <numranks> ... # Cray recommends allowing threads to migrate within a node when using libsci_acc
See also: "man intro_libsci_acc" .
Nvidia's Nsight eclipse-based integrated development environment (IDE) is installed and available as:
Caveat: being eclipse-based, the GUI has a lot of widgets and proximity to our LAN will yield best performance. If your ping time to the system is over 50ms, you may find the tool difficult to use. In that case, it may be worth installing your own local version of CUDA if you want to use Nsight with your kernel development.
The following src code is from Nvidia's cudasamples tar bundle and is used to demonstrate techniques for compiling a basic MPI program with CUDA code. The first example would work with cudatoolkit and PrgEnv-cray or PrgEnv-pgi.
nvcc -c -gencode=arch=compute_35,code=compute_35 -o \
CC -o simpleMPI simpleMPI.cpp simpleMPIcuda.o
The next example demonstrates compiling with PrgEnv-gnu and an earlier gcc version (gcc/4.6.3) that is compatible with nvcc. Done this way, the MPI headers and libraries are linked by the Cray CC wrapper on the nvcc command line. The t.cu file is the combined source from simpleMPI.cu and simpleMPI.cpp.
-o gnusimpleMPI \
--compiler-bindir `which CC` t.cu
cc -o main main.c -lOpenCL