Usage tips for the Nvidia Kepler K20x GPUs
Sharing the GPU in an XK node
CRAY_CUDA_MPS ( also known as Nvidia Hyper Q, was CRAY_CUDA_PROXY in earlier versions of aprun/alps )
After selecting the XK GPU compute nodes via the xk resource specifier ( #PBS -l ), you may further set the runtime mode with the CRAY_CUDA_MPS environment variable.
The Nvidia GPUs default to dedicated mode where each GPU is mapped to one and only one linux process per compute node (typically 1 MPI rank). This default behavior may be overridden with the CRAY_CUDA_MPS environment variable. When set to 1:
...the Nvidia driver will multiplex cuda kernels from different processes (multiple cooperating MPI ranks) onto the Kepler GPU. The driver presents a virtual GPU (reporting to be Device 0 so there's no need to modify your GPU code) to each process requesting it. In some cases, this may allow for more efficient loading and utilization of the GPU. Keep in mind that the basic limitations of the hardware are still in effect (6 GB global memory ) and that processes will be sharing GPU resources. When running in proxy mode, you're more likely to see errors of the type: CUDA_ERROR_OUT_OF_MEMORY if care is not taken to size the work and timing of the kernels so that they fit onto the GPU. For debugging, set CRAY_CUDA_MPS=0.
The environment variable should be used with APRUN_XFER_LIMITS disabled (not set). If APRUN_XFER_LIMITS is set, you may see false positive CUDA_ERROR_OUT_OF_MEMORY errors.
A known issue: CRAY_CUDA_MPS =1 is incompatible with OpenCL. Do not use CRAY_CUDA_MPS =1 with Open CL.
Performance related runtime variables
Module load craympich2/5.6.4 or later