CCM: Cluster Compatibility Mode

Introduction

Cluster Compatibility Mode (CCM) is a component of Cray environment to support full Linux compatibility mode. With help of CCM, XE/XK compute node, normally carrying a stripped down operating system, can be turned into a typical node in a standard Linux cluster.

This mode is useful for several scenarios:

  • debugging using the Eclipse interface with GDB
  • batch jobs that need to use a GUI for long periods of time
  • ISV applications that use commercial MPI's like HP-MPI
  • interactive access to a GPU-equipped node

CCM can be used in both interactive and standard PBS batch jobs.

During a CCM job ssh will use port 203 by default. if a connection to an external ssh server, for example cli.globusonline.org, is required you will have to use ssh -p 22 to explictly select the port to use

CCM in interactive job

First start an interactive batch job:

> qsub -I -l gres=ccm -l nodes=4:ppn=16:xk -l walltime=01:00:00

You can add -X for X11 tunneling. You will see output similar to the following:

qsub: waiting for job 1134727.nid11293 to start
qsub: job 1134727.nid11293 ready
In CCM JOB:  1134727.nid11293  JID  1134727  USER  kot  GROUP  bw_staff
Initializing CCM environment, Please Wait
waiting for jid....
CCM Start success, 4 of 4 responses
...
>

The interactive session places the job on to a PBS MOM node (not a compute node). Do not run any computations on the MOM node as it is against the usage policy. The resource usage is monitored, and violations will not be tolerated.

While in the CCM mode, you can find a list of nodes assigned to the job.

> cat $HOME/.crayccm/ccm_nodelist.$PBS_JOBID | sort -u
nid06822
nid06823
nid06904
nid06905

Use ccmrun to start the application on compute nodes. If the purpose of the session is to run an interactive job, we can migrate from the MOM node to the first compute node. Add the ccm module and execute ccmlogin to move the session to a compute node:

> module add ccm
> ccmlogin
nid06822>

You are now on a compute node nid06822 as if it were a regular Linux node. This is the right place to run compute-intense applications. You can add modules as usual, configure etc. The command ccmlogin supports X11 tunneling so if you used -X with qsub then you should be able to launch a GUI from the compute node.

To access other nodes in the node list, one can use ssh command. For example,

nid06822> ssh nid06823
nid06823> module swap PrgEnv-cray PrgEnv-pgi
nid06823> pgaccelinfo
nid06823> ssh nid06904
nid06904>

When you are done, simply exit the compute node and the batch job.

nid06904> exit
Connection to nid06904 closed.
nid06823> exit
Connection to nid06823 closed.
nid06822> exit
Connection to nid06822 closed.
> exit
qsub: job 1134727.nid11293 completed

OpenMPI support in Cluster Compatibility Mode

CCM does not include support for Cray MPICH. However, it supports OpenMPI parallelization interface. The OpenMPI software stack is not included in the programming environment. Users should compile OpenMPI libraries in their home directories. Following are step-by-step instructions ($ denotes command line):
	$ module swap PrgEnv-cray PrgEnv-gnu
$ cd $HOME
$ mkdir openmpi
$ cd openmpi
$ wget http://www.open-mpi.org/software/ompi/v1.8/downloads/openmpi-1.8.4.tar.gz
$ tar zxvf openmpi-1.8.4.tar.gz
$ cd openmpi-1.8.4
$ ./configure --prefix=$HOME/openmpi --enable-orterun-prefix-by-default --enable-mca-no-build=plm-tm,ras-tm
$ make install

After compilation is completed, add

export PATH=$PATH:$HOME/openmpi/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HOME/openmpi/lib

to your ~/.bashrc

Add

module swap PrgEnv-cray PrgEnv-gnu
module add ccm

to your ~/.modules

Use mpicc, mpiCC, and mpif90 to compile your OpenMPI applications.

Sample PBS script to launch an OpenMPI application:

	#!/bin/bash
#PBS -j oe
#PBS -l nodes=3:ppn=1:xe
#PBS -l walltime=00:02:00
#PBS -l gres=ccm
#PBS -l flags=commlocal:commtolerant

source /opt/modules/default/init/bash
module list

TPN=16
NNODES=3
HOSTLIST=znodelist
LAUNCH=zstart.sh
cd $PBS_O_WORKDIR
cat $HOME/.crayccm/ccm_nodelist.$PBS_JOBID | sort -u | awk -v n=$TPN '{for(i=0;i<n;i++) print $0}' > $HOSTLIST

let NTASKS=$NNODES*$TPN

echo "#!/bin/bash
cd $PBS_O_WORKDIR
$HOME/openmpi/bin/mpirun -v -np $NTASKS --mca btl tcp,self --mca btl_tcp_if_include ipogif0 --hostfile $HOSTLIST -npernode $TPN ./a.out > job.out" > $LAUNCH
chmod 755 $LAUNCH

ccmrun $PBS_O_WORKDIR/$LAUNCH

For more information