Shifter is a software solution that enables execution of *nix applications on HPC systems within isolated Linux environments. Shifter containers, called User-Defined Images, are analogous to Linux containers and provide a convenient way of bundling applications with all of their prerequisites and necessary components of the underlying operating system. Shifter UDIs are designed for quick distribution across compute nodes of supercomputers and can be generated from various types of Linux containers, including Docker images. This guide demonstrates how to work with Shifter UDIs based on Docker images from Docker Hub on Blue Waters.
General Shifter Workflow
The general workflow for using Docker images with Shifter on Blue Waters, is as follows:
In what follows, we will guide you through steps 2 – 4 as the process of building Docker images is application specific and can be skipped if you plan on using existing images from public repositories on Docker Hub.
Shifter on Blue Waters
On Blue Waters, Shifter is available in a form of a module
The the entire Blue Waters system has 64 MOM nodes only. These nodes are shared by all Blue Waters users whose interactive or batch jobs started their execution. Therefore, it is important not to use them for compute- or data-intensive tasks.
On a personal computer, one can work with Docker images
via a full-featured set of commands provided by
Docker. On Blue Waters, where Docker is
not available, one has to use
In this guide, lines in code blocks that start with
Once the interactive job starts, we can load shifter module
and try out the
Let's now go over the three sub-commands provided by
There are two ways to make a generic resource request. We can do so either on the command line:
$ qsub -l gres=shifter16 ...
or in a job script as an additional
#PBS -l gres=shifter16
This request ensures that Torque, the Blue Waters' resource manager, executes a proper prologue script.
There are also two ways to specify which
UDI you wish to use:
|a.|| with a PBS directive (
|b.|| by using
Let's have a closer look at both methods.
Specifying UDI with a PBS directive
We can specify an image to use in a job with a PBS directive by setting an
environment variable called
UDI. We have to set it to the full
name of the image,
imagename:tag, and we can do so either on the
$ qsub -l gres=shifter16 -v UDI=centos:latest -l nodes=2:ppn=32:xe -l walltime=00:30:00
or as a directive in a job script:
#PBS -v UDI=centos:latest
When a job that specifies User-Defined Image using either of the above approaches starts up, Blue Waters' special prologue script mounts named image on the compute nodes and performs some other maintenance tasks. If successful, you should see a message similar to this one:
In Torque Shifter prologue batchID: 224301 Starting munge service on compute nodes Successfully started munge service on compute nodes Initializing udiRoot, please wait. Retrieving Docker Image udiRoot Start successful
This message indicates that Torque prologue has set up the requested
UDI to be used in a job. If the requested image has not been
previously downloaded and packaged, the above command will initiate both of
these steps and, as a result, may time out (currently, the time limit for
pulling an image is 1 hour). Therefore, if you would like to specify
UDI on the command line, it is strongly recommended that you
download that image in advance.
Specifying UDI in a
shifter command call
Another way to specify which
UDI to use in a job is by
shifter command provided by the module and using
--image option. This approach enables multi-step workflows
in which every step is performed in its own
$ qsub -l gres=shifter16 ......
$ # On a MOM node $ module load shifter $ aprun -b ... -- shifter --image=docker:centos:latest -- <app> <args>
A few notes on the above command.
CRAY_ROOTFSenvironment variable has to be unset when we call
shiftercommand on the compute nodes using
- Specifying image type (
--image=docker:centos:latest) is optional and is not required.
- Specifying UDI as a PBS directive (as in
#PBS -v UDI=image:tag) and using
shiftercommand at the same time will produce an error. When we specify
UDIas a PBS directive, Torque prologue sets up the container environment on compute nodes. When we then call
aprunand specify an image with
--imageflag, it attempts to set up Shifter environment on the compute nodes again and fails because it can not overwrite the one that was set up by Torque.
Running applications in Shifter environment
When Blue Waters starts a compute job, it places it on a
node. From there, we can send our applications for execution either in a
Shifter environment. The set of commands to do that
depends on whether we specified a
UDI to use as a PBS directive or
are planning to specify it as an argument to the
In order to execute an application in Shifter environment
UDI was specified as a PBS directive, we have to set
CRAY_ROOTFS environment variable to
SHIFTER. In Bash,
the default shell on Blue Waters, you can do so by executing:
$ export CRAY_ROOTFS=SHIFTER
If all of the code that we plan to run is part of the UDI, we can set
CRAY_ROOTFS environment variable the same way we set
UDI, that is:
$ qsub ... -v CRAY_ROOTFS=SHIFTER ...
Keep in mind, that if you choose to set
CRAY_ROOTFS on the
command line and you need to run some code on compute nodes that is not
contained in the
UDI, you have to unset
$ export -n CRAY_ROOTFS
Now, we are ready to execute our application packaged in
For example, all we have to do in order to print the contents of
/etc/centos-release file that is part of the
$ aprun -n 1 -- cat /etc/centos-releaseCentOS Linux release 7.4.1708 (Core) Application 63788813 resources: utime ~0s, stime ~1s, Rss ~18096, inblocks ~4, outblocks ~0
Note, this file exists in
UDI only and if we unset
CRAY_ROOTFS variable, we
will not be able to access this file:
$ export -n CRAY_ROOTFS $ aprun -n 1 -- cat /etc/centos-releasecat: /etc/centos-release: No such file or directory Application 63788941 exit codes: 1 Application 63788941 resources: utime ~0s, stime ~1s, Rss ~18096, inblocks ~3, outblocks ~0
If we don't specify
UDI as a PBS directive, we have to use
shifter command provided by the Shifter module.
Therefore, the above example translates to:
$ module load shifter $ aprun -n 1 -b -- shifter --image=centos:latest -- cat /etc/centos-release
A clear advantage that
shifter command provides is that once the
above command completes, we can use a different
UDI to execute
another application in a new environment.
Please note the
-b option that we added to the
aprun call above. This is an important flag to remember when
Shifter. It instructs
not to transfer executable file (
shifter) from the
MOM node to the compute nodes. If we forget this flag,
shifter executable to the compute nodes and unset its
setuid bit there. This, in turn, would cause the entire command to
Be extra caferul with the special symbols (such as
*) on the
command line when submitting Shifter jobs. Bash performs
pathname expansion before passing arguments to the
command. So, if you are not careful you might see a No
such file or directory error. Therefore, we recommend that you
use scripts for all your Shifter-related
work on Blue Waters.
Mapping directoties in Shifter
A distinct feature of Shifter images is that they are read-only and the only way to update them is by pulling (downloading) newer versions of corresponding Docker images. This, in turn, means that although input files can be part of
UDIs, results of simulations and analysis produced in Shifter jobs have to be stored on the Blue Waters filesystem. For that purpose, Shifter adds special hooks into
UDIs to make sure that
/mnt/a/u/sciteam/<username>) filesystems are available when Shifter jobs run.
In addition to these automatic hooks, Shifter allows us to manually map existing directories of the Blue Waters filesystems to existing directories within
UDIs. For example, we can map our home directory on Blue Waters system to
/home directory within the image. There are two ways to specify such mappings:
|a.||when Shifter job is submitted to the queue|
|b.||as an argument to the
Let's have a look at both of these methods.
Mapping directoties when submitting a Shifter job
When we submit Shifter jobs to the queue, we have an option to specify mapping between existing directories of the Blue Waters filesystems and those in the user-defined images by amending the UDI assignment in the following way:
$ qsub -l gres=shifter16 -v UDI="centos:latest -v /mnt/a/u/sciteam/<username>:/home" ...
Once the above job starts, specified Blue Waters directory will be mapped onto
/home directory within
centos:latest UDI. This mapping makes all files and folders within the directory on Blue Waters accessible from the
/home directory in
UDI. It also ensures that any changes made to
/home directory in the job are reflected on the actual directory on Blue Waters.
When mapping directories, contents of the directory on Blue Waters replaces the contents of the directory in
UDI for that Shifter job only, that is: no changes to the actual
UDI are made. Make sure to not use directories within UDI that have any imformation required for the job to run.
Mapping directoties from within a Shifter job
The other way to specify volume mapping between Blue Waters filesystems and UDI is by using the
shifter command and its
-V flags directly. For example, to achieve the same mapping as above, we would use the following sequence of commands:
$ qsub -l gres=shifter16 ... $ module load shifter $ aprun -b ... -- shifter --image=centos:latest --volume=/mnt/a/u/sciteam/<username>:/home ... $ # or $ aprun -b ... -- shifter --image=centos:latest -V /mnt/a/u/sciteam/<username>:/home ...
Note, that one can not:
- Overwrite volume mappings specified by Shifter itself
- Map a directory to any of the following directories and their subdirectories within
- Use symbolic links when specifying the directory to be mapped, that is
/u/sciteam/user:/path/in/imagewill fail and the correct syntax is
If we try to map one of the restricted folders, we will receive one of the following error messages:
$ aprun -b -- shifter --image=centos:latest --volume=/mnt/a/u/sciteam/<username>:/etc -- ...Invalid Volume Map: /mnt/a/u/sciteam/<username>:/etc, aborting! 1 Failed to parse volume map options ...
$ aprun -b -- shifter --image=centos:latest --volume=/mnt/a/u/sciteam/<username>:/dev -- ...mount: warning: ufs seems to be mounted read-only. Mount request path /var/udiMount/dev not on an approved device for volume mounts. FAILED to setup user-requested mounts. FAILED to setup image.
Accessing compute nodes running Shifter jobs via SSH
Just like with any other application, you might need to interact with the
application running in a Shifter environment for debugging,
monitoring, or other purposes. To enable such interactions,
Shifter allows users to log in to compute nodes that are part
of its pool via the standard
line tool. There are several requirements, however, in order to make use of
- 1. Specify UDI as a PBS directive.
- To allow users log in to its compute nodes, Shifter
can start up
SSHdaemons. The daemons on the compute nodes can be launched only by the prologue script, which is executed when the job starts. Therefore, in order to be able to login to compute nodes with a Shifter job running on it, it is necessary to specify UDI as a PBS directive.
- 2. Prepare special SSH key pair.
- On startup, the
SSHdaemons enabled by Shifter look for a private SSH key in
$HOME/.shifterand wait for a connection on port
1204authenticated with this key. To prepare such a key pair, execute:
$ mkdir -p ~/.shifter $ ssh-keygen -t rsa -f ~/.shifter/id_rsa -N ''
Once the above two steps are completed, we can log into the compute nodes using:
$ ssh -p 1204 -i ~/.shifter/id_rsa -o StrictHostKeyChecking=no \ -o UserKnownHostsFile=/dev/null -o LogLevel=error nodename
It is advisable to save all the above options into a configuration file. To do that, execute:
$ cat <<EOF > ~/.shifter/config Host * Port 1204 IdentityFile ~/.shifter/id_rsa StrictHostKeyChecking no UserKnownHostsFile /dev/null LogLevel error EOF
Now, we can log in to the compute nodes with a simple:
$ ssh -F ~/.shifter/config nodename
To login to a remote machine using
ssh command we have to specify the remote machine's
network name. To find the name of compute nodes assigned to the Shifter job, execute the
following command on a MOM node before setting the
$ aprun -n $PBS_NUM_NODES -N 1 -b -- hostname
You should see a list of names of the form:
is a five-digit number. Use these to connect to the compute nodes:
$ ssh -F ~/.shifter/config nidXXXXX
Make sure, however, that you do not accidentally copy a network name of a
MOM node where you execute all
ssh will fail with a Permission
denied error if your login shell does not exist in the container or
is not listed in the container's
GPUs in Shifter jobs
If your application benefits from or relies upon CUDA-capable accelerators,
make sure to use NVIDIA Kepler K20X GPUs that are installed on the Blue Water's
XK nodes. Currently, this is only supported when using the
command provided by the module. To control which GPU devices should be
accessible from within the container, Shifter uses an
CUDA_VISIBLE_DEVICES. The value of this
variable is a 0-based, comma-separated list of CUDA-capable device IDs on the host
system (Blue Waters). Because XK nodes have only one NVIDIA GPU each, the only
value we can set this variable to is
0. Note, that on systems with
more than one NVIDIA GPU, device IDs within the container would start with 0
regardless of their IDs in the host system. This enables transparent use of
containers on systems with different number of GPUs per node.
As an example, here is how we can start a 2-node Shifter job that uses GPUs:
$ qsub -l gres=shifter16,nodes=2:ppn=16:xk ... $ # On a MOM node $ module load shifter $ export CUDA_VISIBLE_DEVICES=0 $ aprun -b -- shifter --image=centos:latest -- nvidia-smimount: warning: ufs seems to be mounted read-only. mount: warning: /var/udiMount/opt/shifter/site-resources/gpu/lib64/libcuda.so.1 seems to be mounted read-only. mount: warning: /var/udiMount/opt/shifter/site-resources/gpu/lib64/libcuda.so seems to be mounted read-only. mount: warning: /var/udiMount/opt/shifter/site-resources/gpu/lib64/libnvidia-compiler.so.352.68 seems to be mounted read-only. mount: warning: /var/udiMount/opt/shifter/site-resources/gpu/lib64/libnvidia-compiler.so seems to be mounted read-only. [ GPU SUPPORT ] =WARNING= Could not find library: nvidia-ptxjitcompiler mount: warning: /var/udiMount/opt/shifter/site-resources/gpu/lib64/libnvidia-encode.so.1 seems to be mounted read-only. mount: warning: /var/udiMount/opt/shifter/site-resources/gpu/lib64/libnvidia-encode.so seems to be mounted read-only. mount: warning: /var/udiMount/opt/shifter/site-resources/gpu/lib64/libnvidia-ml.so.1 seems to be mounted read-only. mount: warning: /var/udiMount/opt/shifter/site-resources/gpu/lib64/libnvidia-ml.so seems to be mounted read-only. [ GPU SUPPORT ] =WARNING= Could not find library: nvidia-fatbinaryloader mount: warning: /var/udiMount/opt/shifter/site-resources/gpu/lib64/libnvidia-opencl.so.1 seems to be mounted read-only. mount: warning: /var/udiMount/opt/shifter/site-resources/gpu/lib64/libnvidia-opencl.so seems to be mounted read-only. mount: warning: /var/udiMount/opt/shifter/site-resources/gpu/bin/nvidia-cuda-mps-control seems to be mounted read-only. mount: warning: /var/udiMount/opt/shifter/site-resources/gpu/bin/nvidia-cuda-mps-server seems to be mounted read-only. mount: warning: /var/udiMount/opt/shifter/site-resources/gpu/bin/nvidia-debugdump seems to be mounted read-only. which: no nvidia-persistenced in (/opt/cray/nvidia/default/bin:/usr/local/bin:/usr/bin:/bin:/sbin) [ GPU SUPPORT ] =WARNING= Could not find binary: nvidia-persistenced mount: warning: /var/udiMount/opt/shifter/site-resources/gpu/bin/nvidia-smi seems to be mounted read-only. Thu Jan 4 19:25:45 2018 +------------------------------------------------------+ | NVIDIA-SMI 352.68 Driver Version: 352.68 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla K20X On | 0000:02:00.0 Off | 0 | | N/A 27C P8 17W / 225W | 31MiB / 5759MiB | 0% E. Process | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+ Application 64245641 resources: utime ~0s, stime ~0s, Rss ~7660, inblocks ~57564, outblocks ~43459
LD_LIBRARY_PATH environment variables with paths from the host
operating system that contain CUDA executables and shared libraries,
correspondingly. Therefore, it is important to keep these changes to the
environment variables if you plan to run GPU-enabled applications in
MPI in Shifter
Shifter allows applications to send messages between nodes
using the underlying high-speed interconnect. There are a few requirements,
however, in order for an application in Shifter
UDI to use this feature.
- 1. Use compatible MPI implementation.
- Applications in User-Defined Images have to be compiled against an MPI
implementation that is part of the MPICH
ABI Compatibility Initiative, an effort to maintain ABI (Application
Binary Interface) compatibility between MPICH-derived MPI implementations.
Currently, the list of compatible MPI implementations includes:
- MPICH v3.1
- Intel® MPI Library v5.0
- Cray MPT v7.0.0
- MVAPICH2 2.0
- Parastation MPI 5.1.7-1
- IBM MPI v2.1
- 2. Don't use package manager to install MPI libraries
- Currently, Shifter requires that the MPI implementation
that you link your application against resides in a “user space”.
If you link your application against MPI libraries provided by the package
manager, it will not be able to use the interconnect of the
underlying system and every MPI rank will think that
1. Solution: build MPI implementation from source. Luckily, it is not at all difficult.
- 3. Use modern
- Shifter requires the GNU C library
glibc) version 2.17 or above. This means that you can use containers based on CentOS / Scientific Linux / RedHat 7, Ubuntu 14.04, or newer. However, if you absolutely must use a container based on an older Operating System, you can try updating its glibc, for example, like so. If you experience any difficulties, feel free to contact Blue Waters support at firstname.lastname@example.org"