Shifter is a software solution that provides a way for HPC users to:
Besides working with public and private images from Docker Hub,
Shifter 18 also supports public and private local images in SquashFS format.
For the impatientHere is an example how to pull an image from Dockerhub and run a simple command in it:
which starts an interactive session
Shifter Jobs & Generic Resource Request
On Blue Waters, Shifter is provided as a module called shifter.
Unlike other modules, it can only be loaded on a MOM node in a Shifter job,
which is a job that requests
or in a job batch script in the form of a
Once the job submitted with
and start using the tools it provides, namely:
Use of Shifter mounted images is intended only on compute nodes, not the MOM nodes of a job. Please see the Executing Applications in Container Environments section for more information on executing applications with Shifter.
Let's now review the steps associated with using Docker and local (SquashFS) images with Shifter.
Working with Images from Docker Hub
To use a Docker image on Blue Waters, we need to:
Working with Local Images
Now let's review the steps associated with using local images in SquashFS format.
Private local images
Shifter provides a mechanism to import a SquashFS file as a private image.
All you have to do is add
The user who imports the image is automatically added to the list of people authorized to access the image.
Similarly, we can limit access to the image to specific groups using
Primary image owners for private images
Shifter installed on Blue Waters has an experimental feature called "Primary image owners". Primary image owners are users
who are allowed to make changes to the private image,
see Manipulating local image metadata section below.
This feature uses a
At the moment, this feature is implemented in the Shifter gateway only because this is where image manipulation takes place. Therefore, when you use a private image with primary owners, you'll see a warning:
You may safely ignore this warning. We will submit a patch to Shifter to suppress this warning if this feature gains traction.
Manipulating local image metadata
When Shifter pulls images from Docker Hub, it automatically transfers their medatata into the created Shifter images. With local images, we have to transfer such metadata manually. Shifter recognizes the following metadata entries: environment variables, working directory, and image entrypoint. You can set them when you import the image into Shifter or after the fact. Currently, to update image metadata you must have access to the SquashFS file.
In summary, a call to update image metadata may look something like this:
|Short option||Long option||Description|
|-h||--help||Print a help message.|
|-v||--verbose||Enable verbose output|
|-n||--name||Name to assign to the Shifter image in Shifter image database.|
|-p||--private||Make uploaded image private. Requires -u and/or -g.|
|-u||--user||Comma-separated list of user names or UIDs allowed to access the image. Requires -p.|
|-g||--group||Comma-separated list of user groups or GUIDs allowed to access the image. Requires -p.|
|-o||--owner||Comma-separated list of user names or UIDs allowed to modify the image (Primary Image Owners). Requires -p.|
|-w||--workdir||Working directory within the image. Same as WORKDIR in Docker images.|
|--entrypoint||Image entry point (default application).|
|--env||Comma-separated list of environment variables and their values to be set within the image.
|-d||--dry-run||Show commands that would otherwise be executed.|
|-t||--timeout||Time (in seconds) to wait for the uploaded image to appear in Shifter database. Defaults to 60 seconds.|
Executing Applications in Container Environments
Shifter provides two ways to set up container environment on the Blue Waters compute nodes:
- Using the
- By setting
UDIenvironment variable to the name of the image to be used when Shifter job is submitted and then
CRAY_ROOTFSenvironment variable to
SHIFTERinside the job.
Both of these methods have to be executed from a Shifter job and
we have to provide additional flags to the
aprun command, namely:
||(mandatory)||to bypass transfer of
||(recommended)||to allow PE migration within assigned NUMA nodes.|
So, a typical
aprun call in a Shifter job looks like this:
$ aprun -b -N 1 -cc none [other aprun options]
Executing Applications Using
To execute an application in a container environment using
shiftercommand, we have to:
- Start a Shifter job
- Load Shifter module:
module load shifter
- Select which image to use and execute the application.
Image selection can be done via:
--image=flag for the
$ aprun -b ... -- shifter --image=image_name:tag -- app.exe
$ export SHIFTER=image_name:tag $ aprun -b ... -- shifter -- app.exe
$ export SHIFTER_IMAGETYPE=docker $ export SHIFTER_IMAGE=image_name:tag $ aprun -b ... -- shifter -- app.exe
Note that you can use image ID instead of its name. This approach bypasses Shifter gateway and, therefore, is a recommended way of choosing Shifter images when running at scale. For more information, see Remarks on running applications in container environments section.
Executing application using
We can specify an image we'd like to use in our Shifter job by setting
UDIenvironment variable when we submit a job. We can do that either on the command line:
or as a
$ qsub ... -v UDI=image_name:tag
PBSdirective in a batch job script:
When such a Shifter job starts, we can change the execution environment on the compute nodes to the one we prepared in the container by setting
#PBS -v UDI=image_name:tag
CRAY_ROOTFSenvironment variable to
Now, we can execute our application without using the
$ export CRAY_ROOTFS=SHIFTER
Remember to prefix names of images imported from SquashFS files with
$ aprun -b ... -- app.exe
Remarks on running applications in container environments
shiftercommand can not be used when the
CRAY_ROOTFSenvironment variable is set.
Generally speaking, you don't have to specify image type
dockerwhen working with images pulled from Docker Hub because
dockeris the default tag. This means that you can use any of the following four syntaxes to select
#PBS -v UDI=centos:centos6.9, and
#PBS -v UDI=docker:centos:centos6.9. However, if you name your image
docker, then you must specify image type
docker:for Shifter to identify your image:
You must specify image type
customfor images imported from SquashFS files, e.g.
#PBS -v UDI=custom:my_private_image:tag.
If you select Shifter container by setting the
UDIenvironment variable and then try use
shiftercommand, it will produce an error. When you set
UDIenvironment variable, Shifter sets up the container environment on compute nodes at the time the job starts. When you then call
shiftercommand, it attempts to set up a new environment and fails because it can not overwrite the one that has been previously set previously.
UDIapproach ensures that compute nodes have the container environment set up when the job starts. This results in a better scalability across nodes for large jobs.
shiftercommand, you can use several images within a single Shifter job. However, when you specify an image using its name (such as
image_name:tag), every compute node running the job contacts Shifter Gateway to get image ID from the image database. If you run at scale (more than 10,000 nodes), this may result in too many requests for the Shifter Gateway to handle. Therefore, we recommend using image ID instead of image name for all of your production jobs. Image ID is what
shifterimg lookupcommand returns, see the
shifterimgcommand section below.
Because Shifter images are read-only and produced results must be stored on the Blue Waters filesystem, Shifter automatically mounts the following directories into every container environment:
This means that you can use the above locations in your Shifter job script as you would in a regular job script. In addition to these three locations, you can map any existing directory you have access to to any directory in the container environment. Unlike previous versions of Shifter, Shifter 18 allows mapping to directories that don't exist in the image -- they are automatically created in the container runtime environment (though, not in the image).
To mount a directory
/bw/dir into the container environment as
when Shifter image is selected using
-v /bw/dir:/container/dir to the image specification line and wrap it in the double quotes:
Note that you can use this syntax on the command line (as an argument to the
-v UDI="image_name:tag -v /bw/dir:/container/dir"
qsubcommand) and in a job submission script (as a
To apply the same directory mapping using the
shifter module/command, use the
$ aprun -b ... -- shifter --volume=/bw/dir:/container/dir ...
$ aprun -b ... -- shifter --V /bw/dir:/container/dir ...
Mapping multiple directories
To map several directories into the container runtime environment when Shifter image is selected using
UDI variable, specify directory mappings one after another:
or, as a PBS comment in a job submission script:
$ qsub ... -v UDI="image_name:tag -v /bw/dir:/container/dir -v /bw/dir2:/container/dir2"...
You can also use a single
#PBS -v UDI="image_name:tag -v /bw/dir:/container/dir -v /bw/dir2:/container/dir2"...
-voption and separate mappings with semicolons:
#PBS -v UDI="image_name:tag -v /bw/dir:/container/dir;/bw/dir2:/container/dir2"...
And again, to apply the same directory mapping using the
... shifter --volume=/bw/dir1:/container/dir1 --volume=/bw/dir2:/container/dir2
... shifter -V /bw/dir1:/container/dir1 -V /bw/dir2:/container/dir2
You can use a single
-V option and combine several directory mappings
into one by wrapping them in quotation marks and separating them with semicolons:
Remarks on mapping directoriesShifter imposes some restrictions on user-defined directory mappings. In particular, one can not:
- Overwrite volume mappings specified by Shifter.
- Mount a directory to a folder whose name begins with:
opt/udiImage. If you try to mount a directory to one of the above folders, Shifter will emit a Failed to parse volume map options error:
$ aprun -b ... -- shifter ... --volume=/mnt/a/u/sciteam/<username>:/etc -- ...Invalid Volume Map: /mnt/a/u/sciteam/<username>:/etc, aborting! 1 Failed to parse volume map options
- Mount a directory to
opt. This means that while we can't mount a directory to
/opt/udiImagefolders, we can mount directories to
/opt/<dir>, provided that
- Use symbolic links when specifying the directory to be mapped. For example,
/u/sciteam/user:/path/in/imagemapping will fail and the correct syntax is
/mnt/a/u/sciteam/user:/path/in/image. Because of that, it might be a good idea to wrap mapped directory in
$(readlink -f ...):
$(readlink -f /u/sciteam/user):/path/in/image
Note, that Shifter allows mounting a directory at
Directory mapping flags
When we map directories into the container environment,
we can give them certain properties by adding extra flags after the mapping specification in the form of
Current version of Shifter on Blue Waters supports two flags:
This flag marks the directory mounted in the container environment (
/container/dir) as read-only and any attempt to write to it will result in Read-only file system error.
This flag instructs Shifter to create an XFS file of specified size on every compute node and mount it at the location specified by the mapping (
/container/dir). The source directory (
/bw/dir) is required but ignored. Cache size can be specified as a number, in which case it is interpreted as the number of bytes. Alternatively, you can use a one-letter suffix after the number to specify the units: b for bytes, k for kibibytes (210 bytes), m for mebibytes (220 bytes), g for gibibytes (230 bytes), t for tebibytes (240 bytes), p for pebibytes (250 bytes), and e for exbibytes (260 bytes). You can use uppercase letters instead (B, K, M...). Characters after the first suffix character are ignored, so all of the below mappigs create 10 gibibyte-large cache:
:perNodeCache=size=10G :perNodeCache=size=10GB :perNodeCache=size=10Gibabyte
Accessing compute nodes running Shifter jobs via SSH
Just like with any other application, you might need to interact with the
application running in a Shifter environment for debugging,
monitoring, or other purposes. To enable such interactions,
Shifter allows users to log in to compute nodes that are part
of its pool via the standard
line tool. There are several requirements in order to make use of
- 1. Specify UDI as a PBS directive.
- To allow users log in to its compute nodes, Shifter
can start up
SSHdaemons. The daemons on the compute nodes can be launched only by the prologue script, which is executed when the job starts. Therefore, in order to be able to login to compute nodes with a Shifter job running on it, it is necessary to specify UDI as a PBS directive.
- 2. Prepare special SSH key pair.
- On startup, the
SSHdaemons enabled by Shifter look for a private SSH key in
$HOME/.shifterand wait for a connection on port
1204authenticated with this key. To prepare such a key pair, execute:
$ mkdir -p ~/.shifter $ ssh-keygen -t rsa -f ~/.shifter/id_rsa -N ''
Once the above two steps are completed, you can log into the compute nodes using:
$ ssh -p 1204 -i ~/.shifter/id_rsa -o StrictHostKeyChecking=no \ -o UserKnownHostsFile=/dev/null -o LogLevel=error nodename
It is advisable to save all the above options into a configuration file. To do that, execute:
$ cat <<EOF > ~/.shifter/config Host * Port 1204 IdentityFile ~/.shifter/id_rsa StrictHostKeyChecking no UserKnownHostsFile /dev/null LogLevel error EOF
Now, we can log in to the compute nodes with a simple:
$ ssh -F ~/.shifter/config nodename
To login to a remote machine using
ssh command we have to specify the remote machine's
network name. To find the name of compute nodes assigned to the Shifter job, execute the
following command on a MOM node before setting the
$ aprun -b -N1 -cc none -n$PBS_NUM_NODES -- hostname
You should see a list of names of the form:
is a five-digit number. Use these to connect to the compute nodes:
$ ssh -F ~/.shifter/config nidXXXXX
Make sure, however, that you do not accidentally copy a network name of a
MOM node where you execute all
sshwill fail with a Permission denied error if your login shell does not exist in the container or is not listed in the container's
Using GPUs in Shifter jobs
GPU support is automatically enabled when you use
for executing your application:
$ module load shifter $ module load cudatoolkit $ aprun -q -b -n 1 -- shifter --image=centos:6.9 -- nvidia-smiFri Jun 19 16:49:11 2020 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 390.46 Driver Version: 390.46 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla K20X On | 00000000:02:00.0 Off | 0 | | N/A 27C P8 16W / 225W | 0MiB / 5700MiB | 0% E. Process | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+
You can also use GPUs in Shifter jobs when you specify image using
However, in such a case you have to append the following locations to
0 before you call
$ qsub ... -v UDI=centos:6.9 ... # On a MOM node $ PATH=$PATH:/opt/cray/nvidia/390.46-1_1.0502.2481.1.1.gem/bin $ PATH=$PATH:/opt/nvidia/cudatoolkit9.1/9.1.85_3.10-1.0502.df1cc54.3.1/bin $ export PATH $ LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/cray/nvidia/390.46-1_1.0502.2481.1.1.gem/lib64 $ LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/nvidia/cudatoolkit9.1/9.1.85_3.10-1.0502.df1cc54.3.1/lib64 $ export LD_LIBRARY_PATH $ export CUDA_VISIBLE_DEVICES=0 $ export CRAY_ROOTFS=SHIFTER $ aprun -b -q -n 1 -- nvidia-smiFri Jun 19 19:27:10 2020 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 390.46 Driver Version: 390.46 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla K20X On | 00000000:02:00.0 Off | 0 | | N/A 25C P8 17W / 225W | 0MiB / 5700MiB | 0% E. Process | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+
MPI in Shifter
Shifter allows applications in containers use Blue Waters' high-speed interconnect for MPI-based communications. It does that by swapping MPI libraries inside of the container with CRAY MPI libraries on Blue Waters at run time.
When you use
shifter command for executing applications, MPI support is enabled automatically.
When you use
UDI-based approach, MPI support isn't enabled (because it requires changing user's
so one has to do all the required steps manually. We describe these steps below.
Now let's review the requirements for enabling MPI in Shifter.
- 1. Compatible MPI Implementation
Application in the image can be compiled against any MPI implementation
that's part of MPICH ABI Compatibility Initiative.
Currently, the list of compatible MPI implementations includes:
- MPICH v3.1
- Intel® MPI Library v5.0
- Cray MPT v7.0.0
- MVAPICH2 2.0
- Parastation MPI 5.1.7-1
- IBM MPI v2.1
- 2. Compatible Docker images
- Swapping MPI libraries at run time requires GNU C library (
glibc) version 2.17 or above. This means that you can use containers based on CentOS 7, Ubuntu 14.04, or newer. If your Docker image has older
glibc, you can compile a newer one from source and then build your application against it. Make sure that your application uses correct
To enable MPI support when you specify image using
you have to add the following locations to the
LD_LIBRARY_PATH environment variable:
1 before you call
$ qsub ... -v UDI=centos:6.9 ... # On a MOM node $ LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/cray/pmi/5.0.11/lib64 $ LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/cray/libsci/18.12.1/GNU/5.1/x86_64/lib $ LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/cray/mpt/7.7.4/gni/mpich-gnu-abi/5.1/lib $ LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/cray/xpmem/0.1-2.0502.64982.7.27.gem/lib64 $ LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/cray/dmapp/7.0.1-1.0502.11080.8.74.gem/lib64 $ LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/cray/ugni/6.0-1.0502.10863.8.28.gem/lib64 $ LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/cray/udreg/2.3.2-1.0502.10518.2.17.gem/lib64 $ LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/cray/wlm_detect/1.0-1.0502.64649.2.2.gem/lib64 $ LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/cray/alps/5.2.4-2.0502.9774.31.12.gem/lib64 $ LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/udiImage/modules/mpich/lib64 $ export LD_LIBRARY_PATH $ export MPICH_GNI_MALLOC_FALLBACK=1 $ export CRAY_ROOTFS=SHIFTER $ aprun -b -q -n 1 -- ./mpi-application
MPI + GPU in Shifter jobs
GPU and MPI support can be enabled in the same Shifter job following the guidelines detailed in GPU and MPI sections above. Note, that currently CUDA-aware MPI in Shifter jobs does not work. We're investigating the issue and will notify our users and update this page accordingly if and when we find a resolution.
shiftercommand executes application provided on the command line (or an image entrypoint if no application is specified) in a container environment. Below is the list of its options and their descriptions.
|Short option||Long option||Description|
|-v||--verbose||Show verbose messages (when applicable).|
|-i||--image||Shifter image to use.
Command to be executed when you don't specify an application to be executed in the container environment.
|-w||--workdir||Set working directory.
|-E||--clearenv||Don't pass environment of the MOM nodes to the compute nodes. Recommended.|
|-e||--env||Set an environment variable in the container environment.
|--env-file||Read environment variables from the specified file.
Empty lines and lines starting with
Load only specified Shifter modules (not to be confused with Blue Waters modules).
shifterimgcommand provides means for creating, manipulating, and quering Shifter images. Below is the list of its options and their descriptions.
|Short option||Long option||Description|
|-v||--verbose||Show verbose messages (when applicable). Frequently used with
|-u||--user||Comma-separated list of users allowed to access private image pulled from a private Docker Hub repository.|
|-g||--group||Comma-separated list of groups allowed to access private image pulled from a private Docker Hub repository.|
Below is the list of
shifterimg subcommands and their descriptions.
|images||Show a list of Shifter images already available on the system that you have access to.
|lookup||Get the ID of the specified image in the Shifter image database from the Shifter gateway.
Returned ID prepended by
|pull||Pull specified image from Docker Hub.
|login||Login to a private image repository on Docker Hub.
Generally speaking, in order to enable a feature in a Shifter container (e.g., MPI or GPU access), one has to:
- set environment variables,
- inject directories, and
- write scripts.
Shifter modules provide a way to support all of the above modifications that apply to all containers at a system and user levels.
For example, MPI support in Shifter 18 is implemented as a Shifter module called
and GPU support — as a module called
Currently, Blue Waters has the following Shifter modules set up:
|gpu||Enables GPU support. Default module.|
|mpich||Enables MPI support. Default module.|
|x11||Provides access to Blue Waters' NVIDIA drivers.|
|gcc||Provides access to Blue Waters'
|none||Disables all modules (use with care).|
If you do all of the above manipulations and they are the same across all of your jobs, you may contact us at firstname.lastname@example.org to request to port your manipulations into a new module.
Official documentation for Shifter modules can be found on the Shifter Read the Docs website.