Skip to Content

Shifter 18


Shifter is a software solution that provides a way for HPC users to:

  • execute applications within software containers,
  • build applications on HPC systems using containers as developer's toolbox.

Besides working with public and private images from Docker Hub, Shifter 18 also supports public and private local images in SquashFS format.
This guide describes how to use Shifter 18 on Blue Waters.

Shifter documentation
Official Shifter documentation can be found at:

Shifter Jobs & Generic Resource Request

On Blue Waters, Shifter is provided as a module called shifter. Unlike other modules, it can only be loaded on a MOM node in a Shifter job, which is a job that requests shifter generic resource. This request can be provided either on the command line:

$ qsub -l gres=shifter ...

or in a job batch script in the form of a PBS comment:

#PBS -l gres=shifter

Once the job submitted with gres=shifter request starts, you can load shifter module:

$ module load shifter

and start using the tools it provides, namely:

shifter executes an application in a container environment
shifterimg an auxiliary command for working with images
Warning
Use of Shifter mounted images is intended only on compute nodes, not the MOM nodes of a job. Please see the Executing Applications in Container Environments section for more information on executing applications with Shifter.

Let's now review the steps associated with using Docker and local (SquashFS) images with Shifter.

Working with Images from Docker Hub

To use a Docker image on Blue Waters, we need to:

1. Prepare Docker image on a personal computer,
2. Push that image to Docker Hub,
3. Launch a Shifter job on Blue Waters,
4. Pull the image from Docker Hub to Blue Waters,
5. Execute application from the image on the Blue Waters compute nodes.
1. Prepare Docker image.
Providing instructions on how to build Docker images is out of scope of this guide. There is, however, an important requirement related to this step that must be satisfied. Namely, when you prepare a Docker image for your application, make sure to use a base image that provides glibc that supports the version of Linux kernel installed on Blue Waters, which is version 3.0.101. This requirement is tricky to check automatically because it depends on the version of glibc and --enable-kernel flag used at the time glibc is compiled. If glibc provided by the operating system in the image does not support the version of Linux kernel installed on Blue Waters, consider using a different base image. If this is not an option, contact us at help+bw@ncsa.illinois.edu.
2. Push Docker image to Docker Hub.
Once the image is ready, you can push it to a public or private Docker Hub repository. Please verify repository privacy settings on Docker Hub before pushing an image to it.
$ sudo docker login  # to push a private image
$ sudo docker push docker-username/image-name:tag
3. Launch Shifter job.
Launch Shifter job as described in Shifter Jobs & Generic Resource Request section.
4. Pull image from Docker Hub.
We can pull an image from Docker Hub using the shifterimg pull command provided by the Shifter module:
$ module load shifter
$ shifterimg pull docker-username/image-name:tag
To pull a private image, we first need to authenticate to Docker Hub using shifterimg login command:
$ shifterimg login
default username:<Docker Hub username>
default password:<Docker Hub password>
Now, we can pull image(s) from our private repository on Docker Hub. To keep images private on Blue Waters, we have to specify users who can access them when we pull images. To do that, we can use either --user or --group options of the shifterimg command. These options restrict visibility of and access to the pulled private image to select users or groups. For example:
$ shifterimg --user bw_user1,bw_user2,bw_user3 pull myrepo/myprivateimage1:latest
will pull the myprivateimage1:latest image from myrepo repository on Docker Hub and limit access to this image for everyone on Blue Waters except for users bw_user1, bw_user2, and bw_user3. No other user will be able to see or work with this image. Likewise, --group option limits access to an image based on the group:

$ shifterimg --group bw_group1,bw_group2 pull myrepo/myprivateimage2:latest
Here, only members of groups bw_group1 and bw_group2 will be able to access myprivateimage2:latest image. Note, that even if you use --user or --group options when pulling public images from DockerHub, they still will be visible to and accessible by all Blue Waters users.
:latest... isn't the greatest!
Consider not using the :latest tag when you pull images from Docker Hub because images with these tags change over time. To ensure that your Shifter job runs the same way every time, use the most specific tag you can.
5. Execute your application in a container environment.
Now we're ready to execute our application on compute nodes. There is a couple of ways to do that and we discuss this step in detail in section Executing Applications in Container Environments below.

Working with Local Images

Now let's review the steps associated with using local images in SquashFS format.

1. Prepare a SquashFS image on a personal computer or Blue Waters,
2. Transfer the image from a personal computer to Blue Waters,
3. Launch a Shifter job on Blue Waters,
4. Import the image to Shifter database,
5. Execute application in the image on the Blue Waters compute nodes.
1. Prepare SquashFS image.
Shifter 18 supports local images in SquashFS format. If your application and all of its dependencies live in a single directory, you can convert this directory to a SquashFS file with:
$ mksquashfs directory/ my_image.squashfs -no-xattrs -all-root
Make sure to use both -no-xattrs and -all-root flags. The first flag instructs mksquashfs to not store extended attributes, and the latter one makes all files owned by root. If you don't use these flags, you will run into problems with trying to use this image on Blue Waters.
If you have a Docker image that you'd like to use on Blue Waters but don't want to use Docker Hub, you can convert it to a SquashFS image using the following commands:
$ docker container create --name my_container my_image:tag
$ docker export -o my_image.tar my_container
$ docker container rm my_container
$ tar -xf my_image.tar -C my_image
$ mksquashfs my_image/ my_image.squashfs -no-xattrs -all-root
$ rm -rf my_image/ my_image.tar  # optional: remove temporary files
2. Transfer SquashFS image to Blue Waters.
You can transfer your SquashFS image to Blue Waters using the same tools you use to transfer any other data. Refer to the corresponding Data Transfer page on the Blue Waters portal.
3. Launch Shifter job.
Launch Shifter job as described in Shifter Jobs & Generic Resource Request section.
4. Import SquashFS image to Shifter.
Importing SquashFS files does not require shifter module but it has to be performed from within a Shifter job as it relies on several services that these jobs enable on MOM nodes.
First, make sure that the image is readable by the shifter user. You can achieve this with a couple of setfacl commands:
$ setfacl -m u:shifter:r-- dir1/dir2/dir3/image_file.squashfs
$ setfacl -m u:shifter:r-x dir1 dir1/dir2 dir1/dir2/dir3
where dir1 is the top-most folder you own and dir3 is the directory containing the image.
Once that is done, you can import the image using the following curl call:
$ curl -H "authentication: $(munge -n)" \
       -d '{ "format" : "squashfs",
             "filepath" : "dir1/dir2/dir3/image_file.squashfs"
	   }' http://shifter:5000/api/doimport/bluewaters/custom/IMAGE_NAME:TAG/
The above call tells Shifter to import specified SquashFS file as a public image under the name IMAGE_NAME:TAG. Once the step completes, you'll be able to find the image in the output of shifterimg images command. You also will be able to look it up using shifterimg lookup command. Note, however, that names of images imported this way have to be prefixed with custom:, so to "look up" image IMAGE_NAME:TAG, you have to execute:
$ module load shifter
$ shifterimg lookup custom:IMAGE_NAME:TAG
To import SquashFS file as a private image, all we have to do is add "allowed_uids" data field assigned to the list of UIDs of users allowed to access the image. For example, to limit access to the image to users with UIDs 12345 and 67890, we have to execute:
$ curl -H "authentication: $(munge -n)" \
       -d '{ "format" : "squashfs",
             "filepath" : "dir1/dir2/dir3/image_file.squashfs"
	     "allowed_uids" : "12345,67890"
	   }' http://shifter:5000/api/doimport/bluewaters/custom/IMAGE_NAME:TAG/
Similarly, we can limit access to the image to specific groups using "allowed_gids" data field.
In summary, the command to import your SquashFS file as a private image may looks like this:
$ curl -H "authentication: $(munge -n)" \
       -d '{ "format" : "squashfs",
             "filepath" : "dir1/dir2/dir3/image_file.squashfs"
	     "allowed_uids" : "12345,67890"
	     "allowed_gids" : "54321,09876"
	  }' http://shifter:5000/api/doimport/bluewaters/custom/IMAGE_NAME:TAG/
5. Execute your application in a container environment.
Now we're ready to execute our application on compute nodes. There is a couple of ways to do that and we discuss this step in detail in section Executing Applications in Container Environments below.

Executing Applications in Container Environments

Shifter provides two ways to set up container environment on the Blue Waters compute nodes:

  1. Using the shifter command
  2. By setting UDI environment variable to the name of the image to be used when Shifter job is submitted and then CRAY_ROOTFS environment variable to SHIFTER inside the job.

Both of these methods have to be executed from a Shifter job and we have to provide additional flags to the aprun command, namely:

-b (mandatory) to bypass transfer of shifter executable to the compute nodes
-N 1 (mandatory)* -N 1 limits the number of shifter processes per compute node to 1.
-cc none (optional) to allow PE migration within assigned NUMA nodes.

So, a typical aprun call in a Shifter job looks like this:

$ aprun -b -N 1 -cc none [other aprun options]
Executing Applications Using shifter command

To execute an application in a container environment using shiftercommand, we have to:

  1. Start a Shifter job
  2. Load Shifter module: module load shifter
  3. Select which image to use and execute the application. Image selection can be done via:
    • --image= flag for the shifter command:
      $ aprun -b ... -- shifter --image=image_name:tag -- app.exe
    • SHIFTER environment variable:
      $ export SHIFTER=image_name:tag
      $ aprun -b ... -- shifter -- app.exe
    • SHIFTER_IMAGETYPE and SHIFTER_IMAGE environment variables:
      $ export SHIFTER_IMAGETYPE=docker
      $ export SHIFTER_IMAGE=image_name:tag
      $ aprun -b ... -- shifter -- app.exe
      Remember to always prefix names of images imported from SquashFS files with custom:.
Executing application using UDI variable

We can specify an image we'd like to use in our Shifter job by setting UDI environment variable when we submit a job. We can do that either on the command line:

$ qsub ... -v UDI=image_name:tag
or as a PBS directive in a batch job script:
#PBS -v UDI=image_name:tag
When such a Shifter job starts, we can change the execution environment on the compute nodes to the one we prepared in the container by setting CRAY_ROOTFSenvironment variable to SHIFTER:
$ export CRAY_ROOTFS=SHIFTER
Now, we can execute our application with:
$ aprun -b ... -- app.exe

Remarks on running applications in container environments
  1. CRAY_ROOTFS environment variable has to be unset when shifter command is called.
  2. You can specify image type docker for images pulled from Docker Hub, e.g. --image=docker:centos:centos6.9. This is optional.
  3. You must specify image type custom for images imported from SquashFS files, e.g. --image=custom:my_private_image:tag.
  4. If you set UDI environment variable, trying to use shifter command will produce an error. When we set UDI environment variable, container environment is set up on compute nodes at the time our job starts. When we then call shifter command, it attempts to set up an environment again and fails because it can not overwrite the one that was set previously.
  5. shifter command allows using different images in the same Shifter job.
  6. UDI approach ensures that compute nodes have the container environment already set up when the job starts. This results in a slightly better scalability across nodes for large jobs.

Mapping directoties

Shifter images are read-only, so while input files can be included in the image, produced results have to be stored on the Blue Waters filesystem. For that purpose, Shifter makes the following directories accessible from the container environment:

  • /scratch
  • /projects
  • $HOME(/mnt/a/u/sciteam/<username>)

In addition to these automatic hooks, Shifter allows us to manually map directories from the Blue Waters filesystems to existing directories in images. There are two ways to specify directory mappings:

  1. when Shifter job is submitted to the queue:
    $ qsub ... -v UDI="image_name:tag -v /bw/dir:/container/dir"...
  2. as an argument to the shifter command:
    $ aprun -b ... -- shifter --volume=/bw/dir:/container/dir ...
Remarks on mapping directories
Note, that one can not:
  1. Overwrite volume mappings specified by Shifter
  2. Map to a directory whose name begins with: /etc, /var, /opt/udiImage. If we try to map one of the restricted folders, we will receive an error message:
    $ aprun -b ... -- shifter ... --volume=/mnt/a/u/sciteam/<username>:/etc -- ...
    Invalid Volume Map: /mnt/a/u/sciteam/<username>:/etc, aborting! 1
    Failed to parse volume map options
    
  3. Map to /opt
  4. Use symbolic links when specifying the directory to be mapped, that is the following mapping
    /u/sciteam/user:/path/in/image
    will fail and the correct syntax is
    /mnt/a/u/sciteam/user:/path/in/image
    Because of that, it might be a good idea to wrap mapped directory in $(readlink -f ...):
    $(readlink -f /u/sciteam/user):/path/in/image

To map several directories, provide several -V or --volume requests:

--volume=/bw/dir1:/container/dir1 --volume=/bw/dir2:/container/dir2
or combine them in one using quotation marks and separating them with a semicolon:
--volume="/bw/dir1:/container/dir1;/bw/dir2:/container/dir2"

Shifter supports several flags when mapping directories. For example, to mark mapped directory in a container as read-only, all we have to do is append :ro flag to target directory name:

--volume=/bw/dir1:/container/dir1:ro

Shifter also supports perNodeCache flag which instructs Shifter to create an XFS file with specified capacity on every node in the job and mount it at the location specified by the mapping in the image:

--volume=/bw/dir1:/container/dir1:perNodeCache=size=10GB

Using GPUs in Shifter jobs

GPU support is automatically enabled when you use shifter command for executing your application:

$ module load shifter
$ aprun -q -b -n 1 -- shifter --image=centos:6.9 -- nvidia-smi
Fri Jun 19 16:49:11 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.46                 Driver Version: 390.46                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K20X          On   | 00000000:02:00.0 Off |                    0 |
| N/A   27C    P8    16W / 225W |      0MiB /  5700MiB |      0%   E. Process |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

You can also use GPUs in Shifter jobs when you specify image using UDI-based approach. However, in such a case you have to append the following locations to PATH and LD_LIBRARY_PATH:

PATH /opt/cray/nvidia/390.46-1_1.0502.2481.1.1.gem/bin
/opt/nvidia/cudatoolkit9.1/9.1.85_3.10-1.0502.df1cc54.3.1/bin
LD_LIBRARY_PATH /opt/cray/nvidia/390.46-1_1.0502.2481.1.1.gem/lib64
/opt/nvidia/cudatoolkit9.1/9.1.85_3.10-1.0502.df1cc54.3.1/lib64

and set CUDA_VISIBLE_DEVICES to 0 before you call aprun:

$ qsub ... -v UDI=centos:6.9 ...
# On a MOM node
$ PATH=$PATH:/opt/cray/nvidia/390.46-1_1.0502.2481.1.1.gem/bin
$ PATH=$PATH:/opt/nvidia/cudatoolkit9.1/9.1.85_3.10-1.0502.df1cc54.3.1/bin
$ export PATH
$ LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/cray/nvidia/390.46-1_1.0502.2481.1.1.gem/lib64
$ LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/nvidia/cudatoolkit9.1/9.1.85_3.10-1.0502.df1cc54.3.1/lib64
$ export LD_LIBRARY_PATH
$ export CUDA_VISIBLE_DEVICES=0
$ export CRAY_ROOTFS=SHIFTER
$ aprun -b -q -n 1 -- nvidia-smi
Fri Jun 19 19:27:10 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.46                 Driver Version: 390.46                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K20X          On   | 00000000:02:00.0 Off |                    0 |
| N/A   25C    P8    17W / 225W |      0MiB /  5700MiB |      0%   E. Process |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

MPI in Shifter

Shifter allows applications in containers use Blue Waters' high-speed interconnect for MPI-based communications. It does that by swapping MPI libraries inside of the container with CRAY MPI libraries on Blue Waters at run time.

When you use shifter command for executing applications, this MPI support is enabled automatically. When you use UDI-based approach, MPI support isn't enabled (because it requires changing user's environment variables), so one has to do all the required steps manually. We describe these steps below.

Now let's review the requirements for enabling MPI in Shifter.

1. Compatible MPI Implementation
Application in the image can be compiled against any MPI implementation that's part of MPICH ABI Compatibility Initiative. Currently, the list of compatible MPI implementations includes:

  • MPICH v3.1
  • Intel® MPI Library v5.0
  • Cray MPT v7.0.0
  • MVAPICH2 2.0
  • Parastation MPI 5.1.7-1
  • IBM MPI v2.1
Note, that Shifter on Blue Waters works with MPI implementations installed using package managers such as yum and apt-get.
2. Compatible Docker images
Swapping MPI libraries at run time requires GNU C library (glibc) version 2.17 or above. This means that you can use containers based on CentOS 7, Ubuntu 14.04, or newer. If your Docker image has older glibc, you can compile a newer one from source and then build your application against it. Make sure that your application uses correct glibc.

To enable MPI support when you specify image using UDI-based approach, you have to add the following locations to the LD_LIBRARY_PATH environment variable:

LD_LIBRARY_PATH /opt/cray/pmi/5.0.11/lib64
/opt/cray/libsci/18.12.1/GNU/5.1/x86_64/lib
/opt/cray/mpt/7.7.4/gni/mpich-gnu-abi/5.1/lib
/opt/cray/xpmem/0.1-2.0502.64982.7.27.gem/lib64
/opt/cray/dmapp/7.0.1-1.0502.11080.8.74.gem/lib64
/opt/cray/ugni/6.0-1.0502.10863.8.28.gem/lib64
/opt/cray/udreg/2.3.2-1.0502.10518.2.17.gem/lib64
/opt/cray/wlm_detect/1.0-1.0502.64649.2.2.gem/lib64
/opt/cray/alps/5.2.4-2.0502.9774.31.12.gem/lib64
/opt/udiImage/modules/mpich/lib64

and set MPICH_GNI_MALLOC_FALLBACK to 1 before you call aprun:

$ qsub ... -v UDI=centos:6.9 ...
# On a MOM node
$ LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/cray/pmi/5.0.11/lib64
$ LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/cray/libsci/18.12.1/GNU/5.1/x86_64/lib
$ LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/cray/mpt/7.7.4/gni/mpich-gnu-abi/5.1/lib
$ LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/cray/xpmem/0.1-2.0502.64982.7.27.gem/lib64
$ LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/cray/dmapp/7.0.1-1.0502.11080.8.74.gem/lib64
$ LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/cray/ugni/6.0-1.0502.10863.8.28.gem/lib64
$ LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/cray/udreg/2.3.2-1.0502.10518.2.17.gem/lib64
$ LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/cray/wlm_detect/1.0-1.0502.64649.2.2.gem/lib64
$ LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/cray/alps/5.2.4-2.0502.9774.31.12.gem/lib64
$ LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/udiImage/modules/mpich/lib64
$ export LD_LIBRARY_PATH
$ export MPICH_GNI_MALLOC_FALLBACK=1
$ export CRAY_ROOTFS=SHIFTER
$ aprun -b -q -n 1 -- ./mpi-application

MPI + GPU in Shifter jobs

GPU and MPI support can be enabled in the same Shifter job following the guidelines detailed in GPU and MPI sections above. Note, that currently CUDA-aware MPI in Shifter jobs does not work. We're investigating the issue and will notify our users and update this page accordingly if and when we find a resolution.