Skip to Content

Shifter 18


Shifter is a software solution that provides a way for HPC users to:

  • execute applications within software containers,
  • build applications on HPC systems using containers as developer's toolbox.

Besides working with public and private images from Docker Hub, Shifter 18 also supports public and private local images in SquashFS format.
This guide describes how to use Shifter 18 on Blue Waters.

Shifter documentation
Official Shifter documentation can be found at:

For the impatient

Here is an example of how to pull an image from Docker Hub and run a simple command in it:
$ qsub -I -q debug -l walltime=0:30:0 -l nodes=1:xk:ppn=16 -l gres=shifter 
$ module load shifter
$ shifterimg pull ubuntu:xenial
$ aprun -b -n 1 -N 1 -d 16 -cc none -- \
  shifter --image=ubuntu:xenial -V /dev/shm:/tmp -- \
  cat /etc/os-release
which starts an interactive session (-I) in the debug queue, specifying the shifter option in the gres setting. It runs a simple command with aprun using the -b option to load the executable from the image and mounts /dev/shm onto the /tmp directory in the container. It will output:
$ aprun -b -n 1 -N 1 -d 16 -cc none -- \
  shifter --image=ubuntu:xenial -V /dev/shm:/tmp/ -- \
  cat /etc/os-release
mount: warning: ufs seems to be mounted read-only.
mount: warning: dsl/opt seems to be mounted read-only.
NAME="Ubuntu"
VERSION="16.04.7 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.7 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial
Application 97009575 resources: utime ~0s, stime ~0s, Rss ~7128, inblocks ~53301, outblocks ~37737

Shifter Jobs & Generic Resource Request

On Blue Waters, Shifter is provided as a module called shifter. Unlike other modules, it can only be loaded on a MOM node in a Shifter job, which is a job that requests shifter generic resource. This request can be provided either on the command line:

$ qsub -l gres=shifter ...

or in a job batch script in the form of a PBS comment:

#PBS -l gres=shifter

Once the job submitted with gres=shifter request starts, you can load shifter module:

$ module load shifter

and start using the tools it provides, namely:

shifter executes an application in a container environment
shifterimg an auxiliary command for working with images
Warning
Use of Shifter images is intended only on compute nodes, not the MOM nodes of a job. Please see the Executing Applications in Container Environments section for more information on executing applications with Shifter.

Let's now review the steps associated with using Docker and local (SquashFS) images with Shifter.

Working with Images from Docker Hub

To use a Docker image on Blue Waters, we need to:

1. Prepare Docker image on a personal computer,
2. Push that image to Docker Hub,
3. Launch a Shifter job on Blue Waters,
4. Pull the image from Docker Hub to Blue Waters,
5. Execute application from the image on the Blue Waters compute nodes.
1. Prepare Docker image.
Providing instructions on how to build Docker images is out of scope of this guide. We'll note, however, that the application in the image that you plan to execute on Blue Waters must be built against glibc that is compatible with the Linux kernel installed on Blue Waters — Linux kernel version 3.0.101. Usually this means that the base image itself must provide such glibc. While it is possible to limit glibc's support for older kernels when it's built, generally speaking, glibc 2.23 and older should be compatible with the Linux kernel installed on Blue Waters.
You can check glibc version in your image using ldd --version command:
$ sudo docker run --rm image:tag bash -c 'ldd --version'
If your application is linked against glibc that does not support the version of Linux kernel installed on Blue Waters, you will see FATAL: kernel too old error message when trying to execute such an application on Blue Waters. If you see this error message, consider using a different base image with glibc 2.23 or older. If this is not an option, contact us at help+bw@ncsa.illinois.edu.
2. Push Docker image to Docker Hub.
Once the image is ready, you can push it to a public or private Docker Hub repository. Please verify repository privacy settings on Docker Hub before pushing an image to it.
$ sudo docker login  # to push a private image
$ sudo docker push docker-username/image-name:tag
3. Launch Shifter job.
Launch Shifter job as described in the Shifter Jobs & Generic Resource Request section.
4. Pull image from Docker Hub.
You can now pull an image from Docker Hub using the shifterimg pull command provided by the Shifter module:
$ module load shifter
$ shifterimg pull docker-username/image-name:tag
To pull a private image, authenticate to Docker Hub using shifterimg login command:
$ shifterimg login
default username:<Docker Hub username>
default password:<Docker Hub password>
Now you can pull an image from your private repository on Docker Hub. To keep pulled images private on Blue Waters, you have to specify users who can access them when you pull them. To do that, use either --user or --group options of the shifterimg command followed by comma-separated lists of Blue Waters users or groups that can access the image. These options restrict visibility of and access to the pulled private image to select users or groups. For example:
$ shifterimg --user bw_user1,bw_user2,bw_user3 pull myrepo/myprivateimage1:latest
will pull the myprivateimage1:latest image from myrepo repository on Docker Hub and limit access to this image for everyone on Blue Waters except for users bw_user1, bw_user2, and bw_user3. No other user will be able to see or work with this image. Likewise, --group option limits access to an image based on the group:

$ shifterimg --group bw_group1,bw_group2 pull myrepo/myprivateimage2:latest
In this example, users of both groups (bw_group1 and bw_group2) will have access to the myprivateimage2:latest image. Note, that even if you use --user or --group options when pulling public images from Docker Hub, they still will be visible to and accessible by all Blue Waters users.
:latest... isn't the greatest!
Docker images with the :latest tag change over time. We, therefore, recommend against using the :latest tag when pulling images from Docker Hub. To ensure that your Shifter job runs the same way every time, use the most specific tag you can.
5. Execute your application in a container environment.
Now we're ready to execute our application on compute nodes. There is a couple of ways to do that and we discuss this step in detail in the Executing Applications in Container Environments section below.

Working with Local Images

Now let's review the steps associated with using local images in SquashFS format.

1. Prepare a SquashFS image on a personal computer or Blue Waters,
2. Transfer the image from a personal computer to Blue Waters,
3. Launch a Shifter job on Blue Waters,
4. Import the image to Shifter database,
5. Execute application in the image on the Blue Waters compute nodes.
1. Prepare SquashFS image.
Shifter 18 supports local images in SquashFS format. If your application and all of its dependencies live in a single directory, you can convert this directory to a SquashFS file with:
$ mksquashfs directory/ my_image.squashfs -no-xattrs -all-root
Make sure to use both -no-xattrs and -all-root flags. The first flag instructs mksquashfs to not store extended attributes, and the latter one makes all files owned by root. If you don't use these flags, you will run into problems with trying to use this image on Blue Waters.
If you have a Docker image that you'd like to use on Blue Waters but don't want to use Docker Hub, you can convert it to a SquashFS image using the following commands:
$ docker container create --name my_container my_image:tag
$ docker export -o my_image.tar my_container
$ docker container rm my_container
$ tar -xf my_image.tar -C my_image
$ mksquashfs my_image/ my_image.squashfs -no-xattrs -all-root
$ rm -rf my_image/ my_image.tar  # optional: remove temporary files
2. Transfer SquashFS image to Blue Waters.
You can transfer your SquashFS image to Blue Waters using the same tools you use to transfer any other data. Refer to the corresponding Data Transfer page on the Blue Waters portal.
3. Launch Shifter job.
Launch Shifter job as described in the Shifter Jobs & Generic Resource Request section.
4. Import SquashFS image to Shifter.
Importing SquashFS files does not require shifter module but it has to be performed from within a Shifter job as it relies on several services that these jobs enable on MOM nodes. You can do it manually following the instructions below or with the help of the import_image.sh script provided by the Shifter module, see Using import_image.sh script section.

First, copy your image to a temporary location for local images — /projects/system/shifter/local-images — and make sure it is readable by the shifter user:

$ cp image_file.squashfs /projects/system/shifter/local-images
$ setfacl -m u:shifter:r-- /projects/system/shifter/local-images/image_file.squashfs
Once that is done, you can import the image using the following curl call:
$ curl -H "authentication: $(munge -n)" \
       -d '{ "format" : "squashfs",
             "filepath" : "/projects/system/shifter/local-images/image_file.squashfs"
	   }' http://shifter:5000/api/doimport/bluewaters/custom/IMAGE_NAME:TAG/
The above call tells Shifter to import specified SquashFS file as a public image under the name IMAGE_NAME:TAG.
Once this step completes, you'll be able to find the image in the output of shifterimg images command and look it up using shifterimg lookup command. Note, however, that names of images imported this way have to be prefixed with custom:, so to look up an image called IMAGE_NAME:TAG, you have to execute:
$ shifterimg lookup custom:IMAGE_NAME:TAG
Once you verify the image has been imported, delete the SquashFS file from the temporary location:
$ /bin/rm /projects/system/shifter/local-images/image_file.squashfs
To import SquashFS file as a private image, see the Private local images section below.
5. Execute your application in a container environment.
Now we're ready to execute our application on compute nodes. There is a couple of ways to do that and we discuss this step in detail in the Executing Applications in Container Environments section below.
Private local images

Shifter provides a mechanism to import a SquashFS file as a private image. All you have to do is add "allowed_uids" data field populated with the comma-separated list of UIDs of users who are allowed to access the image. For example, to limit access to the image to users with UIDs 12345 and 67890, execute:

$ curl -H "authentication: $(munge -n)" \
     -d '{ "format" : "squashfs",
           "filepath" : "/absolute/path/to/dir1/dir2/dir3/image_file.squashfs",
           "allowed_uids" : "12345,67890"
         }' http://shifter:5000/api/doimport/bluewaters/custom/IMAGE_NAME:TAG/

The user who imports the image is automatically added to the list of people authorized to access the image.

Similarly, we can limit access to the image to specific groups using "allowed_gids" data field.
In summary, the command to import your SquashFS file as a private image may looks like this:

$ curl -H "authentication: $(munge -n)" \
     -d '{ "format" : "squashfs",
           "filepath" : "/absolute/path/to/dir1/dir2/dir3/image_file.squashfs",
           "allowed_uids" : "12345,67890",
           "allowed_gids" : "54321,09876"
         }' http://shifter:5000/api/doimport/bluewaters/custom/IMAGE_NAME:TAG/
Primary image owners for private images

Shifter installed on Blue Waters has an experimental feature called "Primary image owners". Primary image owners are users who are allowed to make changes to the private image, see Manipulating local image metadata section below. This feature uses a "primary_owners" data field and, similarly to "allowed_uids" and "allowed_gids", can be specified when the image is uploaded:

$ curl -H "authentication: $(munge -n)" \
  -d '{ "format" : "squashfs",
        "filepath" : "/absolute/path/to/dir1/dir2/dir3/image_file.squashfs",
        "allowed_uids" : "12345,67890",
        "primary_owners" : "12345",
      }' http://shifter:5000/api/doimport/bluewaters/custom/IMAGE_NAME:TAG/

At the moment, this feature is implemented in the Shifter gateway only because this is where image manipulation takes place. Therefore, when you use a private image with primary owners, you'll see a warning:

WARNING: Couldn't understand key: primary_owners
You may safely ignore this warning. We will submit a patch to Shifter to suppress this warning if this feature gains traction.
Manipulating local image metadata

When Shifter pulls images from Docker Hub, it automatically transfers their medatata into the created Shifter images. With local images, we have to transfer such metadata manually. Shifter recognizes the following metadata entries: environment variables, working directory, and image entrypoint. You can set them when you import the image into Shifter or after the fact. Currently, to update image metadata you must have access to the SquashFS file.

Working directory
Image working directory can be set or changed by adding the "workdir" data field to the image import call:
"workdir" : "/some/path/in/image"
Its value must be an absolute path to a directory in the image. Note, that you have to use --workdir (-w) option without an argument to make Shifter switch to the working directory set by the image:
$ aprun ... shifter --workdir ...
Environment variables
To set or change environment variables in the image, add the "env" data field to your image import call. Its value must be a comma-separated list of environment variable names and their values separated by the equals signs (=):
"env" : "VAR1=VALUE1a:VALUE1b,VAR2=VALUE2"
Entrypoint
To set or change image entrypoint, add "entrypoint" data field to your image import call. Note, that unlike entrypoints in Docker images which can specify an application along with its arguments, Shifter entrypoint can point to an application only.

In summary, a call to update image metadata may look something like this:

$ curl -H "authentication: $(munge -n)" \
  -d '{ "format" : "squashfs",
        "filepath" : "/absolute/path/to/dir1/dir2/dir3/image_file.squashfs",
        "allowed_uids" : "12345,67890",
        "primary_owners" : "12345",
        "workdir" : "/some/path/in/image",
        "env" : "VAR1=VALUE1a:VALUE1b,VAR2=VALUE2",
        "entrypoint" : "/some/executable"
      }' http://shifter:5000/api/doimport/bluewaters/custom/IMAGE_NAME:TAG/
Using import_image.sh script

Shifter installed on Blue Waters provides a script called import_image.sh that you can use for importing images in SquashFS format. The general synopsys is:

$ import_image.sh [options] local-image.sqsh 
Short option Long option Description
-h --help Print a help message.
-v --verbose Enable verbose output
-n --name Name to assign to the Shifter image in Shifter image database.
-p --private Make uploaded image private. Requires -u and/or -g.
-u --user Comma-separated list of user names or UIDs allowed to access the image. Requires -p.
-g --group Comma-separated list of user groups or GUIDs allowed to access the image. Requires -p.
-o --owner Comma-separated list of user names or UIDs allowed to modify the image (Primary Image Owners). Requires -p.
-w --workdir Working directory within the image. Same as WORKDIR in Docker images.
--entrypoint Image entry point (default application).
--env Comma-separated list of environment variables and their values to be set within the image.
Example: --env PATH=/usr/bin:/bin,VARIABLE1=VALUE1,...
-d --dry-run Show commands that would otherwise be executed.
-t --timeout Time (in seconds) to wait for the uploaded image to appear in Shifter database. Defaults to 60 seconds.

Executing Applications in Container Environments

Shifter provides two ways to set up container environment on the Blue Waters compute nodes:

  1. Using the shifter command
  2. By setting UDI environment variable to the name of the image to be used when Shifter job is submitted and then CRAY_ROOTFS environment variable to SHIFTER inside the job.

Both of these methods have to be executed from a Shifter job and we have to provide additional flags to the aprun command, namely:

-b (mandatory) to bypass transfer of shifter executable to the compute nodes
-N 1 (mandatory)* -N 1 limits the number of shifter processes per compute node to 1.
-cc none (recommended) to allow PE migration within assigned NUMA nodes.

So, a typical aprun call in a Shifter job looks like this:

$ aprun -b -N 1 -cc none [other aprun options]
Executing Applications Using shifter command

To execute an application in a container environment using shiftercommand, we have to:

  1. Start a Shifter job
  2. Load Shifter module: module load shifter
  3. Select which image to use and execute the application. Image selection can be done via:
    • --image= flag for the shifter command:
      $ aprun -b ... -- shifter --image=image_name:tag -- app.exe
    • SHIFTER environment variable:
      $ export SHIFTER=image_name:tag
      $ aprun -b ... -- shifter -- app.exe
    • SHIFTER_IMAGETYPE and SHIFTER_IMAGE environment variables:
      $ export SHIFTER_IMAGETYPE=docker
      $ export SHIFTER_IMAGE=image_name:tag
      $ aprun -b ... -- shifter -- app.exe
Remember to prefix names of images imported from SquashFS files with custom:.

Note that you can use image ID instead of its name. This approach bypasses Shifter gateway and, therefore, is a recommended way of specifying a Shifter image when running at scale. For more information, see Scaling up Shifter applications section.
Executing application using UDI variable
We can specify an image we'd like to use in our Shifter job by setting UDI environment variable when we submit a job. We can do that either on the command line:
$ qsub ... -v UDI=image_name:tag
or as a PBS directive in a batch job script:
#PBS -v UDI=image_name:tag
When such a Shifter job starts, we can change the execution environment on the compute nodes to the one we prepared in the container by setting CRAY_ROOTFSenvironment variable to SHIFTER:
$ export CRAY_ROOTFS=SHIFTER
Now, we can execute our application without using the shifter module:
$ aprun -b ... -- app.exe
Remember to prefix names of images imported from SquashFS files with custom:.
qsub command line arguments vs PBS directives in the job script
Execution parameters specified on the command line as arguments to the qsub command take precedence over the PBS directives in the job script. Therefore, if you pass environment variables other than UDI on the command line using the -v flag, you must specify UDI on the command line too. Otherwise, UDI variable specified in the job script will be ignored and not passed to the job. As a result, Shifter environment on the compute nodes will not be set up and, upon setting CRAY_ROOTFS to SHIFTER, all attempts to execute anything with aprun will be failing with:
Exec /bin/pwd failed: chdir /starting/directory No such file or directory.
Remarks on running applications in container environments
  1. shifter command can not override execution environment set with:
    • CRAY_ROOTFS environment variable.
    • UDI environment variable (when you submit a Shifter job).
    When you call shifter command, it attempts to set up a new execution environment on compute nodes and fails because it can not override the one set by CRAY_ROOTFS or UDI.
  2. docker is the default image type in Shifter. This means that you don't have to specify it when working with images pulled from Docker Hub. For example, any of the following four syntaxes to select centos:centos6.9image would work just fine:
    • --image=centos:centos6.9,
    • --image=docker:centos:centos6.9,
    • #PBS -v UDI=centos:centos6.9, and
    • #PBS -v UDI=docker:centos:centos6.9.
    The only time when you do need to specify docker image type is when your image is called docker: docker:docker:some-tag. Otherwise, Shifter would assume that docker: specifies image type and use the rest of the string as image name when, in fact, this is your image tag.
  3. When working with local images (images imported from SquashFS files), you must specify image type custom: --image=custom:my_private_image:tag or #PBS -v UDI=custom:my_private_image:tag.
  4. Shifter configures its execution environment by means of Shifter modules. Each Shifter module can create, unset, prepend, or append environment variables, bind mount directories, execute root- and user-level scripts, and more. Shifter on Blue Waters is configured to automatically load two Shifter modules: gpu and mpich. These modules, among other things, modify the LD_LIBRARY_PATH environment variable.
    If an application in your Shifter environment fails with symbol lookup error, chances are it tries to use a library that it finds in one of the locations Shifter added to LD_LIBRARY_PATH, but which does not provide symbols your application needs. You can workaround this issue in the following ways:
    1. Do not load offending Shifter module.
    You can do so by specifying the Shifter modules you would like to load. For example, if Shifter module gpu causes your application to halt, you can tell Shifter to load mpich module only (shifter --module=mpich) or not load any modules at all (shifter --module=none).
    2. Remove offending directory from LD_LIBRARY_PATH.
    You can do this by executing the following code inside of your Shifter environment:
    $ export LD_LIBRARY_PATH=$(echo -e ${LD_LIBRARY_PATH//\:/\\n} | grep -v /offending/path | tr '\n' ':')
    3. Use LD_PRELOAD.
    You can specify the right library your application should load using LD_PRELOAD environment variable:
    $ LD_PRELOAD=/my/lib.so application.exe
    4. Use ELF runpaths (RPATHs).
    You may also choose to rebuild your application and embed necessary ELF runpaths (RPATHs) into your application executables and libraries. This will make your application independent of the LD_LIBRARY_PATH set by Shifter.
Scaling up Shifter applications
If you plan to run an application in a Shifter environment at scale (thousands of nodes), consider the following recommendations:
Set Shifter environment using UDI environment variable.
This way, Shifter execution environment on compute nodes is set up when the job starts and you won't waste your compute time waiting for shifter command to do that.
Use Shifter image ID instead of image name.
If you specify Shifter image by its name (as in image_name:tag), every compute node running the job will try to contact Shifter Gateway to get an image ID from the Shifter image database. Large number of such requests may easily overwhelm the Gateway and lead to its significantly degraded performance or complete shutdown. Therefore, we recommend using image ID instead of image name in all of your production jobs. Note, that image ID is what shifterimg lookup command returns, see the shifterimg command section below.

Mapping directoties

Because Shifter images are read-only and produced results must be stored on the Blue Waters filesystem, Shifter automatically mounts the following directories into every container environment:

  • /scratch
  • /projects
  • $HOME(/mnt/a/u/sciteam/<username>)

This means that you can use the above locations in your Shifter job script as you would in a regular job script. In addition to these three locations, you can map any existing directory you have access to to any directory in the container environment. Unlike previous versions of Shifter, Shifter 18 allows mapping to directories that don't exist in the image -- they are automatically created in the container runtime environment (though, not in the image).

To mount a directory /bw/dir into the container environment as /container/dir when Shifter image is selected using UDI variable, add -v /bw/dir:/container/dir to the image specification line and wrap it in the double quotes:

-v UDI="image_name:tag -v /bw/dir:/container/dir"
Note that you can use this syntax on the command line (as an argument to the qsub command) and in a job submission script (as a #PBS comment).

To apply the same directory mapping using the shifter module/command, use the --volume= (or -V) option:

$ aprun -b ... -- shifter --volume=/bw/dir:/container/dir ...
or
$ aprun -b ... -- shifter -V /bw/dir:/container/dir ...
Mapping multiple directories

To map several directories into the container runtime environment when Shifter image is selected using UDI variable, specify directory mappings one after another:

$ qsub ... -v UDI="image_name:tag -v /bw/dir:/container/dir -v /bw/dir2:/container/dir2"...
or, as a PBS comment in a job submission script:
#PBS -v UDI="image_name:tag -v /bw/dir:/container/dir -v /bw/dir2:/container/dir2"...
You can also use a single -v option and separate mappings with semicolons:
#PBS -v UDI="image_name:tag -v /bw/dir:/container/dir;/bw/dir2:/container/dir2"...

And again, to apply the same directory mapping using the shifter module/command, use multiple --volume= (or -V) options:

... shifter --volume=/bw/dir1:/container/dir1 --volume=/bw/dir2:/container/dir2
or
... shifter -V /bw/dir1:/container/dir1 -V /bw/dir2:/container/dir2

You can use a single --volume or -V option and combine several directory mappings into one by wrapping them in quotation marks and separating them with semicolons:

--volume="/bw/dir1:/container/dir1;/bw/dir2:/container/dir2"
Remarks on mapping directories
Shifter imposes some restrictions on user-defined directory mappings. In particular, one can not:
  1. Overwrite volume mappings specified by Shifter.
  2. Mount a directory to a folder whose name begins with: /etc, /var, etc, var, /opt/udiImage, or opt/udiImage. If you try to mount a directory to one of the above folders, Shifter will emit a Failed to parse volume map options error:
    $ aprun -b ... -- shifter ... --volume=/mnt/a/u/sciteam/<username>:/etc -- ...
    Invalid Volume Map: /mnt/a/u/sciteam/<username>:/etc, aborting! 1
    Failed to parse volume map options
    
  3. Mount a directory to /opt or opt. This means that while we can't mount a directory to /opt and /opt/udiImage folders, we can mount directories to /opt/<dir>, provided that <dir> is not udiImage.
  4. Use symbolic links when specifying the directory to be mapped. For example, /u/sciteam/user:/path/in/image mapping will fail and the correct syntax is /mnt/a/u/sciteam/user:/path/in/image. Because of that, it might be a good idea to wrap mapped directory in $(readlink -f ...):
    $(readlink -f /u/sciteam/user):/path/in/image

Note, that Shifter allows mounting a directory at /var/tmp.

Directory mapping flags

When we map directories into the container environment, we can give them certain properties by adding extra flags after the mapping specification in the form of :flag. Current version of Shifter on Blue Waters supports two flags:

The :ro flag
--volume=/bw/dir:/container/dir:ro
This flag marks the directory mounted in the container environment (/container/dir) as read-only and any attempt to write to it will result in Read-only file system error.
The :perNodeCache flag
--volume=/bw/dir:/container/dir:perNodeCache=size=10G
This flag instructs Shifter to create an XFS file of specified size on every compute node and mount it at the location specified by the mapping (/container/dir). The source directory (/bw/dir) is required but ignored. Cache size can be specified as a number, in which case it is interpreted as the number of bytes. Alternatively, you can use a one-letter suffix after the number to specify the units: b for bytes, k for kibibytes (210 bytes), m for mebibytes (220 bytes), g for gibibytes (230 bytes), t for tebibytes (240 bytes), p for pebibytes (250 bytes), and e for exbibytes (260 bytes). You can use uppercase letters instead (B, K, M...). Characters after the first suffix character are ignored, so all of the below mappigs create 10 gibibyte-large cache:
:perNodeCache=size=10G
:perNodeCache=size=10GB
:perNodeCache=size=10Gibabyte
Accessing compute nodes running Shifter jobs via SSH

Just like with any other application, you might need to interact with the application running in a Shifter environment for debugging, monitoring, or other purposes. To enable such interactions, Shifter allows users to log in to compute nodes that are part of its pool via the standard ssh command line tool. There are several requirements in order to make use of this feature:

1. Specify UDI as a PBS directive.
To allow users log in to its compute nodes, Shifter can start up SSH daemons. The daemons on the compute nodes can be launched only by the prologue script, which is executed when the job starts. Therefore, in order to be able to login to compute nodes with a Shifter job running on it, it is necessary to specify UDI as a PBS directive.
2. Prepare special SSH key pair.
On startup, the SSH daemons enabled by Shifter look for a private SSH key in $HOME/.shifter and wait for a connection on port 1204 authenticated with this key. To prepare such a key pair, execute:
$ mkdir -p ~/.shifter
$ ssh-keygen -t rsa -f ~/.shifter/id_rsa -N ''

Once the above two steps are completed, you can log into the compute nodes using:

$ ssh -p 1204 -i ~/.shifter/id_rsa -o StrictHostKeyChecking=no \
-o UserKnownHostsFile=/dev/null -o LogLevel=error nodename

It is advisable to save all the above options into a configuration file. To do that, execute:

$  cat <<EOF > ~/.shifter/config
Host *
    Port 1204
    IdentityFile ~/.shifter/id_rsa
    StrictHostKeyChecking no
    UserKnownHostsFile /dev/null
    LogLevel error
EOF

Now, we can log in to the compute nodes with a simple:

$ ssh -F ~/.shifter/config nodename

To login to a remote machine using ssh command we have to specify the remote machine's network name. To find the name of compute nodes assigned to the Shifter job, execute the following command on a MOM node before setting the CRAY_ROOTFS environment variable:

$ aprun -b -N1 -cc none -n$PBS_NUM_NODES -- hostname

You should see a list of names of the form: nidXXXXX, where XXXXX is a five-digit number. Use these to connect to the compute nodes:

$ ssh -F ~/.shifter/config nidXXXXX

Make sure, however, that you do not accidentally copy a network name of a MOM node where you execute all aprun commands.

Important
ssh will fail with a Permission denied error if your login shell does not exist in the container or is not listed in the container's /etc/shells file.

Using GPUs in Shifter jobs

GPU support is automatically enabled when you use shifter command for executing your application:

$ module load shifter
$ module load cudatoolkit
$ aprun -q -b -n 1 -- shifter --image=centos:6.9 -- nvidia-smi
Fri Jun 19 16:49:11 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.46                 Driver Version: 390.46                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K20X          On   | 00000000:02:00.0 Off |                    0 |
| N/A   27C    P8    16W / 225W |      0MiB /  5700MiB |      0%   E. Process |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

You can also use GPUs in Shifter jobs when you specify image using UDI-based approach. However, in such a case you have to append the following locations to PATH and LD_LIBRARY_PATH:

PATH /opt/cray/nvidia/390.46-1_1.0502.2481.1.1.gem/bin
/opt/nvidia/cudatoolkit9.1/9.1.85_3.10-1.0502.df1cc54.3.1/bin
LD_LIBRARY_PATH /opt/cray/nvidia/390.46-1_1.0502.2481.1.1.gem/lib64
/opt/nvidia/cudatoolkit9.1/9.1.85_3.10-1.0502.df1cc54.3.1/lib64

and set CUDA_VISIBLE_DEVICES to 0 before you call aprun:

$ qsub ... -v UDI=centos:6.9 ...
# On a MOM node
$ PATH=$PATH:/opt/cray/nvidia/390.46-1_1.0502.2481.1.1.gem/bin
$ PATH=$PATH:/opt/nvidia/cudatoolkit9.1/9.1.85_3.10-1.0502.df1cc54.3.1/bin
$ export PATH
$ LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/cray/nvidia/390.46-1_1.0502.2481.1.1.gem/lib64
$ LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/nvidia/cudatoolkit9.1/9.1.85_3.10-1.0502.df1cc54.3.1/lib64
$ export LD_LIBRARY_PATH
$ export CUDA_VISIBLE_DEVICES=0
$ export CRAY_ROOTFS=SHIFTER
$ aprun -b -q -n 1 -- nvidia-smi
Fri Jun 19 19:27:10 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.46                 Driver Version: 390.46                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K20X          On   | 00000000:02:00.0 Off |                    0 |
| N/A   25C    P8    17W / 225W |      0MiB /  5700MiB |      0%   E. Process |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

MPI in Shifter

Shifter allows applications in containers use Blue Waters' high-speed interconnect for MPI-based communications. It does that by swapping MPI libraries inside of the container with CRAY MPI libraries on Blue Waters at run time.

When you use shifter command for executing applications, MPI support is enabled automatically. When you use UDI-based approach, MPI support isn't enabled (because it requires changing user's environment variables), so one has to do all the required steps manually. We describe these steps below.

Now let's review the requirements for enabling MPI in Shifter.

1. Compatible MPI Implementation
Application in the image can be compiled against any MPI implementation that's part of MPICH ABI Compatibility Initiative. Currently, the list of compatible MPI implementations includes:

  • MPICH v3.1
  • Intel® MPI Library v5.0
  • Cray MPT v7.0.0
  • MVAPICH2 2.0
  • Parastation MPI 5.1.7-1
  • IBM MPI v2.1
Note, that Shifter on Blue Waters works with MPI implementations installed using package managers such as yum and apt-get.
2. Compatible Docker images
Swapping MPI libraries at run time requires GNU C library (glibc) version 2.17 or above. This means that you can use containers based on CentOS 7, Ubuntu 14.04, or newer. If your Docker image has older glibc, you can compile a newer one from source and then build your application against it. Make sure that your application uses correct glibc.

To enable MPI support when you specify image using UDI-based approach, you have to add the following locations to the LD_LIBRARY_PATH environment variable:

LD_LIBRARY_PATH /opt/cray/pmi/5.0.11/lib64
/opt/cray/libsci/18.12.1/GNU/5.1/x86_64/lib
/opt/cray/mpt/7.7.4/gni/mpich-gnu-abi/5.1/lib
/opt/cray/xpmem/0.1-2.0502.64982.7.27.gem/lib64
/opt/cray/dmapp/7.0.1-1.0502.11080.8.74.gem/lib64
/opt/cray/ugni/6.0-1.0502.10863.8.28.gem/lib64
/opt/cray/udreg/2.3.2-1.0502.10518.2.17.gem/lib64
/opt/cray/wlm_detect/1.0-1.0502.64649.2.2.gem/lib64
/opt/cray/alps/5.2.4-2.0502.9774.31.12.gem/lib64
/opt/udiImage/modules/mpich/lib64

and set MPICH_GNI_MALLOC_FALLBACK to 1 before you call aprun:

$ qsub ... -v UDI=centos:6.9 ...
# On a MOM node
$ LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/cray/pmi/5.0.11/lib64
$ LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/cray/libsci/18.12.1/GNU/5.1/x86_64/lib
$ LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/cray/mpt/7.7.4/gni/mpich-gnu-abi/5.1/lib
$ LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/cray/xpmem/0.1-2.0502.64982.7.27.gem/lib64
$ LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/cray/dmapp/7.0.1-1.0502.11080.8.74.gem/lib64
$ LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/cray/ugni/6.0-1.0502.10863.8.28.gem/lib64
$ LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/cray/udreg/2.3.2-1.0502.10518.2.17.gem/lib64
$ LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/cray/wlm_detect/1.0-1.0502.64649.2.2.gem/lib64
$ LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/cray/alps/5.2.4-2.0502.9774.31.12.gem/lib64
$ LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/udiImage/modules/mpich/lib64
$ export LD_LIBRARY_PATH
$ export MPICH_GNI_MALLOC_FALLBACK=1
$ export CRAY_ROOTFS=SHIFTER
$ aprun -b -q -n 1 -- ./mpi-application

MPI + GPU in Shifter jobs

GPU and MPI support can be enabled in the same Shifter job following the guidelines detailed in GPU and MPI sections above. Note, that currently CUDA-aware MPI in Shifter jobs does not work. We're investigating the issue and will notify our users and update this page accordingly if and when we find a resolution.

shifter command

shifter command executes application provided on the command line (or an image entrypoint if no application is specified) in a container environment. Below is the list of its options and their descriptions.
Short option Long option Description
-h --help Show shifter command help page (usage information).
-v --verbose Show verbose messages (when applicable).
-i --image Shifter image to use. Examples:
  • -i imageType>:<imageTag>
  • --image=<imageType>:<imageTag>
Equals sign (=) is optional. Allowed image types: docker, custom, id.
--entry --entrypoint Command to be executed when you don't specify an application to be executed in the container environment. Examples:
  • --entry
  • --entrypoint=/executable
When redefining image entry point, equals sign (=) is mandatory.
-w --workdir Set working directory. Examples:
  • -w
  • --workdir=/some/path
Without an argument, Shifter changes working directory to the working directory set by the image. With an argument, Shifter changes working directory to the specified directory. When redefining image working directory, equals sign (=) is mandatory.
-E --clearenv Don't pass environment of the MOM nodes to the compute nodes. Recommended.
-e --env Set an environment variable in the container environment. Examples:
  • -e PATH=/usr/bin:/bin
  • --env=VAR=/usr/bin:/bin
Can be specified multiple times. Equals sign (=) is optional.
--env-file Read environment variables from the specified file. Example: --env-file=/path/to/env/file
Empty lines and lines starting with # are ignored. Can be specified multiple times. Equals sign (=) is optional.
-V --volume Mounts /path/to/bind as /mnt/in/image in a container. Examples:
  • -V /path/to/bind:/mnt/in/image
  • --volume=/path:/mnt;/path2:/mnt2
Equals sign (=) is optional. See Mapping directoties for more info.
-m --module Load only specified Shifter modules (not to be confused with Blue Waters modules). Examples:
  • -m module
  • --module=module1,module2
Available modules: gpu*, mpich*, x11, gcc, and none. Modules marked with an asterisk are loaded by default. List multiple modules by separating their names with commas. Equals sign (=) is optional. See Shifter modules section for more info.

shifterimg command

shifterimg command provides means for creating, manipulating, and quering Shifter images. Below is the list of its options and their descriptions.
Short option Long option Description
-h --help Show shifterimg command help page (usage information).
-v --verbose Show verbose messages (when applicable). Frequently used with lookup subcommand.
-u --user Comma-separated list of users allowed to access private image pulled from a private Docker Hub repository.
-g --group Comma-separated list of groups allowed to access private image pulled from a private Docker Hub repository.

Below is the list of shifterimg subcommands and their descriptions.

Subcommand Description
images Show a list of Shifter images already available on the system that you have access to.
Example: $ shifterimg images
lookup Get the ID of the specified image in the Shifter image database from the Shifter gateway. Returned ID prepended by id: can be used instead of image_type:image_name in a Shifter job that does not use UDI environment variable. This approach allows one to bypass Shifter gateway, which is important when running at scale because otherwise every compute node running shifter command contacts Shifter gateway to get this ID, what, in turn, may overwhelm the gateway.
Example: $ shifterimg -v lookup centos:centos6.9
pull Pull specified image from Docker Hub.
Example: $ shifterimg pull centos:centos6.9
login Login to a private image repository on Docker Hub.
Example: $ shifterimg login

Shifter modules

Generally speaking, in order to enable a feature in a Shifter container (e.g., MPI or GPU access), one has to:

  1. set environment variables,
  2. inject directories, and
  3. write scripts.

Shifter modules provide a way to support all of the above modifications that apply to all containers at a system and user levels. For example, MPI support in Shifter 18 is implemented as a Shifter module called mpich, and GPU support — as a module called gpu.

Currently, Blue Waters has the following Shifter modules set up: gpu, mpich, x11, gcc, and none.

gpu Enables GPU support. Default module.
mpich Enables MPI support. Default module.
x11 Provides access to Blue Waters' NVIDIA drivers.
gcc Provides access to Blue Waters' gcc/6.3.0.
none Disables all modules (use with care).

If you do all of the above manipulations and they are the same across all of your jobs, you may contact us at help+bw@ncsa.illinos.edu to request to port your manipulations into a new module.

Official documentation for Shifter modules can be found on the Shifter Read the Docs website.