Skip to Content

Using Hydro for Research


Table of contents

1. Purpose

The Hydro cluster combines a current OS and software stack, 256 GB of memory per node, 40 Gb/s WAN bandwidth, and direct high-performance access to the Blue Waters home, project, and scratch filesystems, providing several new capabilities for Blue Waters users:

  • Incorporate software that cannot run on the Cray nodes into scientific workflows without the need to move data off of Blue Waters.
  • Incorporate calculations that require more than 128 GB of memory per node into scientific workflows without the need to move data off of Blue Waters.
  • Efficiently import/export data to/from external storage services that are not supported by Globus Online.

The Hydro cluster is only intended to support workflow components that require relatively few node hours but cannot run on the Blue Waters Cray nodes. Due to the small size of the cluster a reasonable effort should be made to enable the entire workflow in the Cray environment. Access to the Hydro cluster may be restricted to NGA-related projects with a clear need for the resource.

2. Quick Start Guide to Hydro

This information is for users who are adept at using BW and are only interested in the basic workflow.

1.   Getting Access - Limited to BW Users who need access to Hydro

2.   Log in to Hydro - example:  ssh hydro

3.   Compile Code - example:  mpicc -o foo.exe foo.c

4.   Run Code - example:  srun -n 1 ./foo.exe

3. How Hydro is different from BW

  • Hydro does not use the Blue Waters Cray interconnect
  • Hydro is a standard Linux cluster that uses commodity hardware and Infiniband
  • Hydro runs a separate and different scheduler (Slurm) than Blue Waters
  • Software Environment Modules are not consistent between Hydro and Blue Waters
  • Hydro does not use Blue Waters allocation system and use is not charged
  • There are no GPUs on Hydro

4. System Description

Hardware

  • 1 Login and 41 Compute nodes
  • Dell PowerEdge R720
  • Dual Socket (2) Intel Xeon CPU E5-2690 (8 core) @ 2.90GHz 20MB Cache (16 cores per node) (HT disabled)
  • 256 GB of memory
  • FDR 56Gb/s InfiniBand (Fat Tree configuration) ~2:1 oversubscription
  • There are no GPUs on Hydro

Software

  • Centos 8.3
  • Kernel 4.18.0
  • Software
    • Currently software is installed using Easybuild. Spack will be made available to software installation. 
    • A complete list of isntallated software can be generaged by the command module avail on the Hydro login node.
    • A sample of select packages 
      • OpenMPI 
      • FFTW
      • Python
        • Tensorflow
      • GCC 10.2
      • R
      • GDAL

Storage

5. Level of Expertise Expected for Blue Waters Hydro Users

Most users of systems like Blue Waters have experience with other large high-performance computer systems.  The instructions on this portal generally assume that the reader knows how to use a Unix-style command line, edit files, run (and modify) Makefiles to build code, write scripts, and submit jobs to a batch queue system.  There are some things that work slightly differently on the Cray XE system than other systems; the portal documentation covers those in detail, but we assume that you know the basics already.

If you're not at that level yet (if you're unfamiliar with things like ssh, emacs, vi, jpico, qsub, make, top) then you'll need to gain some knowledge before you can use Blue Waters effectively.  Here are a few links to resources that will teach you some of the basics about Unix command line tools and working on a high-performance computing system:

Access and Policy

Access to the Hydro cluster is limited to users of allocated Blue Waters projects and is not a separately allocated resource. 

If you are part of an allocated project on Blue Waters and would like access to the Hydro cluster please send email to help+bw@ncsa.illinois.edu with a justification for your need to use the cluster.

For allocations on Blue Waters please see the Allocations page.

 Logging In

Connect to Blue Waters via the external login hosts at bw.ncsa.illinois.edu using ssh with your NCSA DUO passcode or push response from your smartphone (see instructions below)

  • For help activating your NCSA Duo account, reference this page.
  • To check if your NCSA Duo is working properly, visit here. Depending on the choice you make there, you should receive a pass code or a push from Duo.

Open a command prompt (Run command on Windows):

Setting Up the Environment 

1.  Shells and Modules

The default shell is /bin/bash.  You can change it by sending a request via email to help+bw@ncsa.illinois.edu. (can they not do this through bw portal as bw LDAP is shared between the two systems?)

The user environment is controlled using the modules environment management system. Modules may be loaded, unloaded, or swapped either on a command line or in your $HOME/.bashrc (.cshrc for csh ) shell startup file.

The command "module avail | more" will display the avail modules on the system one page at a time.

The module command is a user interface to the Lmod package. The Lmod package provides for the dynamic modification of the user’s environment via modulefiles (a modulefile contains the information needed to configure the shell for an application). Modules are independent of the user’s shell, so both tcsh and bash users can use the same commands to change the environment.

Lmod User Guide

Useful Module commands:

Command

Description

module avail lists all available modules
module list lists currently loaded modules
module help modulefile help on module modulefile
module display modulefile Display information about modulefile
module load modulefile load modulefile into current shell environment
module unload modulefile remove modulefile from current shell environment
module swap modulefile1 modulefile2 unload modulefile1 and load modulefile2

To include a particular software stack in your default environment for hydro login and computes

Log into hydro login node, manipulate your modulefile stack until satisfied. module save; This will create a .lmod.d/default file. It will be loaded on hydro login or computes on next login or job execution.

Useful User Defined Module Collections:

Command

Description

module save Save current modulefile stack to ~/.lmod.d/default
module save collection_name Save current modulefile stack to ~/.lmod.d/collectioin_name
module restore Load ~/.lmod.d/default if it exists or System default
module restore collection_name Load your ~/.lmod.d/collectioin_name
module reset Reset your modulefiles to System default
module disable collection_name Disable collection_name by adding collection_name~
module savelist List all your ~/.lmod.d/collections
module describe collection_name List collection_name modulefiles

2. Home Directory Permissions

By default, user home directories and /scratch directories are closed (permissions 700) with a parent directory setting that prevents users from opening up the permissions. See the File and Directory Access Control List  page (https://bluewaters.ncsa.illinois.edu/facl) for Blue Waters file system policies.  The /projects file system is designed as common space for your group; if you want a space that all your group members can access, that's a good place for it.  As always, your space on the /scratch file system is the best place for job inputs and outputs.

3. Programming Environment

The GNU compilers (GCC) version 10.2.0 are in the default user environment. Version 9.3.0  is also available — load this version with the command:

module load GCC/9.3.0

Compiling

To compile MPI code, use the mpicc, mpiCC, or mpif90 compiler wrappers to automatically include the OpenMPI libraries.

For example:
mpicc -o mpi_hello mpi_hello.c

If the code also uses OpenMP, include the -fopenmp flag:
mpicc -o omp_mpi_hello omp_mpi_hello.c -fopenmp

Job Submission

1. Running Batch Jobs

User access to the compute nodes for running jobs is available via a batch job. Hydro uses the Slurm Workload Manager for running batch jobs. See the sbatch section for details on batch job submission.

Please be aware that the interactive nodes are a shared resource for all users of the system and their use should be limited to editing, compiling and building your programs, and for short non-intensive runs.

Note: User processes running on the interactive nodes are killed automatically if they accrue more than 30 minutes of CPU time or if more than 4 identical processes owned by the same user are running concurrently.

An interactive batch job provides a way to get interactive access to a compute node via a batch job. See the srun or salloc section for information on how to run an interactive job on the compute nodes. Also, a very short time test queue provides quick turnaround time for debugging purposes.

To ensure the health of the batch system and scheduler users should refrain from having more than 1,000 batch jobs in the queues at any one time.

There is currently 1 partition/queue named normal. The normal partition's default wallclock time is 4 hours with a limit of 7 days. Compute nodes are not shared between users.

sbatch

Batch jobs are submitted through a job script using the sbatch command. Job scripts generally start with a series of SLURM directives that describe requirements of the job such as number of nodes, wall time required, etc… to the batch system/scheduler (SLURM directives can also be specified as options on the sbatch command line; command line options take precedence over those in the script). The rest of the batch script consists of user commands.

The syntax for sbatch is:

sbatch [list of sbatch options] script_name

The main sbatch options are listed below.  Refer to the sbatch man page for options.

  • The common resource_names are:
    --time=time

    time=maximum wall clock time (d-hh:mm:ss) [default: maximum limit of the queue(partition) submittied to]

    --nodes=n

    --ntasks=p Total number of cores for the batch job

    --ntasks-per-node=p Number of cores per node (same as ppn under PBS)

    n=number of 16-core nodes [default: 1 node]
    p=how many cores(ntasks) per job or per node(ntasks-per-node) to use (1 through 16) [default: 1 core]

    Examples:
    --time=00:30:00
    --nodes=2
    --ntasks=32

    or

    --time=00:30:00
    --nodes=2
    --ntasks-per-node=16
     

    Memory needs: The compute nodes have 256GB. 

    Example:
    --time=00:30:00
    --nodes=2
    --ntask=32
    --mem=118000

    or

    --time=00:30:00
    --nodes=2
    --ntasks-per-node=16
    --mem-per-cpu=7375

Useful Batch Job Environment Variables

Description

SLURM Environment Variable

Detail Description

PBS Environment Variable
(no longer valid)

JobID $SLURM_JOB_ID Job identifier assigned to the job $PBS_JOBID
Job Submission Directory $SLURM_SUBMIT_DIR By default, jobs start in the directory the job was submitted from. So the   cd $SLURM_SUBMIT_DIR command is not needed. $PBS_O_WORKDIR
Machine(node) list $SLURM_NODELIST variable name that containins the list of nodes assigned to the batch job $PBS_NODEFILE
Array JobID $SLURM_ARRAY_JOB_ID
$SLURM_ARRAY_TASK_ID
each member of a job array is assigned a unique identifier (see the Job Arrays section) $PBS_ARRAYID

Here is a sample Batch script:

#!/bin/bash

 

### set the wallclock time

#SBATCH --time=00:30:00

 

### set the number of nodes, tasks per node, and cpus per task for the job

#SBATCH --nodes=3

#SBATCH --ntasks-per-node=1

#SBATCH --cpus-per-task=16

 

### set the job name

#SBATCH --job-name="hello"

 

### set a file name for the stdout and stderr from the job

### the %j parameter will be replaced with the job ID.

### By default, stderr and stdout both go to the --output

### file, but you can optionally specify a --error file to

### keep them separate

#SBATCH --output=hello.o%j

##SBATCH --error=hello.e%j

 

 

### set email notification

##SBATCH --mail-type=BEGIN,END,FAIL

##SBATCH --mail-user=username@host

 

### In case of multiple allocations, select which one to charge

##SBATCH --account=xyz

 

### For OpenMP jobs, set OMP_NUM_THREADS to the number of

### cpus per task for the job step

export OMP_NUM_THREADS=4

 

## Use srun to run the job on the requested resources. You can change --ntasks-per-node and

## --cpus-per-task, as long as --cpus-per-task does not exceed the number requested in the

## sbatch parameters

 

srun --ntasks=12 --ntasks-per-node=4 --cpus-per-task=4 ./hellope

See the sbatch man page for additional environment variables available.

srun

The srun command initiates an interactive job on the compute nodes.

For example, the following command:

srun --time=00:30:00 --nodes=1 --ntasks-per-node=16 --pty /bin/bash

will run an interactive job in the ncsa queue with a wall clock limit of 30 minutes, using one node and 16 cores per node. You can also use other sbatch options such as those documented above.

After you enter the command, you will have to wait for SLURM to start the job. As with any job, your interactive job will wait in the queue until the specified number of nodes is available. If you specify a small number of nodes for smaller amounts of time, the wait should be shorter because your job will backfill among larger jobs. You will see something like this:

srun: job 123456 queued and waiting for resources

Once the job starts, you will see:

srun: job 123456 has been allocated resources

and will be presented with an interactive shell prompt on the launch node. At this point, you can use the appropriate command to start your program.

When you are done with your runs, you can use the exit command to end the job.

scancel/qdel

The scancel command deletes a queued job or kills a running job.

  • scancel JobID deletes/kills a job.

2. Job Dependencies

Job dependencies allow users to set execution order in which their queued jobs run. Job dependencies are set by using the ??dependency option with the syntax being ??dependency=<dependency type>:<JobID>. SLURM places the jobs in Hold state until they are eligible to run.

The following are examples on how to specify job dependencies using the afterany dependency type, which indicates to SLURM that the dependent job should become eligible to start only after the specified job has completed.

On the command line:

sbatch --dependency=afterany:<JobID> jobscript.pbs

In a job script:

#!/bin/bash
#SBATCH --time=00:30:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=16
#SBATCH --job-name="myjob"
#SBATCH --output=myjob.o%j
#SBATCH --dependency=afterany:<JobID>

In a shell script that submits batch jobs:

#!/bin/bash
JOB_01=`sbatch jobscript1.sbatch |cut -f 4 -d " "`
JOB_02=`sbatch --dependency=afterany:$JOB_01 jobscript2.sbatch |cut -f 4 -d " "`
JOB_03=`sbatch --dependency=afterany:$JOB_02 jobscript3.sbatch |cut -f 4 -d " "`
...

Note: Generally the recommended dependency types to use are after, afterany, afternotok and afterok. While there are additional dependency types, those types that work based on batch job error codes may not behave as expected because of the difference between a batch job error and application errors. See the dependency section of the sbatch manual page for additional information (man sbatch).

3. Job Arrays

If a need arises to submit the same job to the batch system multiple times, instead of issuing one sbatch command for each individual job, users can submit a job array. Job arrays allow users to submit multiple jobs with a single job script using the ??array option to sbatch. An optional slot limit can be specified to limit the amount of jobs that can run concurrently in the job array. See the sbatch manual page for details (man sbatch). The file names for the input, output, etc. can be varied for each job using the job array index value defined by the SLURM environment variable SLURM_ARRAY_TASK_ID.

A sample batch script that makes use of job arrays is available in /projects/consult/slurm/jobarray.sbatch.

Notes:

  • Valid specifications for job arrays are
    --array 1-10
    --array 1,2,6-10
    --array 8
    --array 1-100%5 (a limit of 5 jobs can run concurrently) 

     

     
  • You should limit the number of batch jobs in the queues at any one time to 1,000 or less. (Each job within a job array is counted as one batch job.)
  • Interactive batch jobs are not supported with job array submissions.
  • For job arrays, use of any environment variables relating to the JobID (e.g., PBS_JOBID) must be enclosed in double quotes.
  • To delete job arrays, see the qdel command section.

4. Translating PBS Scripts to Slurm Scripts

The following table contains a list of common commands and terms used with the TORQUE/PBS scheduler, and the corresponding commands and terms used under the Slurm scheduler. This sheet can be used to assist in translating your existing PBS scripts into Slurm scripts to be read by the new scheduler, or as a reference when creating new Slurm job scripts. 

User Commands

User Commands

PBS/Torque

Slurm

Job submission

qsub [script_file]

sbatch [script_file]

Job deletion

qdel [job_id]

scancel [job_id]

Job status (by job)

qstat [job_id]

squeue [job_id]

Job status (by user)

qstat -u [user_name]

squeue -u [user_name]

Job hold

qhold [job_id]

scontrol hold [job_id]

Job release

qrls [job_id]

scontrol release [job_id]

Queue list

qstat -Q

squeue

Node list

pbsnodes -l

sinfo -N OR scontrol show nodes

Cluster status

qstat -a

sinfo

Environment

Environment

PBS/Torque

Slurm

Job ID

$PBS_JOBID

$SLURM_JOBID

Submit Directory

$PBS_O_WORKDIR

$SLURM_SUBMIT_DIR

Submit Host

$PBS_O_HOST

$SLURM_SUBMIT_HOST

Node List

$PBS_NODEFILE

$SLURM_JOB_NODELIST

Q

$PBS_ARRAYID

$SLURM_ARRAY_TASK_ID

Job Specifications

Job Specification

PBS/Torque

Slurm

Script directive

#PBS

#SBATCH

Queue/Partition

-q [name]

-p [name]     *Best to let Slurm pick the optimal partition

Node Count

-l nodes=[count]

-N [min[-max]]     *Autocalculates this if just task # is given

Total Task Count

-l ppn=[count] OR -l mppwidth=[PE_count]

-n OR --ntasks=ntasks

Wall Clock Limit

-l walltime=[hh:mm:ss]

-t [min] OR -t [days-hh:mm:ss]

Standard Output File

-o [file_name]

-o [file_name]

Standard Error File

-e [file_name]

-e [file_name]

Combine stdout/err

-j oe (both to stdout) OR -j eo (both to stderr)

(use -o without -e)

Copy Environment

-V

--export=[ALL | NONE | variables]

Event Notification

-m abe

--mail-type=[events]

Email Address

-M [address]

--mail-user=[address]

Job Name

-N [name]

--job-name=[name]

Job Restart

-r [y | n]

--requeue OR --no-requeue

Resource Sharing

-l naccesspolicy=singlejob

--exclusive OR --shared

Memory Size

-l mem=[MB]

--mem=[mem][M | G | T] OR --mem-per-cpu=[mem][M | G | T]

Accounts to charge

-A OR -W group_list=[account]

--account=[account] OR -A

Tasks Per Node

-l mppnppn [PEs_per_node]

--tasks-per-node=[count]

CPUs Per Task

 

--cpus-per-task=[count]

Job Dependency

-d [job_id]

--depend=[state:job_id]

Quality of Service

-l qos=[name]

--qos=[normal | high]

Job Arrays

-t [array_spec]

--array=[array_spec]

Generic Resources

-l other=[resource_spec]

--gres=[resource_spec]

Job Enqueue Time

-a “YYYY-MM-DD HH:MM:SS”

--begin=YYYY-MM-DD[THH:MM[:SS]]

 

Frequently Asked Questions

  • Is my Blue Water's allocation charged for Hydro use?
    • No. There is currently no plan to charge for use of Hydro. (link to How is BW different)
  • I see the following when I log in: Lmod has detected the following error:  The following module(s) are unknown:...
    • The modules environments are different between Blue Waters and Hydro. See here.
  • If I have an issue, who do I contact? 
    • help+bw@ncsa.illinois.edu