Skip to Content

Storage

Online storage consists of three Lustre storage systems, each on a separate mount point.   The Online volumes look like disk partitions from command line shells (although their architecture is more complicated).  Users can use shell commands like "cd", "ls", "mkdir" within those volumes.  The Online volumes are accessible from the Blue Waters login nodes, on the MOM nodes where jobs scripts are run, and from compute nodes.  The Online storage volumes are also accessible through Globus Online via the ncsa#BlueWaters endpoint.  All Online storage are Lustre file systems and the terms Lustre and Online are used interchangably.Data storage on Blue Waters is divided up into Online and Nearline; they differ by how the user gets at the data.

Blue Waters also has a Nearline tape storage system. The Nearline system does have a disk-like file system structure, but it is not mounted on Blue Waters and it cannot be seen from the Blue Waters command line.  It can be accessed through Globus Online using the ncsa#Nearline endpoint.  The file system structure of Nearline in some ways mirrors the Lustre file system structure on Blue Waters (your group's area is in /projects, for instance) but it is NOT the same; they are separate file systems.

The key difference between Online (Lustre) storage and the Nearline storage is accessibility.  Online is mounted on compute nodes, mom nodes (where job scripts run) and login nodes; Nearline is not.  Both Online storage and Nearline storage are accessible through Globus Online.  

All Blue Waters storage is limited by quotas.

storage system

type & name

 

block quota, default file quota (inodes)

quota

type

purged

sample pathname

Globus

Endpoint

Lustre Home 1 TB, 5 M files user no

/u/sciteam/<myusername>

(from command line OR Globus)

ncsa#BlueWaters
Lustre Projects 5 TB, 25 M files group no

/projects/sciteam/<groupcode>

(from command line OR Globus)

ncsa#BlueWaters
Lustre Scratch 50 TB, 50 M files group

no

/scratch/sciteam/<myusername>

(from command line OR Globus)

ncsa#BlueWaters

Access to Nearline will end  3/31/2020

storage system

type & name

 

block quota, default file quota (inodes)

quota

type

purged

sample pathname

Globus

Endpoint

Nearline Home

(tape mass store)

5 TB (READ ONLY) user no

/~/

(only from Globus)

ncsa#Nearline

Nearline Projects

(tape mass store)

50 TB (READ ONLY) group no

/projects/sciteam/<groupcode>

(only from Globus)

ncsa#Nearline


 

Lustre (Online) File Systems

These are the three Lustre file systems mounted on Blue Waters.

Lustre: HOME

/u - 2.2 PB with 36 OSTs

Visible as /u on Blue Waters command line.  

Visible as ncsa#BlueWaters endpoint, /u path within Globus.

Your home directory will typically be /u/sciteam/<yourusername>.  

You should use HOME for keeping source code, etc. For performance reasons, jyou should run from SCRATCH in your batch jobs, but HOME is accessible for batch jobs as well.

There is a quota on Lustre home for each user, for files stored in /u/sciteam/<myusername>.  The default quota is 1 TB.  

Lustre: PROJECTS

/projects - 2.2 PB with 36 OSTs

Visible as /projects path on Blue Waters command line.

Visible as ncsa#BlueWaters endpoint, /projects path within Globus.

Space in projects should be used for sharing frequently used large files within a science team. Each team has space in/projects/sciteam/<PSN> where <PSN> is the 3 letter "group code" that is part of your project allocation name. Use the 'id <myusername>' command to obtain the correct 3 letter code.  Here's a made-up example of user bob finding his group code:

> id bob
uid=11111(bob) gid=12222(PRAC_jaa) groups=12222(PRAC_jaa)

(Bob's 3-character groupid is indicated in this example in the green box. Other groups might have different prefix letters instead of PRAC; University of Illinois users will have "ILL" and Great Lakes Consotrtium users will have "GLCPC" here, for instance.)  In this toy example, bob's group directory would be /projects/sciteam/jaa/.  

Lusture /projects has a group quota.  By default it is 5 TB per group.  All files within a group's project directory count against the group's quota.  Any one user can use it all.  

Lustre: /SCRATCH

/scratch - 22 PB with 360 OSTs

Visible as /scratch on Blue Waters command line.

Visible as ncsa#BlueWaters endpoint, /scratch path within Globus.

Your pathname in scratch will be /scratch/sciteam/<yourusername>.  You should use this directory for large job outputs and any other files that require high performance reading or writing. We suggest using properly set stripe counts and sizes for directories holding large files or set on a per file basis.

For example:

$ cd /scratch/sciteam/$USER
$ mkdir largefiles
$ lfs setstripe -c 24 largefiles

There are 2 motivations for setting the stripe count more than the default of 1: a large file or parallel access pattern (via MPI-IO, parallel HDF5, NetCDF or similar parallel i/o library).  If your i/o pattern is not parallel, file-per-rank MPI, or constrained to a single compute node then stick with the default striping of 1 or no more than 4.  Striping wide for non-parallel i/o will degrade your i/o performance.  If the motivation for striping is only that the file is very large ( > 100 GB ) and the i/o pattern is not parallel, we suggest setting the stripe count to 2 or 4.

There are quotas on scratch.  Although files are stored in per-user directories (/scratch/sciteam/<myusername>, quotas on scratch are by all files with the group code, no matter where under /scratch they are.  The quota is 50 TB per group by default.  The quota is shared; any one user can, for instance. store 40 TB in scratch but that will only leave 10 TB for the rest of the group.

Nearline (tape) Storage System: (Now READ-ONLY as of 2019-10-01)

Nearline: HOME (Now READ-ONLY as of 2019-10-01)

Not visible from Blue Waters command line.

Vislble as ncsa#Nearline endpoint, /~/ path within Globus.  (NOTE: When you log into the ncsa#Nearline endpoint, you're already in your "home" directory, it's NOT under /u, you're already there.)

The nearline mass storage (tape) system has its own file system.  That file system is visible when looking at the ncsa#Nearline endpoint through Globus, but it it is not mounted on Blue Waters.  

Nearline home has individual quotas.  Each user can store 5 TB in their home directory by default.  

Nearline: PROJECTS (Now READ-ONLY as of 2019-10-01)

Not visible from Blue Waters command line.  

Visible as ncsa#Nearline endpoint, /projects path within Globus.  To get there, open the ncsa#Nearline endpoint in globus, hit "up one folder", then click on "/projects", then "sciteam", then on your group code to get to your group area.  The full pathname for your project directory will be something like /projects/sciteam/jaa/.  

There is also a link to each of your project spaces in your home directory.  Double-click on directory "project.jaa" in your "~" directory, you'll be in the same place as /projects/sciteam/jaa/.  

Nearline projects have group quotas.  Each group can store 50 TB in their project directory by default. 

Batch Job Usage with Nearline

To indicate to the batch system that a job will use Nearline  (HPSS) for staging and storing data in the workflow, the PBS directive:

#PBS -l gres=hpss 

should be specified in the batch job. This lets the system administrators know which jobs rely on Nearline being available. Jobs that specify gres=hpss will not be started if the Nearline system is not available. 
 

Quotas in General

The quotas listed here are default for allocation groups on Blue Waters.  If your project had larger data needs, then your quotas on the Blue Waters system should reflect that (see the "quota" command).  For quota-related issues such as increases in limits or grace periods, please contact help+bw@ncsa.illinois.edu.

Backups

Blue Waters filesystems are not currently backed up.

Checkpointing

All applications should implement some form of checkpointing that limits loss from hardware or software failures on the system. As the node count of a job increases or the wallclock increases, the likelihood of an interruption to the job increases proportionally.

To assist with determination of a proper checkpoint interval (the time between checkpoints that will provide a balance between loss of data due to a job interruption and the time spent performing checkpoint IO) we provide a utility that reports a recommended checkpoint interval using recent data on node failures and system interrupts, the desired number of XE nodes, XK nodes or both and the time the application takes to perform a checkpoint. The formula used in the utility is equation 37 from the 2004 paper by J.T. Daly "A higher order estimate of the optimum checkpoint interval for restart dumps". A mean time to interruption (MTTI) is computed and used to calculate a checkpoint interval (time between checkpoints).

Please remove commas when entering the requested node counts. Note that the time to write a checkpoint file is in hours.