Storage

Data storage on Blue Waters is divided up into Online and Nearline; they differ by how the user gets at the data.

Online storage consists of three Lustre storage systems, each on a separate mount point.   The Online volumes look like disk partitions from command line shells (although their architecture is more complicated).  Users can use shell commands like "cd", "ls", "mkdir" within those volumes.  The Online volumes are accessible from the Blue Waters login nodes, on the MOM nodes where jobs scripts are run, and from compute nodes.  The Online storage volumes are also accessible through Globus Online via the ncsa#BlueWaters endpoint.  All Online storage are Lustre file systems and the terms Lustre and Online are used interchangably.

Blue Waters also has a Nearline tape storage system.  The Nearline system does have a disk-like file system structure, but it is not mounted on Blue Waters and it cannot be seen from the Blue Waters command line.  It can be accessed through Globus Online using the ncsa#Nearline endpoint.  The file system structure of Nearline in some ways mirrors the Lustre file system structure on Blue Waters (your group's area is in /projects, for instance) but it is NOT the same; they are separate file systems.

The key difference between Online (Lustre) storage and the Nearline storage is accessibility.  Online is mounted on compute nodes, mom nodes (where job scripts run) and login nodes; Nearline is not.  Both Online storage and Nearline storage are accessible through Globus Online.  

All Blue Waters storage is limited by quotas. Additionally, the Online Scratch filesystem is regularly purged of files older than 30 days. Special attention should be given to the types and sizes of quotas and the purge policies on Scratch. These details are laid out in the table and farther down this page.

storage system

type & name

 

block quota, default file quota (inodes)

quota

type

purged

sample pathname

Globus

Endpoint

Lustre Home 1 TB, 5 M files user no

/u/sciteam/<myusername>

(from command line OR Globus)

ncsa#BlueWaters
Lustre Projects 5 TB, 25 M files group no

/projects/sciteam/<groupcode>

(from command line OR Globus)

ncsa#BlueWaters
Lustre Scratch 500 TB, 50 M files group

yes

30 days

/scratch/sciteam/<myusername>

(from command line OR Globus)

ncsa#BlueWaters
------------------------------------------------------------------------------------------------------------------------------

Nearline Home

(tape mass store)

5 TB user no

/~/

(only from Globus)

ncsa#Nearline

Nearline Projects

(tape mass store)

50 TB group no

/projects/sciteam/<groupcode>

(only from Globus)

ncsa#Nearline


 

Lustre (Online) File Systems

These are the three Lustre file systems mounted on Blue Waters.

Lustre: HOME

/u - 2.2 PB with 144 OSTs

Visible as /u on Blue Waters command line.  

Visible as ncsa#BlueWaters endpoint, /u path within Globus.

Your home directory will typically be /u/sciteam/<yourusername>.  

You should use HOME for keeping source code, etc. For performance reasons, jyou should run from SCRATCH in your batch jobs, but HOME is accessible for batch jobs as well.

There is a quota on Lustre home for each user, for files stored in /u/sciteam/<myusername>.  The default quota is 1 TB.  

Lustre home directories under /u are not subject to purging.  Data is expected to remain for the duration of your allocation.

Lustre: PROJECTS

/projects - 2.2 PB with 144 OSTs

Visible as /projects path on Blue Waters command line.

Visible as ncsa#BlueWaters endpoint, /projects path within Globus.

Space in projects should be used for sharing frequently used large files within a science team. Each team has space in/projects/sciteam/<PSN> where <PSN> is the 3 letter "group code" that is part of your project allocation name. Use the 'id <myusername>' command to obtain the correct 3 letter code.  Here's a made-up example of user bob finding his group code:

> id bob
uid=11111(bob) gid=12222(PRAC_jaa) groups=12222(PRAC_jaa)

(Bob's 3-character groupid is indicated in this example in the green box. Other groups might have different prefix letters instead of PRAC; University of Illinois users will have "ILL" and Great Lakes Consotrtium users will have "GLCPC" here, for instance.)  In this toy example, bob's group directory would be /projects/sciteam/jaa/.  

Lusture /projects has a group quota.  By default it is 5 TB per group.  All files within a group's project directory count against the group's quota.  Any one user can use it all.  

Lustre projects directories are not subject to purging.  Data is expected to remain for the duration of your allocation.

Lustre: /SCRATCH

/scratch - 22 PB with 1440 OSTs

Visible as /scratch on Blue Waters command line.

Visible as ncsa#BlueWaters endpoint, /scratch path within Globus.

Your pathname in scratch will be /scratch/sciteam/<yourusername>.  You should use this directory for large job outputs and any other files that require high performance reading or writing. We suggest using properly set stripe counts and sizes for directories holding large files or set on a per file basis. There is a known issue that limits the stripe count to a maximum of 160 or less (even though there are 1440 OSTs).  

For example:

$ cd /scratch/sciteam/$USER
$ mkdir largefiles
$ lfs setstripe -c 24 largefiles

There are 2 motivations for setting the stripe count more than the default of 1: a large file or parallel access pattern (via MPI-IO, parallel HDF5, NetCDF or similar parallel i/o library).  If your i/o pattern is not parallel, file-per-rank MPI, or constrained to a single compute node then stick with the default striping of 1 or no more than 4.  Striping wide for non-parallel i/o will degrade your i/o performance.  If the motivation for striping is only that the file is very large ( > 100 GB ) and the i/o pattern is not parallel, we suggest setting the stripe count to 2 or 4.

There are quotas on scratch.  Although files are stored in per-user directories (/scratch/sciteam/<myusername>, quotas on scratch are by all files with the group code, no matter where under /scratch they are.  The quota is fairly large, though, 500 TB per group by default.  The quota is shared; any one user can, for instance. store 400 TB in scratch but that will only leave 100 TB for the rest of the group.

NOTE: Lustre scratch is not backed up and is subject to purging!  Any files with last-read or last-access dates more than 30 days old will be DELETED with no warning and no notification to users. Files should be moved to another file system or to Nearline as a backup. Please note that explicitly accessing a file (e.g. touch) for the purpose of avoiding purging of the file is a violation of policy. Contact us if there are issues with data movement from scratch to Nearline. 

Nearline (tape) Storage System:

Nearline: HOME

Not visible from Blue Waters command line.

Vislble as ncsa#Nearline endpoint, /~/ path within Globus.  (NOTE: When you log into the ncsa#Nearline endpoint, you're already in your "home" directory, it's NOT under /u, you're already there.)

The nearline mass storage (tape) system has its own file system.  That file system is visible when looking at the ncsa#Nearline endpoint through Globus, but it it is not mounted on Blue Waters.  

Nearline home has individual quotas.  Each user can store 5 TB in their home directory by default.  

Nearline home directories are not subject to purging.  Data is expected to remain for the duration of your allocation.

Nearline: PROJECTS

Not visible from Blue Waters command line.  

Visible as ncsa#Nearline endpoint, /projects path within Globus.  To get there, open the ncsa#Nearline endpoint in globus, hit "up one folder", then click on "/projects", then "sciteam", then on your group code to get to your group area.  The full pathname for your project directory will be something like /projects/sciteam/jaa/.  

There is also a link to each of your project spaces in your home directory.  Double-click on directory "project.jaa" in your "~" directory, you'll be in the same place as /projects/sciteam/jaa/.  

Nearline projects have group quotas.  Each group can store 50 TB in their project directory by default. 

Nearline project directories are not subject to purging.  Data is expected to remain for the duration of your allocation.

By default, your group's nearline projects directory is the largest storage volume your group has that isn't subject to purge.  So if you have large storage needs, typically you'll use Globus to move files you want to keep from Lustre /scratch to Nearline /projects so they don't get purged, then use Globus to move the files elsewhere before the end of your allocation period.

Batch Job Usage with Nearline

To indicate to the batch system that a job will use Nearline  (HPSS) for staging and storing data in the workflow, the PBS directive:

#PBS -l gres=hpss 

should be specified in the batch job. This lets the system administrators know which jobs rely on Nearline being available. Jobs that specify gres=hpss will not be started if the Nearline system is not available. 
 

Quotas in General

The quotas listed here are default for allocation groups on Blue Waters.  If your project had larger data needs, then your quotas on the Blue Waters system should reflect that (see the "quota" command).  For quota-related issues such as increases in limits or grace periods, please contact help+bw@ncsa.illinois.edu.

Backups

The HOME and PROJECTS filesystems are backed up per user with a full backup every 30 days and a daily incremental. Data over quota is not backed up. Data in Lustre scratch is NEVER backed up.

Checkpointing

All applications should implement some form of checkpointing that limits loss from hardware or software failures on the system. As the node count of a job increases or the wallclock increases, the likelihood of an interruption to the job increases proportionally.

To assist with determination of a proper checkpoint interval (the time between checkpoints that will provide a balance between loss of data due to a job interruption and the time spent performing checkpoint IO) we provide a utility that reports a recommended checkpoint interval using recent data on node failures and system interrupts, the desired number of XE nodes, XK nodes or both and the time the application takes to perform a checkpoint. The formula used in the utility is equation 37 from the 2004 paper by J.T. Daly "A higher order estimate of the optimum checkpoint interval for restart dumps". A mean time to interruption (MTTI) is computed and used to calculate a checkpoint interval (time between checkpoints).

Please remove commas when entering the requested node counts. Note that the time to write a checkpoint file is in hours.