Illinois Allocation Users
Where to move your data
- Illinois ResearchIT has a data storage finder tool for storage option may be best for your data. ResearchIT also reminds that U of I Box is the only approved campus cloud storage option for university-owned information, with special "Team folders" in Box for a unit or research group available by request.
- NCSA Taiga is available for projects with funds to participate in the storage option.
- Illinois Campus Google Drive can be used to move data to from Blue Waters using Globus Online.. See below.
- Storage on your own system (personal, research group, departmental). Please use Globus Online Personal to transfer large data sets to these options.
How to move your data
Globus Online is our preferred method for transferring files. NCSA currently supports several endpoints as well as Globus connectors to Google Drive for campus accounts and to AWS.
- Globus can be used to transfer files from the Blue Waters endpoint to other Globus endpoints such as the Illinois Campus Cluster, NCSA Taiga, etc.
- Globus support for data transfer to your Illinois campus Google Drive. Updated instructions can be found here. Google limits 24 hr transfer amounts of 750 GB but allows for files up to 4 TB but with throttled transfer rates. Limitations are documented here. The effective transfer rate for transfers that last 24 hrs or more will be <= 8MB/s.
- Your laptop, desktop, or lab computer with Globus connect personal. Check that your local system is connected to the departmental LAN via wired ethernet cable if that is at all possible. WiFi performance will typically be less than wired performance. A 100MB/s Gigabit Ethernet connection can transfer (under good conditions) 6 GB/minute, 0.3 TB/hr, or ~8 TB/day.
- Links to more information about using AWS S3 Globus conntector for transfers is available from the Blue Waters data transfer page.
- For source code and small transfers using utilities such as
scpfrom a login node or in a batch job is allowed.
Preparing to Transfer
Useful command examples (use the lfs version of find to get better performance from the lustre filesystem). It’s good practice to tar bundles of files from a directory level above ( ../ ) so that the directory name is included when later extracting files.
Find all the files smaller than 256k in a directory and tar them into a bundle:
$ lfs find dir_name/ -size -256k -print0 | xargs --null tar -rvf dir_smallfiles.tar
Same but for files larger or equal (i.e. not smaller) to 256k:
$ lfs find dir_name/ ! -size -256k -print0 | xargs --null tar -rvf dir_largerfiles.tar
If you have very many (millions) of files that you need to tar up you may consider using a specialized tool such as parfu which parallelizes the file creation using MPI and creates tar archives.
How to check current disk usage (quota)
quotacommand reports disk usage on all 3 file systems.
- The portal displays some file system usage information at https://bluewaters.ncsa.illinois.edu/summary # login with your Blue Waters account
Please avoid using the following commands on the system in the final weeks of production to avoid causing additional slow down on login node or the file system metadata server:
du -sh * # creates a large load on the filesystem servers
find … # use lfs find instead
rm -r <path_to_large_number_of_files>
If you are certain your data falls into the WORN category (write-once, read-never), leave it on the system and we’ll dispose of it after the system is shut down. There’s no need to delete the data yourself as that process can impact filesystem performance for others.