PTARDescriptionIt is often useful to combine a large number of small files into a small number of large files, especially when saving multiple directories to archival storage.
Depending on the types of files, compression techniques may also be applied in order to reduce file size and data transmission time.
The unix tar command, combined with either gzip or bzip2 compression, are popular combinations, but they may not scale very well with large numbers of files. How to use ptarThe Blue Waters ptar module offers an alternative - using either the Parallel Implementation of GZip (pigz) or Parallel Bzip2 (pbzip2) compression, which use threads to compress multiple files concurrently.
To use this functionality on Blue Waters, simply load the ptar module:
module load ptar
Subsequent invocations of the tar command with compression enabled (i.e., using the -z or -j flag) will use a multi-threaded version of the compression library, thereby achieving significant speedup. ExamplesAs an example, various strategies were used to create an archive file of a directory containing 128,641 files, 15 Gb total.
Using pigz compression (i.e., -z) was the fastest:
real 1m23.787s user 37m49.482s sys 1m40.486s
and produced a 6.1 Gb archive file.
Using pbzip2 compression (i.e., -j) produced a smaller (5.4 Gb) archive file, but was much slower:
real 23m30.233s user 148m49.194s sys 2m24.577s
Using the standard, non-threaded gzip compression required 35 minutes of real time, whereas pigz required only 1.5 minutes:
real 34m51.178s user 34m12.872s sys 1m2.456s Additional Information / ReferencesFor more information about the Blue Waters ptar module, please send email to "help+bw@ncsa.illinois.edu". |