PTAR

Description

It is often useful to combine a large number of small files into a small number of large files, especially when saving multiple directories to archival storage.

 

Depending on the types of files, compression techniques may also be applied in order to reduce file size and data transmission time.

 

The unix tar command, combined with either gzip or bzip2 compression, are popular combinations, but they may not scale very well with large numbers of files.

How to use ptar

The Blue Waters ptar module offers an alternative - using either the Parallel Implementation of GZip (pigz) or Parallel Bzip2 (pbzip2) compression, which use threads to compress multiple files concurrently.

 

To use this functionality on Blue Waters, simply load the ptar module:

 

module load ptar

 

Subsequent invocations of the tar command with compression enabled (i.e., using the -z or -j flag) will use a multi-threaded version of the compression library, thereby achieving significant speedup.

Examples

As an example, various strategies were used to create an archive file of a directory containing 128,641 files, 15 Gb total.

 

Using pigz compression (i.e., -z) was the fastest:

 

real 1m23.787s

user 37m49.482s

sys 1m40.486s

 

and produced a 6.1 Gb archive file.

 

Using pbzip2 compression (i.e., -j)  produced a smaller (5.4 Gb) archive file, but was much slower:

 

real 23m30.233s

user 148m49.194s

sys 2m24.577s

 

Using the standard, non-threaded gzip compression required 35 minutes of real time, whereas pigz required only 1.5 minutes:

 

real 34m51.178s

user 34m12.872s

sys 1m2.456s

Additional Information / References

For more information about the Blue Waters ptar module, please send email to "help+bw@ncsa.illinois.edu".