PGI Compiler

Programming Environment

To use the PGI compiler, swap in the PGI programming environment (PrgEnv-pgi) and use the wrappers for C (cc), C++ (CC), and Fortran (ftn).

module swap PrgEnv-cray PrgEnv-pgi

To see all available compiler versions, use the command:

module avail pgi

To switch to a different version of the compiler, use the command:

module swap pgi pgi/<version>

When to Use the PGI Compiler

The PGI compiler generates highly optimized binary code for the newest processor architectures and supports the latest industrial standards including OpenACC (it now supports much of OpenACC 2.5, see http://www.pgroup.com/doc/pgirn.pdf) , OpenCL v1, and CUDA FORTRAN. The PGI compiler is supported virtually on every supercomputer center, offering excellent portability of the developed application source code.

Flags

There are numerous compile-time flags that can be used. For complete listings, see the man pages:

man pgf90
man pgf77
man pgcc
man pgc++

For options provided in the format -Mflag=option,option,..., using their short form -Mflag will invoke a deafult list of options. See compiler manual for the full set of options.

OpenMP

-mp=nonuma  # compile with OpenMP support

Fortran file naming

Fortran file name conventions (can override with flags)
Suffixes of source file names indicate the type of processing to be done: `.f fixed-format Fortran source; compile` `.F fixed-format Fortran source; preprocess, compile` `.f90 free-format Fortran source; compile` `.F90 free-format Fortran source; preprocess, compile` `.f95 free-format Fortran source; compile` `.F95 free-format Fortran source; preprocess, compile` `.f03 free-format Fortran source; compile` `.F03 free-format Fortran source; preprocess, compile` `.for fixed-format Fortran source; compile` `.FOR fixed-format Fortran source; preprocess, compile` `.ftn fixed-format Fortran source; compile` `.FTN fixed-format Fortran source; preprocess, compile` `.fpp fixed-format Fortran source; preprocess, compile` `.FPP fixed-format Fortran source; preprocess, compile` `.cuf free-format CUDA Fortran source; compile` `.CUF free-format CUDA Fortran source; preprocess, compile` `.s assembler source; assemble` `.S assembler source; preprocess, assemble` `.o object file; passed to linker` `.a library archive file; passed to linker` `.so library archive file; passed to linker`

options

On Cray platform, the target processor type should not be specified in the compiler options. The target platform is set by default in the programming environment. The compiler wrappers (cc, CC, ftn) already include the necessary options instructing the compiler about the native processor architechture.

-On
- -O0 No optimization will be performed
- -O1 Minimum optimization level
- -O2 Default optimization level
- -O3 Agressive optimization (sometimes produces non-working code)
- -O4 Highest level of optimization (least reliable)
-Mipa=options
- align Enable recognition when pointer targets are all cache-line aligned for better SSE code generation
- fast Choose optimal flags for the target platform
- globals Analyze which globals are modified by procedure calls
- inline:n Allow up to n levels of inlining
- libc Optimize calls to certain C library routines
- libinline Allow inlining from routines in libraries
- safe:<name> Declares that the named function is safe; a safe function does not call back into the known procedure and does not change any known global variable
-fast Chooses generally optimal flags for the target platform; see "pgcc -fast -help"
-fastsse Chooses generally optimal flags for the target platform SSE optimization; see "pgcc -fastsse -help"
-Munroll=options
- c:k Completely unroll loops with a constant loop count less than or equal to k (default: k=4)
- n:k Unroll k times for a single-block loop
- m:k Unroll k times for a multi-block loop
-Mcache_align Align data objects of size greater than or equal to 16 bytes on cache-line boundaries
-Mflushz Flush SSE denormal numbers to zero
-Mnoframe Do not set up a stack frame pointer for functions
-Mlre Enable loop-carried redundancy elimination
-Mautoinline=options Enable inlining of functions with the inline attribute
- levels:n Inline up to n level of function calls
- maxsize:n Only inline functions with a size of n or less that roughly corresponds to the number of statements in the function
-Mvect=options Set vectorization details
- fuse Enable loop fusion to combine adjacent loops into a single loop
- simd:256 Use vector SIMD instructions (SSE, AVX)
- prefetch Use prefetch instructions in loops where profitable
- level:n Set maximum nest level of loops to optimize
-Mpre Enable the partial redundancy elimination optimization
-Msafeptr Override data dependence between C pointers and between pointers and variables or arrays. Can greatly enhance performance of code, especially floating point operation loops. May result in incorrect results.

Frequently used options

-i8 -m64 -mcmodel=medium -Mdalign -Mllalign -Munroll -Kieee -O2 -fastsse -Mipa=fast

-i8 Set default size for integer type to 8-byte (FORTRAN only)
-m64 Compile for 64-bit target
-mcmodel=medium Allow data sections to be larger than 2GB
-Mdalign Align double precision variables in structures on 8-byte boundaries
-Mllalign Align long longs or integer*8 in structures or common blocks on 8-byte boundaries
-Munroll Unroll the loops
-Kieee Instruct to perform floating point operations with high accuracy methods
-O2 Set typical (safe) optimization level
-fastsse Instruct the compiler to apply SSE instructions
-Mipa=fast Enable interprocedural analysis to optimize the code accross multiple procedures (subroutines) possibly in different source files

Note that using too high optimization level may inadvertantly break the correct flow of computation. We recommend that you try a few short, but realistic, benchmarking runs with different combinations of these flags to see what gives you the best performance.

Debugging options

For the verbose mode, use:

-Minfo=option Print useful information to stderr
- all Print information for all categories
- inline Print information about inlined functions
- intensity Print compute intensity information about loops
- ipa Print information about interprocedural analysis (IPA)
- loop | opt Print information about loop optimizations
- lre Print information about loop-carried redundancy elimination
- mp Print information about OpenMP parallel regions
- par Print information about loop parallelization
- vect Print information about loop vectorization
-Mneginfo=option Prints information on why certain optimizations are not performed. See -Minfo for the list of options
-v Gives verbous output
-g Generates symbolic debug information for debugger to display
-gopt Generates debug information in the presence of optimization
-pg Enable gprof-style profiling
-Mbounds Perform array bounds checking

Miscellaneous options

For applications using 64-bit array indices in MPI calls, use the following options

-i8 -m64 -mcmodel=medium -default64

-default64 instructs compiler wrappers to include 64-bit MPI library; this option exists only on Cray platform

Underflow, Flush-to-Zero, and -fast (or -fastsse)

When floating-point calculations result in numbers that are so small that they underflow (also called "sub-normal" numbers), the compiled code may take a huge performance hit while attempting to preserve the numbers. The underflow may be detected with keyword -Ktrap=unf. To flush those results to zero, use option -Mflushz. The latter option is also set by -fast (or -fastsse).