PGI Compiler

Programming Environment

To use the PGI compiler, swap in the PGI programming environment (PrgEnv-pgi) and use the wrappers for C (cc), C++ (CC), and Fortran (ftn).

 module swap PrgEnv-cray PrgEnv-pgi

To see all available compiler versions, use the command:

 module avail pgi

To switch to a different version of the compiler, use the command:

 module swap pgi pgi/<version>

When to Use the PGI Compiler

The PGI compiler generates highly optimized binary code for the newest processor architectures and supports the latest industrial standards including OpenACC (it now supports much of OpenACC 2.5, see http://www.pgroup.com/doc/pgirn.pdf)  , OpenCL v1, and CUDA FORTRAN. The PGI compiler is supported virtually on every supercomputer center, offering excellent portability of the developed application source code. 

Flags

There are numerous compile-time flags that can be used.  For complete listings, see the man pages:

man pgf90
man pgf77
man pgcc
man pgc++

For options provided in the format -Mflag=option,option,..., using their short form -Mflag will invoke a deafult list of options. See compiler manual for the full set of options.

OpenMP

 -mp=nonuma  # compile with OpenMP support

Fortran file naming

Fortran file name conventions (can override with flags)
       Suffixes of source file names indicate the type  of  processing  to  be
       done:
 
       .f     fixed-format Fortran source; compile
       .F     fixed-format Fortran source; preprocess, compile
       .f90   free-format Fortran source; compile
       .F90   free-format Fortran source; preprocess, compile
       .f95   free-format Fortran source; compile
       .F95   free-format Fortran source; preprocess, compile
       .f03   free-format Fortran source; compile
       .F03   free-format Fortran source; preprocess, compile
       .for   fixed-format Fortran source; compile
       .FOR   fixed-format Fortran source; preprocess, compile
       .ftn   fixed-format Fortran source; compile
       .FTN   fixed-format Fortran source; preprocess, compile
       .fpp   fixed-format Fortran source; preprocess, compile
       .FPP   fixed-format Fortran source; preprocess, compile
       .cuf   free-format CUDA Fortran source; compile
       .CUF   free-format CUDA Fortran source; preprocess, compile
       .s     assembler source; assemble
       .S     assembler source; preprocess, assemble
       .o     object file; passed to linker
       .a     library archive file; passed to linker

options

On Cray platform, the target processor type should not be specified in the compiler options. The target platform is set by default in the programming environment. The compiler wrappers (cc, CC, ftn) already include the necessary options instructing compiler about the native processor architechture.

  • -On
    • -O0     No optimization will be performed
    • -O1     Minimum optimization level
    • -O2     Default optimization level
    • -O3     Agressive optimization (sometimes produces non-working code)
    • -O4     Highest level of optimization (least reliable)
  • -Mipa=options
    • align     Enable recognition when pointer targets are all cache-line aligned for better SSE code generation
    • fast     Choose optimal flags for the target platform
    • globals     Analyze which globals are modified by procedure calls
    • inline:n     Allow up to n levels of inlining
    • libc     Optimize calls to certain C library routines
    • libinline     Allow inlining from routines in libraries
    • safe:<name>     Declares that the named function is safe; a safe function does not call back into the known procedure and does not change any known global variable
  • -fast     Chooses generally optimal flags for the target platform; see "pgcc -fast -help"
  • -fastsse     Chooses generally optimal flags for the target platform SSE optimization; see "pgcc -fastsse -help" 
  • -Munroll=options
    • c:k     Completely unroll loops with a constant loop count less than or equal to k (default: k=4)
    • n:k     Unroll k times for a single-block loop
    • m:k     Unroll k times for a multi-block loop
  • -Mcache_align     Align data objects of size greater than or equal to 16 bytes on cache-line boundaries
  • -Mflushz     Flush SSE denormal numbers to zero
  • -Mnoframe     Do not set up a stack frame pointer for functions
  • -Mlre     Enable loop-carried redundancy elimination
  • -Mautoinline=options     Enable inlining of functions with the inline attribute
    • levels:n     Inline up to n level of function calls
    • maxsize:n     Only inline functions with a size of n or less that roughly corresponds to the number of statements in the function
  • -Mvect=options      Set vectorization details
    • fuse     Enable loop fusion to combine adjacent loops into a single loop
    • simd:256     Use vector SIMD instructions (SSE, AVX)
    • prefetch     Use prefetch instructions in loops where profitable
    • level:n     Set maximum nest level of loops to optimize
  • -Mpre     Enable the partial redundancy elimination optimization
  • -Msafeptr     Override data dependence between C pointers and between pointers and variables or arrays. Can greatly enhance performance of code, especially floating point operation loops. May result in incorrect results. 

Frequently used options

 -i8 -m64 -mcmodel=medium -Mdalign -Mllalign -Munroll -Kieee -O2 -fastsse -Mipa=fast 
  • -i8     Set default size for integer type to 8-byte (FORTRAN only)
  • -m64     Compile for 64-bit target
  • -mcmodel=medium      Allow data sections to be larger than 2GB
  • -Mdalign     Align double precision variables in structures on 8-byte boundaries
  • -Mllalign     Align long longs or integer*8 in structures or common blocks on 8-byte boundaries
  • -Munroll     Unroll the loops
  • -Kieee     Instruct to perform floating point operations with high accuracy methods
  • -O2     Set typical (safe) optimization level
  • -fastsse     Instruct the compiler to apply SSE instructions
  • -Mipa=fast      Enable interprocedural analysis to optimize the code accross multiple procedures (subroutines) possibly in different source files

Note that using too high optimization level may inadvertantly break the correct flow of computation.  We recommend that you try a few short, but realistic, benchmarking runs with different combinations of these flags to see what gives you the best performance.

Debugging options

For the verbose mode, use:

  • -Minfo=option     Print useful information to stderr
    • all     Print information for categories
    • inline     Print information about inlined functions
    • intensity     Print compute intensity information about loops
    • ipa     Print information about interprocedural analysis (IPA)
    • loop | opt     Print information about loop optimizations
    • lre     Print information about loop-carried redundancy elimination
    • mp     Print information about OpenMP parallel regions
    • par     Print information about loop parallelization
    • vect     Print information about loop vectorization
  • -Mneginfo=option     Prints information on why certain optimizations are not performed. See -Minfo for the list of options
  • -v     Gives verbous output
  • -g     Generates symbolic debug information for debugger to display
  • -gopt  Generates debug information in the presence of optimization
  • -pg     Enable gprof-style profiling
  • -Mbounds     Perform array bounds checking     

Miscellaneous options

For applications using 64-bit array indices in MPI calls, use the following options

-i8 -m64 -mcmodel=medium -default64

  • -default64     instructs compiler wrappers to include 64-bit MPI library; this option exists only on Cray platform

Underflow, Flush-to-Zero, and -fast (or -fastsse)

When floating-point calculations result in numbers that are so small that they underflow (also called "sub-normal" numbers), the compiled code may take a huge performance hit while attempting to preserve the numbers. The underflow may be detected with keyword -Ktrap=unf. To flush those results to zero, use option -Mflushz. The latter option is also set by -fast (or -fastsse).