Skip to Content



PerfSuite is an easy-to-use collection of tools and libraries to support application software performance analysis on Linux-based systems (x86, x86-64, ia64, ppc64 and ppc32). It includes components to assist with performance measurement tasks, such as hardware performance counting and profiling, and itimer profiling.

How to use PerfSuite

  1. PerfSuite's "psrun" tool requires the measured program to be dynamically linked. You can use "file <program_path>" to check this. The default compiler options on Blue Waters will build a statically linked executable. You can add the "-dynamic" option to the linker ("ld") or the compiler linker driver program ("cc") to create a dynamically linked executable.
  2. Load the PerfSuite module:
    module load perfsuite
    module load perfsuite/<specific_version>
  3. Use PerfSuite's "psrun" to count or profile an executable without recompiling or relinking. To count, do:
    aprun -n <num> psrun -f -p <my_program> <my_pgm_args>

    To profile, do:

    aprun -n <num> psrun -C -c <profiling_conf_xml> \
        -f -p <my_program> <my_pgm_args>
  4. Use PerfSuite's "psprocess" to post-process the generated XML files:
    psprocess <my_pgm.*.xml>


module load perfsuite
aprun -n 8 psrun -C -c papi_profile_cycles.xml \
    /home/user123/namd-12.3/bin/namd -c a.conf
psprocess namd.0.98765.nid01234.xml

How to use PerfSuite to access Cray Gemini network counters

  1. Make sure the "craype-network-gemini", "craype-interlagos" and "papi- <or later>" modules are loaded. Currently the first 2 are in the default list. The PAPI one is automatically done when a user loads the "perfsuite" or "perfsuite/<version>" module.
  2. Set the environment variable CRAY_NPU_ACCESS to 1, 2, or 4 depending on your needs. An example:
    export CRAY_NPU_ACCESS=1
    Please see Cray documentation "Using the PAPI Cray NPU Component" for details.
  3. Use "aprun" to start a job to make sure that it runs on a compute node, so that the job indeed uses the Gemini network.
  4. Use a PerfSuite configuration file that contains only the Gemini events -- that is, no mixing of PAPI preset/native CPU events such as "PAPI_TOT_CYC" with the Gemini NPU events. This is a PAPI restriction.

    An example:

    nid25331 $ cat gm_events-2.xml
    <?xml version="1.0" encoding="UTF-8" ?>
    <ps_hwpc_eventlist class="PAPI">
      <ps_hwpc_event name="GM_RMT_PERF_PUT_BYTES_RX" type="native">
      <ps_hwpc_event name="GM_RMT_PERF_SEND_BYTES_RX" type="native">
    nid25331 $ aprun -n 1 psrun -c gm_events-2.xml top -b -n5 > /dev/null

    The full list of available Gemini NPU events can be obtained by running "aprun -n 1 papi_native_avail" with the PAPI module loaded. They are the events named "craynpu:::GM_...", close to the end of the output. Both with and without the leading "craynpu:::" string work.

Known Issues

There are two minor issues with the perfsuite/1.1.3 module. They occur only when GNU compilers are used, and only when doing profiling.

  • On both login and compute nodes, when running psprocess to do source code mapping -- to find the line numbers that are hot spots from profiled samples -- with GNU compiler generated programs does not work. The cause is likely due to issues in libbfd, as the "addr2line" utility in the bfd-utils package does not work either.
  • On login nodes, when running psprocess with psrun-generated profile XML files, PerfSuite's "psprocess" gave error messages at the beginning, complaining about BFD dwarf version, such as:
    ERROR> BFD: Dwarf Error: found dwarf version '4',
    this reader only handles version 2 and 3 information.

    This is because libbfd version on login nodes ( are different from that on the compute nodes (2.21.1). You can safely ignore these error messages.

Additional Information / References

  • For MPI programs, please remember to use the "-f" option (meaning "fork") for "psrun"; for OpenMP programs, use the "-p" option (meaning "pthread"); for hybrid programs (MPI+OpenMP), use both "-f -p" options.
  • PerfSuite project web site: