PerfSuite is an easy-to-use collection of tools and libraries to support application software performance analysis on Linux-based systems (x86, x86-64, ia64, ppc64 and ppc32). It includes components to assist with performance measurement tasks, such as hardware performance counting and profiling, and itimer profiling.
How to use PerfSuite
- PerfSuite's "
psrun" tool requires the measured program to be dynamically linked. You can use "
file <program_path>" to check this. The default compiler options on Blue Waters will build a statically linked executable. You can add the "
-dynamic" option to the linker ("ld") or the compiler linker driver program ("cc") to create a dynamically linked executable.
- Load the PerfSuite module:
module load perfsuiteor
module load perfsuite/<specific_version>
- Use PerfSuite's "
psrun" to count or profile an executable without recompiling or relinking. To count, do:
aprun -n <num> psrun -f -p <my_program> <my_pgm_args>
To profile, do:
aprun -n <num> psrun -C -c <profiling_conf_xml> \ -f -p <my_program> <my_pgm_args>
- Use PerfSuite's "
psprocess" to post-process the generated XML files:
module load perfsuite aprun -n 8 psrun -C -c papi_profile_cycles.xml \ /home/user123/namd-12.3/bin/namd -c a.conf psprocess namd.0.98765.nid01234.xml
How to use PerfSuite to access Cray Gemini network counters
- Make sure the "craype-network-gemini", "craype-interlagos" and "papi-126.96.36.199 <or later>" modules are loaded. Currently the first 2 are in the default list. The PAPI one is automatically done when a user loads the "perfsuite" or "perfsuite/<version>" module.
- Set the environment variable CRAY_NPU_ACCESS to 1, 2, or 4 depending on your needs. An example:
export CRAY_NPU_ACCESS=1Please see Cray documentation "Using the PAPI Cray NPU Component" for details.
- Use "aprun" to start a job to make sure that it runs on a compute node, so that the job indeed uses the Gemini network.
- Use a PerfSuite configuration file that contains only the Gemini events -- that is, no mixing of PAPI preset/native CPU events such as "PAPI_TOT_CYC" with the Gemini NPU events. This is a PAPI restriction.
nid25331 $ cat gm_events-2.xml <?xml version="1.0" encoding="UTF-8" ?> <ps_hwpc_eventlist class="PAPI"> <ps_hwpc_event name="GM_RMT_PERF_PUT_BYTES_RX" type="native"> <ps_hwpc_event name="GM_RMT_PERF_SEND_BYTES_RX" type="native"> </ps_hwpc_eventlist> nid25331 $ aprun -n 1 psrun -c gm_events-2.xml top -b -n5 > /dev/null
The full list of available Gemini NPU events can be obtained by running
"aprun -n 1 papi_native_avail"with the PAPI module loaded. They are the events named "craynpu:::GM_...", close to the end of the output. Both with and without the leading "craynpu:::" string work.
There are two minor issues with the perfsuite/1.1.3 module. They occur only when GNU compilers are used, and only when doing profiling.
- On both login and compute nodes, when running psprocess to do source code mapping -- to find the line numbers that are hot spots from profiled samples -- with GNU compiler generated programs does not work. The cause is likely due to issues in libbfd, as the "addr2line" utility in the bfd-utils package does not work either.
- On login nodes, when running psprocess with psrun-generated profile XML files, PerfSuite's "psprocess" gave error messages at the beginning, complaining about BFD dwarf version, such as:
ERROR> BFD: Dwarf Error: found dwarf version '4', this reader only handles version 2 and 3 information.
This is because libbfd version on login nodes (188.8.131.5200122-0.7.9) are different from that on the compute nodes (2.21.1). You can safely ignore these error messages.
Additional Information / References
- For MPI programs, please remember to use the "-f" option (meaning "fork") for "
psrun"; for OpenMP programs, use the "-p" option (meaning "pthread"); for hybrid programs (MPI+OpenMP), use both "-f -p" options.
- PerfSuite project web site: http://perfsuite.ncsa.illinois.edu.