Blue Waters User Portal | Reveal and OpenMP

Using Cray Reveal and scoping loops for OpenMP

Description

The latest Cray perftools module and new Reveal anaylsis tool can be used to automatically markup loops for OpenMP parallelization. Reveal will do the variable scoping and create directives with the appropriate private and shared clauses for loops you choose to target. While the tool is semi-automatic and still requires programmer input, it is helpful at detecting which variables may safely be used privately and which must be shared to ensure algorithm correctness.

How to use Reveal to scope loops

					module unload darshan # <- needed to avoid conflict with perftools
module load perftools
module load PrgEnv-cray

Add the usual flags to run with perftools to your compile and link commands for your Makefile or build process (if present, drop -g as it interferes with Cray profiling):

					CFLAGS= -h profile_generate
FFLAGS= -h profile_generate

Instrument the code and run the instrumented version, and then process the .xf file for apprentice 2:

					pat_build -w a.out
...
aprun -n <PEs> ./a.out+pat
...
pat_report a.out*.xf > loops_report
app2 a.out*.ap2

Cray Apprentice 2 view

At this point, the reveal tool can be used with the application and perftools to do some OpenMP analysis. You can use the profile info at this step as well. Rebuild the application with flags similar to those below -- creating a program library (use a full path to your program library with multi-directory builds):

					CFLAGS= -h pl=a.out.program_lib -h wp
FFLAGS= -h pl=a.out.program_lib -h wp
reveal a.out.program_lib a.out*.ap2

Right click on the function or loop of interest and you'll be presented with the option to "Scope Loop". If a function was selected, all of its loops will be automatically selected. The Reveal OpenMP Scoping tool is not fully-automatic. Some degree of programmer steering is needed to get sensible results (if you let it scope all loops, you'll end up with an impossible set of directives and the compiler will throw errors later when it discovers you're trying to thread inner and outer loops simultaneously).

After scoping a loop, you'll see a dialog appear with the variable scopes and options to Insert or Display the OpenMP directives. Display will show the suggested directives without code modification and insert will change the source code which you may later save.

Results of OpenMP code additions from Cray Reveal

With the stock unmodified kernel for this code (already marked for OpenMP on the most compute intensive loops), here are the timings:

aprun -n 1 -d16 ./a.out

...

deposit time = 1.66461015

...

push time = 2.34570265

...

Total Particle Time (nsec) = 8.77031326

After adding a couple of the directives suggested by Cray Reveal for loops that were still marked as hot in the loop view:

apun -n 1 -d16 ./a.out
...
deposit time = 0.917835116
...
push time = 2.12162423
...
Total Particle Time (nsec) = 7.74624014

Here are the code changes deployed that yielded the performance improvement above:

					arnoldg@jyc1 12:36 ~/mpic2 P__--Cray- $ diff mpush2.f_stock mpush2.f_reveal
46a47,50
> ! Directive inserted by Cray Reveal.  May be incomplete.
> !$OMP  parallel do default(none)                                         
> !$OMP&   private (at3,j,k,k1)                                            
> !$OMP&   shared  (part,npx,npy,at1,at2,edgelx,edgely)
62a67,71
> ! Directive inserted by Cray Reveal.  May be incomplete.
> !$OMP  parallel do default(none)                                         
> !$OMP&   private (j)                                                     
> !$OMP&   shared  (part,npxy)                                             
> !$OMP&   reduction (+:dsum1)reduction (+:dsum2)
71a81,84
> ! Directive inserted by Cray Reveal.  May be incomplete.
> !$OMP  parallel do default(none)                                         
> !$OMP&   private (j)                                                     
> !$OMP&   shared  (part,npxy,sum1,sum2)
170a184,188
> ! Directive inserted by Cray Reveal.  May be incomplete.
> !$OMP&   shared (ppart,kpic)                                         
> !$OMP&   private (i,ip,j,m,n)                                            
> !$OMP&   shared  (idimp,nop,part,nppmx,mx,my,mx1,ierr)
arnoldg@jyc1 12:37 ~/mpic2 P__--Cray- $

Additional Information / References

Cray Reveal