Cray Performance Analysis Tool
CrayPat is a performance analysis tool used to evaluate program behaviour on HPE Cray supercomputer systems.
Perftools-lite
The perftools-lite is a simplified and easy-to-use version of CrayPat that 
provides basic performance analysis information automatically, with minimum user 
interaction. In order to use perftools-lite you must first load the 
perftools-base module followed by perftools-lite.
module load perftools-base
module load perftools-lite
After these modules have been loaded, subsequent compiler invocations (cc, 
CC, ftn) will automatically insert necessary hooks for profiling. 
$ cc -o app.x source.c 
WARNING: PerfTools is saving object files from a temporary directory into
directory '/home/olouant/.craypat/app.x/846040'
INFO: creating the PerfTools-instrumented executable 'app.x' 
(lite-samples) ...OK
You can then run your application as normal, the profiling information will be written to the standard output.
Other perftools-lite modules are available for users seeking information other
than that provided by the default perftools-lite module.
perftools-lite-events: event profile (tracing)perftools-lite-gpu: GPU kernel and data movement events profilingperftools-lite-loops: loop work estimatesperftools-lite-hbm: memory profiling
Once you have them loaded, these modules can be used in the same way as 
perftools-lite.
CrayPat
CrayPat is the full-featured program analysis tool set. The typical workflow is
- use 
pat_buildto instrument a program - run the instrumented executable
 - use either 
pat_reportor Cray Apprentice2 to view the resulting report. 
Sampling
Sampling is a statistical profiling. By taking regular snapshots of the applications call stack, we can create a statistical profile of where the application spends most of its time.
One of the main advantages of a sampling experiment is the low overhead that is fixed by the choice of sampling rate. On the other hand, sampling is non-deterministic and can only provide a statistical picture of the application behaviour.
The pat_build tool is used to instruments your application. The first step
to use this tool is to load the perftools-base and perftools modules and 
build your application as normal.
module load perftools-base
module load perftools
cc -o app.x source.c
The second step is to use pat_build.
pat_build app.x
This command will create a new executable with name <exec>+pat. In our 
example, we will produce app.x+pat. The name can be chosen by the user using
the -o <output_exe> option. The default experiment is a sampling experiment.
The next step is to run the application. A directory with a name beginning with 
the name of your application will be created as a result. This directory 
contains the profiling information gathered during the run. You can change the 
name of this output directory with the PAT_RT_EXPDIR_NAME environment 
variable. For example
export PAT_RT_EXPDIR_NAME=apa_sample_exp.${SLURM_JOBID}
srun ./app.x+pat
You can use this directory to generate more detailed report with the 
pat_report command.
pat_report <perftool-output-dir>
Tracing
Tracing revolves around specific program events like entering or exiting a function. This allows the collection of accurate information about specific areas of the code every time the event occurs. This allows for a more accurate and more detailed information as data are collected from every traced function call not a statistical average. Tracing may require the program to be instrumented.
The main downside it that the instrumentation code inserted will be run every time an instrumented function is called in order to record the information. This may introduce significant profiling overhead.
Automatic program analysis (APA)
You can do a focused tracing experiment based on the results from the 
sampling experiment. This is achieved by providing pat_build with
a build-options.apa file generated with pat_report from a previous sampling 
run.
pat_build -O <pertools-out-dir>/build-options.apa
This will build a new executable whose name ends with +apa. You can then run 
this executable in order to get tracing data and generate a report with
pat_report.
Manual analysis
If the automatic program analysis is not sufficient, you to manually choose your
profiling setup. The tracing of the entire program is made possible by using the
-w option when building your application with pat_build
pat_build -w app.x
Another possibility is to select the function belonging to a particular trace function group. For example, for the MPI group functions
pat_build -g mpi app.x
where the -g option is used to select a trace group. There supported is a wide 
variety of predefined function groups. A full list can be obtained from the 
pat_build manpage.
User-defined function can be traced with the -T option and provide a list of
function names, or use the -t option and provide a file listing the functions
to trace. 
pat_build -w -T func1,func2 app.x
pat_build -w -t tracefile app.x
Be careful when you specify the name of the function as the compiler may have 
altered the name. For example, an underscore character may have been added to the
Fortran routine. You can use nm <app> or readelf -s <app> to read the symbol
table of your application. In addition, you can choose to trace all the 
user-defined function, with the -u option.
pat_build -u app.x
Of course, you can combine the option presented above to match your needs. For example, you can choose to trace the MPI and OpenMP group and all the user-defined functions.
pat_build -g mpi,omp -u app.x