Cray Performance Analysis Tool¶
Cray Performance Analysis Tool (CrayPat) is a performance analysis tool used to evaluate program behaviour on HPE Cray supercomputer systems like LUMI.
Perftools-lite¶
The perftools-lite
is a simplified and easy-to-use version of CrayPat that
provides basic performance analysis information automatically, with minimum
user interaction. In order to use perftools-lite
you must first load the
perftools-base
module followed by perftools-lite
.
After these modules have been loaded, subsequent compiler invocations (cc
,
CC
, ftn
) will automatically insert all necessary hooks for profiling.
$ cc -o app.x source.c
WARNING: PerfTools is saving object files from a temporary directory into
directory '/home/olouant/.craypat/app.x/846040'
INFO: creating the PerfTools-instrumented executable 'app.x'
(lite-samples) ...OK
You can then run your application as you would normally. The profiling information will be written to the standard output.
Other perftools-lite
modules are available for users seeking information
other than that provided by the default perftools-lite
module:
perftools-lite-events
: event profile (tracing)perftools-lite-gpu
: GPU kernel and data movement events profilingperftools-lite-loops
: loop work estimatesperftools-lite-hbm
: memory profiling
Once you have them loaded, these modules can be used in the same way as
perftools-lite
.
CrayPat¶
CrayPat is the full-featured program analysis tool set. The typical workflow is
- use
pat_build
to instrument a program - run the instrumented executable
- use either
pat_report
or Cray Apprentice2 to view the resulting report.
Sampling¶
Sampling is a statistical profiling. By taking regular snapshots of the applications call stack, we can create a statistical profile of where the application spends most of its time.
One of the main advantages of a sampling experiment is the low overhead that is fixed by the choice of sampling rate. On the other hand, sampling is non-deterministic and can only provide a statistical picture of the application behaviour.
The pat_build
tool is used to instruments your application. The first step to
use this tool is to load the perftools-base
and perftools
modules and build
your application as normal.
The second step is to use pat_build
.
This command will create a new executable with name <exec>+pat
. In our
example, we will produce app.x+pat
. The name can be chosen by the user using
the -o <output_exe>
option. The default experiment is a sampling experiment.
The next step is to run the application. A directory with a name beginning with
the name of your application will be created as a result. This directory
contains the profiling information gathered during the run. You can change the
name of this output directory with the PAT_RT_EXPDIR_NAME
environment
variable. For example
You can use this directory to generate more detailed report with the
pat_report
command.
Tracing¶
Tracing revolves around specific program events like entering or exiting a function. This allows the collection of accurate information about specific areas of the code every time the event occurs. This allows for a more accurate and more detailed information as data are collected from every traced function call not a statistical average. Tracing may require the program to be instrumented.
The main downside it that the instrumentation code inserted will be run every time an instrumented function is called in order to record the information. This may introduce significant profiling overhead.
Automatic program analysis (APA)¶
You can do a focused tracing experiment based on the results from the sampling
experiment. This is achieved by providing pat_build
with a
build-options.apa
file generated with pat_report
from a previous sampling
run.
This will build a new executable whose name ends with +apa
. You can then run
this executable in order to get tracing data and generate a report with
pat_report
.
Manual analysis¶
If the automatic program analysis is not sufficient, you have to manually
choose your profiling setup. The tracing of the entire program is made possible
by using the -w
option when building your application with pat_build
Another possibility is to select the function belonging to a particular trace function group. For example, for the MPI group functions
where the -g
option is used to select a trace group. There is support for a
wide variety of predefined function groups. A full list can be obtained from
the pat_build
manpage.
User-defined function can be traced with the -T
option and provide a list of
function names, or use the -t
option and provide a file listing the functions
to trace.
Be careful when you specify the name of the function as the compiler may have
altered the name. For example, an underscore character may have been added to
the Fortran routine. You can use nm <app>
or readelf -s <app>
to read the
symbol table of your application. In addition, you can choose to trace all the
user-defined function, with the -u
option.
Of course, you can combine the option presented above to match your needs. For example, you can choose to trace the MPI and OpenMP group and all the user-defined functions.