Cray Compiling Environment
The Cray Compiling Environment (CCE) provides the Cray Fortran and Cray 
C/C++ compilers. The Cray Fortran compiler supports the Fortran 2018 standard 
while the C/C++ compiler is C17 and C++17 compliant. Invoking these compilers
is done through the ftn, cc and CC compilers wrappers.
CCE has supports for the full OpenMP 4.5 specification as well as partial support for OpenMP 5.0. PGAS languages (UPC and Fortran coarrays) are also integrated.
Overview
| Feature | Fortran | C/C++ | 
|---|---|---|
| Listing | -hlist=m | -fsave-loopmark | 
| Free format | -ffree | N/A | 
| Vectorization | -O1 and above | -O2 and above | 
| Link Time Optimization | -hwp | -flto | 
| Floating-point optimizations | -hfpN, N=0...4 | -ffp=N, N=0...4 | 
| Suggested Optimization | default | -O3 | 
| Aggressive Optimization | -O3 -hfp3 | -Ofast -ffp=3 | 
| OpenMP recognition | -fopenmp | -fopenmp | 
| Variable sizes | -s real64 -s integer64  | 
N/A | 
| Debug | -g | -g | 
Choose the CCE version
The Cray Compiling Environment is available from the PrgEnv-cray module which 
is loaded by default. This module load the default version of the compilers. If 
you wish to use an older or newer version, you can list the available version 
with
module avail cce
and then switch to the desired version using
module swap cce cce/<version>
OpenMP Support
OpenMP is turned off by default which is the opposite of earlier version of
the CCE compilers. It is turned on using the -homp or -fopenmp flag.
The CCE Fortran compiler allows to control the level of optimization of OpenMP 
directives with the -hthreadN (N = 0...3). A value N = 0 being off and 
N = 3 specifying the most aggressive optimization. The default value is 
N = 2.
Debugging
To ease a debugging process, it's useful to generate an executable containing 
debugging information. For this purpose, you can use the -g option. 
Most of the time, the debug information works best at low levels of code 
optimization, so consider using the -O0 level. The -g options can be 
specified on a per-file basis so that only a small part of your application 
incur the debugging penalty.
Compiler feedback
The compilers can generate loopmarks which indicate the type of optimization
performed. This feature is enabled by the -hlist=m option for the Fortran
compiler, and the -fsave-loopmark in the case of the C/C++ compilers. For 
example
ftn -fopenmp -hlist=m -o saxpy saxpy.f08
cc -fopenmp -fsave-loopmark -Ofast -o saxpy saxpy.c
CC -fopenmp -fsave-loopmark -Ofast -o saxpy saxpy.cpp
will produce a file called saxpy.lst where you can find a listing of your code
with annotations indicating which optimizations were performed by the compiler.
    1.                   subroutine saxpy(n, a, x, y) 
    2.                     real :: x(n), y(n), a
    3.                     integer :: n, i
    4.                   
    5.    M----------<     !$omp parallel do
    6.    M mVr2-----<     do i=1,n
    7.    M mVr2             y(i) = a*x(i)+y(i)
    8.    M mVr2----->     enddo
    9.    M---------->     !$omp end parallel do
  10.                   end subroutine saxpy
The signification of the annotations can be found at the beginning of the 
listing file. In our example, we can see for example that the compiler did 
vectorized (V) and unrolled our loop (r).
3.            void saxpy(int n, float a, 
4.                float * restrict x, 
5.                float * restrict y) {
6. + I Vu--<>   #pragma omp parallel for
7. +   M----<   for(int i = 0; i < n; i++) {
8. +   M          y[i] = a*x[i] + y[i];
9.     M---->   }
10.            }
The signification of the annotations can be found at the beginning of the 
listing file. In our example, we can see for example that the compiler did
vectorized (V) and unrolled our loop (u).
Compiler Messages
 man explain
Use the explain command to display an explanation of any message issued by the 
compiler. This message will be identified with a code looking like 
ftn-<number>. You can pass this identifier as an argument to the explain 
command to find out more about the error.
$ ftn -fopenmp -o saxpy saxpy.f08
    call saxpy(2**20, 2.0, x, y)
    ^                            
ftn-954 crayftn: ERROR MAIN, File = saxpy.f08, Line = 18, Column = 5 
  Procedure "SAXPY", defined at line 1 (saxpy.f08) must have an explicit
  interface because one or more arguments have the assumed-shape 
  DIMENSION attribute.
$ explain ftn-954
<explain output>
CCE Fortran Compiler
 man crayftn
Once the PrgEnv-cray module is loaded (by default) you can invoke the Cray
Fortran compiler with the ftn command.
Optimization options
The default optimization level of the CCE Fortran compiler is -O2. Aggressive
optimization can be enabled with the -O3 option.
Vectorization
The level of automatic vectorizing is controlled with the -hvectorN option 
(N = 0...3).
- the default value is 
N = 2enabling moderate vectorization and loop nests restructuring - setting 
N = 0orN = 1enable minimal and moderate automatic vectorization respectively - aggressive optimization is enabled by setting 
N = 3 
Loop unrolling
Loop unrolling can be controlled with the -hunrollN flag with N = 0...2. 
- the default value is 
N = 2for which the compiler will attempt to unroll all loops, except those marked with theNOUNROLLdirective. - setting 
N = 0requests that no loop unrolling is performed (also ignore theUNROLLdirectives). - if you only want to unroll loops that are marked by the 
UNROLLdirective useN = 1. 
Floating point optimizations
The Cray compiler is aggressive by default in the floating-point optimization.
If your application is sensitive to the floating-point optimization, use the 
-hfpN flag with N = 0...4 to set the level of optimization. 
- the default value is 
N = 2which performs various generally safe, nonconforming IEEE optimizations - most applications can benefit from more aggressive  optimization with 
N = 3 - use the value of 
N = 0orN = 1if the application you are compiling requires strong IEEE standard conformance 
CCE C and C++ compilers
 man craycc - man crayCC - clang --help
One the PrgEnv-cray module is loaded (by default) you can invoke the Cray C 
compiler with the cc command. The C++ compiler may be invoked with the CC
command. These compilers are based on Clang/LLVM with Cray improvements. The 
Cray improvements can be turned off with the -fno-cray flag.
Clang does not apply optimizations unless they are requested. Most optimization
are enabled using the -O2 level. Recommended flags are
-Ofastto enable all the optimizations including aggressive optimizations that may violate strict compliance with language standards-fltoto enable aggressive link time optimizations
For applications that are sensitive to floating−point optimizations, it may be 
recommended to use -O3 instead of -Ofast. These floating−point optimization 
can also be controlled with the −ffp=N flag with N = 0...4.
- using 
−ffp=0, will generate code with the highest precision and grants the compiler minimal freedom to optimize floating−point operations. Using-ffp=0will prevent the use of Cray math libraries. - requesting the highest level (
−ffp=4) will grant the compiler maximal freedom to aggressively optimize but likely will result in lower precision