In this hands-on we analyze a conjugate-gradient (CG) linear solver which is derived from the popular HPCG benchmark. It is "matrix free" because it does not actually store the coefficient matrix; the matrix is hard-coded so that the sparse MVM kernel is actually a Jacobi-like stencil update scheme. The source can be found in the folder MFCG.

Build and run

Build the executable from C with:

$ icx -Ofast -xHost -std=c99 -o ./mfcg ./mfcg.c

For Fortran:

$ ifx -Ofast -xHost -o ./mfcg ./mfcg.f90

Test if it is working:

$ ./mfcg  8000 8000

The problem size is specified as two numbers: The outer (first argument) and the inner dimension (second argument) of the grid. The code implements a standard Conjugate-Gradient (CG) algorithm without a stored matrix, i.e., the matrix-vector multiplication is a stencil update sweep. Performance is printed in millions of lattice site updates per second

Time profile

Compile the code with the -pg switch to instrument with gprof:

For C:

$ icx -Ofast -xHost -std=c99 -pg -o ./mfcg ./mfcg.c

For Fortran:

$ ifx -Ofast -xHost -pg -o ./mfcg ./mfcg.f90

After running the the code you end up with a gmon.out file, which can be converted to readable form by the gprof tool:

$ gprof ./mfcg

The result is a "flat profile" and a "butterfly graph" of the application. Look at the flat profile: Which functions take most of the runtime (the "hot spots")?  Is this useful information? Think about what went wrong here (this problem appears only in the C version). Fix it and do the profile again. 

 

Last modified: Wednesday, 25 February 2026, 4:19 PM