Disparity while using various compilers for Schönauer triad

Disparity while using various compilers for Schönauer triad

by Aman Sayyad -
Number of replies: 1

Compiled Schönauer triad code with `icpx` and `g++`, practical performance from `icpx` shows "inf" GFLOPS, while from `g++` it's around "0.64" GFLOPS. Theoretical peak performance remains consistent at "67.2" GFLOPS. Seeking insights and recommendations for this disparity. Attached is a picture for reference.

Thanks!


In reply to Aman Sayyad

Re: Disparity while using various compilers for Schönauer triad

by Jan Laukemann -

It's impossible to tell you what is exactly happening without seeing the code.

However, you can notice that your first run takes 0 ms, of course the GFLOP/s show inf if you divide your number of FLOPs by (near) 0, probably the compiler can see that you are not using any of your computed values in your benchmark loop and optimizes away this part.

By the way, gcc/g++ also has a flag for compiling for the host micro-architecture, which is -march=native. While you only use the -mavx512f flag, the -march=native with gcc 12 on our Fritz nodes includes many more flags:

-march=icelake-server -mmmx -mpopcnt -msse -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -mavx -mavx2 -mno-sse4a -mno-fma4 -mno-xop -mfma -mavx512f -mbmi -mbmi2 -maes -mpclmul -mavx512vl -mavx512bw -mavx512dq -mavx512cd -mno-avx512er -mno-avx512pf -mavx512vbmi -mavx512ifma -mno-avx5124vnniw -mno-avx5124fmaps -mavx512vpopcntdq -mavx512vbmi2 -mgfni -mvpclmulqdq -mavx512vnni -mavx512bitalg -mno-avx512bf16 -mno-avx512vp2intersect -mno-3dnow -madx -mabm -mno-cldemote -mclflushopt -mclwb -mno-clzero -mcx16 -mno-enqcmd -mf16c -mfsgsbase -mfxsr -mno-hle -msahf -mno-lwp -mlzcnt -mmovbe -mno-movdir64b -mno-movdiri -mno-mwaitx -mpconfig -mpku -mno-prefetchwt1 -mprfchw -mno-ptwrite -mrdpid -mrdrnd -mrdseed -mno-rtm -mno-serialize -msgx -msha -mno-shstk -mno-tbm -mno-tsxldtrk -mvaes -mno-waitpkg -mwbnoinvd -mxsave -mxsavec -mxsaveopt -mxsaves -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -mno-uintr -mno-hreset -mno-kl -mno-widekl -mno-avxvnni -mno-avx512fp16 --param l1-cache-size=48 --param l1-cache-line-size=64 --param l2-cache-size=55296 -mtune=icelake-server -dumpbase