# Ray Tracer

The `ray-tracer` folder contains a serial ray tracer code (in a Fortran90 and
a C version), which computes a pretty picture.  It writes the picture to a
file called `result.pnm`.

You can view the image under Linux with the `display` program (from the
imagemagick package) or under Windows with `irfanview`.

The central function is `calc_tile()`, which computes one tile of the picture.
The size of one tile and of the whole picture is hardcoded at the start of
the main program.

NOTE: the code assumes that the picture size is a multiple of the tile size.

For this exercise, set them to 6000x6000 and 1000x1000, respectively, to
begin with.  The program outputs its runtime and its performance in million
pixels per second (MPixel/s).

## Compile

Compile the C or Fortran90 version of the ray tracer.

In order to compile an OpenMP program, you have to use

* `-fopenmp` for gcc, clang, gfortran, flang-new
* `-qopenmp` for icc, ifort, icx, ifx

NOTE: the C version requires linking with `-lm`, that is the math library
providing the `sqrt` and `pow` functions.  Do this for example by:

```bash
gcc -fopenmp ray-tracer.c -o ray-tracer -lm
```


## Task 1

Parallelize the code with OpenMP-parallel loops.  You can deactivate the
output for testing, but make sure that your parallel code computes the
correct result (this is easy since you can always display the picture).

What speedup does your program get from 1 to N cores?

## Task 2

Do you see any optimization potential in the way the code is parallelized?
Think about how the work is distributed among the threads, and how you can
influence this distribution.

## Task 3 Tasking

Parallelize the code with OpenMP tasks.  Is there a performance difference
to the loop-parallel version?

You can use `ray-tracer.task.c` or `ray-tracer.task.F90`.  They include a check
against a reference solution and print out the difference.  This eliminates
the need to view the image.

## Task 4 Offloading

Offload the ray tracing to the GPU.  We use the nvhpc compilers.

Load the compiler + cuda (on Alex cluster):

```bash
module purge
module load nvhpc cuda/11.8.0
```

Compile your code with:

```bash
# C
nvc -O3 -march=native -Wall -Wextra -gopt -mp=gpu -gpu=cc80 ray-tracer.offload.c -o ray-tracer.offload.c.exe
# Fortran
nvfortran -O3 -march=native -Wall -Wextra -gopt -mp=gpu -gpu=cc80 ray-tracer.offload.F90 -o ray-tracer.offload.F90.exe
```

You can use `ray-tracer.offload.c` or `ray-tracer.offload.F90`.  They include
a check against a reference solution and print out the difference.  This
eliminates the need to view the image.  Furthermore they do not include
recursive calls inside the shade function.
