header applications

This website uses cookies to manage authentication, navigation, and other functions. By using our website, you agree that we can place these types of cookies on your device.

View e-Privacy Directive Documents

In a second step, users need to implement the Cluster-Booster division like it is described in the figure below. This is necessary for applications of all categories but the one with discrete use of the Booster obviously.

 

How to integrate the Cluster-Booster devision

 

There are 2 techniques for the Cluster-Booster division:

  • Offload tasks with OmpSs: Pragmas are used to mark the code sections that should be offloaded. Communication between the host and the offloaded parts is done via the input and output variables within the pragma.
  • Offload code parts with MPI_Comm_spawn (mpi): With MPI_Comm_spawn a new MPI communicator is created. The command determines where the processes within this communicator should run (Cluster or Booster) by a host list.  Another parameter for the command is an executable of the code part that should be offloaded. The new and the former MPI communicator can communicate with each other the whole time with the usual MPI send and receive commands.

If your code belongs to the orange category, you can either start on the Booster Nodes or on the Cluster Nodes and then offload the other part. You should implement the offload once at the beginning of the application and then both parts run in parallel. In most cases the parts have to communicate so the MPI_Comm_spawn offload is recommended here. For that you have to include the following command to do the offload:

MPI_Comm_spawn(“offload.c”, args, procs, inf, rank, com, intercom, err);


xPic from KU Leuven uses this type of offload. A detailed describtion of the offload can be found in Section 3.2.3 of this document.


If the code falls into the ‘red’ category, the application starts on the Booster (host part) and the I/O operations and serial parts are offloaded to the Cluster (offload part). This approach is used e.g. for the GERShWIN application.


For applications belonging to the green category, users can either start on the Booster Nodes or on the Cluster Nodes depending on the structure of your program. It is important to try and minimise the data transfer for the offload. Then offload as many processes as you need to the other part. You can also employ a nested offload like BSC does for their FWI application.


In case of the fourth category (purple) you will start the application on the Cluster (host part) and offload the massively parallel parts to the Booster (offload parts). This is e.g. the case for applications with a master-slave fashion like TurboRVB from CINECA.


Since in those cases communication between both parts (host and offload) is only needed at the beginning and the end of the offload section, we recommend to use the OmpSs offload here. In this case all parts to be offloaded are surrounded by a pragma like this:


#pragma omp task device(mpi) onto(com,rank) in(var1, var2) out (var3)
  {
      Code segments to be offloaded…
  }

 

The communicator named in the onto clause has to be created with a deep_booster_alloc call.


In summary: To perform the offload we recommend to use OmpSs, except when the host and the offload part should communicate not only at the beginning and the end (orange category). The advantage of using OmpSs is that application developers can also benefit from the resiliency features the programming model provides (see Step 5).