header applications

This website uses cookies to manage authentication, navigation, and other functions. By using our website, you agree that we can place these types of cookies on your device.

View e-Privacy Directive Documents

The LHC (Large Hadron Collider) experiments at CERN collect enormous amounts of data, which need to be pre-processed, treated and then analysed to extract the scientific information which physicists look for. This makes the codes developed for LHC paramount examples of HPDA applications.


In DEEP-EST, CERN will investigate a new model for deploying improvements for the analysis of data created by the CMS instrument. This focuses on the instrument code (which specifies how the instrument “sees” events) and its calibration. Currently, new code must be available before starting the actual data processing. CERN will explore the dynamic reprocessing of objects when the instrument code or calibration changes. To test this new concept a large high performance processing centre with excellent integrated storage is required. The MSA could be an ideal platform.

At least three of the DEEP‑EST modules are planned to be used: the storage module Data Analytics Module (DAM) and the Extreme Scale Booster (ESB). In DEEP-EST CERN will use two different applications:


CMS Event Reconstruction


CMS Event Reconstruction is a data-parallel workload, which does not involve any kind of inter-process or remote-process communication. The same set of algorithms (a single executable) is replicated across all of the available nodes/cores. The event reconstruction can run on the ESB or DAM nodes:




The main focus was on porting compute-intensive parts of the workload to utilize GPUs (NVIDIA GPUs in particular). This was achieved during the project and resulted in a huge performance optimization:



CMS Event Classification


CMS Event Classification is an analytics workflow that mainly involves the usage of third-party tools (e.g. TensorFlow, Apache Spark) for either data processing or Deep Learning. The main idea here is that a set of models is to be trained. The Feature Engineering stage (Data Preparation) will be run on the DAM as this allows utilization of the large memory capacities provided by the DCPMM. Training and inference will be run on the GPUs of the DAM and the ESB respectively: