In the predecessor DEEP project, an innovative architecture for heterogeneous HPC systems has been developed based on the combination of a standard HPC Cluster and a tightly connected HPC Booster built of many- core processors.
DEEP-ER now evolves this architecture to address two significant Exascale computing challenges: highly scalable and efficient parallel I/O and system resiliency. Co-Design is key to tackle these challenges – through thoroughly integrated development of new hardware and software components, fine-tuned with actual HPC applications in mind.
In terms of hardware, a prototype is constructed that leverages advances in hardware components, employs a unified network, and integrates new storage technologies. Creating a hardware architecture that is utterly flexible and hence allows for easily upgrading or implementing entirely new technologies is the ultimate goal of this part of the project.
Needless to say: The enhancements in terms of the Cluster-Booster architecture form the basis for software improvements geared towards highly scalable I/O and resiliency approaches. The software experts on the team develop an efficient and user-friendly parallel I/O system tailored to the specific needs of large-scale HPC applications. On top of the I/O system, a unified user-level checkpointing system with low overhead is developed, exploiting multiple levels of storage. The DEEP programming model is extended to introduce easy-to-use annotations to control checkpointing, and to combine automatic re-execution of failed tasks and recovery of long-running tasks from multi-level checkpoint.
The requirements of HPC codes in terms of I/O and resiliency guide the design of the DEEP-ER hardware and software components. Seven applications are optimised for the prototype to demonstrate and validate the benefits of the DEEP-ER extensions to the Cluster-Booster Architecture.
DEEP-ER collaborates closely with other projects funded under the EU's FP7 Exascale Programme: