Building on the concepts and results of the predecessor project DEEP, we focus on the following objectives:
- DEEP-ER extends the Cluster-Booster architecture of the DEEP project by a highly scalable I/O system and implements an efficient mechanism to recover application tasks that fail due to hardware errors.
- The project leverages new memory technology to provide increased performance and power efficiency. As a result, I/O-intensive HPC codes will run faster and exploit higher scalability. HPC applications will be able to profit from sophisticated checkpointing and task restart techniques reducing overhead seen today, even on large-scale systems.
- DEEP-ER builds a prototype based on the second generation Intel® Xeon Phi processor, a uniform high-speed interconnect across Cluster and Booster, non-volatile memory on the compute nodes, and network attached memory providing high-speed shared storage.
- A highly scalable and efficient I/O system based on Fraunhofer’s BeeGFS file system supports I/O intensive applications, using the optimised I/O middleware SIONlib and Exascale10. A multi-level checkpoint scheme exploits the powerful I/O subsystem and the fast, network-attached storage to reduce the overhead of saving state for long-running tasks.
- The OmpSs based DEEP programming model governs the creation of checkpoints and seamlessly restarts failed tasks.
- Seven important HPC applications are optimised demonstrating the usability, performance and resiliency of the DEEP-ER Prototype.