Find out about our hardware team's experiments with innovative memory technologies.
The NVM device is physically attached to the Booster Nodes (BN) and provides local high-performance data storage.
The data written to it survives system restarts and power cycles. To ensure against node failures, redundant storage of data across multiple NVM devices are used. The traditional device for permanent, block addressable data storage is the magnetic, spinning disk. Its performance characteristics have shaped the architecture of today’s I/O systems. SSDs relying on NAND Flash non-volatile memory provide significantly better performance at higher cost per storage unit and limited capacity and endurance. Recent advances in memory technology promise high capacity, non-volatile memory to become available with performance and endurance characteristics that by far exceed Flash memory comes closer to the performance and addressability characteristics of DRAM. Potential technologies for NVM include Phase Change Memory, Memristors, Spin Transfer Torque RAM, and Magnetoresistive RAM.
NVM in an HPC environment
In the HPC context, NVM will certainly not replace DRAM due to the high latency and bandwidth requirements for main memory. However, it can play an important role for parallel I/O in general and for the checkpoint/restart functionality necessary for a resilient operation of Exascale systems. Both application areas can profit from having large memory pools available to reliably store data for later processing or asynchronous migration to permanent storage devices with limited bandwidth. Compared to NAND Flash, NVM will provide much better read and write performance, it will avoid the need to always write large blocks of data, and it will have significantly higher endurance. This allows accessing NVM in similar ways as today’s main memory, for instance by direct reading or writing cache lines or even bytes, instead of suffering from the limitations introduced by the block device interface required today.
NVM in the DEEP-ER architecture
In DEEP-ER, we explore the benefits of NVM technology by using devices that implement the NVM Express interface. Each Booster Node will have a significant amount (up to 1 Terabyte) of storage attached via PCI Express with performance by far exceeding conventional SSDs. The NVM devices will be used as:
- Distributed I/O buffers for the parallel file system that enable reliable staging of input and output data, facilitate prefetching of data, and in addition improve file system resiliency by storing critical configuration and mapping information.
- Buffers for checkpoint data that enable fast writing of checkpoints and offer sufficient capacity to use redundancy techniques for guarding against failures.