It is accessible through network-carried remote memory operations from the Cluster and Booster Nodes. It adds another level to the storage hierarchy and can also provide compute capabilities close to memory.
The NAM prototype
The DEEP-ER NAM prototype system consists of a Hybrid Memory Cube (HMC) based on DRAM technology and a NAM controller. The combination provides high-speed access to DDR memory, accessible from both the Booster and the Cluster node address spaces. The NAM nodes are attached to the DEEP-ER interconnect and form part of the node address spaces to support fine-grain access.
The Computer Architecture Group (CAG) at the Heidelberg University has implemented a 16x link with 10Gbit/s per lane to the HMC. This provides 40GB/s of bi-directional bandwidth. Two additional links provide connecticy to the EXTOLL network for an aggregate bandwith of 48GB/s. A substantial part of the NAM development is closely linked to the openHMC project by Heidelberg University. For more information on the open source project please see here.
To enable access to the NAM via the EXTOLL network the libNAM library was developed within the DEEP-ER project. It is a library programmed in C and provides a NAM Manager which is able to govern all the NAMs attached to the network. libNAM provides an API to reserve allocations on the NAMs, transfer data between the NAM and user codes and/or other
system modules like file-systems. It also enables user codes to trigger operations executed in the NAM which are then carried out on the data stored.
As its call signatures are derived from standard POSIX allocation and de-allocation routines it is very user-friendly. User codes can easily be modified by replacing standard allocation calls by the ones provided by libNAM. A special daemon - the NAM Manager - runs on a central place within the EXTOLL network housing the NAMs and takes control of memory allocation. To transfer data to and from the NAM, the API also provides put and get functionality operating on NAM allocations. Since the NAM houses an EXTOLL NIC which connects it to the network, the EXTOLL RMA (Remote Memory Access) protocol is being used for such data transfers. Using this low-level protocol ensures high bandwidth and low latency between the NAM and its connected compute nodes.
The extended API is designed to provide access to the full functionality of the NAM. As the NAM houses an FPGA it can be configured to carry out operations on the data stored. Configuration of the NAM for a certain operation is being done utilising the EXTOLL RRA (Remote Register Access) protocol. It can be used to trigger operations and registering nodes in the NAM which should participate in a remote operation like for instance XOR-Checkpointing.
DEEP-ER use cases
Within the DEEP-ER architecture, the NAM nodes can serve as targets to hold the parity information and thus extend our approach for scalalbe checkpoint/restart. This makes it possible to survive system crashes and to provide fast, truly random access. By this, they can significantly reduce the time required to calculate parity information as well as the time taken for reconstructing checkpointing data. In addition, exploiting locality when selecting NAMs as targets will reduce the load of the network.
Read here for more information on the use case.