In this behind the scenes interview project manager Dr. Estela Suarez talks about the reasons to advance the DEEP project, what the consortium now focuses on and why Europe is actually the place to carry out such research.
We have accomplished quite some achievements in our first Exascale project DEEP: We have come up with an entirely new and very innovative hardware architecture; we have addressed extremely pressing challenges on the way to Exascale like scalability, programmability and energy efficiency.
Still, we found that the concept could be enhanced further and that we definitely need to address tw3o more critical aspects, in order to have a system that is adequately armed for the enormous challenges at the Exascale: These are highly scalable, efficient and user-friendly parallel I/O as well as resiliency. The scope, timeframe and budget of the DEEP project were just not prepared to fit these additional topics. Hence, applying for a follow-up project was a logical consequence. We are very glad we could convince the European Commission to provide additional funding and further support for our way to an Exascale machine.
Focusing on I/O and resiliency means you are more of a software-oriented project now?
That is not quite true. We still follow the co-design approach we have opted for in the DEEP project. After all, we strongly believe that this is the only way to achieve Exascale performance on application level.
Actually, highly scalable I/O and resiliency are good examples to explain why we have to do co-design across the entire HPC stack. After all, these are not solely software challenges. If we want to address these topics in a comprehensive way, we also have to take into account the underlying hardware and the applications that go on top.
In terms of applications, we already consider the special and very concrete requirements Exascale codes might have with respect to I/O and resiliency. We have chosen seven exemplary applications from various scientific fields to be able to get a good understanding of their I/O and resiliency requirements.
You were also talking about the hardware. What kind of improvements to the original hardware concept in the predecessor project DEEP are you thinking about?
There are a couple of aspects in that the new architecture differs from the old one: We use for instance new memory technologies and a uniform interconnect.
Regarding the memory, we are investigating extremely new and innovative technologies: We are planning to deploy NVM cards whose performance and endurance will exceed today’s NAND Flash memory by far. On top, we are looking into network attached memory (NAM) that will provide high-speed shared storage capabilities. Both of these technologies will play an important role for I/O and checkpoint / restart. A nice and at the same time overly important side effect is that such a design will also help reducing power consumption further.
Last but not least, we will simplify the system complexity by introducing a unified network for the Cluster and Booster part that will also allow for a tighter coupling of both parts and reduce the penalty of communicating between the two parts of the machine.
Taken together, these hardware improvements will lead to a more flexible architecture that enables easy upgrades and can accommodate entirely new technologies.
Next to DEEP and DEEP-ER there are four more Exascale initiatives supported by the European Union – with even more to come in the course of Horizon2020. Why is Europe the hub for such R&D initiatives?
The EU might not exactly be the cradle of IT industry. But first of all, we can rely on excellent HPC experts in our project as well as on innovative European IT providers. Secondly, it is the users who really make the difference in our research endeavour. We are not developing an Exascale ready supercomputer because we want to be the first ones to have that machine. We do that because we want to enable scientists to excel in their disciplines with the help of HPC and in doing so contribute to tackling societal challenges. In the project we have for example applications that deal with simulating brain activities or with the prediction of earthquakes. This requires that IT experts and scientists work together extremely closely. Europe provides excellent infrastructures to do so. And Europe is very renowned for its fundamental research in various areas such as medical engineering, environmental studies, manufacturing techniques and many more – something badly needed for this initiative as well. Last but not least, energy efficiency is an important challenge at Exascale and European researchers have a lot of experience and know-how in this field. Plus, this is also something backed by politics and by society. Hence, support for this part of our research is much higher in Europe than it might be in Asia or the USA.
Thank you for the interview, Estela!
|Dr. Estela Suarez is the Project Manager of both DEEP & DEEP-ER and is based at Jülich Supercomputing Centre, Germany. She holds a PhD in Physics and has worked on high-energy astrophysics projects at the University of Geneva, Switzerland. Already quite early in her studies Estela gained first experience in programming, something that accompanied her throughout her academic career. From there, it was only a small step into pursuing a career in high performance computing.|