Adapting CoreNeuron – an advanced brain simulation application – to run efficiently on the DEEP architecture.
Optimising CoreNeuron for manycore architectures
Brain simulation is making giant leaps towards a better understanding of the brain’s inner workings. In DEEP, EPFL has been adapting CoreNeuron to run efficiently on the platform.In manycore architectures, efficient threading and vectorisation are no longer optional. To take advantage of them, these changes were necessary:
- An elaborated load balancing strategy at the thread level, taking into account the complexity of simulating different kinds of neurons.
- Data layout changes and refactoring of the compute loops to enable vectorisation.
The result: The effects of the refactoring are already very noticeable. the simulation now achieves a very high level of parallel efficiency and is able to take advantage of using 240+ threads. Memory layout transformation and vectorisation optimisation lead to enhanced performance even on memory bound kernels.
Going even further with DEEP
Being a highly scalable application, CoreNeuron is already running on some of the most powerful supercomputers on the planet. On the DEEP system, it can leverage one of the highest levels of parallelism very efficiently and explore future heterogeneous system designs. The key to success:
- Decoupling the I/O from the computation
The result: A speedup of more than one order of magnitude with respect to performing I/O directly from Xeon Phis in some preliminary tests on standard platforms. The design of the OmpSs offload allows direct (Cluster to Booster) and reverse (Booster to Cluster) offloading in the same way, which allows to implement the reverse offload followed in CoreNeuron in a particularly easy way.