FaST: Find a Suitable Topology for Exascale Applications
"Find a Suitable Topology for Exascale Applications" (FaST) is a research project funded by the German Ministry of Education and Research. It deals with the temporal and spatial placement of processes on high performance computers of the future. It is widely assumed that the current trend in hardware development will continue and that the CPU performance will therefore grow considerably faster than the I/O performance. In order to prevent that these resources become bottlenecks in the system, FaST develops a new scheduling concept which monitors the system resources and locally adapts the distribution of the jobs. For monitoring the system a new agent-based system will be developed. The adaptations to the schedule will be realized by process migration. The effectiveness of the concept will be demonstrated in a prototype implementation using applications like LAMA and mpiBLAST.
In this project, the Institute for Automation of Complex Power Systems (ACS) examines existing and new techniques to migrate high-performance applications within a compute cluster. Migration is an important feature in modern compute centers as it allows for a more efficient use and maintenance of the hardware. In Principle there are following three types of migration:
- Process-level migration,
- Virtual machine, and
- Container-based migration.
In context of this project, we analyze their qualitative and quantitative properties and determine virtualization as the solution most suitable for high-performance computing. Afterwards we optimize these techniques to reach the desired objectives (efficiency, transparency to existing approaches, etc.) of the FAST project.
Latest publications
- Simon Pickartz, Carsten Clauss, Stefan Lankes, Stephan Krempel, Thomas Moschny, and Antonello Monti: Non-Intrusive Migration of MPI Processes in OS-bypass Networks, 1st Workshop on Emerging Parallel and Distributed Runtime Systems and Middleware (IPDRM 2016) held in conjunction with IEEE International Parallel and Distributed Processing Symposium (IPDPS 2016), Chicago, USA, 2016
Abstract:Load balancing, maintenance, and energy efficiency are key challenges for upcoming supercomputers. An indispensable tool for the accomplishment of these tasks is the ability to migrate applications during runtime. Especially in HPC, where any performance hit is frowned upon, such migration mechanisms have to come with minimal overhead. This constraint is usually not met by current practice adding further abstraction layers to the software stack.
In this paper, we propose a concept for the migration of MPI processes communicating over OS-bypass networks such as InfiniBand. While being transparent to the application, our solution minimizes the runtime overhead by introducing a protocol for the shutdown of individual connections prior to the migration. It is implemented on the basis of an MPI library and evaluated using virtual machines based on KVM.
Our evaluation reveals that the runtime overhead is negligible small. The migration time itself is mainly determined by the particular migration mechanism, whereas the additional execution time of the presented protocol converges to 2ms per connection if more than a few dozen connections are shut down at a time.
-
Simon Pickartz, Carsten Clauss, Jens Breitbart, Stefan Lankes und Antonello Monti: Application Migration in HPC—A Driver of the Exascale Era?, 14th International Conference on High Performance Computing & Simulation (HPCS 2016), Innsbruck, Austria, 2016
Abstract:
Application migration is valuable for modern computing centers. Apart from a facilitation of the maintenance process, it enables dynamic load balances for an improvement of the system’s efficiency. Although, the concept is already wide- spread in cloud computing environments, it did not find huge adoption in HPC yet.
As major challenges of future exascale systems are resiliency, concurrency, and locality, we expect migration of applications to be one means to cope with these challenges. In this paper we investigate its viability for HPC by deriving respective requirements for this specific field of application. In doing so, we sketch example scenarios demonstrating its potential benefits. Furthermore, we discuss challenges that result from the migration of OS-bypass networks and present a prototype migration mechanism enabling the seamless migration of MPI processes in HPC systems.