ZeptoOS, BG/P, and Petascale Machines
In the beginning, supercomputers ran a simple operating system. In some cases, customers wrote their own OS for the raw hardware. As MPPs and clusters emerged, they began to run thousands of instances of an operating system. Today, supercomputers have several types of operating systems on different functional units: storage nodes, compute nodes, login nodes, service nodes, etc. Most of the time they are based on Linux. As machines scale up, however, simple design choices that worked well for servers or several thousand nodes suddenly become bottlenecks. Can Linux be used for petascale BlueGene/P systems? Are preemptive multitasking operating systems doomed to fail on petascale machines? What influences the performance more: noise level or distribution? We investigate noise by measuring existing operating systems and injecting artificially generated noise into a massively parallel system to measure its influence on the performance of collective operations. Is paged memory a blessing or a curse? We compare the performance of the memory subsystem on a lightweight kernel and Linux. We also present the design and implementation of ZOID, an I/O forwarding system for petascale architectures, and evaluate its performance against the stock infrastructure. What are the fundamental constraints, and what does the real data suggest? This talk will address these questions and present data gathered from experiments on BG/L and BG/P systems.
Pete Beckman, Argonne National Laboratory