Parallel Processing '08: Can Simulations Ever Scale to Millions of Processors?June 11, 2008
In a plenary talk titled "At the Limits of Scalability?" Thomas Lippert, head of the Jülich Supercomputing Centre, addressed questions of increasing scale---not only of computational problems solved on supercomputers, but also of the high-performance computing community.
The Jülich Supercomputing Centre is part of the Research Centre Jülich (Forschungzentrum Jülich), the largest unclassified research centre in Europe and a member of the Helmholtz Association. The Jülich supercomputers fall into two classes: general-purpose machines that serve the needs of the broader scientific community, and special-purpose machines that are highly scalable and provide cutting-edge floating-point performance. Lippert compared the machines of the first type to a Maybach limousine, those of the second type to a Ferrari formula one racing car. Each is a "supercar" in its own right; neither is unequivocally superior to the other, but each has its own capabilities and demanding user bases.
On the national scale, Jülich is one of three German centres for supercomputing, which together make up the Gauss Centre for Supercomputing (http://www.gauss-centre.de/). The Gauss Centre is one of the principal partners of PRACE (Partnership for Advanced Computing in Europe;
Beginning in 2009, PRACE will facilitate Europe-wide collaboration, providing tools for consistent management, procurement, and resource allocation across the European high-performance computing and scientific communities. Jülich is involved in collaborations with domain scientists in nanoscience, plasma physics, biology, and earth science, fostering an environment that connects the domain scientists with algorithm developers and system specialists who can assist with performance tuning.
The general-purpose machine now in place at the Jülich Supercomputing Centre is an IBM cluster called JUMP, which is used by about 150 scientific projects. The centre currently has two special-purpose machines: a Blue Gene/P, called Jugene, which debuted on the Top500 in the number 2 spot in November 2007, and a Blue Gene/L, called JUBL, which was number 8 on the Top500 list when it first appeared in June 2006. These machines are used for about 25 projects each, with some overlap.
The Jülich Supercomputing Centre hosted a week-long "Big Blue Gene Scaling Week" workshop in May 2006 for the express purpose of helping researchers scale up existing codes to take advantage of a system with 65,536 processors. Results from this workshop include molecular dynamics simulations of 15 million particles using the code DL-POLY3. A simulation of laser-accelerated protons (of which Lippert showed a video) was scaled to run on 8000 processors on Jugene; prior to the workshop it had been scaled only to 2000 processors. Another achievement was a nanoscale ab initio atomistic dynamic mean field theory code scaled to run on 60,000 processors.
In a collaboration with medical scientists at Aachen, the computational researchers affiliated with Jülich are modelling a blood pump using incompressible Navier–Stokes equations. Thanks to the Big Blue Gene Scaling workshop, performance of this code improved by a factor of 4–5.
As a final example of the scaling achievements on Jugene, Lippert discussed the performance of a lattice QCD code. The researchers have drastically reduced the memory footprint of the code, achieving 35% of peak performance in single precision and 25% of peak performance in double precision.
As mentioned many times during the conference, energy demands are a concern as systems continue to scale up. Lippert presented a graph displaying power consumption for a notional petaflop/s machine based on current microprocessors. The most optimistic of them was for a system consisting of cell processors, which would run at 1 megawatt; others would consume 9 megawatts.
This led Lippert to his concluding question: "Can simulation ever scale to millions of processors?"
Philip Sternberg is a postdoctoral researcher at Lawrence Berkeley National Laboratory.