Dongarra Is Elected to NAEMay 9, 2001
Describing an ongoing study of the top 500 computers in the world---a study that amounts to taking a "snapshot" every six months, in November and June, of computing everywhere in the world---Jack Dongarra pointed out that a computation that would have taken one year to solve in 1980 took nine hours in 1992 and can be completed today in 90 seconds (on the Intel ASCI White at Lawrence Livermore National Laboratory). Just a few years ago, he said, the laptop he uses for e-mail would have been in the top 500 machine list!
A frequent speaker at SIAM meetings, Dongarra recently gave a talk at a conference held in Berkeley in memory of Fred Howes (see "Remembering a Peerless Program Manager, Celebrating a Vital Applied Math Sciences Program"). In the Berkeley talk, "The Impact of Computer Architecture on Linear Algebra Algorithms," he gave the audience a good idea of what goes on in an innovative computing lab, beginning with a brief look at current and future high-performance computing environments.
With today's teraflops machines, he said, computing is highly parallel, with distributed processors, often in network-based systems; users are faced with computation/communication tradeoffs. Looking ahead ten years, to an era of petaflops machines, he sees many more levels in memory hierarchies; there will be shared memory nodes, and the use of latency tolerance in algorithms will be critical. At that point, he said, algorithms will have to be more adaptive to the computer architecture.
As to the current performance/memory mismatch, given the Moore's law doubling in performance occurring every 18 months, he pointed to the memory speed deficiencies that result in many numerical operations. Architects are well aware that things are way out of line, he said, which is why they're providing memory hierarchies, with up to three levels of cache memory.
In this setting, he explained, users need to structure their computations so that data are reused. SANS---self-adapting numerical software---can provide good performance automatically, "but it's a complex balancing act." Among the challenges are understanding and taking advantage of algorithm block sizing, compiler optimization, and hardware-specific parameters. The concepts are well understood, he said, but getting the details right continues to challenge users.
ATLAS (Automatically Tuned Linear Algebra Subroutines), a system developed by Dongarra and colleagues at the University of Tennessee (see SIAM News, November 1999, page 1), is one example of a system that generates a program based on the specifics of the hardware platform and compiler. For, say, a matrix multiply, ATLAS probes the system, performing thousands of tests, and then generates the optimal software. Today, ATLAS is in use in Matlab, Maple, Octave, and other numerical packages. Related tuning projects include FFTW (the Fastest Fourier Transform in the West, also described in the SIAM News article) and several efforts on sparse matrix operations.
By using the SANS concepts to re-arrange a calculation, e.g., a singular value decomposition, a user can enhance performance by up to 30%; with conjugate gradients, researchers have developed several algorithms that coalesce inner products to collect them at one sychronization point in the algorithm, thereby reducing computation times by 10-15%. Super LU, a high-performance sparse solver being developed by James Demmel and colleagues at Lawrence Berkeley National Laboratory, has provided impressive results on a hydrogen atom scattering problem, as reported in Science magazine in December 1999 (and featured on the issue's cover). The strength of the algorithm, Dongarra pointed out, lies in its ability to find small dense blocks.
Dongarra touched on several large-scale distributed-computing projects, beginning with seti@home. Using idle cycles, the approximately 1.6 million participants in 224 counties search the huge database provided by the SETI group at UC Berkeley from data collected at the Arecibo Radio Telescope, in Puerto Rico; it is the largest such project in existence. Others include the Grid, which treats CPU cycles and software as commodities; Netsolve at the University of Tennessee, which provides users with software they don't have on their machines; and NEOS, an Argonne project that provides optimization software for remote execution of applications.
Among Dongarra's conclusions: Determinism in computation is gone---given the setting in which high-performance computing is done today, results of the same computation, done on successive days, will differ.
Dongarra was a member of the SIAM Council from 1986 to 1991. In addition to his many SIAM talks, he is an author of many of SIAM's most successful books, including the LINPACK, LAPACK, and ScaLAPACK users' guides and Templates for the Solution of Algebraic Eigenvalue Problems: A Practical Guide. He was the founding chair of the SIAM Activity Group on Supercomputing and continues to be an active member.