## Parallel Processing '08: An Increasing Role for Combinatorial Methods in Large-Scale Parallel Simulations

**June 11, 2008**

**Michael Wolf and Erik Boman**

Combinatorial scientific computing (CSC) plays a vital enabling role in computational science and engineering. Many scientific computing problems require tasks that are inherently combinatorial; others have more subtle underlying discrete structures that complement the analytic structures of the problems. As problems in computational science continue to grow larger, and their solutions more complex, exploitation of these combinatorial structures will be increasingly important.

In recent years researchers in CSC have come together in a cohesive community. The first SIAM Workshop on Combinatorial Scientific Computing (2004) was followed by two international workshops, and minisymposia dedicated to CSC are not uncommon at SIAM and other international conferences. One such minisymposium was part of the program of the 2008 SIAM Conference on Parallel Processing for Scientific Computing. Rob Bisseling (Utrecht University) and Fredrik Manne (University of Bergen) were the organizers of the well-attended session, at which few seats were left unfilled. The eight speakers discussed various topics in CSC and parallel computing, including parallel combinatorial algorithms as well as combinatorial problems arising in support of parallelism.

For several decades, combinatorial algorithms have played an important role in algorithms for parallel scientific computing. They have been integral to the development of parallel direct and iterative methods for solving sparse linear systems, for example, as well as to parallel *N*-body simulations. As applications of computational science grow in size and complexity, and computers have larger numbers of processors/cores, these combinatorial algorithms for efficient parallelism will be increasingly important. Creation of the SciDAC Combinatorial Scientific Computing and Petascale Simulations (CSCAPES) institute can be seen as an indication of the enabling role that CSC will play in petascale simulations. The focus of CSCAPES research is on four areas of CSC of great importance for large-scale parallel scientific simulations: partitioning and load-balancing, automatic differentiation, graph algorithms (in particular, graph coloring and matching), and performance improvement. The hope is that with improvements in these underlying combinatorial methods, the parallel efficiency of petascale simulations will be greatly increased.

In this article we describe two active areas of CSC research affecting parallel computing that were discussed in the minisymposium, and give a broad overview of several additional areas of interest.

**Partitioning/Load-balancing**Large scientific simulations currently run on thousands of processors, and petascale simulations will most likely run on hundreds of thousands or millions of cores. An essential combinatorial task is to partition the data and computation over these processors/cores in order to minimize the total run-time. If this is to be done successfully, the amount of work should be balanced across the processes and the amount of communication required between the processes should be minimized---two objectives that are often at odds with each other.

Most partitioning methods are either geometry-based or connectivity-based. Geometry-based methods tend to be faster, but for many problems result in high communication volumes. Connectivity-based methods are relatively slow, but yield relatively low communication volumes. Connectivity-based methods use graphs or hypergraphs to model the communication volume so that it can be minimized. In his minisymposium talk at the conference, "Efficient and Scalable Parallel Graph Partitioning Methods," Jun-Ho Her (INRIA Futurs–LaBRI) presented work on the Scotch code. In particular, he described scalable diffusive methods for the local optimization used in the refinement step of multilevel partitioning. While Scotch has long supported serial graph partitioning and, more recently, PT-Scotch has offered parallel matrix ordering capability, efforts are now under way to generalize this to parallel graph partitioning as well. The code will become available this summer.

Sparse matrix operations are ubiquitous in large-scale scientific simulations. In sparse matrix partitioning, the nonzeros (and often corresponding vector elements) are partitioned and assigned to processes; high-quality partitioning techniques are increasingly important with the increased communication volumes resulting from sparse matrix operations. Partitioning techniques more complicated than conventionally necessary are needed to mitigate the growing communication volumes.

Traditionally, sparse matrices have been partitioned in one dimension; for more irregular or non-mesh-based matrices, however, this may be insufficient and two-dimensional partitioning may be necessary. Rob Bisseling opened the minisymposium with a discussion of the use of two-dimensional matrix distributions to reduce communication volumes, including examples for his two-dimensional Mondriaan method.

Rob Hoekstra (Sandia National Laboratories) presented results of recent electrical circuit simulations, in which much of the progress can be attributed to the use of CSC methods and software. Hoekstra showed that a new preconditioner based on combinatorial algorithms, including hypergraph partitioning for reduction to block triangular form (see Figure 1), can reduce the solution time by an order of magnitude or more.

*Figure 1. Circuit simulation matrix. Left, original; right, matrix modified by reordering and various other combinatorial techniques. Use of the modified matrix reduces the execution time by a factor of 1000. Courtesy of Rob Hoekstra.*

**Parallel Graph Coloring**

Parallel graph coloring is another area of CSC that is important to large-scale scientific simulations; in particular, graphs can be used to model the dependency of computational subtasks. In graph coloring, the vertices in a graph are assigned positive integers (colors) in such a way that adjacent vertices have different colors. The graph coloring is used to determine the computational subtasks that can be executed in parallel. Subtasks represented by different colors must be executed in different computational steps. Thus, when the number of colors is minimized, the number of sequential computational steps is minimized as well, and run-time is reduced. Graph coloring can be used in this manner in several applications, including the iterative solution of sparse linear systems, preconditioning, and automatic differentiation. Several variations on the vertex-coloring problem arise in the computation of sparse Jacobian and Hessian matrices.

Standard graph coloring is already a challenging problem, however. Not only finding an optimal solution, but even finding a good approximate solution, is NP-hard. Fortunately, serial greedy linear-time heuristics have been developed that, in practice, yield solutions of reasonable quality. Parallel coloring algorithms are needed for large-scale simulations because the computational de-pendency graph can be large and is often built in parallel and, thus, already distributed across different processes. In general, the sequential linear-time coloring heuristics are not easily extended to parallel methods; finding ways to extend them is an active area of research.

In his minisymposium talk, Assefaw Gebremedhin (Old Dominion University) presented a framework for parallel graph coloring. The main idea is to increase parallelism by having each process speculatively color its local vertices, after which conflicts between processes are detected and then resolved. This strategy is potentially useful for other problems as well.

**Diversity of Parallel CSC Research**

The short descriptions included here represent only a small selection of active areas of CSC research in parallel computing. Together, the speakers in the PP08 minisymposium covered a wide range of combinatorial problems in parallel scientific computing. Patrick Amestoy (ENSEEIHT) discussed improvement of the solution phase of a parallel sparse multifrontal solver in a limited memory environment. Specifically, he described streamlined scheduling for out-of-core methods with a new NNS (next-node-in-the sequence) strategy for processing the assembly tree that ensures more regular access to disk.

Michael Luelfesmann (RWTH Aachen University) described a CSC approach for improving the iterative solution of large sparse systems in which the coefficient matrices are Jacobian. With a bipartite graph model for the parallel preconditioning of such a system, he avoided full assembly of the Jacobian, using only a narrow band around the diagonal. Masha Sosonkina (Ames Laboratory) discussed parallel Schur-complement preconditioning of difficult unstructured linear systems. She used combinatorial techniques to identify nonsymmetric permutations for use in algebraic, recursive, multilevel preconditioners suitable for parallel computing. Bora Uçar (CERFACS) presented an iterative method for finding a particular class of bipartite matching in parallel. His goal is to use this result to derive permutations and scalings that will stabilize sparse, nonsymmetric direct solvers.

It seems fair to conclude that with an abundance of interesting problems, good attendance, and lively discussion in the PP08 minisymposium, parallel combinatorial scientific computing is an active field of research poised to play an important role in parallel scientific computing and the petascale simulations to come.

*Michael Wolf is a graduate student in computer science at the University of Illinois at Urbana–Champaign, where he works under the supervision of Michael Heath. Erik Boman is a scientist in the Scalable Algorithms Department at Sandia National Laboratories in Albuquerque, New Mexico.*