CSE 2009: Python for Scientific Computing at CSE 2009June 15, 2009
Figure 1. AIG’s recent stock performance, summarized by matplotlib.
Fernando Pérez, Hans Petter Langtangen, and Randy LeVeque
Python for Scientific Computing, a three-part minisymposium we organized for CSE 2009, featured a mix of speakers from universities, government research laboratories, and industry who explained why they chose the open-source Python programming language and showed how they use it in their research and teaching. The sessions, all held on March 5, were well attended, as was a similar three-part minisymposium at last summer's SIAM Annual Meeting.
A Flexible Tool
Over the last few years, Python has experienced tremendous growth, becoming the tool of choice for many who do high-level scientific computing. It offers an effective mix of interactive and exploratory development approaches, direct access to libraries for many different tasks, and interfaces with high-performance numerical libraries in Fortran, C, or C++. Along with a few free add-on packages, Python provides basic capabilities similar to those offered by computational environments like MATLAB or IDL: numerical arrays with syntactic support for arithmetic and mathematical operations, a comprehensive library of common algorithms (in linear algebra, FFT, numerical integration, optimization, and special functions, among other areas), interactive control of data visualization, publication-quality plotting, and modules for interfacing with codes written in numerous other programming languages. What attracts many scientists to the language is its combination of flexibility, expressive power, and development possibilities that, in our experience, is unmatched by commercial tools.
Python was designed as a general-purpose language with an emphasis on a clear and readable syntax, high-level constructs that would not impede access to low-level resources, robust error handling, and portability. It ships with a comprehensive standard library that covers many common tasks, from text processing to network protocols or database access. Other free and open-source projects support most common computational needs, and provide bindings for use of a wide collection of Fortran, C, and C++ libraries. Today, Python is used extensively in industry, by companies like Google (the employer of Python's creator, Guido van Rossum), and in most U.S. federal research agencies, as made clear by the inclusion of speakers from several national laboratories and the National Institute of Standards and Technology in the minisymposia. Python requires no licensing fee and hence there are no license-manager hassles (important considerations for those running parallel codes on hundreds or thousands of processors or using cloud computing services). Highly portable, it can run on a cell phone or on the largest of supercomputers.
Research in Scientific Computing
The minisymposium speakers covered both general-purpose tools and domain-specific projects (all materials are available online*). One group of speakers focused on interactive, exploratory computing and data visualization: Fernando Pérez of UC Berkeley discussed the IPython system of components for interactive computing, and Brian Granger of Cal Poly San Luis Obispo covered IPython's applications for high-level parallel computing as well as the de-sign of distributed data structures. John Hunter of TradeLink, the lead developer of the matplotlib plotting package, presented an overview and hands-on demonstration of the project, whose goal is to provide exceptionally high-quality two-dimensional plots (see Figure 1).
Hank Childs of Lawrence Livermore National Laboratory followed up with a discussion of VisIt, a package for the analysis and visualization of large-scale three-dimensional data sets, such as those produced via adaptive mesh refinement on massively parallel machines. VisIt has its own GUI interface, along with a Python interface that makes it particularly easy to integrate with other programs. Figure 2 shows a visualization done with VisIt in which the high-quality VTK 3D graphics library combined multiple rendering options to display elevation data for Mount St. Helens.
Figure 2. Topographic visualization of Mount St. Helens performed with Lawrence Livermore Lab's VisIt software.
Speakers in another group covered Python tools for high-performance computing. Andreas Klöckner of Brown University demonstrated the easy access provided by his PyCuda library to the capabilities of modern high-performance graphics cards for numerical computing. Pearu Peterson of the Institute of Cybernetics at Estonia's Tallinn University of Technology presented his research on the algorithmic and data structure problems involved in the design of sympycore, a fast library for symbolic computing in Python. The mpi4py library, which permits the development of MPI codes in Python, was covered in a presentation by Lisandro Dalcín of the Argentinean CIMEC research laboratory and Brian Granger. With mpi4py, MPI primitives can be called in pure Python, and C or Fortran MPI codes (or even both in the same process) can be accessed directly, often with minimal performance loss.
Tony Drummond of Lawrence Berkeley National Laboratory described the PyACTS project, which provides Python interfaces to the ACTS collection of high-performance codes (Aztec, Hypre, PETSc, SLEPc, ScaLAPACK, SUNDIALS, SuperLU, TAO, and OPT++). Similar efforts of this type include PyTrilinos and petsc4py for use of Trilinos and PETSc from Python. Jon Guyer of NIST described the architecture of the FiPy project, a finite volume PDE solver developed by his group. With an easy-to-use syntax for model description, FiPy exploits Python tools like NumPy, SciPy, matplotlib, and PyTrilinos to tackle problems in materials science.
Aric Hagberg of Los Alamos National Laboratory presented the NetworkX project, a library of algorithms for studying complex networks; a good illustration of Python's strengths, the project couples an algorithmic core with rich functionality to multiple visualization libraries that provide alternative means of looking at networks. With Python, visualization systems written in other languages can be used to render results from NetworkX via a unified interface.
Python is not only a research tool: The day was bracketed by two presentations illustrating the use of Python in scientific computing education. Hans Petter Langtangen of Norway's Simula Research Laboratory described the University of Oslo's implementation of a major reform of computational science teaching using Python as its foundation. Students learn Python and numerical methods in their first semester and apply these tools in a range of science courses across the university.
Toward the end of the day, Joe Harrington of the University of Central Florida described a 2007 attempt to replace IDL with Python in his course on astronomical data analysis, with poor results because of documentation issues. He responded by organizing and funding the SciPy Documentation Project during the summer of 2008. Results were dramatic, he said in Miami: Students in his fall 2008 class learned more in less time than students in the IDL-based class.
Python's use in education is growing. In the U.S. its impact can be seen at both ends of the spectrum: The National Science Foundation-funded SECANT project supports the development of a Python-based curriculum and workshops for interdisciplinary computational science education; the One Laptop Per Child project provides economically disadvantaged children with laptops loaded with Python-based software that they can inspect, learn from, and modify.
Open Tools and Reproducible Research
Throughout the SIAM CSE conference, speakers emphasized the growing awareness that computational research must be truly reproducible. Included in the registration packet of everyone who attended the conference, in fact, was the January issue of Computing in Science and Engineering, a special issue focused on reproducible research.
We believe that Python is an excellent platform for building a work flow in which every step can be validated and reproduced by anyone. Because it is open source and available at no cost, there are no financial barriers to its use and all of its internal components are open to inspection. By using open-source tools to build our computing foundation, we can facilitate this core principle of the scientific process in our discipline; accordingly, every project discussed in the minisymposia in Miami is freely available for all to download, use, verify, and improve. We hope that the community of scientists who build tools in this manner will continue to grow. The Python projects for scientific computing are rapidly maturing, becoming better integrated and documented, more powerful, and easier to use. Join us!
Readers interested in learning more about these tools can visit the SciPy Web site (http://www.scipy.org/), which hosts some of these tools and contains links and information to many related projects.
Fernando Pérez is a research scientist at the University of California, Berkeley. Hans Petter Langtangen is director of the Center for Biomedical Computing at the Simula Research Laboratory, in Norway. Randy LeVeque is a professor of applied mathematics at the University of Washington, Seattle.