Argonne Theory Institute: Differentiation of Computational Approximations to FunctionsJanuary 22, 1999
A Theory Institute, "Differentiation of Computational Approximations to Functions," was held at Argonne National Laboratory, May 18-20, 1998. The workshop was organized by Christian Bischof (now at the Technical University Aachen in Aachen, Germany) and Paul Hovland of the Mathematics and Computer Science Division at Argonne National Laboratory.
The Theory Institute brought together 38 researchers from the United States, Great Britain, France, and Germany. Mathematicians, computer scientists, physicists, and engineers from diverse disciplines discussed advances in automatic differentiation (AD) theory and software, as well as benefits realized with AD methods in application contexts. Fluid mechanics, structural engineering, optimization, meteorology, and computational mathematics for the solution of ordinary differential equations or differential-algebraic equations (DAEs) were among the application areas considered.
This workshop was the fourth to be devoted to automatic differentiation. Earlier meetings were two SIAM conferences---in Breckenridge, Colorado (1991), and Santa Fe, New Mexico (1996)---and the first Argonne Theory Institute on computational differentiation, in 1993.
AD methods can be used whenever gradient information or higher-order derivative information must be computed. The problem is defined by a computer program (without gradient information) that is able to compute numerical values of some output variables for a given set of input variables. The application of AD methods to this computer program results in the automatic generation of a new computer program, which computes the derivatives of the output variables with respect to the input variables.
Recent experience with AD tools has shown that both the implementation and the algorithmic frameworks need to be expanded, with an eye toward exploiting the strengths of these different approaches. Thus, the myriad possibilities of computing derivatives that arise from the associativity of the chain rule and the challenges for building systems that combine runtime and compile-time techniques provide a fertile ground for future challenging research.
AD methods are widely used for solving problems in which the output variables are computed "directly" from the input variables. One thing is not clear, however: What happens when the output variables are computed iteratively or approximately? That is, do "differentiate" and "approximate" commute? To discuss this issue, the workshop was dedicated to the differentiation of computational approximations to functions. Observations and experiences indicate that AD tools often compute correct gradients for approximations without any human modification, but some of the problems reported require closer investigation.
During the three-day workshop, 20 talks were presented and extensively discussed. In addition, lively discussions continued during the breaks.
A. Griewank, one of the pioneers of AD, set the stage by describing the turbulent history of the field during the past few decades. Using the table of contents of his soon-to-appear book on automatic differentiation, he also summarized current AD-related research.
H.-G. Bock discussed the problem of evaluating gradients for functions computed by numerical time integration of ODEs. He presented an internal differentiation method based on a selective activation of variables during time integration in specialized algorithms. This issue was also discussed by the author, who described the differentiation of general-purpose integration routines for ODEs used in multi-body dynamics, identifying potential sources of problems and ways to overcome them. Although black-box integration of this type often is feasible, manual deactivation of differentiation of adaptive elements (e.g., step-size control) sometimes is required.
In a related context, L. Petzold explored the use of AD for the solution of DAEs. Typically, inaccurate finite differences are used within DAE solvers, but it has been shown that the gradients can be computed more efficiently and accurately with AD. For frequently arising problem structures in numerical integration schemes, A. Verma proposed methods for exploiting the structure of continuous problems that, for example, maintain the sparsity of the problems.
As an alternative to DAE methods that rely solely on first-order derivative information, J. Pryce described the use of Taylor-series expansions to solve smooth DAEs. AD is used to compute the Taylor coefficients, and interesting consistency conditions can be derived.
The capability of AD to compute higher-order derivatives accurately and efficiently also underpins the work of M. Berz and K. Makino. Berz presented rigorous methods for obtaining verified results for the simulation of particle accelerators using higher-order remainder algebra. Tight bounds can be obtained, with the accuracy determined by the order of the AD-computed Taylor coefficients. Makino described applications of these techniques to verified quadrature and the integration of ODEs.
Several talks, motivated by applications, illustrated the need for accurate derivatives in engineering contexts and the potential savings that can be achieved with AD-based approaches. O. Pironneau gave an introduction to the application of shape optimization to wing design and breakwater design. The results are very sensitive: It is crucial to detect and to avoid "unphysical" designs. B. Mohammadi discussed aerodynamical flow control problems and the use of AD tools for shape optimization, in which fixed as well as moving boundaries are considered to control the flow. In this context, Mohammadi considers it sufficient and more efficient to compute only an approximate gradient, while maintaining, of course, the reliability of the "leading digits."
T. Slawig, who also investigated shape optimization of airfoils, illustrated the benefits of computing the Jacobian matrix in the nonlinear BFGS Newton method via AD. Using interval arithmetics and applications in economics, G. Corliss and J. Walters presented a rigorous global search procedure. Derivatives must be computed for both floating-point and interval arithmetics. Fortran 90 is used to interpret code lists and to do code transformation.
Two speakers discussed Burger's equation, which is used for advection-diffusion systems in meteorology and fluid dynamics. S.K. Park discussed applications of AD in weather modeling and presented a new numerical scheme for an adjoint model. A. Walther described work on computing adjoints of discretizations of Burger's equation, including the application of an optimal check-pointing scheme to overcome memory restrictions in the reverse mode.
While it is tempting to use AD tools in a black-box mode, this is not the most efficient use of the technology. B. Christianson pointed out that applications must expose more of their structure to AD tools. Such an approach not only yields more efficient code, as shown in Christianson's examples, but also can help the researcher become aware of potential pitfalls. In some cases a black-box mode may not be possible, as illustrated by C. Faure in her talk on the application of the AD tool Odyssee to an industrial thermohydraulic code, in which she discussed ways to overcome such problems as inaccessible sources for software libraries. She also showed that although the results are mostly very accurate, unphysical oscillations in the derivatives are observed for some input parameters.
S. Brown addressed the design of future AD tools, which will need to offer a high degree of flexibility, in, for example, mixed-mode computations or specialized evaluation strategies. There seems to be a trend leading from general black-box tools to more specialized tools in which user knowledge is introduced to optimize time and memory efficiency. AD tool design was also the focus of N. Di Cesare's talk on an AD implementation that uses operator overloading and expression templates in C++. This approach leads to a very flexible tool, at the expense of longer compilation times. J. Abate considered the computation of second-order derivatives. Whereas AD tools can be used repeatedly to compute Hessians, much better results can be achieved by exploiting the structure of the computation. The possible savings become even greater when only Hessian-vector products are required.
Finally, P. Hovland illustrated the benefits that arise from the differentiation of an algorithm. He described the application of AD to a successive overrelaxation algorithm to solve linear equations, where the relaxation parameter was adaptively modified to improve the convergence. To obtain improved values, Hovland used ADIFOR to compute the derivative of a cost function with respect to the relaxation parameter.
To summarize, there has been a big change in the AD community during the past decade. In the first phase in the development of the field, it was necessary to understand the basic mechanisms and applications of AD and to develop a "common language," enabling scientists from many different areas to communicate. In the second phase, AD was used successfully in some large applications; based on this experience, intensive discussions about the best way to design and implement tools were started.
Perhaps we have now entered the third phase, in which all basic methods and some efficient tools are available and AD has proven its reliability in some very complex problems. Nevertheless, many open questions remain, and scientists from all over the world are actively seeking solutions and pushing the envelope in this rapidly developing, and by its very nature interdisciplinary, field. The hope is that, in the not-too-distant future, it will be possible to compute derivatives so efficiently and conveniently that nobody will understand why, in the late 20th century, this task was such a serious problem in many fields of research.
Peter Eberhard (email@example.com) is a research scientist specializing in mechanics at the University of Stuttgart, Germany.
Further information about the Theory Institute is available at http://www.mcs.anl.gov/autodiff/workshop.html.