CSE 2011: Data-driven Art: Building Computational Models of the FaceJune 21, 2011
The face is the most important part of any dramatic character, said Mark Sagar of Weta Digital near the beginning of his invited talk ("Reverse Engineering the Face") at the 2011 SIAM Conference on Computational Science and Engineering, held in Reno, February 28 to March 4. "We're especially sensitive to faces," he continued, "which artistically are one of the most difficult things to get right."
As the talk unfolded, it became clear that the artistic effects achieved by the two-time Academy Award-winning Sagar were thoroughly grounded in computational math. His avowal in an impromptu question-and-answer session after the talk that he sees faces "as multi-channel signalling devices," that what he does "is essentially signal processing," didn't come as a surprise.
Part of the appeal of the talk derived from numerous images from familiar movies of the recent past, including King Kong (2005) and Avatar (2009), with details about the approaches used, and the difficulties encountered, in building the models of characters' faces.
For many of the students and other early-career conference participants, landing a job like Sagar's may have hovered as a tantalizing possibility. In introducing him, Karen Willcox of MIT pointed out that Sagar's specialties at Weta are facial motion capture, animation, and rendering technologies; his current title is special projects supervisor. Lest the job appear hopelessly one-of-a-kind, he alluded in the course of the talk to his counterparts in modeling bodies, hair, cloth, and water.
Contrasting a "pure art" approach, in which everything is done by hand, with a "data-driven" approach, in which geometry, motion capture, reflectance, pattern tracking, and finite element control meshes come into play, Sagar took the audience through some of the most difficult aspects of face modeling. Among them is skin. Given the complex reflectance and subsurface scattering of light, he said, the creation of realistic faces in 3D, in any lighting, is an enormous challenge. For highly dynamic lighting over a very wide range of values---the ability to place, say, a sunlit face in a deeply shadowed environment---he uses mathematical recombination techniques to produce the desired effects.
One memorable project called for "dynamic wrinkling," in which a 20-year-old character was to age to 80. "Ten years before Benjamin Button"---that is, ten years before subsurface scattering techniques became available---the difficulty of the problem motivated Sagar to contact Paul Debevec, now at the Institute for Creative Technologies at the University of Southern California. Sagar especially admired Debevec's short 1999 film Fiat Lux, shown at SIGGRAPH, in which the object of the rendering, lighting, and modeling was an architectural masterpiece---St. Peter's Basilica in Rome.
Working with Debevec had at least one unexpected payoff for both: In 2009, Sagar and Debevec, with Tim Hawkins and John Monos, received a Scientific and Engineering Award from the Academy of Motion Picture Arts and Sciences, for the design and engineering of a lighting stage and facial rendering system. This was the first of Sagar's two consecutive Oscars---the other, which he received in Los Angeles about a month before the SIAM talk, was for work that "led to a method for transforming facial motion capture data into an expression-based, editable character animation system."
For Light Stage 1.0, Sagar explained in Reno, they built a stage with lighting from two thousand directions so that they would be able to show a face under just about any condition; the lighting was rotated around the subject. Light Stage 2.0, developed with Debevec at USC, breaks the light down into view-dependent and -independent, and diffuse and specular components. In a "complex pipeline," they build up "little reflectance functions for every area of the face."
Among the other key challenges touched on in the Reno talk is facial expression. How do you classify what the face does? How do you transfer facial motion (e.g., from a human to a gorilla)? What is the fundamental information from the face?
As with other aspects in modeling the face, Sagar begins far under the surface to build facial expressions. The brain sends signals to the face, he pointed out; the muscular activation is the basis for geometric representation. Here, the Weta face modelers have a reference source: the facial action coding system, or FACS, developed in the 1970s by Paul Ekman and Wallace Friesen. Expressions are broken down into muscle groups, with a numeric value assigned to each; various groups can be combined to produce a particular expression.
Representing expressions on computer-generated faces at Weta begins with FACS. One of Sagar's examples, the character Neytiri from Avatar, started with brow expressions for anger, surprise, fear, sadness, and so on. The brow motion was then combined with different motions for the mouth, ears, and other features. "It takes a lot more effort to be angry than happy," he commented.
The motion-capture data, from a setup that tracks the motion of real actors assuming a variety of expressions at many points, can be noisy. "It matters what information you look at," Sagar said; "there's a lot of data---the problem is to define your parameter space." The real-time system tracks the live video, and solves for the facial expressions to give live feedback. The video is also used for more accurate offline processing for the final shots. Complications are many and inevitable, with the cameras often knocked out of alignment during scenes and stunts, and variations in the placement of the actors' markers each day.
Using the motion-capture data, an action performed by a human actor---chewing, say---can be transferred to a gorilla; Sagar mentioned the many nonlinearities that arose in simulating King Kong chewing. Eye motion is particularly important; the eyeballs and eyelids of a character looking to the left assume different shapes from those of a rightward glance, Sagar said. And the relation of eyelid to pupil reveals how intensely engaged a character is.
In the end, Sagar and his colleagues simulate the mechanics of the face from a structural anatomic foundation, taking into account facial anatomy and anatomic variations across the face, tissue variations, the structure of fat and its effects on shear, and aging-related anatomical changes. Here, too, a valuable resource is available for consultation: The Visible Human Female (a project of the U.S. National Institutes of Health), shows how muscle fibers attach to skin---useful information for biomechanical modeling of the face. Even with the exquisite attention to detail, realistic animation is labor-intensive. An artist must interpolate between the many different shapes produced in the simulation; respect for the underlying form is needed, along with an understanding of weight and dynamics.
Or not quite the end: With Cleve Moler in the audience, the question of software inevitably arises. Each actor is hooked up to two PCs, one for computer vision tracking and the other to do the solving, with results streamed to a central server. The team does use Matlab and basics from the BLAS and Intel libraries, like the SVD routines from LAPACK. As to finite elements, they write their own code; the resulting software is efficient, and, as Sagar pointed out: "If you write the code, you can also control it."
Looking to the future, Sagar mentioned a goal of improvements "on the data-reduction side," inviting comments from listeners and students interested in a career with Weta (firstname.lastname@example.org); inclusion of "cse: attn Mark Sagar" in the subject line will help get the message through to him.---GRC
Motion-capture data for Avatar characters Neytiri and Jake Sully was obtained from a real-time setup that tracked human actors (Zoe Saldana and Sam Worthington) as they assumed various facial expressions. © 2009 Twentieth Century Fox Film Corporation. All rights reserved. Images courtesy Weta Digital.