Imaging Science: Computational Metric Geometry: A New Tool in Imaging Science

July 23, 2010


Figure 1. How to add two cats? The notions of signal processing are often not applicable when it comes to the analysis of 3D shapes, which requires new and very different mathematical tools.
Michael Bronstein

The last century has seen an evolution in the dimensions of media: from the invention of the radio, bringing one-dimensional audio signals to practically every household on the planet, to the spread of television and the advent of the era of image and video signals. From a mathematical viewpoint, signal processing has traditionally been the kingdom of functional analysis, mainly because it is very natural to think of signals and images as functions. Recently, however, more and more efforts have been directed to the search for alternative mathematical models for the representation of signals. These winds of change emanate mostly from communities working on the processing and analysis of three-dimensional data, an increasingly popular area in imaging sciences, driven by the advance of 3D acquisition and display technology. The 2010 SIAM Conference on Imaging Science, held in Chicago in April, showcased some of these recent trends. [Other conference highlights will be featured in upcoming issues of SIAM News.---eds.]

Three-dimensional data differ from their one- and two-dimensional counterparts in some fundamental respects. Unlike the case for 1D signals and images, representing 3D shapes as functions is unnatural and inconvenient. Basic operations taken for granted in traditional signal processing, such as adding or subtracting two signals, are non-trivial when it comes to 3D objects: It is not clear, for example, what would result from the addition of two cats (Figure 1). Moreover, whereas signals and images are straightforwardly discretized as vectors of samples or matrices of pixels, the representation of 3D objects depends heavily on the applications. Thus, 3D models encountered in computer graphics usually appear as triangular meshes or clouds of points, and shapes of organs in medical imaging as voxels or implicit surfaces.

While the lack of a common comparable representation makes it difficult to define the similarity of two different 3D shapes, the "self-similarity" of a shape can be expressed by defining a distance structure that measures the "proximity" of a pair of points. Comparison of such structures, more formally called metric spaces, is a well-defined problem in metric geometry, a branch of theoretical mathematics that formalizes and studies the notions of "distance" and "similarity." In recent decades, this field has experienced revolutionary development, although it continues to be virtually unnoticed in the applied sciences. When 3D shapes are modeled as metric spaces, the representation problems can be forgotten and shape similarity can be thought of as the similarity of the corresponding metrics. The metric model also offers an elegant way to address the issues of invariance: defining the metric in such a way that it is unaffected by a class of transformations (e.g., non-rigid bending) of the shape, and describing invariance in terms of isometries (metric-preserving mappings) between metric spaces.

Applied more broadly, metric geometry can be considered a wide common denominator in computational science and engineering. Many problems that arise, especially in computer vision and pattern recognition, involve some underlying notion of similarity. In object detection and recognition, one of the classic problems in computer vision, similarity of regions in an image to some object model is used to decide whether an object is there. Different types of inverse problems involve a criterion of similarity between the observed data and the data to be estimated. Finally, in content-based image retrieval, some metric between images is used to rank the matches. The latter is an example of an emerging need to solve computer vision and pattern recognition problems on large-scale repositories of visual data, such as community photo albums, video blogs, social networks, web TV, and peer-to-peer networks, collectively known as "Internet vision."

Internet vision is among the most interesting and challenging applications of metric geometry. The scale, complexity, and diversity of such data make it very difficult or even impossible to model their metric structure. However, a sampling of the metric on a subset of the data is typically easily available in such applications. One example is the Video Genome project (Figure 2), aimed at the detection of pirated and illegal video content, a constant headache for web search and social network companies. An illegal video uploaded to YouTube can be very different from the original version because of changes in format, compression, editing, and post-production processing. It is easy to simulate typical transformations a video can undergo, thereby creating a set of examples of similar and dissimilar videos, from which a transformation-invariant metric can be learned. Embedding this metric in another, more convenient metric space can reduce the video representation storage complexity and speed up the search for similar video sequences.


Figure 2. The Video Genome project attempts to represent videos much as DNA sequences are represented and to apply algorithms from bioinformatics to detect copies or modifications. Modifications of the video sequence are regarded as similar to mutations of DNA. A central piece of this approach is a metric measuring mutation-invariant similarity of visual content, which is learned from examples.

With many real-life problems that can be approached from the geometric perspective on the one hand, and powerful theoretical tools on the other, I see much hope for metric geometry in the applied sciences. This is the idea I tried to convey in the Forward Looking Panel Discussion at the SIAM conference, having been invited by Fadil Santosa to present fresh or even controversial research directions. A big gap needs to be addressed by applied mathematicians if metric geometry is to become a true computational science: It is still not completely clear in which cases metric models can be applied, nor is it known how they can be approximated, discretized, and computed. What can be said for sure is that attempting to address these challenges offers an exciting playground for interdisciplinary research in the coming decades.

Michael Bronstein is currently a member of the Department of Computer Science at the Technion-Israel Institute of Technology. He is a co-author of the book Numerical Geometry of Non-rigid Shapes (Springer, 2008). Video Genome is a project developed in collaboration with Alexander Bronstein and Ron Kimmel by BBKTech; additional information can be found at www.v-nome.org.



Renew SIAM · Contact Us · Site Map · Join SIAM · My Account
Facebook Twitter Youtube