T1: Data Mining for Science and Engineering Applications

Chandrika Kamath, Lawrence Livermore National Laboratory

Data analysis techniques have long been used to analyze scientific and engineering data. With sensors becoming ubiquitous, computers simulating complex processes at an unprecedented pace, and ever-improving data storage capabilities, petabyte-scale datasets are becoming routine. This has led to the innovative application of data mining techniques to novel and challenging problems. In this tutorial, we will first give a brief introduction to data mining in the context of scientific and engineering applications. Using examples from diverse fields such as astronomy, biology, physics, and remote sensing, we will identify the common threads that permeate the mining of scientific/engineering data. We will also illustrate how issues such as feature extraction and data fusion differentiate scientific data mining from its commercial counterpart. Our goal is to show that the diversity of applications, the richness of the problems faced by practitioners, and the opportunity to borrow ideas from other more established areas of data analysis, make scientific data mining an exciting and challenging field.

Presenter Bio

Chandrika Kamath received the Ph.D. degree in computer science from the University of Illinois at Urbana-Champaign in 1986. Prior to joining Lawrence Livermore National Laboratory in 1997, Chandrika was a Consulting Software Engineer at Digital Equipment Corporation. Her research interests are in large-scale data mining and pattern recognition, including image processing, feature extraction, dimension reduction, and classification and clustering algorithms. She is also interested in the practical application of these techniques. Chandrika is currently the project lead and an individual contributor for Sapphire, a project in large-scale data mining.