We are very pleased to present the proceedings of the 2003 SIAM International Conference on Data Mining. The field of Data Mining has seen a tremendous increase of interest in recent months. Applications of Data Mining are mentioned often in the daily press, especially in the fields of security and forensics. Thus, these are exciting times for researchers and practitioners in the area. We hope that the research captured by these proceedings helps in advancing this important field.
We received 106 paper submissions from 17 countries. Each submitted paper was reviewed by at least four members of the program committee. The reviewing period was followed by a discussion phase. Finally 21 papers (19.8%) were selected to appear in the program as full papers, another 14 (13.4%) were accepted as poster presentations, and 7 (6.6%) were accepted as student papers. The latter category is a new addition to this year’s conference, intended to give papers whose main authors were students, the opportunity of presenting their work through a full-length talk in the conference. Student papers received five pages of the proceedings, as did poster papers.
The program of SIAM DM 2003 includes four keynote lectures, four tutorials, and two mini symposia. The mini symposia constitute a new addition to the format of this conference. We hope they will be an important forum for discussing topics of high relevance to the community. In addition, associated with the conference, we have planned six workshops, all in areas that are of current interest to data miners.
Several people have contributed to the success of this endeavor. First and foremost, we would like to thank the program committee members whose dedication and hard work made the selection of the program possible. We would also like to thank the members of the steering committee for their guidance and help. Special thanks to the conference Co-Chairs, Michael Berry and Rajeev Rastogi, who oversaw the process diligently. Thanks also to the tutorial chair, Joydeep Ghosh, who managed to put together a terrific set of tutorials, and to the workshop chair, Hillol Kargupta, for doing a great job with the associated workshops. Our warmest thanks go to Microsoft Corporation for providing the Conference Management Tool (CMT) that tremendously facilitated our work. In particular, we wish to acknowledge the support of Tim Olson from Microsoft, who spent time beyond the call of duty in helping us navigate through the CMT and was prompt to correct any problems along the way. We would also like to thank the staff at SIAM for their help in putting the proceedings and the conference together.
Finally, we are grateful to the authors and the participants who are the primary reason for the success of this conference. We hope you all enjoy the conference!
Daniel Barbará and Chandrika Kamath, Program Co-Chairs.
Decision Tree Classification of Spatial Data Patterns
from Videokeratography using Zernicke Polynomials
M. D. Twa, S. Parthasarathy, and T. W. Raasch
Feature Mining Paradigms for Scientific Data
Ming Jiang, Tat-Sang Choy, Sameep Mehta, Matt Coatney, Steve Barr, Kaden Hazzard,
David Richie, Srinivasan Parthasarathy, Raghu Machiraju, David Thompson, John
Wilkins, and Boyd Gatlin
A Comparative Study of Anomaly Detection Schemes in
Network Intrusion Detection
Aleksander Lazarevic, Levent Ertöz, Vipin Kumar, Aysel Ozgur, and Jaideep
Srivastava
Fast Online SVD Revisions for Lightweight Recommender
Systems
Matthew Brand
Finding Clusters of Different Sizes, Shapes, and Densities
in Noisy, High Dimensional Data
Levent Ertöz, Michael Steinbach, and Vipin Kumar
Hierarchical Document Clustering using Frequent Itemsets
Benjamin C. M. Fung, Ke Wang, and Martin Ester
Scalable, Balanced Model-based Clustering
Shi Zhong and Joydeep Ghosh
A New Gravitational Clustering Algorithm
Jonathan Gomez, Dipankar Dsgupta, and Olfa Nasraoui
Mining Changes of Classification by Correspondence Tracing
Ke Wang, Senqiang Zhou, Chee Ada Fu, and Jeffrey Xu Yu
Dynamic Classification of Online Customers
Dimitris J. Bertsimas, Adam J. Mersereau, and Nitin R. Patel
Communication and Memory Efficient Parallel Decision
Tree Construction
Ruoming Jin and Gagan Agrawal
ATLaS: A Native Extension of SQL for Data Mining
Haixun Wang and Carlo Zaniolo
Approximate Query Answering by Model Averaging
Dmitry Pavlov and Padhraic Smyth
On using Page Cooccurrences for Computing Clickstream
Similarity
Ravi Kothari, Parul Mittal, Vivek Jain, and Mukesh Mohania
CloSpan: Mining Closed Sequential Patterns in Large
Databases
Xifeng Yan, Jiawei Han, and Ramin Afshar
StarClass: Interactive Visual Classification using Star
Coordinates
Soon Tee Teoh and Kwan-Liu Ma
Anytime Query-Tuned Kernel Machines via Cholesky Factorization
Dennis DeCoste
Estimation of Topological Dimension
D. R. Hundley and M. J. Kirby
Nonparametric Density Estimation: Toward Computational
Tractability
Alexander G. Gray and Andrew W. Moore
Generalized Sensitivity Analysis: A Framework for Evaluating
Data Analysis Results
Ronald K. Pearson
STAMP: On Discovery of Statistically Important Pattern
Repeats in Long Sequential Data
Jiong Yang, Wei Wang, and Philip S. Yu
Efficient Unsupervised Mining from Noisy Data Sets:
Application to Clustering Co-occurrence Data
Hiroshi Mamitsuka
Active Sampling: An Effective Approach to Feature Selection
Huan Li, Hongjun Lu, and Lei Yu
PageRank: HITS and a Unified Framework for Link Analysis
Chris Ding, Xiaofeng He, Parry Husbands, Hongyuan Zha, and Horst Simon
The Application of Text Mining Software to Examine Coded
Information
Patricia B. Cerrito and James Cox
Extracting Cyber Communities through Patterns
Tassos Argyros, Charis Ermopoulos, Vassiliki Pavlaki, and Nidal Al-Said
On the Techniques for Data Clustering with Numerical
Constraints
Bi-Ru Dai, Cheng-Ru Lin, and Ming-Syan Chen
The Analysis of Asthma and Exposure Data using Geographic
Information Systems and Data Mining Information
Patricia B. Cerrito, George R. Barnes, and Robert W. Forbes
Detecting Periodicity in Nonideal Datasets
R. K. Pearson, H. Lähdesmäki, H. Huttunen, and O. Yli-Harja
Detection of Underrepresented Biological Sequences using
Class-Conditional Distribution Models
Slobodan Vucetic, Dragoljub Pokrajac, Hongbo Xie, and Zoran Obradovic
Learning Bayesian Network Structure from Distributed
Data
R. Chen, K. Sivakumar, and H. Khargupta
Mixture Models and Frequent Sets: Combining Global and
Local Methods for 0-1 Data
Jaakko Hollmén, Jouni K. Seppänen, and Heikki Mannila
Field-Theoretic Methods for Intractable Probabilistic
Models
Dennis Lucarelli, Cheryl Resch, I-Jeng Wang, and Fernando J. Pineda
Data-Mining of a Large Virtual Community: Relationship
between Users DB and the Web-Log File
S. M. Savaresi, Simone Garatti, Sergio Bittanti, and Luca La Brocca
Cube Lattices: A Framework for Multidimensional Data
Mining
Alain Casali, Rosine Cicchetti, and Lotfi Lakhal
ApproxMAP: Approximate Mining of Consensus Sequential
Patterns
Hye-Chung (Monica) Kim, Jian Pei, Wei Wang, and Dean Duncan
Mining Frequent Sequential Patterns under Regular Expressions:
A Highly Adaptive Strategy for Pushing Contraints
Hunor Albert-Lorincz and Jean-François Boulicaut
Sort-Merge Feature Selection for Video Data
Yan Liu and John R. Kender
An Outlier-based Data Association Method for Linking
Criminal Incidents
Song Lin and Donald E. Brown
CPAR: Classification based on Predictive Association
Rules
Xiaxin Yin and Jiawei Han
Mining Temporal Databases for Subsequence Patterns
Wen Niu and Raj Bhatnagar
Using Low-Memory Representations to Cluster Very Large
Data Sets
David Littau and Daniel Boley