SIAM International Conference on Data Mining (2003)

Proceedings

A Message from the Co-Chairs

We are very pleased to present the proceedings of the 2003 SIAM International Conference on Data Mining. The field of Data Mining has seen a tremendous increase of interest in recent months. Applications of Data Mining are mentioned often in the daily press, especially in the fields of security and forensics. Thus, these are exciting times for researchers and practitioners in the area. We hope that the research captured by these proceedings helps in advancing this important field.

We received 106 paper submissions from 17 countries. Each submitted paper was reviewed by at least four members of the program committee. The reviewing period was followed by a discussion phase. Finally 21 papers (19.8%) were selected to appear in the program as full papers, another 14 (13.4%) were accepted as poster presentations, and 7 (6.6%) were accepted as student papers. The latter category is a new addition to this year’s conference, intended to give papers whose main authors were students, the opportunity of presenting their work through a full-length talk in the conference. Student papers received five pages of the proceedings, as did poster papers.

The program of SIAM DM 2003 includes four keynote lectures, four tutorials, and two mini symposia. The mini symposia constitute a new addition to the format of this conference. We hope they will be an important forum for discussing topics of high relevance to the community. In addition, associated with the conference, we have planned six workshops, all in areas that are of current interest to data miners.

Several people have contributed to the success of this endeavor. First and foremost, we would like to thank the program committee members whose dedication and hard work made the selection of the program possible. We would also like to thank the members of the steering committee for their guidance and help. Special thanks to the conference Co-Chairs, Michael Berry and Rajeev Rastogi, who oversaw the process diligently. Thanks also to the tutorial chair, Joydeep Ghosh, who managed to put together a terrific set of tutorials, and to the workshop chair, Hillol Kargupta, for doing a great job with the associated workshops. Our warmest thanks go to Microsoft Corporation for providing the Conference Management Tool (CMT) that tremendously facilitated our work. In particular, we wish to acknowledge the support of Tim Olson from Microsoft, who spent time beyond the call of duty in helping us navigate through the CMT and was prompt to correct any problems along the way. We would also like to thank the staff at SIAM for their help in putting the proceedings and the conference together.

Finally, we are grateful to the authors and the participants who are the primary reason for the success of this conference. We hope you all enjoy the conference!

Daniel Barbará and Chandrika Kamath, Program Co-Chairs.

Part I: Full Papers

Decision Tree Classification of Spatial Data Patterns from Videokeratography using Zernicke Polynomials
M. D. Twa, S. Parthasarathy, and T. W. Raasch

Feature Mining Paradigms for Scientific Data
Ming Jiang, Tat-Sang Choy, Sameep Mehta, Matt Coatney, Steve Barr, Kaden Hazzard, David Richie, Srinivasan Parthasarathy, Raghu Machiraju, David Thompson, John Wilkins, and Boyd Gatlin

A Comparative Study of Anomaly Detection Schemes in Network Intrusion Detection
Aleksander Lazarevic, Levent Ertöz, Vipin Kumar, Aysel Ozgur, and Jaideep Srivastava

Fast Online SVD Revisions for Lightweight Recommender Systems
Matthew Brand

Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data
Levent Ertöz, Michael Steinbach, and Vipin Kumar

Hierarchical Document Clustering using Frequent Itemsets
Benjamin C. M. Fung, Ke Wang, and Martin Ester

Scalable, Balanced Model-based Clustering
Shi Zhong and Joydeep Ghosh

A New Gravitational Clustering Algorithm
Jonathan Gomez, Dipankar Dsgupta, and Olfa Nasraoui

Mining Changes of Classification by Correspondence Tracing
Ke Wang, Senqiang Zhou, Chee Ada Fu, and Jeffrey Xu Yu

Dynamic Classification of Online Customers
Dimitris J. Bertsimas, Adam J. Mersereau, and Nitin R. Patel

Communication and Memory Efficient Parallel Decision Tree Construction
Ruoming Jin and Gagan Agrawal

ATLaS: A Native Extension of SQL for Data Mining
Haixun Wang and Carlo Zaniolo

Approximate Query Answering by Model Averaging
Dmitry Pavlov and Padhraic Smyth

On using Page Cooccurrences for Computing Clickstream Similarity
Ravi Kothari, Parul Mittal, Vivek Jain, and Mukesh Mohania

CloSpan: Mining Closed Sequential Patterns in Large Databases
Xifeng Yan, Jiawei Han, and Ramin Afshar

StarClass: Interactive Visual Classification using Star Coordinates
Soon Tee Teoh and Kwan-Liu Ma

Anytime Query-Tuned Kernel Machines via Cholesky Factorization
Dennis DeCoste

Estimation of Topological Dimension
D. R. Hundley and M. J. Kirby

Nonparametric Density Estimation: Toward Computational Tractability
Alexander G. Gray and Andrew W. Moore

Generalized Sensitivity Analysis: A Framework for Evaluating Data Analysis Results
Ronald K. Pearson

STAMP: On Discovery of Statistically Important Pattern Repeats in Long Sequential Data
Jiong Yang, Wei Wang, and Philip S. Yu

Part II: Poster Presentations

Efficient Unsupervised Mining from Noisy Data Sets: Application to Clustering Co-occurrence Data
Hiroshi Mamitsuka

Active Sampling: An Effective Approach to Feature Selection
Huan Li, Hongjun Lu, and Lei Yu

PageRank: HITS and a Unified Framework for Link Analysis
Chris Ding, Xiaofeng He, Parry Husbands, Hongyuan Zha, and Horst Simon

The Application of Text Mining Software to Examine Coded Information
Patricia B. Cerrito and James Cox

Extracting Cyber Communities through Patterns
Tassos Argyros, Charis Ermopoulos, Vassiliki Pavlaki, and Nidal Al-Said

On the Techniques for Data Clustering with Numerical Constraints
Bi-Ru Dai, Cheng-Ru Lin, and Ming-Syan Chen

The Analysis of Asthma and Exposure Data using Geographic Information Systems and Data Mining Information
Patricia B. Cerrito, George R. Barnes, and Robert W. Forbes

Detecting Periodicity in Nonideal Datasets
R. K. Pearson, H. Lähdesmäki, H. Huttunen, and O. Yli-Harja

Detection of Underrepresented Biological Sequences using Class-Conditional Distribution Models
Slobodan Vucetic, Dragoljub Pokrajac, Hongbo Xie, and Zoran Obradovic

Learning Bayesian Network Structure from Distributed Data
R. Chen, K. Sivakumar, and H. Khargupta

Mixture Models and Frequent Sets: Combining Global and Local Methods for 0-1 Data
Jaakko Hollmén, Jouni K. Seppänen, and Heikki Mannila

Field-Theoretic Methods for Intractable Probabilistic Models
Dennis Lucarelli, Cheryl Resch, I-Jeng Wang, and Fernando J. Pineda

Data-Mining of a Large Virtual Community: Relationship between Users DB and the Web-Log File
S. M. Savaresi, Simone Garatti, Sergio Bittanti, and Luca La Brocca

Cube Lattices: A Framework for Multidimensional Data Mining
Alain Casali, Rosine Cicchetti, and Lotfi Lakhal

Part III: Student Papers

ApproxMAP: Approximate Mining of Consensus Sequential Patterns
Hye-Chung (Monica) Kim, Jian Pei, Wei Wang, and Dean Duncan

Mining Frequent Sequential Patterns under Regular Expressions: A Highly Adaptive Strategy for Pushing Contraints
Hunor Albert-Lorincz and Jean-François Boulicaut

Sort-Merge Feature Selection for Video Data
Yan Liu and John R. Kender

An Outlier-based Data Association Method for Linking Criminal Incidents
Song Lin and Donald E. Brown

CPAR: Classification based on Predictive Association Rules
Xiaxin Yin and Jiawei Han

Mining Temporal Databases for Subsequence Patterns
Wen Niu and Raj Bhatnagar

Using Low-Memory Representations to Cluster Very Large Data Sets
David Littau and Daniel Boley