Proceedings: Data Mining 2003

Proceedings of the 2003 SIAM International Conference on Data Mining


Cathedral Hill Hotel, San Francisco, CA
May 1-3, 2003

Each link below is to a PDF of the paper as it was submitted. Papers are listed in program order. PDF file names represent the Proceedings (DM and year 03), followed by order in printed version (e.g. 001) and first author's last name and first initial..

Message from the Conference Co-Chairs

Preface

Part I: Full Papers

3 Decision Tree Classification of Spatial Data Patterns from Videokeratography using Zernicke Polynomials
M. D. Twa, S. Parthasarathy, and T. W. Raasch

13 Feature Mining Paradigms for Scientific Data
Ming Jiang, Tat-Sang Choy, Sameep Mehta, Matt Coatney, Steve Barr, Kaden Hazzard, David Richie, Srinivasan Parthasarathy, Raghu Machiraju, David Thompson, John Wilkins, and Boyd Gatlin

25 A Comparative Study of Anomaly Detection Schemes in Network Intrusion Detection
Aleksander Lazarevic, Levent Ertöz, Vipin Kumar, Aysel Ozgur, and Jaideep Srivastava

37 Fast Online SVD Revisions for Lightweight Recommender Systems
Matthew Brand

47 Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data
Levent Ertöz, Michael Steinbach, and Vipin Kumar

59 Hierarchical Document Clustering using Frequent Itemsets
Benjamin C. M. Fung, Ke Wang, and Martin Ester

71 Scalable, Balanced Model-based Clustering
Shi Zhong and Joydeep Ghosh

83 A New Gravitational Clustering Algorithm
Jonathan Gomez, Dipankar Dsgupta, and Olfa Nasraoui

95 Mining Changes of Classification by Correspondence Tracing
Ke Wang, Senqiang Zhou, Chee Ada Fu, and Jeffrey Xu Yu

107 Dynamic Classification of Online Customers
Dimitris J. Bertsimas, Adam J. Mersereau, and Nitin R. Patel

119 Communication and Memory Efficient Parallel Decision Tree Construction
Ruoming Jin and Gagan Agrawal

130 ATLaS: A Native Extension of SQL for Data Mining
Haixun Wang and Carlo Zaniolo

142 Approximate Query Answering by Model Averaging
Dmitry Pavlov and Padhraic Smyth

154 On using Page Cooccurrences for Computing Clickstream Similarity
Ravi Kothari, Parul Mittal, Vivek Jain, and Mukesh Mohania

166 CloSpan: Mining Closed Sequential Patterns in Large Databases
Xifeng Yan, Jiawei Han, and Ramin Afshar

178 StarClass: Interactive Visual Classification using Star Coordinates
Soon Tee Teoh and Kwan-Liu Ma

186 Anytime Query-Tuned Kernel Machines via Cholesky Factorization
Dennis DeCoste

194 Estimation of Topological Dimension
D. R. Hundley and M. J. Kirby

203 Nonparametric Density Estimation: Toward Computational Tractability
Alexander G. Gray and Andrew W. Moore

212 Generalized Sensitivity Analysis: A Framework for Evaluating Data Analysis Results
Ronald K. Pearson

224 STAMP: On Discovery of Statistically Important Pattern Repeats in Long Sequential Data
Jiong Yang, Wei Wang, and Philip S. Yu

Part II: Poster Presentations

239 Efficient Unsupervised Mining from Noisy Data Sets: Application to Clustering Co-occurrence Data
Hiroshi Mamitsuka

244 Active Sampling: An Effective Approach to Feature Selection
Huan Li, Hongjun Lu, and Lei Yu

249 PageRank: HITS and a Unified Framework for Link Analysis
Chris Ding, Xiaofeng He, Parry Husbands, Hongyuan Zha, and Horst Simon

254 The Application of Text Mining Software to Examine Coded Information
Patricia B. Cerrito and James Cox

259 Extracting Cyber Communities through Patterns
Tassos Argyros, Charis Ermopoulos, Vassiliki Pavlaki, and Nidal Al-Said

264 On the Techniques for Data Clustering with Numerical Constraints
Bi-Ru Dai, Cheng-Ru Lin, and Ming-Syan Chen

269 The Analysis of Asthma and Exposure Data using Geographic Information Systems and Data Mining Information
Patricia B. Cerrito, George R. Barnes, and Robert W. Forbes

274 Detecting Periodicity in Nonideal Datasets
R. K. Pearson, H. Lähdesmäki, H. Huttunen, and O. Yli-Harja

279 Detection of Underrepresented Biological Sequences using Class-Conditional Distribution Models
Slobodan Vucetic, Dragoljub Pokrajac, Hongbo Xie, and Zoran Obradovic

284 Learning Bayesian Network Structure from Distributed Data
R. Chen, K. Sivakumar, and H. Khargupta

289 Mixture Models and Frequent Sets: Combining Global and Local Methods for 0-1 Data
Jaakko Hollmén, Jouni K. Seppänen, and Heikki Mannila

294 Field-Theoretic Methods for Intractable Probabilistic Models
Dennis Lucarelli, Cheryl Resch, I-Jeng Wang, and Fernando J. Pineda

299 Data-Ming of a Large Virtual Community: Relationship between Users DB and the Web-Log File
S. M. Savaresi, Simone Garatti, Sergio Bittanti, and Luca La Brocca

304 Cube Lattices: A Framework for Multidimensional Data Mining
Alain Casali, Rosine Cicchetti, and Lotfi Lakhal

Part III: Student Papers

311 ApproxMAP: Approximate Mining of Consensus Sequential Patterns
Hye-Chung (Monica) Kim, Jian Pei, Wei Wang, and Dean Duncan

316 Mining Frequent Sequential Patterns under Regular Expressions: A Highly Adaptive Strategy for Pushing Contraints
Hunor Albert-Lorincz and Jean-François Boulicaut

321 Sort-Merge Feature Selection for Video Data
Yan Liu and John R. Kender

326 An Outlier-based Data Association Method for Linking Criminal Incidents
Song Lin and Donald E. Brown

331 CPAR: Classification based on Predictive Association Rules
Xiaxin Yin and Jiawei Han

336 Mining Temporal Databases for Subsequence Patterns
Wen Niu and Raj Bhatnagar

341 Using Low-Memory Representations to Cluster Very Large Data Sets
David Littau and Daniel Boley

Renew SIAM · Contact Us · Site Map · Join SIAM · My Account
Facebook Twitter Flickr Youtube