Proceedings: Data Mining 2004

Proceedings of the 2004 SIAM International Conference on Data Mining

Each link below is to a PDF of the paper as it was submitted. Papers are listed in program order. PDF file names represent the Proceedings (DM and year 04), followed by order in printed version (e.g. 001) and first author's last name and first initial..

Message from the Conference Co-Chairs

Preface

1 Mining Relationships between Interacting Episodes
Carl Mooney and John F. Roddick

11  Making Time-Series Classification More Accurate Using Learned Constraints
Chotirat Ann Ratanamahatana and Eamonn Keogh

23 GRM: A New Model for Clustering Linear Sequences
Hansheng Lei and Venu Govindaraju

33 Nonlinear Manifold Learning for Data Stream
Martin H. C. Law, Nan Zhang, and Anil K. Jain

45 Text Mining from Site Invariant and Dependent Features for Information Extraction Knowledge Adaptation
Tak-Lam Wong and Wai Lam

57 Constructing Time Decompositions for Analyzing Time Stamped Documents
Parvathi Chundi and Daniel J. Rosenkrantz

69 Equivalence of Several Two-Stage Methods for Linear Discriminant Analysis
Peg Howland and Haesun Park

78 A Framework for Discovering Co-location Patterns in Data Sets with Extended Spatial Objects
Hui Xiong, Shashi Shekhar, Yan Huang, Vipin Kumar, Xiaobin Ma, and Jin Soung Yoo

90 A Top-Down Method for Mining Most Specific Frequent Patterns in Biological Sequences
Martin Ester and Xiang Zhang

102 Using Support Vector Machines for Classifying Large Sets of Multi-Represented Objects
Hans-Peter Kriegel, Peer Kröger, Alexej Pryakhin, and Matthias Schubert

114 Minimum Sum-Squared Residue Co-Clustering of Gene Expression Data
Hyuk Cho, Inderjit S. Dhillon, Yuqiang Guan, and Suvrit Sra

126 Training Support Vector Machine Using Adaptive Clustering
Daniel Boley and Dongwei Cao

138 IREP++, A Faster Rule Learning Algorithm
Oliver Dain, Robert K. Cunningham, and Stephen Boyer

147 GenIc: A Single Pass Generalized Incremental Algorithm for Clustering
Chetan Gupta and Robert Grossman

154 Conquest: A Distributed Tool for Constructing Summaries of High-Dimensional Discrete Attributed Datasets
Jie Chi, Mehmet Koyutürk, and Ananth Grama

166 Basic Association Rules
Guichong Li and Howard J. Hamilton

178 Hierarchical Clustering for Thematic Browsing and Summarization of Large Sets of Association Rules
Alípio Jorge

188 Quantitative Evaluation of Clustering Results Using Computational Negative Controls
Ronald K. Pearson, Tom Zylkin, James S. Schwaber, and Gregory E. Gonye

200 An Abstract Weighting Framework for Clustering Algorithms
Richard Nock and Frank Nielsen

210 RBA: An Integrated Framework for Regression Based on Association Rules
Aysel Ozgur, Pang-Ning Tan, and Vipin Kumar

222 Privacy-Preserving Multivariate Statistical Analysis: Linear Regression and Classification
Wenliang Du, Yunghsiang S. Han, and Shigang Chen

234 Clustering with Bregman Divergences
Arindam Banerjee, Srujana Merugu, Inderjit Dhillon, and Joydeep Ghosh

246 Density-Connected Subspace Clustering for High-Dimensional Data
Karin Kailing, Hans-Peter Kriegel, and Peer Kröger

257 Tessellation and Clustering by Mixture Models and Their Parallel Implementations
Qiang Du and Xiaoqiang Wang

269 Clustering Categorical Data Using the Correlated-Force Ensemble
Kun-Ta Chuang and Ming-Syan Chen

279 HICAP: Hierarchical Clustering with Pattern Preservation
Hui Xiong, Michael Steinbach, Pang-Ning Tan, and Vipin Kumar

291 Enhancing Communities of Interest Using Bayesian Stochastic Blockmodels
Deepak Agrawal and Daryl Pregibon

300 VEDAS: A Mobile and Distributed Data Stream Mining System for Real-Time Vehicle Monitoring
Hillol Kargupta, Ruchita Bhargava, Kun Liu, Michael Powers, Patrick Blair, Samuel Bushra, James Dull, Kakali Sarkar, Martin Klein, Mitesh Vasa, and David Handy

312 DOMISA: DOM-Based Information Space Adsorption for Web Information Hierarchy Mining
Hung-Yu Kao, Jan-Ming Ho, and Ming-Syan Chen

321 CREDOS: Classification Using Ripple Down Structure (A Case for Rare Classes)
Mahesh V. Joshi and Vipin Kumar

333 Active Semi-Supervision for Pairwise Constrained Clustering
Sugato Basu, Arindam Banerjee, and Raymond J. Mooney

345 Finding Frequent Patterns in a Large Sparse Graph
Michihiro Kuramochi and George Karypis

357 A General Probabilistic Framework for Mining Labeled Ordered Trees
Nobuhisa Ueda, Kiyoko F. Aoki, and Hiroshi Mamitsuka

369 Mixture Density Mercer Kernels: A Method to Learn Kernels Directly from Data
Ashok N. Srivastava

379 A Mixture Model for Clustering Ensembles
Alexander Topchy, Anil K. Jain, and William Punch

391 Visualizing RFM Segmentation
Ron Kohavi and Rajesh Parekh

400 Visually Mining through Cluster Hierarchies
Stefan Brechiesen, Hans-Peter Kriegel, Peer Kröger, and Martin Pfeifle

412 Class-Specific Ensembles for Active Learning in Digital Imagery
Amit Mandvikar and Huan Liu

422 Mining Text for Word Senses Using Independent Component Analysis
Reinhard Rapp

427 A Kernel-Based Semi-Naive Bayesian Classifier Using P-Trees
Anne Denton and William Perrizo

432 BAMBOO: Accelerating Closed Itemset Mining by Deeply Pushing the Length-Decreasing Support Constraint
Jianyong Wang and George Karypis

437 A General Framework for Adaptive Anomaly Detection with Evolving Connectionist Systems
Yihua Liao, V. Rao Vemuri, and Alejandro Pasos

442 R-MAT: A Recursive Model for Graph Mining
Deepayan Chakrabarti, Yiping Zhan, and Christos Faloutsos

447 Lazy Learning by Scanning Memory Image Lattice
Yiqiu Han and Wai Lam

452 Text Mining Using Non-negative Matrix Factorizations
V. Paul Pauca, Farial Shahnaz, Michael W. Berry, and Robert J. Plemmons

457 Active Mining of Data Streams
Wei Fan, Yi-an Huang, Haixun Wang, and Philip S. Yu

462 Learning to Read Between the Lines: The Aspect Bernoulli Model
A. Kabán, E. Bingham, and T. Hirsimäki

467 Exploiting Hierarchical Domain Values in Classification Learning
Yiqiu Han and Wai Lam

472 IFD: Iterative Feature and Data Clustering
Tao Li and Sheng Ma

477 Adaptive Filtering for Efficient Record Linkage
Lifang Gu and Rohan Baxter

482 A Foundational Approach to Mining Itemset Utilities from Databases
Hong Yao, Howard J. Hamilton, and Cory J. Butz

487 The Discovery of Generalized Causal Models with Mixed Variables Using MML Criterion
Gang Li and Honghua Dai

492 Reservoir-Based Random Sampling with Replacement from Data Stream
Byung-Hoon Park, George Ostrouchov, Nagiza F. Samatova, and Al Geist

497 Principal Component Analysis and Effective K-Means Clustering
Chris Ding and Xiaofeng He

502 Classifying Documents without Labels
Daniel Barbará, Carlotta Domeniconi, and Ning Kang

507 Data Reduction in Support Vector Machines by a Kernelized Ionic Interaction Model
Hyunsoo Kim and Haesun Park

512 Continuous-Time Bayesian Modeling of Clinical Data
Sathyakama Sandilya and R. Bharat Rao

517 Subspace Clustering of High Dimensional Data
Carlotta Domeniconi, Dimitris Papadopoulos, Dimitrios Gunopulos, and Sheng Ma

522 Privacy Preserving Naïve Bayes Classifier for Vertically Partitioned Data
Jaideep Vaidya and Chris Clifton

527 Resource-Aware Mining with Variable Granularities in Data Streams
Wei-Guang Teng, Ming-Syan Chen, and Philip S. Yu

532 Mining Patters of Activity from Video Data
Michael C. Burl

Renew SIAM · Contact Us · Site Map · Join SIAM · My Account
Facebook Twitter Flickr Youtube