Proceedings of the 2005 SIAM International Conference on Data Mining

Each link below is to a PDF of the paper as it was submitted. Papers are listed in program order. PDF file names represent the Proceedings (DM and year 05), followed by order in printed version (e.g. 001) and first author's last name and first initial..
Message from the Conference Co-Chairs
1 Computational Developments of ψ-learning
Sijin Liu, Xiaotong Shen, and Wing Hung Wong
12 A Random Walks Perspective on Maximizing Satisfaction and Profit
Matthew Brand
20 Surveying Data for Patchy Structure
Ronald K. Pearson
32 2-Dimensional Singular Value Decomposition for 2D Maps and Images
Chris Ding and Jieping Ye
44 Summarizing and Mining Skewed Data Streams
Graham Cormode and S. Muthukrishnan
56 Online Analysis of Community Evolution in Data Streams
Charu C. Aggarwal and Philip S. Yu
68 Mining Frequent Itemsets from Data Streams with a Time-Sensitive Sliding
Window
Chih-Hsiang Lin, Ding-Ying Chiu, Yi-Hung Wu, and Arbee L. P. Chen
80 On Abnormality Detection in Spuriously Populated Data Streams
Charu C. Aggarwal
92 Privacy-Preserving Classification of Customer Data without Loss of Accuracy
Zhiqiang Yang, Sheng Zhong, and Rebecca N.Wright
103 Privacy-Aware Market Basket Data Set Generation: A Feasible Approach for
Inverse Frequent Set Mining
Xintao Wu, Ying Wu, Yongge Wang, and Yingjiu Li
115 On Variable Constraints in Privacy Preserving Data Mining
Charu C. Aggarwal and Philip S. Yu
126 Clustering with Model-Level Constraints
David Gondek, Shivakumar Vaithyanathan, and Ashutosh Garg
138 Clustering with Constraints: Feasibility Issues and the k-Means Algorithm
Ian Davidson and S. S. Ravi
150 A Cutting Algorithm for the Minimum Sum-of-Squared Error Clustering
Jiming Peng and Yu Xia
161 Dynamic Classification of Defect Structures in Molecular Dynamics Simulation
Data
Sameep Mehta, Steve Barr, Tat-Sang Choy, Hui Yang, Srinivasan Parthasarathy,
Raghu
Machiraju, and John Wilkins
173 Striking Two Birds with One Stone: Simultaneous Mining of Positive and
Negative Spatial Patterns
Bavani Arunasalam, Sanjay Chawla, and Pei Sun
183 Finding Young Stellar Populations in Elliptical Galaxies from Independent
Components of Optical Spectra
Ata Kabán, Louisa A. Nolan, and Somak Raychaudhury
195 Hybrid Attribute Reduction for Classification Based on a Fuzzy Rough Set
Technique
Qinghua Hu, Daren Yu, and Zongxia Xie
205 HARMONY: Efficiently Mining the Best Rules for Classification
Jianyong Wang and George Karypis
217 On Error Correlation and Accuracy of Nearest Neighbor Ensemble Classifiers
Carlotta Domeniconi and Bojun Yan
227 Lazy Learning for Classification Based on Query Projections
Yiqiu Han and Wai Lam
239 Mining Non-derivable Association Rules
Bart Goethals, Juho Muhonen, and Hannu Toivonen
250 Depth-First Non-derivable Itemset Mining
Toon Calders and Bart Goethals
262 Exploiting Relationships for Domain-Independent Data Cleaning
Dmitri V. Kalashnikov, Sharad Mehrotra, and Zhaoqi Chen
274 A Spectral Clustering Approach to Finding Communities in Graphs
Scott White and Padhraic Smyth
286 Mining Behavior Graphs for "Backtrace''
of Noncrashing Bugs
Chao Liu, Xifeng Yan, Hwanjo Yu, Jiawei Han, and Philip S. Yu
298 Learning to Refine Ontology for a New Web Site Using a Bayesian Approach
Tak-Lam Wong and Wai Lam
310 Exploiting Parameter Related Domain Knowledge for Learning in Graphical
Models
Radu S. Niculescu, Tom M. Mitchell, and R. Bharat Rao
322 Exploiting Geometry for Support Vector Machine Indexing
Navneet Panda and Edward Y. Chang
334 Parallel Computation of RBF Kernels for Support Vector Classifiers
Shibin Qiu and Terran Lane
346 Loadstar: A Load Shedding Scheme for Classifying Data Streams
Yun Chi, Philip S. Yu, Haixun Wang, and Richard R. Muntz
358 Topic-Driven Clustering for Document Datasets
Ying Zhao and George Karypis
370 Variational Learning for Noisy-OR Component Analysis
Tomas Singliar and Milos Hauskrecht
380 Summarizing Sequential Data with Closed Partial Orders
Gemma Casas-Garriga
392 SUMSRM: A New Statistic for the Structural Break Detection in Time Series
Kwok Pan Pang and Kai Ming Ting
404 Markov Models for Identification of Significant Episodes
Robert Gwadera, Mikhail Atallah, and Wojciech Szpankowski
415 Efficient Mining of Maximal Sequential Patterns Using Multiple Samples
Congnan Luo and Soon M. Chung
427 Gaussian Processes for Active Data Mining of Spatial Aggregates
Naren Ramakrishnan, Chris Bailey-Kellogg, Satish Tadepalli, and Varun N. Pandey
439 Correlation Clustering for Learning Mixtures of Canonical Correlation Models
Xiaoli Z. Fern, Carla E. Brodley, and Mark A. Friedl
449 On Periodicity Detection and Structural Periodic Similarity
Michail Vlachos, Philip Yu, and Vittorio Castelli
461 Cross Table Cubing: Mining Iceberg Cubes from Data Warehouses
Moonjung Cho, Jian Pei, and David W. Cheung
466 Decision Tree Induction in High Dimensional, Hierarchically Distributed
Databases
Amir Bar-Or, Assaf Schuster, Ran Wolff, and Daniel Keren
471 Slope One Predictors for Online Rating-Based Collaborative Filtering
Daniel Lemire and Anna Maclachlan
476 Sparse Fisher Discriminant Analysis for Computer Aided Detection
M. Murat Dundar, Glenn Fung, Jinbo Bi, Sandilya Sathyakama, and Bharat Rao
481 Expanding the Training Data Space Using Emerging Patterns and Genetic Methods
Hamad Alhammady and Kotagiri Ramamohanarao
486 Making Data Mining Models Useful to Model Non-paying Customers of Exchange
Carriers
Wei Fan, Janak Mathuria, and Chang-tien Lu
491 Matrix Condition Number Prediction with SVM Regression and Feature Selection
Shuting Xu and Jun Zhang
496 Cluster Validity Analysis of Alternative Results from Multi-objective Optimization
Yimin Liu, Tansel Özyer, Reda Alhajj, and Ken Barker
501 ClosedPROWL: Efficient Mining of Closed Frequent Continuities by Projected
Window List Technology
Kuo-Yu Huang, Chia-Hui Chang, and Kuo-Zui Lin
506 Three Myths about Dynamic Time Warping Data Mining
Chotirat Ann Ratanamahatana and Eamonn Keogh
511 PCA without Eigenvalue Calculations: A Case Study on Face Recognition
E. Kokiopoulou and Y. Saad
516 Mining Top-K Itemsets over a Sliding Window Based on Zipfian Distribution
Raymond Chi-Wing Wong and Ada Wai-Chee Fu
521 Hierarchical Document Classification Using Automatically Generated Hierarchy
Tao Li and Shenghuo Zhu
526 On Clustering Binary Data
Tao Li and Shenghuo Zhu
531 Time-Series Bitmaps: A Practical Visualization Tool for Working with Large
Time Series Databases
Nitin Kumar, Venkata Nishanth Lolla, Eamonn Keogh, Stefano Lonardi, Chotirat
Ann
Ratanamahatana, and Li Wei
536 Pushing Feature Selection Ahead of Join
Rong She, Ke Wang, Yabo Xu, and Philip S. Yu
541 Discarding Insignificant Rules During Impact Rule Discovery in Large, Dense
Databases
Shiying Huang and Geoffrey I. Webb
546 SPID4.7: Discretization Using Successive Pseudo Deletion at Maximum Information
Gain Boundary Points
Somnath Pal and Himika Biswas
551 Iterative Mining for Rules with Constrained Antecedents
Zheng Sun, Philip S. Yu, and Xiang-Yang Li
556 Influence in Ratings-Based Recommender Systems: An Algorithm-Independent
Approach
Al Mamunur Rashid, George Karypis, and John Riedl
561 Mining Unconnected Patterns in Workflows
Gianluigi Greco, Antonella Guzzo, Giuseppe Manco, and Domenico Saccà
566 The Best Nurturers in Computer Science Research
Bharath Kumar M. and Y. N. Srikant
571 Knowledge Discovery from Heterogeneous Dynamic
Systems Using Change-Point Correlations
Tsuyoshi Idé and Keisuke Inoue
576 Building Decision Trees on Records Linked through Key References
Ke Wang, Yabo Xu, Philip S. Yu, and Rong She
581 Efficient Allocation of Marketing Resources Using Dynamic Programming
Giuliano Tirenni, Abderrahim Labbi, André Elisseeff, and Cèsar
Berrospi
586 Near-Neighbor Search in Pattern Distance Spaces
Haixun Wang, Chang-Shing Perng, and Philip S. Yu
591 An Algorithm for Lattice-Structured Subspace Clusters
Haiyun Bian and Raj Bhatnagar
596 CBS: A New Classification Method by Using Sequential Patterns
Vincent S. M. Tseng and Chao-Hui Lee
601 SeqIndex: Indexing Sequences by Sequential Pattern Analysis
Hong Cheng, Xifeng Yan, and Jiawei Han
606 On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering
Chris Ding, Xiaofeng He, and Horst D. Simon
611 Kronecker Factorization for Speeding up Kernel Machines
Gang Wu, Zhihua Zhang, and Edward Chang
616 Symmetric Statistical Translation Models for Automatic Image Annotation
Feng Kang and Rong Jin
621 Correcting Sampling Bias in Structural Genomics through Iterative Selection
of Underrepresented Targets
Kang Peng, Slobodan Vucetic, and Zoran Obradovic
626 Statistical Models for Unequally Spaced Time Series
Emre Erdogan, Sheng Ma, Alina Beygelzimer, and Irina Rish
631 CLSI: A Flexible Approximation Scheme from Clustered Term-Document Matrices
Dimitrios Zeimpekis and Efstratios Gallopoulos
636 WFIM: Weighted Frequent Itemset Mining with a Weight Range and a Minimum
Weight
Unil Yun and John J. Leggett
641 Model-Based Clustering with Probabilistic Constraints
Martin H. C. Law, Alexander Topchy, and Anil K. Jain
