2008 Data Mining Proceedings

Proceedings of the 2008 SIAM International Conference on Data Mining

Each link below is to a PDF of the paper as it was submitted. Papers are listed in program order. PDF file names represent the Proceedings (DM and year 08), followed by order in printed version (e.g. 001) and first author's last name and first initial..

Message from the Conference Co-Chairs

Preface

Constraint-Based Clustering

1 Semi-Supervised Clustering via Matrix Factorization
Fei Wang, Tao Li, and Changshui Zhang

13 Creating a Cluster Hierarchy under Constraints of a Partially Known Hierarch
Korinna Bad and Andreas Nürnberger

25 Constrained Co-clustering of Gene Expression Data
Ruggero G. Pensa and Jean-François Boulicaut

37 Data Peeler: Contraint-Based Closed Pattern Mining in n–ary Relations
Loïc Cerf, Jérémy Besson, Céline Robardet, and Jean-François Boulicaut

49 SpaRClus: Spatial Relationship Pattern-Based Hierarchial Clustering
Sangkyum Kim, Xin Jin, and Jiawei Han

Pattern Mining

61 Mining Tree Patterns with Almost Smallest Supertrees
Jeroen De Knijf

72 Maximal Quasi-Bicliques with Balanced Noise Tolerance: Concepts and Co-clustering Applications
Jinyan Li, Kelvin Sim, Giumei Liu, and Limsoon Wong

84 CISpan: Comprehensive Incremental Mining Algorithms of Closed Sequential Patterns for Multi-Versional  Software Mining
Ding Yuan, Kyuhyung Lee, Hong Cheng, Gopal Krishna, Zhenmin Li, Xiao Ma, Yuanyuan Zhou, and Jiawei Han

96 Mining Association Rules of Simple Conjunctive Queries
Bart Goethals, Wim Le Page, and Heikki Mannila

108 Discovering Relational Items Sets Efficiently
Arne Koopman and Arno Siebes

Classification/Regression

120 A Stagewise Least Square Loss Function for Classification
Shuang-Hong Yang and Bao-Gang Hu

132 Semi-Supervised Learning Based on Semiparametric Regularization
Zhen Guo, Zhongfei (Mark) Zhang, Eric P. Xing, and Christos Faloutsos

143 Roughly Balanced Bagging for Imbalanced Data
Shohei Hido and Hisashi Kashima

153 An Efficient Local Algorithm for Distributed Multivariate Regression in Peer-to-Peer Networks
Kanishka Bhaduri and Hillol Kargupta

165 Aerosol Optical Depth Prediction from Satellite Obsercations by Multiple Instance Regression
Zhuang Wang, Vladan Radosavljevic, Bo Han, Zoran Obradovic, and Slobodan Vucetic

Feature Selection/Statistical Methods

177 Feature Selection with the logRatio Kernel
Julien Prados, Alexandros Kalousis, and Melanie Hilario

188 A RELIEF Based Feature Extraction Algorithm
Yijun Sun and Dapeng Wu

196 Deterministic Latent Variable Models and Their Pitfalls
Max Welling, Chaitanya Chemudugunta, and Nathan Sutter

208 Massive-Scale Kernel Discriminant Analysis: Mining for Quasars
Ryan Riegel, Alexander Gray, Gordon Richards

219 Dynamic Non-Parametric Mixture Models and the Recurrent Chinese Restaurant Process : with  Applications to Evolutionary Clustering
Amr Ahmed and Eric Xing

Outliers/Privacy

231 Latent Variable Mining with Its Applications to Anomalous Behavior Detection
Shunsuke Hirose and Kenji Yamanishi

243 Similarity Measures for Categorical Data: A Comparative Evaluation
Shyam Boriah, Varun Chandola, and Vipin Kumar

255 Gaussian Process Learning for Cyber-Attack Early Warning
Jian Zhang, Phillip Porras, and Johannes Ulrich

265 Practical Private Computation and Zero-Knowledge Tools for Privacy-Preserving Distributed Data Mining
Yitao Duan and John Canny

277 A Spamicity Approach to Web Spam Detection
Bin Zhou, Jian Pei, and Zhaohui Tang

Poster Session

289 Semantic Smoothing for Bayesian Text Classification with Small Training Data
Xiaohua Zhou, Xiadan Zhang, and Xiaohua Hu

301 Clustering from Constraint Graphs
Ari Freund, Dan Pelleg, and Yossi Richter

313 Efficiently Mining Closed Subsequences with Gap Constraints
Chun Li and Jianyong Wang

323 Semi-Supervised Classification with Universum
Dan Zhang, Jingdong Wang, Fei Wang, and Changshui Zhang

334 Finding Subgroups having Several Descriptions: Algorithms for Redescription Mining
Arianna Gallo, Pauli Miettinen, and Heikki Mannila

346 The PageTrust Algorithm: How to rank web pages when negative links are allowed?
Cristobald de Kerchove and Paul Van Dooren

353 A pattern mining approach toward discovering generalized sequence signatures
Dietmar H. Dorr and Anne M. Denton

363 The Asymmetric Approximate Anytime Join: A New Primitive with Applications to Data Mining
Lexisang Ye, Xiaoyue Wang, Dragomir Yankov, and Eamonn Keogh

375 Preemptive Measures against Malicious Party in Privacy-Preserving Data Mining
Shuguo Han and Wee Koeng Ng

387 A Range Query Approach for High Dimensional Euclidean Space Based on EDM Estimation
Kentarou Kido, Hiroshi Kuwajima, and Takashi Washio

399 A Bayesian Technique for Estimating the Credibility of question Answerers
Byron Dom and Deepa Paranjpe

410 Semi-supervised Multi-label Learning by Solving a Sylvester Equation
Gang Chen, Yangqiu Song, Fei Wang, and Changshui Zhang

420 Exploiting Structured Reference Data for Unsupervised Text Segmentation with Conditional Random Fields
Chang Zhao, Jalal Mahmud, and I.V. Ramakrishnan

432 Graph Mining with Variational Dirichlet Process Mixture Models
Koji Tsuda and Kenichi Kurihara

443 Direct Density Ratio Estimation for Large-scale Covariate Shift Adaptation
Yuta Tsuboi, Hisashi Kashima, Shohei Hido, Steffen Bickel, and Masahi Sugiyama

455 ROC-tree: A Novel Decision Tree Induction Algorithm Based on Receiver Operating Characteristics to Classify Gene Expression Data
M. Maruf Hossain, Md. Rafiul Hassan, and James Bailey

466 Semi-supervised Learning of a Markovian Metric
Avleen S. Bijral, Manuel E. Lladser, and Gregory Grudic

472 Mining Abnormal Patterns from Heterogeneous Time-Series with Irrelevant Features for Fault Event Detection
Ryohei Fujimaki, Takayuki Nakata, Hidenori Tsukahara, and Akinori Sato

483 Outlier Detection with Uncertain Data
Charu C. Aggarwal and Philip S. Yu

494 Randomization of real-valued matrices for assessing the significance of data mining results
Markus Ojala, Niko Vuokko, Aleksi Kallio, Niina Haiminen, and Heikki Mannila

506 Theorectical Analysis of Subsequence Time-Series Clustering from a Frequency-Analysis Viewpoint
Ryohei Fujimaki, Shunsuke Hirose, and Takayuki Nakata

518 Active Learning with Model Selection in Linear Regression
Masashi Sugiyama and Neil Rubens

530 A Feature Selection Algorithm Capable of Handling Extremely Large Data Dimensionality
Yijun Sun, Sinisa Todorovic, and Steve Goodison

541 Generic Methods for Multi-criteria Evaluation
Niklas Lavesson and Paul Davidsson

547 A New Method for Rule Finding Via Bootstrapped Confidence Intervals  
Norman Matloff

553 Mining and Ranking Generators of Sequential Patterns
David Lo, Siau-Cheng Khoo, and Jinyan Li

565 Type-Independent Correction of Sample Selection Bias via Structural Discovery and Re-balancing
Jiangtao Ren, Xiaoxiao Shi, Wei Fan, and Philip S. Yu

577 Exploration and Reduction of the Feature Space by Hierarchical Clustering
Dino Ienco and Rosa Meo

588 On the Dangers of Cross-Validation.  An Experimental Evaluation
R. Bharat Rao and Glenn Fung

597 Mining Complex, Maximal and Complete Sub-graph and Sets of Correlated Variables with Applications to Feature Subset Selection
Florian Verhein

609 Spatio-Temporal Partitioning for Improving Aerosol Prediction Accuracy
Vladan Radosavljevic, Slobodan Vucetic, and Zoran Obradovic

621 On Indexing High Demensional Data with Uncertainty
Charu C. Aggarwal and Philip S. Yu

632 Efficient Distribution Mining and Classification
Yashushi Sakurai and Rosalynn Chong

644 Mining Sequence Classifiers for Early Prediction
Zhengzheng Xing, Jian Pei, Guozhu Dong and Philip Yu

656 Exact and Approximate Reverse Nearest Neighbor Search for Multimedia Data
Jessica Lin, David Etter, and David DeBarr

668 Finding a Haystack in Haystacks – Simultaneous Identification of Concepts in Large Bio-Medical Corpora
Ying Liu, Lucian V. Lita, R. Stefan Niculescu. Prasenjit Mitra, and C. Lee Giles

680 Learning Markov Network Structure using Few Independence Tests
Parichey Gandhi, Facundo Bromberg, and Dimitris Margaritis

Graphs/Networks

692 Statistical Density Prediction in Traffic Networks
Hans-Peter Kriegel, Matthias Renz, Matthias Schubert, and Andreas Zuefle

704 Proximity Tracking on Time-Evolving Bipartite Graphs
Hanghang Tong, Spiros Papadimitriou, Philip S. Yu, and Christos Faloutsos

716 Integration of Multiple Networks for Robust Label Propagation
Tsuyoshi Kato, Hisashi Kashima, and Masashi Sugiyama

727 Spatial Scan Statistics for Graph Clustering
Bei Wang, Jeff M. Phillips, Robert Schreiber, and Dennis Wilkinson

739 Randomizing Social Networks: a Spectrum Preserving Approach
Xiaowei Ying and Xintao Wu

Clustering

751 Efficient Maximum Margin Clustering via Cutting Plane Algorithm
Bin Zhao, Fei Wang, Changshui Zhang

763 Robust Clustering in Arbitrarily Oriented Subspaces
Elke Achtert, Christian Böhm, Jörn David, Peer Kröger, and Arthur Zimek

775 The Relevant-set Correlation Model for Data Clustering
Michael E. Houle

787 Cluster Ensemble Selection
Xiaoli Z. Fern and Wei Lin

798 Weighted Consensus Clustering
Tao Li and Chris Ding

Unsupervised Learning

810 A general framework for estimating similarity of datasets and decision trees: exploring semantic similarity of decision trees
Irene Ntoutsi, Alexandros Kalousis, and Yannis Theordoridis

822 A General Model for Multiple View Unsupervised Learning
Bo Long, Philip S. Yu, and Zhongfei (Mark) Zhang

834 Unsupervised Segmentation of Conversational Transcripts
Krishna Kummamuru, Deepak P, Shourya Roy, and L. Venkata Subramaniam

846 Large-Scale Many-Class Learning
Omid Madani and Michael Connor

858 Simultaneous Unsupervised Learning of Disparate Clusterings
Prateek Jain, Raghu Meka, and Inderjit S. Dhillon

Renew SIAM · Contact Us · Site Map · Join SIAM · My Account
Facebook Twitter Flickr Youtube