Proceedings of the 2008 SIAM International Conference on Data Mining

Each link below is to a PDF of the paper as it was submitted. Papers are listed in program order. PDF file names represent the Proceedings (DM and year 08), followed by order in printed version (e.g. 001) and first author's last name and first initial..
Message from the Conference Co-Chairs
Constraint-Based Clustering
1 Semi-Supervised
Clustering via Matrix Factorization
Fei Wang, Tao Li, and Changshui Zhang
13 Creating a Cluster Hierarchy under Constraints of a Partially Known Hierarch
Korinna
Bad and Andreas Nürnberger
25 Constrained Co-clustering of Gene Expression
Data
Ruggero G. Pensa and Jean-François Boulicaut
37 Data
Peeler: Contraint-Based Closed Pattern Mining in n–ary Relations
Loïc Cerf, Jérémy Besson, Céline Robardet, and Jean-François
Boulicaut
49 SpaRClus:
Spatial Relationship Pattern-Based Hierarchial Clustering
Sangkyum Kim, Xin Jin, and Jiawei Han
Pattern Mining
61 Mining
Tree Patterns with Almost Smallest Supertrees
Jeroen
De Knijf
72 Maximal
Quasi-Bicliques with Balanced Noise Tolerance: Concepts and Co-clustering Applications
Jinyan Li, Kelvin Sim, Giumei Liu, and Limsoon Wong
84 CISpan:
Comprehensive Incremental Mining Algorithms of Closed Sequential Patterns for
Multi-Versional Software Mining
Ding
Yuan, Kyuhyung Lee, Hong Cheng, Gopal Krishna, Zhenmin Li, Xiao Ma, Yuanyuan
Zhou, and Jiawei Han
96 Mining
Association Rules of Simple Conjunctive Queries
Bart Goethals, Wim Le Page, and Heikki Mannila
108 Discovering
Relational Items Sets Efficiently
Arne Koopman and Arno Siebes
Classification/Regression
120 A
Stagewise Least Square Loss Function for Classification
Shuang-Hong Yang and Bao-Gang Hu
132 Semi-Supervised
Learning Based on Semiparametric Regularization
Zhen Guo, Zhongfei (Mark) Zhang, Eric P. Xing, and Christos Faloutsos
143 Roughly
Balanced Bagging for Imbalanced Data
Shohei Hido and Hisashi Kashima
153 An Efficient Local
Algorithm for Distributed Multivariate Regression in Peer-to-Peer Networks
Kanishka Bhaduri and Hillol Kargupta
165 Aerosol Optical
Depth Prediction from Satellite Obsercations by Multiple Instance Regression
Zhuang Wang, Vladan Radosavljevic, Bo Han, Zoran Obradovic, and Slobodan Vucetic
Feature Selection/Statistical Methods
177 Feature
Selection with the logRatio Kernel
Julien Prados, Alexandros Kalousis, and Melanie Hilario
188 A
RELIEF Based Feature Extraction Algorithm
Yijun Sun and Dapeng Wu
196 Deterministic
Latent Variable Models and Their Pitfalls
Max Welling, Chaitanya Chemudugunta, and Nathan Sutter
208 Massive-Scale
Kernel Discriminant Analysis: Mining for Quasars
Ryan Riegel, Alexander Gray, Gordon Richards
219 Dynamic
Non-Parametric Mixture Models and the Recurrent Chinese Restaurant Process
: with Applications
to Evolutionary Clustering
Amr Ahmed and Eric Xing
Outliers/Privacy
231 Latent Variable
Mining with Its Applications to Anomalous Behavior Detection
Shunsuke Hirose and Kenji Yamanishi
243 Similarity Measures
for Categorical Data: A Comparative Evaluation
Shyam Boriah, Varun Chandola, and Vipin Kumar
255 Gaussian Process
Learning for Cyber-Attack Early Warning
Jian Zhang, Phillip Porras, and Johannes Ulrich
265 Practical Private
Computation and Zero-Knowledge Tools for Privacy-Preserving Distributed Data
Mining
Yitao Duan and John Canny
277 A Spamicity Approach
to Web Spam Detection
Bin Zhou, Jian Pei, and Zhaohui Tang
Poster Session
289 Semantic Smoothing
for Bayesian Text Classification with Small Training Data
Xiaohua Zhou, Xiadan Zhang, and Xiaohua Hu
301 Clustering from
Constraint Graphs
Ari Freund, Dan Pelleg, and Yossi Richter
313 Efficiently Mining
Closed Subsequences with Gap Constraints
Chun Li and Jianyong Wang
323 Semi-Supervised
Classification with Universum
Dan Zhang, Jingdong Wang, Fei Wang, and Changshui Zhang
334 Finding Subgroups
having Several Descriptions: Algorithms for Redescription Mining
Arianna
Gallo, Pauli Miettinen, and Heikki Mannila
346 The PageTrust Algorithm:
How to rank web pages when negative links are allowed?
Cristobald de Kerchove and Paul Van Dooren
353 A pattern mining
approach toward discovering generalized sequence signatures
Dietmar H. Dorr and Anne M. Denton
363 The Asymmetric Approximate
Anytime Join: A New Primitive with Applications to Data Mining
Lexisang Ye, Xiaoyue Wang, Dragomir Yankov, and Eamonn Keogh
375 Preemptive Measures
against Malicious Party in Privacy-Preserving Data Mining
Shuguo Han and Wee Koeng Ng
387 A Range Query Approach
for High Dimensional Euclidean Space Based on EDM Estimation
Kentarou Kido, Hiroshi Kuwajima, and Takashi Washio
399 A Bayesian Technique
for Estimating the Credibility of question Answerers
Byron Dom and Deepa Paranjpe
410 Semi-supervised
Multi-label Learning by Solving a Sylvester Equation
Gang Chen, Yangqiu Song, Fei Wang, and Changshui Zhang
420 Exploiting Structured
Reference Data for Unsupervised Text Segmentation with Conditional Random Fields
Chang Zhao, Jalal Mahmud, and I.V. Ramakrishnan
432 Graph Mining with
Variational Dirichlet Process Mixture Models
Koji Tsuda and Kenichi Kurihara
443 Direct Density Ratio
Estimation for Large-scale Covariate Shift Adaptation
Yuta Tsuboi, Hisashi Kashima, Shohei Hido, Steffen Bickel, and Masahi Sugiyama
455 ROC-tree: A Novel
Decision Tree Induction Algorithm Based on Receiver Operating Characteristics
to Classify Gene Expression Data
M. Maruf Hossain, Md. Rafiul Hassan, and James Bailey
466 Semi-supervised
Learning of a Markovian Metric
Avleen S. Bijral, Manuel E. Lladser, and Gregory Grudic
472 Mining Abnormal
Patterns from Heterogeneous Time-Series with Irrelevant Features for Fault
Event Detection
Ryohei Fujimaki, Takayuki Nakata, Hidenori Tsukahara, and Akinori Sato
483 Outlier Detection
with Uncertain Data
Charu C. Aggarwal and Philip S. Yu
494 Randomization of
real-valued matrices for assessing the significance of data mining results
Markus Ojala, Niko Vuokko, Aleksi Kallio, Niina Haiminen, and Heikki Mannila
506 Theorectical Analysis
of Subsequence Time-Series Clustering from a Frequency-Analysis Viewpoint
Ryohei Fujimaki, Shunsuke Hirose, and Takayuki Nakata
518 Active Learning
with Model Selection in Linear Regression
Masashi Sugiyama and Neil Rubens
530 A Feature Selection
Algorithm Capable of Handling Extremely Large Data Dimensionality
Yijun Sun, Sinisa Todorovic, and Steve Goodison
541 Generic Methods
for Multi-criteria Evaluation
Niklas Lavesson and Paul Davidsson
547 A New Method for
Rule Finding Via Bootstrapped Confidence Intervals
Norman Matloff
553 Mining and Ranking
Generators of Sequential Patterns
David Lo, Siau-Cheng Khoo, and Jinyan Li
565 Type-Independent
Correction of Sample Selection Bias via Structural Discovery and Re-balancing
Jiangtao Ren, Xiaoxiao Shi, Wei Fan, and Philip S. Yu
577 Exploration and
Reduction of the Feature Space by Hierarchical Clustering
Dino Ienco and Rosa Meo
588 On the Dangers of
Cross-Validation. An Experimental Evaluation
R. Bharat Rao and Glenn Fung
597 Mining Complex,
Maximal and Complete Sub-graph and Sets of Correlated Variables with Applications
to Feature Subset Selection
Florian Verhein
609 Spatio-Temporal
Partitioning for Improving Aerosol Prediction Accuracy
Vladan Radosavljevic, Slobodan Vucetic, and Zoran Obradovic
621 On Indexing High
Demensional Data with Uncertainty
Charu C. Aggarwal and Philip S. Yu
632 Efficient Distribution
Mining and Classification
Yashushi Sakurai and Rosalynn Chong
644 Mining Sequence
Classifiers for Early Prediction
Zhengzheng Xing, Jian Pei, Guozhu Dong and Philip Yu
656 Exact and Approximate
Reverse Nearest Neighbor Search for Multimedia Data
Jessica Lin, David Etter, and David DeBarr
668 Finding a Haystack
in Haystacks – Simultaneous Identification of Concepts in Large Bio-Medical
Corpora
Ying Liu, Lucian V. Lita, R. Stefan Niculescu. Prasenjit Mitra, and C. Lee
Giles
680 Learning Markov
Network Structure using Few Independence Tests
Parichey Gandhi, Facundo Bromberg, and Dimitris Margaritis
Graphs/Networks
692 Statistical Density
Prediction in Traffic Networks
Hans-Peter Kriegel, Matthias Renz, Matthias Schubert, and Andreas Zuefle
704 Proximity Tracking
on Time-Evolving Bipartite Graphs
Hanghang Tong, Spiros Papadimitriou, Philip S. Yu, and Christos Faloutsos
716 Integration of Multiple
Networks for Robust Label Propagation
Tsuyoshi Kato, Hisashi Kashima, and Masashi Sugiyama
727 Spatial Scan Statistics
for Graph Clustering
Bei Wang, Jeff M. Phillips, Robert Schreiber, and Dennis Wilkinson
739 Randomizing Social
Networks: a Spectrum Preserving Approach
Xiaowei Ying and Xintao Wu
Clustering
751 Efficient Maximum
Margin Clustering via Cutting Plane Algorithm
Bin Zhao, Fei Wang, Changshui Zhang
763 Robust Clustering
in Arbitrarily Oriented Subspaces
Elke Achtert, Christian Böhm, Jörn David, Peer Kröger, and Arthur
Zimek
775 The Relevant-set
Correlation Model for Data Clustering
Michael E. Houle
787 Cluster Ensemble
Selection
Xiaoli Z. Fern and Wei Lin
798 Weighted Consensus
Clustering
Tao Li and Chris Ding
Unsupervised Learning
810 A general framework
for estimating similarity of datasets and decision trees: exploring semantic
similarity of decision trees
Irene Ntoutsi, Alexandros Kalousis, and Yannis Theordoridis
822 A General Model
for Multiple View Unsupervised Learning
Bo Long, Philip S. Yu, and Zhongfei (Mark) Zhang
834 Unsupervised Segmentation
of Conversational Transcripts
Krishna Kummamuru, Deepak P, Shourya Roy, and L. Venkata Subramaniam
846 Large-Scale Many-Class
Learning
Omid
Madani and Michael Connor
858 Simultaneous Unsupervised
Learning of Disparate Clusterings
Prateek Jain, Raghu Meka, and Inderjit S. Dhillon
