Proceedings of the 2007 SIAM International Conference on Data Mining

Each link below is to a PDF of the paper as it was submitted. Papers are listed in program order. PDF file names represent the Proceedings (DM and year 07), followed by order in printed version (e.g. 001) and first author's last name and first initial..
Message from the Conference Co-Chairs
Long Papers
3 A General Framework for Mining
Concept-Drifting Data Streams with Skewed Distributions
Jing Gao, Wei Fan, Jiawei Han and Philip S. Yu
15 Fast Counting with AV-Space for
Efficient Rule Induction
Linyan Wang and Aijun An
27 Maximizing the Area under the ROC Curve with Decision Lists and Rule Sets
Henrik Bostrom
35 Maximum Margin Classifiers with Specified False Positive and False Negative Error Rates
J. Saketha Nath and C. Bhattacharyya
47 AC-Framework for Privacy-Preserving Collaboration
Wei Jiang and Chris Clifton
57 On Privacy-Preservation of Text and Sparse Binary Data with Sketches
Charu Aggarwal and Philip Yu
68 Preventing Information Leaks in Email
Vitor R. Carvalho and William W. Cohen
78 Towards Attack-Resilient Geometric Data Perturbation
Keke Chen, Gordon Sun, and Ling Liu
90 Adaptive Concept Learning through Clustering and Aggregation of Relational Data
Hichem Frigui and Cheul Hwang
102 RCMap: Efficiently Creating High-Quality Euclidean Embeddings
Arun Qamra and Edward Y. Chang
113 Active Learning of Constraints for Semi-supervised Text Clustering
Ruizhang Huang, Wai Lam and Zhigang Zhang
125 Mining Naturally Smooth Evolution of Clusters from Dynamic Data
Yi Wang, Shi-Xia Liu, Jianhua Feng, and Lizhu Zhou
135 Clustering by weighted cuts in directed graphs
Marina Meila and William Pentney
145 Multi-way Clustering on Relation Graphs
Arindam Banerjee, Sugato Basu and Srujana Merugu
157 Fast Multilevel Transduction on Graphs
Fei Wang and Changshui Zhang
169 Conical Dimension as an Intrinsic Dimension Estimator and its Applications
Xin Yang, Sebastien Michea and Hongyuan Zha
180 Nonlinear Dimensionality Reduction using Approximate Nearest Neighbors
Erion Plaku and Lydia Kavraki
192 On Point Sampling Versus Space Sampling for Dimensionality Reduction
Charu Aggarwal
204 An Analysis of Logistic Models: Exponential Family Connections and Online Performance
Arindam Banerjee
216 Bandits for Taxonomies: A Model-based Approach
Sandeep Pandey, Deepak Agarwal, Deepayan Chakrabarti and Vanja Josifovski
228 Boosting Optimal Logical Patterns Using Noisy Data
Noam Goldberg and Chung-chieh Shan
237 Constraint-Based Pattern Set Mining
Luc De Raedt and Albrecht Zimmermann
249 Finding Motifs in a Database of Shapes
Xiaopeng Xi, Eamonn Keogh, Li Wei and Agenor Mafra-Neto
261 Incremental Spectral Clustering With Application to Monitoring of Evolving Blog Communities
Huazhong Ning, Wei Xu, Yun Chi, Yihong Gong and Thomas Huang
273 ROAM: Rule- and Motif-Based Anomaly Detection in Massive Moving Object Data Sets
Xiaolei Li, Jiawei Han, Sangkyum Kim and Hector Gonzalez
285 Segmentations with Rearrangements
Aristides Gionis and Evimaria Terzi
297 Efficient Multiclass Boosting Classification with Active Learning
Jian Huang, Seyda Ertekin, Yang Song, Hongyuan Zha and C. Lee Giles
309 Kernel Based Detection of Mislabeled Training Examples
Hamed Valizadegan and Pang-Ning Tan
320 On Sample Selection Bias and Its Efficient Correction via Model Averaging and Unlabeled Examples
Wei Fan and Ian Davidson
332 Probabilistic Joint Feature Selection for Multi-task Learning
Tao Xiong, Jinbo Bi, Bharat Rao and Vladimir Cherkassky
343 Fast Newton-type Methods for the Least Squares Nonnegative Matrix Approximation Problem
Dongmin Kim, Suvrit Sra and Inderjit S. Dhillon
355 Higher Order Orthogonal Iteration of Tensors (HOOI) and its Relation to PCA and GLRAM
Benard N. Sheehan and Yousef Saad
366 Less is More: Compact Matrix Decomposition for Large Sparse Graphs
Jimeng Sun, Yinglian Xie, Hui Zhang and Christos Faloutsos
378 Harmonium Models for Semantic Video Representation and Classification
Jun Yang, Yan Liu, Eric P. Xing and Alexander G. Hauptmann
390 Identifying Bundles of Product Options using Mutual Information Clustering
Claudia Perlich and Saharon Rosset
398 Lattice based Clustering of Temporal Gene-Expression Matrices
Yang Huang and Martin Farach-Colton
Short Papers
413 Robust, Complete, and Efficient Correlation Clustering
Elke Achtert, Christian Böhm, Hans-Peter Kriegel, Peer Kröger and Arthur Zimek
419 On Anonymization of String Data
Charu Aggarwal and Philip Yu
425 Load Shedding in Classifying Multi-Source Streaming Data: A Bayes Risk Approach
Yijian Bai, Haixun Wang and Carlo Zaniolo
431 Topic Models over Text Streams: A Study of Batch and Online Unsupervised Learning
Arindam Banerjee and Sugato Basu
437 Are approximation algorithms for consensus clustering worthwhile?
Michael Bertolacci and Anthony Wirth
443 Learning from Time-Changing Data with Adaptive Windowing
Albert Bifet and Ricard Gavaldà
449 WAT: Finding Top-K Discords in Time Series Database
Yingyi Bu, Tat-Wing Leung, Ada Wai-Chee Fu, Eamonn Keogh, Jian Pei and Sam Meshkin
455 A PAC Bound for Approximate Support Vector Machines
Dongwei Cao and Daniel Boley
461 Localized Support Vector Machine and Its Efficient Algorithm
Haibin Cheng, Pang-Ning Tan and Rong Jin
467 Understanding and Utilizing the Hierarchy of Abnormal BGP Events
Dejing Dou, Jun Li, Han Qin, Shiwoong Kim and Sheng Zhong
473 Distributed Top-K Outlier Detection from Astronomy Catalogs using the DEMAC System
Haimonti Dutta, Chris Giannella, Kirk Borne and Hillol Kargupta
479 Mining Visual and Textual Data for Constructing a Multi-Modal Thesaurus
Hichem Frigui and Joshua Caudill
485 HP2PC: Scalable Hierarchically-Distributed Peer-to-Peer Clustering
Khaled Hammouda and Mohamed Kamel
491 Bursty Feature Representation for Clustering Text Streams
Qi He, Kuiyu Chang, Ee-Peng Lim and Jun Zhang
497 Flexible Anonymization For Privacy Preserving Data Publishing: A Systematic Search Based Approach
Bijit Hore, Ravi Chandra Jammalamadaka and Sharad Mehrotra
503 A System for Keyword Search on Textual Streams
Vagelis Hristidis, Oscar Valdivia, Michail Vlachos and Philip S. Yu
509 Co-Preserving Patterns in Bipartite Partitioning for Topic Identification
Tianming Hu, Hui Xiong, Sam Yuan Sung
515 Change-Point Detection using Krylov Subspace Learning
Tsuyoshi Ide and Koji Tsuda
521 Approximating Representations for Large Numerical Databases
Szymon Jaroszewicz and Marcin Korzen
527 Distance Preserving Dimension Reduction for Manifold Learning
Hyunsoo Kim, Haesun Park and Hongyuan Zha
533 Stacked Graphical Models for Efficient Inference in Markov Random Fields
Zhenzhen Kou and William W. Cohen
539 Summarizing Review Scores of "Unequal'' Reviewers
Hady W. Lauw, Ee-Peng Lim and Ke Wang
545 A Better Alternative to Piecewise Linear Time Series Segmentation
Daniel Lemire
551 Patterns of Cascading Behavior in Large Blog Graphs
Jure Leskovec, Mary McGlohon, Christos Faloutsos, Natalie Glance and Matthew Hurst
557 PoClustering: Lossless Clustering of Dissimilarity Data
Jinze Liu, Qi Zhang, Wei Wang, Leonard McMillan and Jan Prins
563 An incremental data-stream sketch using sparse random projections
Aditya Krishna Menon, Gia Vinh Anh Pham, Sanjay Chawla and Anastasios Viglas
569 Performance of Recommendation Systems in Dynamic Streaming Environments
Olfa Nasraoui, Jeff Cerwinske, Carlos Rojas and Fabio Gonzalez
575 Scalable Name Disambiguation using Multi-level Graph Partition
Byung-Won On and Dongwon Lee
581 Dynamic Algorithm for Graph Clustering Using Minimum Cut Tree
Barna Saha and Pabitra Mitra
587 Rank Aggregation for Similar Items
D. Sculley
593 Sketching Landscapes of Page Farms
Bin Zhou and Jian Pei
599 Estimating False Negatives for Classification Problems with Cluster Structure
Gyorgy J. Simon, Vipin Kumar, and Zhi-Li Zhang
605 Discriminating Subsequence Discovery for Sequence Clustering
Jianyong Wang, Yuzhou Zhang, Lizhu Zhou, George Karypis and Charu Aggarwal
611 Fast Best-Match Shape Searching in Rotation Invariant Metric Spaces
Dragomir Yankov, Eamonn Keogh, Li Wei, Xiaopeng Xi and Wendy Hodges
617 HACS: Heuristic Algorithm for Clustering Subsets
Ding Yuan and Nick Street
623 On Demand Phenotype Ranking through Subspace Clustering
Xiang Zhang, Wei Wang and Jun Huan
629 Semi-Supervised Dimensionality Reduction
Daoqiang Zhang, Zhi-Hua Zhou and Songcan Chen
635 Computing Statistical Profiles of Active Sites in Proteins
Chang Zhao, Jalal Mahmud, I.V. Ramakrishnan and Subramanyam Swaminathan
641 Semi-supervised Feature Selection via Spectral Analysis
Zheng Zhao and Huan Liu
