Proceedings of the Ninth SIAM International Conference on Data Mining

Each link below is to a PDF of the paper as it was submitted. Papers are listed in program order. PDF file names represent the Proceedings (DM and year 09), followed by order of appearance (e.g. 001) and first author's last name and first initial..
Preface, Message from the Conference Co-Chair Acknowledgments
Sessions:
S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11
2 GAD:
General Activity Detection for Fast Clustering on Large Data
Xin
Jin, Sangkyum Kim, Jiawei Han, Liangliang Cao, and Zhijun Yin
14 CORE:
Nonparametric Clustering of Large Numeric Databases
Andrej
Taliun, Michael H. Böhlen, and Arturas Mazeika
26 Constraint-Based
Subspace Clustering
Elisa
Fromont, Adriana Prado, and Céline Robardet
38 Integrated
KL (K-means – Laplacian) Clustering: A New Clustering Approach by
Combining Attribute Data and Pairwise Relations
Fei
Wang, Chris Ding, and Tao Li
49 Hybrid
Clustering of Text Mining and Bibliometrics Applied to Journal Sets
Xinhai
Liu, Shi Yu, Yves Moreau, Bart De Moor, Wolfgang Glänzel, and Frizo Janssens
61 Event
Discovery in Time Series
Dan
Preston, Pavlos Protopapas, and Carla Brodley
73 FuncICA
for Time Series Pattern Discovery
Nishant
Mehta and Alexander Gray
85 Autocannibalistic and Anyspace Indexing Algorithms with Application to Sensor Data Mining
Lexiang
Ye, Xiaoyue Wang, Eamonn Keogh, and Agenor Mafra-Neto
97 Proximity-Based
Anomaly Detection Using Sparse Structure Learning
Tsuyoshi
Idé, Aurelie C. Lozano, Naoki Abe, and Yan Liu
109 Optimal
Distance Bounds on Time-Series Data
Michail
Vlachos, Suleyman S. Kozat, and Philip S. Yu
Session S3: Statistical Methods and Applications
121 Application
of Bayesian Partition Models in Warranty Data Analysis
Markus
Mueller, Christoph Schlieder, and Axel Blumenstock
133 Learning
Random-Walk Kernels for Protein Remote Homology Identification and Motif
Discovery
Renqiang
Min, Rui Kuang, Anthony Bonner, and Zhaolei Zhang
145 Outlier
Detection with Globally Optimal Exemplar-Based GMM
Xingwei
Yang, Longin Jan Latecki, and Dragoljub Pokrajac
155 Prior-Free
Rare Category Detection
Jingrui
He and Jaime Carbonell
164 A
Family of Large Margin Linear Classifiers and Its Application in Dynamic
Environments
Jianqiang
Shen and Thomas G. Dietterich
Session S4: Unsupervised Learning and Clustering
173 DensEst:
Density Estimation for Data Mining in High Dimensional Spaces
Emmanuel
Müller, Ira Assent, Ralph Krieger, Stephan Günnemann, and Thomas Seidl
185 A
Framework for Exploring Categorical Data
Varun
Chandola, Shyam Boriah, and Vipin Kumar
197 Discovering
Substantial Distinctions among Incremental Bi-Clusters
Faris
Alqadah and Raj Bhatnagar
209 Bayesian
Cluster Ensembles
Hongjun
Wang, Hanhuai Shan, and Arindam Banerjee
221 Agglomerative
Mean-Shift Clustering via Query Set Compression
Xiao-Tong
Yuan, Bao-Gang Hu, and Ran He
Session S5: Data Stream Mining
233 Adaptive
Concept Drift Detection
Anton
Dries and Ulrich Rückert
245 Scalable
Distributed Change Detection from Astronomy Data Streams Using Local, Asynchronous
Eigen Monitoring Algorithms
Kamalika
Das, Kanishka Bhaduri, Sugandha Arora, Wesley Griffin, Kirk Borne, Chris Giannella,
and Hillol Kargupta
257 Positive
Unlabeled Learning for Data Stream Classification
Xiao-Li
Li, Philip S. Yu, Bing Liu, and See-Kiong Ng
269 Time-Decayed
Correlated Aggregates over Data Streams
Graham
Cormode, Srikanta Tirthapura, and Bojian Xu
281 Multi-Modal
Hierarchical Dirichlet Process Model for Predicting Image Annotation and
Image-Object Label Correspondence
Oksana
Yakhnenko and Vasant Honavar
295 A
Bayesian Approach to Graphy Regression with Relevant Subgraph Selection
Silvia
Chiappa, Hiroto Saigo, and Koji Tsuda
305 A
Hybrid Data Mining Metaheuristic for the p-Median Problem
Alexandre
Plastino, Erick R. Fonseca, Richard Fuchshuber, Simone de L. Martins, Alex
A. Freitas, Martino Luis, and Said Salhi
317 A
New Constraint for Mining Sets in Sequences
Boris
Cule, Bart Goethals, and Céline Robardet
329 A
Re-evaluation of the Over-Searching Phenomenon in Inductive Rule Learning
Frederik
Janssen and Johannes Fürnkranz
341 A
Semi-Supervised Framework for Feature Mapping and Multiclass Classification
Bo Chen,
Wai Lam, Ivor Tsang, and Tak-Lam Wong
353 Aligned
Graph Classification with Regularized Logistic Regression
Brian
Quanz and Jun Huan
365 An
Entity Based Model for Coreference Resolution
Michael
Wick, Aron Culitta, Khashayar Rohanimanesh, and Andrew McCallum
377 Analyses
for Service Interaction Networks with Applications to Service Delivery
S. Kameshwaran,
Sameep Mehta, Vinayaka Pandit, Gyana Parija, Sudhanshu Singh, and
N. Viswanadham
389 Change-Point
Detection in Time-Series Data by Direct Density-Ratio Estimation
Yoshinobu
Kawahara and Masashi Sugiyama
401 Context
Aware Trace Clustering: Towards Improving Process Mining Results
R. P.
Jagadeesh Chandra Bose and Wil M. P. van der Aalst
413 Detection
and Characterization of Anomalies in Multivariate Time Series
Haibin
Cheng, Pang-Ning Tan, Christopher Potter and Steven Klooster
425 Discovery
of Geospatial Discriminating Patterns from Remote Sensing Datasets
Wei
Ding, Tomasz Stepinski, and Josue Salazar
437 Diversity-Based
Weighting Schemes for Clustering Ensembles
Francesco
Gullo, Andrea Tagarelli, and Sergio Greco
449 Divide
and Conquer Strategies for Effective Information Retrieval
Jie
Chen and Yousef Saad
461 Speeding
Up Secure Computations via Embedded Caching
K. Zhai,
W. K. Ng, A. R. Herianto, and S. Han
473 Exact
Discovery of Time Series Motifs
Abdullah
Mueen, Eamonn Keogh, Qiang Zhu, Sydney Cash, and Brandon Westover
485 Exploiting
Semantic Constraints for Estimating Supersenses with CRFs
Gerhard
Paaβ and Frank Reichartz
497 Feature
Weighted SVMs Using Receiver Operating Characteristics
Shaoyi
Zhang, M. Maruf Hossain, Md. Rafiul Hassan, James Bailey, and Kotagiri Ramamohanarao
509 FEDRA:
A Fast and Efficient Dimensionality Reduction Algorithm
Panagis
Magdalinos, Christos Doulkeridis, and Michalis Vazirgiannis
521 Finding
Representative Association Rules from Large Rule Collections
Warren
L. Davis IV, Peter Schwarz, and Evimaria Terzi
533 FutureRank:
Ranking Scientific Articles by Predicting their Future PageRank
Hassan
Sayyadi and Lise Getoor
545 Highlighting
Diverse Concepts in Documents
Kun
Liu, Evimaria Terzi, and Tyrone Grandison
557 Identifying
Information-Rich Subspace Trends in High-Dimensional Data
Snehal
Pokharkar and Chandan K. Reddy
569 Low-Entropy
Set Selection
Hannes
Heikinheimo, Jilles Vreeken, Arno Siebes, and Heikki Mannila
581 Measuring
Discrimination in Socially-Sensitive Decision Records
Dino
Pedreschi, Salvatore Ruggieri, and Franco Turini
593 Mining
Cohesive Patterns from Graphs with Feature Vectors
Flavia
Moser, Recep Colak, Arash Rafiey, and Martin Ester
605 Mining
Complex Spatio-Temporal Sequence Patterns
Florian
Verhein
617 Mining
for Surprise Events Within Text Streams
Paul
Whitney, Dave Engel, and Nick Cramer
628 Multi-field
Correlated Topic Modeling
Konstantin
Salomatin, Yiming Yang, and Abhimanyu Lad
638 Multiple
Kernel Clustering
Bin
Zhao, James T. Kwok, and Changshui Zhang
650 MUSK:
Uniform Sampling of k Maximal Patterns
Mohammad
Al Hasan and Mohammed Zaki
662 Noise
Robust Classification Based on Spread Spectrum
Joern
David
673 Non-negative
Matrix Factorization, Convexity and Isometry
Nikolaos
Vasiloglou, Alexander G. Gray, and David V. Anderson
685 Non-parametric
Information-Theoretic Measures of One-Dimensional Distribution Functions
from Continuous Time Series
Paolo
D’Alberto and Ali Dasdan
697 On
Maximum Coverage in the Streaming Model & Application to Multi-topic
Blog-Watch
Barna
Saha and Lise Getoor
709 On
Randomness Measures for Social Networks
Xiaowei
Ying and Xintao Wu
721 On
Segment-Based Stream Modeling and Its Applications
Charu
C. Aggarwal
733 On
the Comparison of Relative Clustering Validity Criteria
Lucas
Vendramin, Ricardo J. G. B. Campello, and Eduardo R. Hruschka
745 Parallel
Pairwise Clustering
Elad
Yom-Tov and Noam Slonim
756 PICC
Counting: Who Needs Joins When You Can Propagate Efficiently?
Jong
Wook Kim and K. Selçuk Candan
768 Providing
Privacy through Plausibly Deniable Search
Mummoorthy
Murugesan and Chris Clifton
780 Randomization
Techniques for Graphs
Sami
Hanhijärvi, Gemma C. Garriga, and Kai Puolamäki
792 Semi-supervised
Learning by Sparse Representation
Shuicheng
Yan and Huan Wang
802 ShatterPlots:
Fast Tools for Mining Large Graphs
Ana
Paula Appel, Deepayan Chakrabarti, Christos Faloutsos, Ravi Kumar, Jure Leskovec,
and Andrew Tomkins
814 Spatially
Cost-Sensitive Active Learning
Alexander
Liu, Goo Jun, and Joydeep Ghosh
826 Structure
and Dynamics of Research Collaboration in Computer Science
Christian
Bird, Earl Barr, and Andre Nash
838 Text
Categorization with All Substring Features
Daisuke
Okanohara and Jun’ichi Tsujii
847 The
Set Classification Problem and Solution Methods
Xia
Ning and George Karypis
859 Topic
Evolution in a Stream of Documents
André Gohr,
Alexander Hinneburg, René Schult, and Myra Spiliopoulou
871 Tracking
User Mobility to Detect Suspicious Behavior
Gaurav
Tandon and Philip K. Chan
Session S6: Supervised Learning
884 Toward
Optimal Ordering of Prediction Tasks
Abhimanyu
Lad, Yiming Yang, Rayid Ghani, and Bryan Kisiel
894 Hierarchical
Linear Discriminant Analysis for Beamforming
Jaegul
Choo, Barry L. Drake, and Haesun Park
906 Twin
Vector Machines for Online Learning on a Budget
Zhuang
Wang and Slobodan Vucetic
918 The
Metric Dilemma: Competence-Conscious Associative Classification
Adriano
Veloso, Mohammed Zaki, Wagner Meira Jr., and Marcos Gonçalves
Session S7: Privacy and Social Networks
930 AMORI:
A Metric-Based One Rule Inducer
Niklas
Lavesson and Paul Davidsson
942 Identifying
Unsafe Routes for Network-Based Trajectory Privacy
Aris
Gkoulalas-Divanis, Vassilios S. Verykios, and Mohamed F. Mokbel
954 Privacy
Preservation in Social Networks with Sensitive Edge Weights
Lian
Liu, Jie Wang, Jinze Liu, and Jun Zhang
966 Graph
Generation with Prescribed Feature Constraints
Xiaowei
Ying and Xintao Wu
978 Detecting
Communities in Social Networks Using Max-Min Modularity
Jiyang
Chen, Osmar R. Zaïane, and Randy Goebel
990 A
Bayesian Approach Toward Finding Communities and Their Evolutions in Dynamic
Social Networks
Tianbao
Yang, Yun Chi, Shenghuo Zhu, Yihong Gong, and Rong Jin
Session S8: Relational Mining and High Performance Learning
1002 Efficient
Discovery of Interesting Patterns Based on Strong Closedness
Mario
Boley, Tamás Horváth, and Stefan Wrobel
1014 Efficient
Computation of Partial-Support for Mining Interesting Itemsets
Ardian
Kristanto Poernomo and Vivekanand Gopalkrishnan
1026 Grammar
Mining
Siegfried
Nijssen and Luc De Raedt
1038 Top-k Correlative
Graph Mining
Yiping
Ke, James Cheng, and Jeffrey Xu Yu
1050 High Performance
Parallel/Distributed Biclustering Using Barycenter Heuristic
Arifa
Nisar, Waseem Ahmad, Wei-keng Liao, and Alok Choudhary
Session S9: Mining Graphs and Semi Structured Data
1063 MultiVis:
Content-Based Social Network Exploration through Multi-way Visual Analysis
Jimeng
Sun, Spiros Papadimitriou, Ching-Yung Lin, Nan Cao, Shixia Liu, and Weihong
Qian
1075 Near-optimal
Supervised Feature Selection among Frequent Subgraphs
Marisa
Thoma, Hong Cheng, Arthur Gretton, Jiawei Han, Hans-Peter Kriegel, Alex Smola,
Le Song, Philip S. Yu, Xifeng Yan, and Karsten Borgwardt
1087 Polynomial-Delay
and Polynomial-Space Algorithms for Mining Closed Sequences, Graphs, and
Pictures in Accessible Set Systems
Hiroki
Arimura and Takeaki Uno
1099 Link Propagation:
A Fast Semi-supervised Learning Algorithm for Link Prediction
Hisashi
Kashima, Tsuyoshi Kato, Yoshihiro Yamanishi, Masashi Sugiyama, and
Koji Tsuda
1111 Understanding
Importance of Collaborations in Co-authorship Networks: A Supportiveness
Analysis Approach
Yi Han,
Bin Zhou, Jian Pei, and Yan Jia
Session S10: Text Mining and Data Reduction
1123 Topic Cube:
Topic Modeling for OLAP on Multidimensional Text Databases
Duo
Zhang, Chengxiang Zhai, and Jiawei Han
1135 Local Relevance
Weighted Maximum Margin Criterion for Text Classification
Quanquan
Gu and Jie Zhou
1147 Multi-topic
Based Query-Oriented Summarization
Jie
Tang, Limin Yao, and Dewei Chen
1159 Straightforward
Feature Selection for Scalable Latent Semantic Indexing
Jun
Yan, Shuicheng Yan, Ning Liu, and Zheng Chen
1171 Parallel
Large Scale Feature Selection for Logistic Regression
Sameer
Singh, Jeremy Kubica, Scott Larsen, and Daria Sorokina
Session S11: Mining Spatio-Temporal Data and Efficient Learning
1183 Travel-Time
Prediction Using Gaussian Process Regression: A Trajectory-Based Approach
Tsuyoshi
Idé and Sei Kato
1195 Discretized
Spatio-Temporal Scan Window
Seyed
H. Mohammadi, Vandana P. Janeja, and Aryya Gangopadhyay
1207 Finding
Links and Initiators: A Graph-Reconstruction Problem
Heikki
Mannila and Evimaria Terzi
1218 Efficient
Multiplicative Updates for Support Vector Machines
Vamsi
K. Potluru, Sergey M. Plis, Morten Mørup, Vincent D. Calhoun, and Terran
Lane
1230 Efficient
Active Learning with Boosting
Zheng
Wang, Yangqiu Song, and Changshui Zha
