Proceedings of the 2006 SIAM International Conference on Data Mining

Each link below is to a PDF of the paper as it was submitted. Papers are listed in program order. PDF file names represent the Proceedings (DM and year 06), followed by order in printed version (e.g. 001) and first author's last name and first initial..

1 Area Under ROC Optimisation using a Ramp Approximation
Alan Herschtal, Bhavani Raskutti, Peter K. Campbell

12 On the Necessary and Sufficient Conditions of a Meaningful Distance Function for High Dimensional Data Space
Chih-Ming Hsu, Ming-Syan Chen

24 CPM: A Covariance-preserving Projection Method
Jieping Ye, Tao Xiong, Ravi Janardan

35 Transform Regression and the Kolmogorov Superposition Theorem
Edwin Pednault

47 A Latent Dirichlet Model for Unsupervised Entity Resolution
Indrajit Bhattacharya, Lise Getoor

59 Deriving Private Information from Randomly Perturbed Ratings
Sheng Zhang, James Ford, and Fillia Makedon

70 Name Reference Resolution in Organizational Email Archives
Christopher P. Diehl, Lise Getoor, Galileo Namataz

82 Automated Knowledge Discovery from Simulators
M.C. Burl, D. DeCoste, B.L. Enke, D. Mazzoni, W.J. Merline, L. Scharenbroich

94 Mining for Outliers in Sequential Databases
Pei Sun, Sanjay Chawla, Bavani Arunasalam

106 Mining Control Flow Abnormality for Logic Error Isolation
Chao Liu, Xifeng Yan, Jiawei Han

118 Scan Detection: A Data Mining Approach
György J. Simon, Hui Xiong, Eric Eilertson, Vipin Kumar

130 Learning Bayesian Networks from Incomplete Data: An Efficient Method for Generating Approximate Predictive Distributions
Carsten Riggelsen

141 Efficient Markov Network Structure Discovery using Independence Tests
Facundo Bromberg, Dimitris Margaritis, Vasant Honavar

153 K-Means Clustering Over a Large, Dynamic Network
Souptik Datta, Chris Giannella, Hillol Kargupta

165 Adapting K-Medians to Generate Normalized Cluster Centers
Benjamin J. Anderson, Deborah S. Gross, David R. Musicant, Anna M. Ritz, Thomas G. Smith, Leah E. Steinberg

176 Advanced Prototype Machines: Exploring Prototypes for Classification
Hans-Peter Kriegel Matthias Schubert

188 Toward Semantic XML Clustering
Andrea Tagarelli, Sergio Greco

200 A Semantic Approach for Mining Hidden Links from Complementary and Non-interactive Biomedical Literature
Xiaohua Hu, Xiaodan Zhang, Illhoi Yoo, Yanqing Zhang

210 Representation is Everything: Towards Efficient and Adaptable Similarity Measures for Biological Data
Charu C. Aggarwal

222 Mining Frequent Agreement Subtrees in Phylogenetic Databases
Sen Zhang and Jason T. L. Wang

234 Trend Relational Analysis and Grey-Fuzzy Clustering Method
Zhijie Chen, Weizhen Chen, Qile Chen and Mian-Yun Chen

246 Joint Cluster Analysis of Attribute Data and Relationship Data: the Connected k-Center Problem
Martin Ester, Rong Ge, Byron J. Gao, Zengjian Hu, Boaz Ben-Moshe

258 Weighted Clustering Ensembles
Muna Al-Razgan, Carlotta Domeniconi

270 Clustering in the Presence of Bridge-Nodes
Jerry Scripps and Pang-Ning Tan

282 Mining Interesting Patterns from Very High Dimensional Data: A Top-Down Row Enumeration Approach
Hongyan Liu, Jiawei Han, Dong Xin, Zheng Shao

294 Mining Frequent Patterns by Differential Refinement of Clustered Bitmaps
Jianwei Li, Alok Choudhary, Nan Jiang, Wei-keng Liao

306 Discovery of Co-evoluting Spatial Co-located Event Sets
Jin Soung Yoo, Shashi Shekhar, Sangho Kim, Mete Celik

316 Efficient Algorithms for Sequence Segmentation
Evimaria Terzi, Panayiotis Tsaparas

328 Density-Based Clustering over an Evolving Data Stream with Noise
Feng Cao, Martin Ester, Weining Qian, Aoying Zhou

340 A Random Walks Method for Text Classification
Yunpeng Xu, Xing Yi, Changshui Zhang

348 Efficient Mining of Temporally Annotated Sequences
Fosca Giannotti, Mirco Nanni, Dino Pedreschi

360 A Framework for Local Supervised Dimensionality Reduction of High Dimensional Data
Charu C. Aggarwal

372 Segmentation and dimensionality reduction
Ella Bingham, Aristides Gionis, Niina Haiminen, Heli Hiisilä, Heikki Mannila and Evimaria Terzi

384 Probabilistic Multi-State Split-Merge Algorithm for Coupling Parameter Estimates
Juan K. Lin

395 Item Sets that Compress
Arno Siebes, Jilles Vreeken, Matthijs van Leeuwen

407 Mining Approximate Frequent Itemsets In the Presence of Noise: Algorithm and Analysis
Jinze Liu, Susan Paulsen, Xing Sun, Wei Wang, Andrew Nobel, Jan Prins

419 Mining frequent closed itemsets out-of-core
Claudio Lucchese, Salvatore Orlando, Raffaele Perego

430 Local L2-Thresholding Based Data Mining in Peer-to-Peer Systems
Ran Wolff, Kanishka Bhaduriy, Hillol Karguptaz

442 Collaborative Information Extraction and Mining from Multiple Web Documents
Tak-Lam Wong, Wai Lam, Shing-Kit Chan

453 Collaborative Document Clustering
Khaled Hammouda, Mohamed Kamel

464 Cluster Description Formats, Problems and Algorithms
Byron J. Gao and Martin Ester

469 Positive Borders or Negative Borders: How to Make Lossless Generator Based Representations Concise
Guimei Liu, Jinyan Li, Limsoon Wong, Wynne Hsu

474 Bayesian K-Means as a "Maximization-Expectation" Algorithm
Max Welling, Kenichi Kurihara

479 A Framework for Clustering Massive Text and Categorical Data Streams
Charu C. Aggarwal and Philip S. Yu

484 Cone Cluster Labeling for Support Vector Clustering
Sei-Hyung Lee and Karen M. Daniels

489 Semi-Supervised Clustering with Partial Background Information
Jing Gao, Pang-Ning Tany, Haibin Cheng

494 A New Privacy-Preserving Distributed k-Clustering Algorithm
Geetha Jagannathan, Krishnan Pillaipakkamnatt, Rebecca N. Wright

499 ODAC: Hierarchical Clustering of Time Series Data Streams
Pedro Pereira Rodrigues, João Gama, João Pedro Pedroso

504 Detecting the Change of Clustering Structure in Categorical Data Streams
Keke Chen, Ling Liu

509 Dissimilarity Measures for Detecting Hepatotoxicity in Clinical Trial Data
Matthew Eric Otey, Srinivasan Parthasarathy, Donald C. Trost

514 Transductive De-Noising and Dimensionality Reduction using Total Bregman Regression
Sreangsu Acharyya

519 Robust Estimation for Mixture of Probability Tables based on beta-likelihood
Yu Fujimoto and Noboru Murata

524 Fast optimal bandwidth selection for kernel density estimation
Vikas Chandrakant Raykar, Ramani Duraiswami

529 Risk-Sensitive Learning via Expected Shortfall Minimization
Hisashi Kashima

534 On Approximate Solutions to Support Vector Machines
Dongwei Cao, Daniel Boley

539 Confidence Estimation Methods for Partially Supervised Information Extraction
Eugene Agichtein

544 Inference of Node Replacement Recursive Graph Grammars
Jacek P. Kukluk, Lawrence B. Holder, Diane J. Cook

549 Learning from Incomplete Ratings Using Non-negative Matrix Factorization
Sheng Zhang, Weihong Wang, James Ford, Fillia Makedon

554 Health monitoring of a shaft transmission system via hybrid models of PCR and PLS
Yi Fang, Hyun-Woo Cho, Myong Kee Jeong

559 Modeling Evolutionary Behaviors for Community-based Dynamic Recommendation
Xiaodan Song, Ching-Yung Lin, Belle L. Tseng, Ming-Ting Sun

564 A Systematic Cross-Comparison of Sequence Classifiers
Binyamin Rozenfeld, Ronen Feldman, Moshe Fresko

569 Data-Enhanced Predictive Modeling for Sales Targeting
Saharon Rosset, Richard D. Lawrence

574 Graph-based Methods for Orbit Classification
Abraham Bagherjeiran, Chandrika Kamath

579 Mining and Validating Localized Frequent Itemsets with Dynamic Tolerance
Olfa Nasraoui, Suchandra Goswami 

584 Profiling Protein Families from Partially Aligned Sequences
Saikat Mukherjee, Chang Zhao, I.V. Ramakrishnan

589 Personalized Knowledge Discovery: Mining Novel Association Rules from Text
Xin Chen, Yi-Fang Wu

594 A Novel Framework for Incorporating Labeled Examples into Anomaly Detection
Jing Gao, Haibin Chengy, Pang-Ning Tan

599 Towards the Prediction of Protein Abundance from Tandem Mass Spectrometry Data
Anthony J. Bonner, Han Liu

604 Using Compression to Identify Classes of Inauthentic Texts
Mehmet M. Dalkilic, Wyatt T. Clark, James C. Costello, Predrag Radivojac

609 Fast Mining of Distance-Based Outliers in High Dimensional Datasets
Amol Ghoting, Srinivasan Parthasarathy, and Matthew Eric Otey

614 Spatial Weighted Outlier Detection
Yufeng Kou, Chang-Tien Lu, Dechang Chen

619 Robust Clustering for Tracking Noisy Evolving Data Streams
Olfa Nasraoui, Carlos Rojas

624 WIP: mining Weighted Interesting Patterns with a strong weight and/or support affinity
Unil Yun and John J. Leggett

629 Discovering Frequent Tree Patterns over Data Streams
Mark Cheng-Enn Hsieh, Yi-Hung Wu, Arbee L.P. Chen

634 Finding Sequential Patterns from a Massive Number of Spatio-Temporal Events
Yan Huang, Liqin Zhang, and Pusheng Zhang

639 Mining Minimal Contrast Subgraph Patterns
Roger Ming Hieng Ting, James Bailey 

Renew SIAM · Contact Us · Site Map · Join SIAM · My Account
Facebook Twitter Flickr Youtube