Program
Wednesday, April 25, 2007
5:00PM – 7:00PM Registration Opens
Thursday, April 26, 2007
7:00AM – 7:30PM Registration
7:00AM – 5:30PM Internet Café
7:30AM – 8:00AM Continental Breakfast
8:00AM – 8:15AM Welcome Remarks
8:15AM – 9:30AM Invited Keynote
Machine
Learning and Analyzing Human Brain Activity
Tom M. Mitchell,
Carnegie Mellon University
Session Chair: Chid Apte
9:30AM – 10:00AM Coffee Break
10:00AM - 12:00PM Three parallel sessions S1, S2, S3
-----------------------------------------------------------------
S1. Classification (chair: Jaideep Srivastava)
Title: A General Framework for Mining Concept-Drifting
Data Streams with Skewed Distributions
Authors: Jing
Gao, Wei Fan, Jiawei Han and Philip S. Yu
Title: Fast Counting with AV-Space for Efficient Rule
Induction
Authors: Linyan Wang and Aijun An
Title: Maximizing the Area under the ROC Curve with Decision
Lists and Rule Sets
Authors: Henrik Bostrom
Title: Maximum Margin Classifiers with Specified False Positive
and False Negative Error Rates
Authors: J. Saketha Nath and C. Bhattacharyya
-----------------------------------------------------------------
S2. Theoretical Foundations (chair: Michael Berry)
Title: An Analysis of Logistic Models: Exponential
Family Connections and Online Performance
Authors: Arindam
Banerjee
Title: Bandits for Taxonomies: A Model-based Approach
Authors: Sandeep
Pandey, Deepak Agarwal, Deepayan Chakrabarti and Vanja Josifovski
Title: Boosting Optimal Logical Patterns Using Noisy
Data
Authors: Noam Goldberg and Chung-chieh Shan
Title: Constraint-Based Pattern Set Mining
Authors: Luc
De Raedt and Albrecht Zimmermann
-----------------------------------------------------------------
S3. Clustering
Title: Adaptive Concept Learning through Clustering
and Aggregation of Relational Data
Authors: Hichem
Frigui and Cheul Hwang
Title: RCMap: Efficiently Creating High-Quality Euclidean
Embeddings
Authors: Arun Qamra and Edward Chang
Title: Active Learning of Constraints for Semi-supervised
Text Clustering
Authors: Ruizhang Huang, Wai Lam and
Zhigang Zhang
Title: Mining Naturally Smooth Evolution of Clusters
from Dynamic Data
Authors: Yi Wang, Shi-Xia Liu, Jianhua
Feng, and Lizhu Zhou
-----------------------------------------------------------------
12:00PM -1:30PM Lunch Break on your own
1:30PM - 2:45PM Invited Keynote
Predictive
Learning via Rule Ensembles
Jerome H. Friedman, Stanford University
Session Chair: Vipin Kumar
2:45PM - 3:15PM Coffee Break
3:15PM - 4:45PM Two parallel sessions (S4 and S5) and Invited Session (IS)
-----------------------------------------------------------------
S4: Graphs (chair: Wei Wang)
Title: Clustering by weighted cuts in directed graphs
Authors: Marina
Meila and William Pentney
Title: Multi-way Clustering on Relation Graphs
Authors: Arindam
Banerjee, Sugato Basu and Srujana Merugu
Title: Fast Multilevel Transduction on Graphs
Authors: Fei
Wang and Changshui Zhang
-----------------------------------------------------------------
S5: Applications (chair: Hui Yang)
Title: Harmonium-Based Models for Semantic Video Representation and
Classification
Authors: Jun Yang, Yan Liu, Eric Xing and Alexander Hauptmann
Title: Identifying Bundles of Product Options using Mutual Information
Clustering
Authors: Claudia Perlich and Saharon Rosset
Title: Lattice based Clustering of Temporal Gene-Expression Matrices
Authors: Yang Huang and Martin Farach-Colton
-----------------------------------------------------------------
IS: Invited Session on Statistical Learning: Joe Verducci (chair)
A Large Margin Method for Semi-supervised Learning
Authors: Xiaotong Shen, Junhui Wang and Wei Pan
Improved Centroids Estimation for the Nearest Shrunken Centroid Classifier
Authors: Sijian Wang and Ji Zhu*
Classification with Reject Option
Authors: Radu Herbei
-----------------------------------------------------------------
4:45PM – 5:00PM Organizational Break
5:00PM – 6:20PM Poster Spotlights (Plenary) Chair: Dan Boley
- Robust, Complete, and Efficient Correlation Clustering - Elke Achtert, Christian Böhm, Hans-Peter Kriegel, Peer Kröger and Arthur Zimek
- On Anonymization of String Data - Charu Aggarwal and Philip Yu
- Load Shedding in Classifying Multi-Source Streaming Data: A Bayes Risk Approach - Yijian Bai, Haixun Wang and Carlo Zaniolo
- Topic Models over Text Streams: A Study of Batch and Online Unsupervised Learning - Arindam Banerjee and Sugato Basu
- Are Approximation Algorithms for Consensus Clustering Worthwhile? - Michael Bertolacci and Anthony Wirth
- Learning from Time-Changing Data with Adaptive Windowing - Albert Bifet and Ricard Gavaldà
- WAT: Finding Top-K Discords in Time Series Database - Yingyi Bu, Tat-Wing Leung, Ada Wai-Chee Fu, Eamonn Keogh, Jian Pei and Sam Meshkin
- A PAC Bound for Approximate Support Vector Machines - Dongwei Cao and Daniel Boley
- Localized Support Vector Machine and Its Efficient Algorithm - Haibin Cheng, Pang-Ning Tan and Rong Jin
- Understanding and Utilizing the Hierarchy of Abnormal BGP Events - Dejing Dou, Jun Li, Han Qin, Shiwoong Kim and Sheng Zhong
- Distributed Top-K Outlier Detection from Astronomy Catalogs using the DEMAC System - Haimonti Dutta, Chris Giannella, Kirk Borne and Hillol Kargupta
- Mining Visual and Textual Data for Constructing a Multi-Modal Thesaurus - Hichem Frigui and Joshua Caudill
- HP2PC: Scalable Hierarchically-Distributed Peer-to-Peer Clustering - Khaled Hammouda and Mohamed Kamel
- Bursty Feature Representation for Clustering Text Streams - Qi He, Kuiyu Chang, Ee-Peng Lim and Jun Zhang
- Flexible Anonymization For Privacy Preserving Data Publishing: A Systematic Search Based Approach - Bijit Hore, Ravi Chandra Jammalamadaka and Sharad Mehrotra
- A System for Keyword Search on Textual Streams - Vagelis Hristidis, Oscar Valdivia, Michail Vlachos and Philip S. Yu
- Co-Preserving Patterns in Bipartite Partitioning for Topic Identification - Tianming Hu, Hui Xiong, Sam Yuan Sung
- Change-Point Detection using Krylov Subspace Learning - Tsuyoshi Ide and Koji Tsuda
- Approximating Representations for Large Numerical Databases - Szymon Jaroszewicz and Marcin Korzen
- Distance Preserving Dimension Reduction for Manifold Learning - Hyunsoo Kim, Haesun Park and Hongyuan Zha
- Stacked Graphical Models for Efficient Inference in Markov Random Fields - Zhenzhen Kou and William W. Cohen
- Summarizing Review Scores of "Unequal'' Reviewers - Hady W. Lauw, Ee-Peng Lim and Ke Wang
- A Better Alternative to Piecewise Linear Time Series Segmentation - Daniel Lemire
- Patterns of Cascading Behavior in Large Blog Graphs - Jure Leskovec, Mary McGlohon, Christos Faloutsos, Natalie Glance and Matthew Hurst
- PoClustering: Lossless Clustering of Dissimilarity Data - Jinze Liu, Qi Zhang, Wei Wang, Leonard McMillan and Jan Prins
- An incremental data-stream sketch using sparse random projections - Aditya Krishna Menon, Gia Vinh Anh Pham, Sanjay Chawla and Anastasios Viglas
- Performance of Recommendation Systems in Dynamic Streaming Environments - Olfa Nasraoui, Jeff Cerwinske, Carlos Rojas and Fabio Gonzalez
- Scalable Name Disambiguation using Multi-level Graph Partition - Byung-Won On and Dongwon Lee
- Dynamic Algorithm for Graph Clustering Using Minimum Cut Tree - Barna Saha and Pabitra Mitra
- Rank Aggregation for Similar Items - D. Sculley
- Sketching Landscapes of Page Farms - Bin Zhou and Jian Pei
- Estimating False Negatives for Classification Problems with Cluster Structure - Gyorgy J. Simon, Vipin Kumar, and Zhi-Li Zhang
- Discriminating Subsequence Discovery for Sequence Clustering - Jianyong Wang, Yuzhou Zhang, Lizhu Zhou, George Karypis and Charu Aggarwal
- Fast Best-Match Shape Searching in Rotation Invariant Metric Spaces - Dragomir Yankov, Eamonn Keogh, Li Wei, Xiaopeng Xi and Wendy Hodges
- HACS: Heuristic Algorithm for Clustering Subsets - Ding Yuan and Nick Street
- On Demand Phenotype Ranking through Subspace Clustering - Xiang Zhang, Wei Wang and Jun Huan
- Semi-Supervised Dimensionality Reduction - Daoqiang Zhang, Zhi-Hua Zhou and Songcan Chen
- Computing Statistical Profiles of Active Sites in Proteins - Chang Zhao, Jalal Mahmud, I.V. Ramakrishnan and Subramanyam Swaminathan
- Semi-supervised Feature Selection via Spectral Analysis - Zheng Zhao and Huan Liu
6:30PM – 8:30PM Welcome Reception and Poster Session
Friday, April 27, 2007
7:00AM – 4:00PM Registration
7:00AM – 5:30PM Internet Café
7:30AM – 8:00AM Continental Breakfast
8:00AM – 8:15AM Announcements
8:15AM – 9:30AM Invited Keynote
Deep
Computing in Biology: Challenges and Progress
Dr. Ajay Royyuru,
IBM Research
Session Chair: David Skillicorn
9:30AM – 10:00AM Break
10:00AM - 12:00PM Three parallel sessions S6, S7, S8
-----------------------------------------------------------------
S6: Privacy and Security
Title: AC-Framework for Privacy-Preserving Collaboration
Authors: Wei Jiang and Chris Clifton
Title: On Privacy-Preservation of Text and Sparse Binary Data with
Sketches
Authors: Charu Aggarwal and Philip Yu
Title: Preventing Information Leaks in Email
Authors: Vitor Carvalho
and William Cohen
Title: Towards Attack-Resilient Geometric Data Perturbation
Authors: Keke Chen, Gordon Sun, and Ling Liu
-----------------------------------------------------------------
S7: Spatial and Temporal Mining (chair: Sanjay Chawla)
Title: Finding Motifs in Database of Shapes
Authors: Xiaopeng
Xi, Eamonn Keogh, Li Wei and Agenor Mafra-Neto
Title: Incremental Spectral Clustering With Application to
Monitoring of Evolving Blog Communities
Authors: Huazhong Ning, Wei Xu, Chi Yun, Yihong Gong and Thomas Huang
Title: ROAM: Rule- and Motif-Based Anomaly Detection
in Massive Moving Object Data Sets
Authors: Xiaolei
Li, Jiawei Han, Sangkyum Kim and Hector Gonzalez
Title: Segmentations with rearrangements
Authors: Aristides
Gionis and Evimaria Terzi
-----------------------------------------------------------------
S8: Learning (chair: Hui Xiong)
Title: Efficient Multiclass Boosting Classification
with Active Learning
Authors: Jian Huang, Seyda Ertekin,
Yang Song, Hongyuan Zha and C. Lee Giles
Title: Kernel-based Detection of Mislabeled Training
Examples
Authors: Hamed Valizadegan and Pang-Ning Tan
Title: On Sample Selection Bias and Its Efficient Correction
via Model Averaging and Unlabeled Examples
Authors: Wei Fan and Ian Davidson
Title: Probabilistic Joint Feature Selection for
Multi-task Learning
Authors: Tao Xiong, Jinbo Bi, Bharat
Rao and Vladimir Cherkassky
-----------------------------------------------------------------
12:00PM - 1:30PM Lunch Break on your own
1:30PM - 2:45PM Invited Keynote
The
Next Algorithmic and Theoretical Challenges for Search Engines
Corinna Cortes, Google Research
Session Chair: Srinivasan Parthasarthy
2:45PM - 3:15PM Coffee Break
2:55PM-4:55PM Invited CRM Tutorial
Data Analytics for Marketing Decision
Support
Presenters: Saharon Rosset (IBM) and Naoki Abe (IBM)
3:15PM - 4:45PM Two parallel sessions (S9 and S10)
-----------------------------------------------------------------
S9: Matrices and Tensors (chair: Arindam Banerjee)
Title: Fast Newton-type Methods for the Least Squares
Nonnegative Matrix Approximation Problem
Authors: Dongmin
Kim, Suvrit Sra and Inderjit Dhillon
Title: Higher Order Orthogonal Iteration of Tensors
(HOOI) and its Relation to PCA and GLRAM
Authors: Benard
Sheehan and Yousef Saad
Title: Less is More: Compact Matrix Decomposition
for Large Sparse Graphs
Authors: Jimeng Sun, Yinglian
Xie, Hui Zhang and Christos Faloutsos
-----------------------------------------------------------------
S10: Dimensionality (chair: Pang-Ning Tan)
Title: Conical Dimension as an Intrinsic Dimension
Estimator and its Applications
Authors: Xin Yang, Sebastien
Michea and Hongyuan Zha
Title: Nonlinear Dimensionality Reduction using Approximate
Nearest Neighbors
Authors: Erion Plaku and Lydia Kavraki
Title: On Point Sampling Versus Space Sampling for
Dimensionality Reduction
Authors: Charu Aggarwal
-----------------------------------------------------------------
4:45PM - 5:00PM Organizational Break
5:00PM - 6:15PM Panel
Data Mining Research: Current Status and Future
Opportunities
Moderator: Haym Hirsh, NSF
Panelists: Ajay Royyuru - IBM Research
Jerry Friedman - Stanford University
Christos Faloutsos - CMU
Mehran Sahami - Google
6:30PM - Special reception and poster session sponsored by the Digital Technology Center (DTC) at the University of Minnesota to showcase data mining research at the University. This is not a SIAM event, but is open to all attendees of the conference.
Saturday, April 28, 2007
7:30AM – 4:00PM Regstration
7:30AM – 4:00PM Internet Café
8:00AM-4:30PM Workshop on Text Mining Schedule [PDF, 18KB]
8:30AM-5:15PM Workshop on Biomedical Informatics Schedule [PDF, 23KB]
8:45AM-12:00PM Tutorial II
Mining Large Time-evolving Data Using
Matrix and Tensor Tools
Presenters: Christos Faloutsos (CMU), Tamara G
Kolda (Sandia National Labs), and Jimeng Sun (CMU)
8:45AM -12:00PM Tutorial III
Dimensionality Reduction
for Data Mining
Presenters: Lei Yu (Binghamton
U), Jieping Ye (Arizona State U), and Huan Liu (Arizona State U)
10:00AM – 10:45AM Coffee Break
12:00PM – 1:30PM Lunch
1:30PM - 3:30PM Tutorial IV
A Statistical Framework for Mining
Data Streams
Presenters: Simon
Urbanek (AT&T Labs) and Tamraparni Dasu (AT&T Labs)
3:00PM – 3:45PM Coffee Break
End of Conference