SIAM Data Miners Meet in Atlanta for Eighth Annual ConferenceSeptember 24, 2008
The Eighth SIAM International Conference on Data Mining (SDM08) was held in Atlanta, Georgia, April 24–26. The conference drew nearly 250 participants from five continents.
The most popular features of SDM have always been the keynote talks, tutorials, and special-topic workshops. The keynote speakers this year were Ronald Coifman of Yale University ("Geometry and Analysis of Digital Data, Emergent Structures, and Knowledge Building"), Andrew Moore of Google ("Algorithms for Understanding the Sky"), Dianne O'Leary of the University of Maryland, College Park ("Mining Multilingual Documents"), and Krishna Rajan of Iowa State University ("Data Mining and Materials Informatics"). Several of the speakers have captured much of the excitement and energy of their fields in short articles for SIAM News (see other articles in this issue, and look for more in upcoming issues).
As in past years, peer-reviewed papers formed the core of the conference. The nearly 300 papers submitted were reviewed by 140 program committee members, with the assistance of 200 external reviewers. Coordination of the reviews, an immense task ably managed by program co-chairs Mohammed Zaki (RPI) and Ke Wang (Simon Fraser), entailed not only arranging for the reviews, but also soliciting author feedback on certain papers to clarify re-viewers' questions. This very competitive process whittled the 300 submissions down to 40 full papers, along with 37 posters (presented during an evening reception on the first day of the conference).
Being good data miners, the program co-chairs also did a simple clustering of the submissions. Papers on classification, at 29% of submissions, beat out papers on clustering, which accounted for 22%. Frequent pattern mining, at 11%, and Web mining and link analysis, at 10%, made up the majority of the remaining submissions. Geographically, a large number of submissions were from the U.S., followed by substantial numbers from Europe, Asia–Pacific, and North and South America.
After further scrutiny of the six top-ranked papers, the awards committee gave the Best Paper Award to "Proximity Tracking on Time-Evolving Bipartite Graphs," by Hanghang Tong, Spiros Papadimitriou, Philip Yu, and Christos Faloutsos. A runner-up award went to "Simultaneous Unsupervised Learning of Disparate Clusterings," by Prateek Jain, Raghu Meka, and Inderjit Dhillon, and the committee awarded honorable mention to "Robust Clustering in Arbitrarily Oriented Subspaces," by Elke Achtert, Christian Bohm, Jorn David, Peer Kroger, and Arthur Zimek. Revised and enhanced versions of these papers will appear in the Journal of Statistical Analysis and Data Mining later this year.
Four tutorials were presented at SDM08: Data Mining based Social Network Analysis from Online Behavior, Anomaly Detection, Fast N-body Algorithms for Massive Datasets, and Mining Massive Collections of Shapes and Time Series. Tutorial slides are available at the conference Web site (http://www.siam.org/meetings/sdm08/); papers from all the SDM conferences since 2002 can be found at http://www.siam.org/proceedings/.
A panel discussion on the second day led to a rather animated exchange on ways to increase the participation of statisticians in SDM, given the cultural differences in conferences organized by the two communities. Interestingly, statisticians in the audience volunteered to get their colleagues involved, citing educational benefits---the conference, they said, introduced them to very familiar topics, such as anomaly detection, from a very different viewpoint.
At a special afternoon session organized by universities in the Atlanta area, faculty presented research highlights from their institutions, followed by a poster session that gave students the opportunity to present their work and discuss new ideas with conference participants. Many attendees remained for the third day of the conference to attend various workshops, which covered such topics as biomedical informatics, link analysis, text mining, and privacy-preserving data mining.
The conference was supported by the American Statistical Association, Georgia Institute of Technology and its partner institutions in the greater Atlanta area, Thomson West, BeliefNetworks, and the SAS Institute. IBM Research and the U.S. National Science Foundation deserve special mention for providing funds for student travel.
We hope you will join us in Sparks–Reno–Lake Tahoe, Nevada, for the next SDM conference, which is scheduled for April 30 to May 2, 2009.
Chid Apte is a senior manager in the Data Analytics Center at the IBM T.J. Watson Research Center. Chandrika Kamath is a computer scientist in the Center for Applied Scientific Computing at Lawrence Livermore National Laboratory.