Growing Interest in Information Retrieval Draws 70 to Raleigh Workshop

January 9, 2001

Michael W. Berry

The first Computational Information Retrieval Workshop (CIR'00) was held on Sunday, October 22, 2000, in Raleigh, North Carolina, immediately preceding the Seventh SIAM Conference on Applied Linear Algebra. The close to 70 workshop attendees represented universities, industry, and government laboratories. Invited and contributed talks focused on the role of linear algebra, computational statistics, and computer science in the development of algorithms and software systems for information (particularly text) retrieval. The workshop was sponsored by the SIAM Activity Group on Linear Algebra and supported by SIAM, the National Science Foundation, Boeing, M-CAM, Inc., and Telcordia Technologies, Inc.

Abstracts for all the talks presented at CIR'00 are available at, and SIAM will publish the proceedings of the workshop in early 2001. Space constraints allow the inclusion of only a few highlights here.

Many of the talks focused on the use of latent semantic indexing/analysis (LSI/A) and alternative vector space models for information retrieval. Approaches based on less computationally demanding techniques, such as concept decompositions, were shown to be quite effective for document clustering (Inderjit Dhillon, University of Texas, Austin). Researchers have made some progress in reducing the SVD-based costs of LSI by using symbolic techniques common to sparse matrix factorization research and graph theory (Padma Ragahavan, Pennsylvania State University). A different approach to SVD-complexity reduction, based on fast bi-diagonalization (short Krylov subspaces), was also presented (Axel Ruhe, Chalmers University of Technology, Sweden).

Efforts to demonstrate optimal performance of LSI/A across different text collections remain inconsistent (Elizabeth Jessup, University of Colorado, Boulder); discussions arose at the workshop about a paradigm shift from "noise reduction," associated with low-rank subspace modeling, to "noise addition," with greater emphasis on the statistical significance of additive subspace dimensionality (Chris Ding, NSERC, Lawrence Berkeley National Laboratory, and Kyle Gallivan, Florida State University). Complete IR systems based on LSI/A have been designed at Minnesota (BIRDS project, Haesun Park) and Boeing (TRUST, Jason Wu), and emerging new application areas include the automatic detection of warranty repair claims (William Pottenger, Lehigh University).

The workshop was quite successful in outlining future research directions in vector space IR modeling, and participants expressed considerable interest in holding a follow-up workshop next year.

Michael W. Berry, who chaired the organizing committee for the workshop, is an associate professor of computer science at the University of Tennessee.

