S. Muthukrishnan
Rutgers University and AT&T Shannon Labs
Say you go fishing. How do you quickly estimate the number of fish types that
are only a few (or the most abundant) in your catch? How do you quickly check
if your catch is similar to your colleague's? These problems are interesting
when fish types are numerous, you can not remember all the fish types in your
catch, and you do not have the time to sort through them.
Fishing puzzles above are examples of problems that arise when processing data
streams, motivated by applications in managing massive data, databases and networking.
The Theoretical Computer Science community has recently produced models, algorithms
and complexity results to reason about processing streaming data, be it for
estimating statistical parameters, clustering or accurate summarization of the
streaming signals. In this tutorial, I will present an overview of this emerging
theory, and the applications that drive some of the questions.
Problems become hard when one throws some of the catch back into the pond!