Outlier Detection for Temporal Data
Manish Gupta, University of Illinois at Urbana-Champaign, USA
Jing Gao, State University of New York, Buffalo, USA
Charu Aggarwal, IBM T. J. Watson Research Center, USA
Jiawei Han, University of Illinois at Urbana-Champaign, USA
Outlier (or anomaly) detection is a very broad field which has been studied in the context of a large number of research areas like statistics, data mining, sensor networks, distributed systems, etc. While there have been many tutorials and surveys for general outlier detection, we focus on outlier detection for temporal data in this tutorial. A large number of applications generate temporal datasets. Besides the initial work on time series, many other forms of temporal data have been studied, like multiple data streams, spatio-temporal data, temporal network data, and temporal community distribution data. Compared to general outlier detection, techniques for temporal outlier detection are very different, like AR models, Markov models, evolutionary clustering, etc. In this tutorial, we will present an organized picture of recent research in temporal outlier detection. We will begin by motivating the importance of temporal outlier detection and briefing the challenges beyond usual outlier detection. Then, we will list down a taxonomy of proposed techniques for temporal outlier detection and cover some of the techniques in detail. We will summarize by presenting a collection of applications where temporal outlier detection techniques have been applied to discover interesting outliers.
Manish Gupta received his Masters in Computer Science from IIT Bombay, in 2007. He worked for Yahoo! Bangalore for two years. Since 2009, he has been working towards his Ph.D. with Dr. Jiawei Han at the Department of Computer Science, University of Illinois at Urbana-Champaign. He has interned over summers at Microsoft Research, IBM Research and NEC Labs America. His research interests are in the areas of data mining and information retrieval. Specifically his recent interests are in the area of mining of information networks. He has published more than 20 research papers in referred journals and conferences, including KDD, PKDD, SDM, WWW conferences.
Jing Gao is an assistant professor in the Computer Science and Engineering Department at the University at Buffalo, The State University of New York. She is broadly interested in data and information analysis with a focus on information integration, ensemble methods, transfer learning, anomaly detection and mining data streams. She obtained her PhD degree in Computer Science from University of Illinois at Urbana-Champaign in 2011. Her thesis work was supported by IBM PhD fellowship and lead to a tutorial at SDM’10 conference. She has published more than 40 papers in refereed journals and conferences.
Charu Aggarwal is a Research Scientist at the IBM T. J. Watson Research Center in Yorktown Heights, New York. He completed his B.S. from IIT Kanpur in 1993 and his Ph.D. from Massachusetts Institute of Technology in 1996. His research interest during his Ph.D. years was in combinatorial optimization (network flow algorithms), and his thesis advisor was Professor James B. Orlin . He has since worked in the field of performance analysis, databases, and data mining. He has published over 200 papers in refereed conferences and journals, and has applied for or been granted over 80 patents. Because of the commercial value of the above-mentioned patents, he has received several invention achievement awards and has thrice been designated a Master Inventor at IBM. He is a recipient of an IBM Corporate Award (2003) for his work on bio-terrorist threat detection in data streams, a recipient of the IBM Outstanding Innovation Award (2008) for his scientific contributions to privacy technology, and a recipient of an IBM Research Division Award (2008) for his scientific contributions to data stream research. He has served on the program committees of most major database/data mining conferences, and served as program vice-chairs of the SIAM Conference on Data Mining , 2007, the IEEE ICDM Conference, 2007, the WWW Conference 2009, and the IEEE ICDM Conference, 2009. He served as an associate editor of the IEEE Transactions on Knowledge and Data Engineering Journal from 2004 to 2008. He is an associate editor of the ACM TKDD Journal , an action editor of the Data Mining and Knowledge Discovery Journal , an associate editor of the ACM SIGKDD Explorations, and an associate editor of the Knowledge and Information Systems Journal. He is a fellow of the IEEE for "contributions to knowledge discovery and data mining techniques", and a life-member of the ACM.Jiawei Han (Ph.D., Univ. of Wisconsin at Madison), is Abel Bliss Professor in Engineering, in the Department of Computer Science at the University of Illinois. He has been researching into data mining, information network analysis, and database systems, with over 600 publications. He served as the founding Editor-in-Chief of ACM Transactions on Knowledge Discovery from Data (TKDD) and on the editorial boards of several other journals. Jiawei has received IBM Faculty Awards, HP Innovation Awards, ACM SIGKDD Innovation Award (2004), IEEE Computer Society Technical Achievement Award (2005), IEEE Computer Society W. Wallace McDowell Award (2009), and Daniel C. Drucker Eminent Faculty Award at UIUC (2011). He is a Fellow of ACM and a Fellow of IEEE. He is currently the Director of Information Network Academic Research Center (INARC) supported by the Network Science-Collaborative Technology Alliance (NS-CTA) program of U.S. Army Research Lab. His book "Data Mining: Concepts and Techniques" (Morgan Kaufmann) has been used worldwide as a textbook.