Relational Data Mining

Prof. Saso Dzeroski
Josef Stefan Institute, Ljubljana, Slovenia

Abstract:

Relational Data Mining (RDM) is the multi-disciplinary field dealing with knowledge discovery from relational databases consisting of multiple tables (relations). To emphasize the contrast to typical data mining approaches that look for patterns in a single relation of a database, the name Multi-Relational Data Mining is often used as well. Mining data which consists of complex/structured objects also falls within the scope of this field: the normalized representation of such objects in a relational database requires multiple tables. The field aims at integrating results from existing fields such as inductive logic programming (ILP), KDD, data mining, machine learning and relational databases; producing new techniques for mining multi-relational data; and practical applications of such techniques.

Present RDM approaches consider all of the main data mining tasks, including association analysis, classification, clustering, learning probabilistic models and regression. The pattern languages used by single-table data mining approaches for these data mining tasks have been extended to the multiple-table case. Relational pattern languages now include relational association rules, relational classification rules, relational decision trees, and probabilistic relational models, among others. RDM algorithms have been developed to mine for patterns expressed in relational pattern languages. Typically, data mining algorithms have been upgraded from the single-table case: for example, distance-based algorithms for prediction and clustering have been upgraded by defining distance measures between examples/instances represented in relational logic. RDM methods have been successfully applied accross many application areas, ranging from the analysis of business data, through bioinformatics (including the analysis of complete genomes) and pharmacology (drug design) to Web mining (e.g., information extraction from Web sources).

The tutorial will provide a coherent introduction to the basic concepts, techniques and applications of relational data mining.

Biography:

Saso Dzeroski is a Senior Scientific Associate of the Department of Intelligent System, Jozef Stefan Institute, Ljubljana, Slovenia. He is also an adjunct professor of the School of Environmental Sciences, Polytechnic Nova Gorica. He received his B.Sc. in 1989, M.Sc. in 1991, and Ph.D. in 1995, all in computer science, from the Faculty of Computer and Information Science, University of Ljubljana, Slovenia. For his dissertation "Numerical costraints and learnability in inductive logic programming", he received the 1996 The Jozef Stefan Golden Emblem Prize Award, a Slovenian national prize awarded for dissertations in the area of natural and technical sciences. He has held visiting researcher positions at the Turing Institute, Glasgow, UK; Katholieke Universiteit Leuven, Belgium; and German National Research Center for Computer Science, Sankt Augustin, Germany.

He has been active in the research areas of inductive logic programming (ILP) and more recently relational data mining (RDM). He was involved in several international projects related to ILP and was the scientific coordinator of ILPnet2: The Network of Excellence in ILP. He was co-chair of the Seventh and Ninth International Workshops on ILP (ILP-97 and ILP-99) and co-chair of The Sixteenth International Conference on Machine Learning (ICML-99). He has also co-organized a number of events related to the topic of RDM, such as the ILP&KDD Summer School in Prague in September 1997, the RDM Summer School in Helsinki in August 2002, and the Multi-Relational Data Mining Workshop at KDD-2002 in Edmonton in July 2002. He is the co-author/co-editor of three books in the areas of ILP/RDM: Inductive Logic Programming: Techniques and Applications, the first authored book on ILP; Learning Language in Logic, concerned with learning from natural language resources; and finally the book Relational Data Mining.

Return to Program