Abstract: Data Mining Technology and Practice in the Real World


This tutorial will provide a detailed overview of practical data mining by walking through the steps of a data mining project (e.g., CRISP-DM), describing the goals, techniques and problems associated with each step. Practical and interesting examples will be drawn from the presenter's extensive real-world data mining experience in government and industry.

The tutorial has 3 parts: Background and Motivation, Technology and Practice, and Practical Tips for Data Mining Success. Specifically addressed will be data conditioning, feature selection and enhancement, descriptive modeling, predictive modeling, validation methods, and integration into the enterprise environment. A taxonomy of modeling algorithms will be presented, enabling the modeler to answer the question, "When do I use which technique?" The final part of the tutorial will be a treatment of "lessons learned": tips and techniques for recognizing and avoiding problems on data mining projects. The presenter will describe in detail simple mistakes that novice data miners make that lead to trouble, and too often, project failure.


Monte F. Hancock, Jr.

Monte Hancock is Chief Scientist for CSI Corporation, a provider of high-end data mining analytics for government and industry. He served on the Program Committee for the KDD2002, and gave one of KDD-2002's six tutorials, titled "Common Reasons Data Mining Projects Fail." Monte holds an MS in Mathematics from Syracuse University. He serves on the adjunct faculty in Computer Science for Rollins College, and the graduate faculties in Computer Science and Business for Webster University. He is coauthor (with R. Delmater) of "Data Mining Explained", published by Digital Press, January 2001.

Return to Program