T4 : Data Mining for Genomics

George Karypis, University of Minnesota-Twin Cities

Biological researchers are generating data at an explosive rate. Analyzing this volume of data and using it intelligently is a challenge because of its complexity, its multiple interdependent factors, the uncertainty of these dependencies, and the continuous evolution of our understanding of the data. The goal of this tutorial is to provide an introduction to the latest techniques for data mining and knowledge discovery and how it applies to genomics. A number of diverse data-mining algorithms and how they are currently applied to solving important biological problems will be described. This will include clustering, classification, pattern discovery, temporal prediction. The genomics applications and examples will include amino acid sequence analysis, homology detection, elucidation of biological function, protein structure prediction and identification of related proteins, analysis of biological effects, and DNA microarrays analysis.

Presenter Bio

George Karypis is an assistant professor at the department of Computer Science and Engineering at the University of Minnesota. His research interests spans the areas of parallel algorithm design, data mining, applications of bio-informatics, information retrieval, data mining, parallel processing, scientific computing and optimization, sparse matrix computations. His research includes results in the development of software libraries for serial and parallel graph partitioning (METIS and ParMETIS), hypergraph partitioning (hMETIS), and for parallel Cholesky factorization (PSPASES). He has coauthored several journal articles and conference papers on these topics and a book "Introduction to Parallel Computing" (Publ. Benjamin Cummings/Addison Wesley, 1994). He is a member of ACM, IEEE, and SIAM.