Safer Data Mining: A Tutorial on Algorithmic Techniques in Differential Privacy
Moritz Hardt, IBM Research, Almaden, San Jose, USA
Alexander Nikolov, Rutgers University, USA
We present recent algorithmic developments in privacy-preserving data analysis and data mining. The goal of privacy-preserving data analysis is to reveal useful statistics about a population, while still preserving the privacy of the individuals. Sensitive data sets are increasingly common: examples include surveys, census data, social networks, medical records, financial statements, and streams of search queries. The tutorial focuses on Differential Privacy as the formal privacy guarantee. Differential privacy is a strong privacy guarantee that, intuitively speaking, hides the presence or absence of any individual in the data set. Differential Privacy has seen rapid developments in the last seven years drawing interest from several different communities. The research frontier remains active with many important open problems. This tutorial will present the basics of Differential Privacy and state of the art algorithms in three application areas: Supervised Machine Learning (with a focus on regularized regression), Unsupervised Machine Learning (with a focus on spectral methods), and Streaming Computation (with a focus on streaming data structures). A particular emphasis of this tutorial is on practical methods that will aid the practitioner in applying Differential Privacy in academic and industrial applications.