Differential privacy (DP) is widely regarded as a gold standard for privacy-preserving computation over users’ data. It is a parameterized notion of database privacy that gives a rigorous worst-case bound on the information that can be learned about any one individual from the result of a data analysis task. Algorithmically it is achieved by injecting carefully calibrated randomness into the analysis to balance privacy protections with accuracy of the results.
In this talk, we will survey recent developments in the development of DP algorithms for three important statistical problems, namely online learning with bandit feedback, causal inference, and learning from imbalanced data. For the first problem, we will show that Thompson sampling -- a standard bandit algorithm developed in the 1930s -- already satisfies DP due to the inherent randomness of the algorithm. For the second problem of causal inference and counterfactual estimation, we develop the first DP algorithms for synthetic control, which has been used non-privately for this task for decades. Finally, for the problem of imbalanced learning, where one class is severely underrepresented in the training data, we show that combining existing techniques such as minority oversampling perform very poorly when applied as pre-processing before a DP learning algorithm; instead we propose novel approaches for privately generating synthetic minority points.
Based on joint works with Marco Avella Medina, Vishal Misra, Yuliia Lut, Tingting Ou, Saeyoung Rho, Lucas Rosenblatt, Ethan Turok