Originally posted on the Google Developers Blog
By: Miguel Guevara, Product Manager, Privacy and Data Protection Office
Differentially-private data analysis is a principled approach that enables organizations to learn from the majority of their data while simultaneously ensuring that those results do not allow any individual's data to be distinguished or re-identified. This type of analysis can be implemented in a wide variety of ways and for many different purposes. For example, if you are a health researcher, you may want to compare the average amount of time patients remain admitted across various hospitals in order to determine if there are differences in care. Differential privacy is a high-assurance, analytic means of ensuring that use cases like this are addressed in a privacy-preserving manner.
Today, we’re rolling out the open-source version of the differential privacy library that helps power some of Google’s core products. To make the library easy for developers to use, we’re focusing on features that can be particularly difficult to execute from scratch, like automatically calculating bounds on user contributions. It is now freely available to any organization or developer that wants to use it.
A deeper look at the technology
Our open source library was designed to meet the needs of developers. In addition to being freely accessible, we wanted it to be easy to deploy and useful.
Here are some of the key features of the library:
- Statistical functions: Most common data science operations are supported by this release. Developers can compute counts, sums, averages, medians, and percentiles using our library.
- Rigorous testing: Getting differential privacy right is challenging. Besides an extensive test suite, we’ve included an extensible ‘Stochastic Differential Privacy Model Checker library’ to help prevent mistakes.
- Ready to use: The real utility of an open-source release is in answering the question “Can I use this?” That’s why we’ve included a PostgreSQL extension along with common recipes to get you started. We’ve described the details of our approach in a technical paper that we’ve just released today.
- Modular: We designed the library so that it can be extended to include other functionalities such as additional mechanisms, aggregation functions, or privacy budget management.
Investing in new privacy technologies
We have driven the research and development of practical, differentially-private techniques since we released RAPPOR to help improve Chrome in 2014, and continue to spearhead their real-world application.
We’ve used differentially private methods to create helpful features in our products, like how busy a business is over the course of a day or how popular a particular restaurant’s dish is in Google Maps, and improve Google Fi.
This year, we’ve announced several open-source, privacy technologies—Tensorflow Privacy, Tensorflow Federated, Private Join and Compute—and today’s launch adds to this growing list. We're excited to make this library broadly available and hope developers will consider leveraging it as they build out their comprehensive data privacy strategies. From medicine, to government, to business, and beyond, it’s our hope that these open-source tools will help produce insights that benefit everyone.
Acknowledgements
Software Engineers: Alain Forget, Bryant Gipson, Celia Zhang, Damien Desfontaines, Daniel Simmons-Marengo, Ian Pudney, Jin Fu, Michael Daub, Priyanka Sehgal, Royce Wilson, William Lam