from Part III - Exploratory Attacks on Machine Learning
Published online by Cambridge University Press: 14 March 2019
High-profile privacy breaches have trained the spotlight of public attention on data privacy. Until recently privacy, a relative laggard within computer security, could be enhanced only weakly by available technologies, when releasing aggregate statistics on sensitive data: until the mid-2000s definitions of privacy were merely syntactic. Contrast this with the state of affairs within cryptography that has long offered provably strong guarantees on maintaining the secrecy of encrypted information, based on computational limitations of attackers. Proposed as an answer to the challenge of bringing privacy on equal footing with cryptography, differential privacy (Dwork et al. 2006; Dwork & Roth 2014) has quickly grown in stature due to its formal nature and guarantees against powerful attackers. This chapter continues the discussion begun in Section 3.7, including a case study on the release of trained support vector machine (SVM) classifiers while preserving training data privacy. This chapter builds on (Rubinstein, Bartlett, Huang, & Taft 2012).
Privacy Breach Case Studies
We first review several high-profile privacy breaches achieved by privacy researchers. Together these have helped shape the discourse on privacy and in particular have led to important advancements in privacy-enhancing technologies. This section concludes with a discussion of lessons learned.
Massachusetts State Employees Health Records
An early privacy breach demonstrated the difficulty in defining the concept of personally identifiable information (PII) and led to the highly influential development of kanonymity (Sweeney 2002).
In the mid-1990s the Massachusetts Group Insurance Commission released private health records of state employees, showing individual hospital visits, for the purpose of fostering health research. To mitigate privacy risks to state employees, the Commission scrubbed all suspected PII: names, addresses, and Social Security numbers. What was released was pure medical information together with (what seemed to be innocuous) demographics: birthdate, gender, and zipcode.
Security researcher Sweeney realized that the demographic information not scrubbed was in fact partial PII. To demonstrate her idea, Sweeney obtained readily available public voter information for the city of Cambridge, Massachusetts, which included birthdates, zipcodes, and names. She then linked this public data to the “anonymized” released hospital records, thereby re-identifying many of the employees including her target, then Governor William Weld who originally oversaw the release of the health data.
To save this book to your Kindle, first ensure [email protected] is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Find out more about the Kindle Personal Document Service.
To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.
To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.