/
Controlling Privacy Loss in Survey Sampling Controlling Privacy Loss in Survey Sampling

Controlling Privacy Loss in Survey Sampling - PowerPoint Presentation

cecilia
cecilia . @cecilia
Follow
65 views
Uploaded On 2023-11-23

Controlling Privacy Loss in Survey Sampling - PPT Presentation

Mark Bun Boston University Jörg Drechsler IAB Marco Gaboardi Boston University Audra McMillan Apple Jayshree Sarathy Harvard University Outline Background Cluster sampling ID: 1034867

privacy sampling amplification sample sampling privacy sample amplification records stratified allocation random replacement randomized stratum proportional cluster datasets data

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Controlling Privacy Loss in Survey Sampl..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1. Controlling Privacy Loss in Survey SamplingMark Bun (Boston University), Jörg Drechsler (IAB), Marco Gaboardi (Boston University), Audra McMillan (Apple), Jayshree Sarathy (Harvard University)

2. OutlineBackgroundCluster samplingStratified samplingGeneral findings

3. Differential PrivacyDefinition:Mechanism is -differentially private if for all datasets differing on one record, and for all sets , where the probabilities are (only) taken over the randomness in . 

4. Framework

5. Framework ’ ? 

6. Privacy amplification by subsamplingFor simple random sampling with replacement, simple random sampling without replacement, and Poisson samplingwhere is the privacy loss parameter used by , is the sampling rate, and is resulting privacy loss guarantee Kasiviswanathan, Lee, Nissim, Raskhodnikova, and Smith (2008)… Balle, Barthe, and Gaboardi (2018)

7. Related WorkPoisson sampling: Kasiviswanathan et al. 2008, Beimel et al. 2010Simple random sampling with replacement: Bun et al. 2015Simple random sampling without replacement: Beimel et al. 2010, Bassily et al. 2014, Wang et al. 2016Unified analysis through probabilistic couplings: Balle et al. 2018

8. Using amplification results in practice

9. Using amplification in practice Dataset S contains records, want to compute -DP proportion of employed individuals with Add Laplace noise with scale 1/300 But, if S is a secret, simple random sample from a population P of individualsLaplace noise with scale 1/300 will yield To get , add noise with scale only 1/3000Intuition: Secrecy of sample -> more privacy -> less noise needed 

10. This workQ: Can we extend these results to more complex sampling designs? Privacy amplification not guaranteed for more complex sampling designs Examples from cluster and stratified samplingInsight into more general sampling designs

11. Cluster samplingP   …  

12. Cluster samplingNegligible amplification if size of clusters is large!Intuition:Large clusters -> less secrecy of samplePrivacy of records within cluster may not protect privacy of chosen clusters

13. Multi-stage cluster samplingP   …      

14. Stratified samplingP   …      

15. Stratified differential privacySuppose there are strata. Let denote stratum in question and denote an individual datapoint.Definition: A mechanism satisfies -stratified differential privacy if for all datasets P and datapoints andare indistinguishable.Allows for different values of for each stratum.Protects datapoint and stratum it belongs to. 

16. Stratified sampling, proportional allocationConstant sampling rate within each stratum Deterministic rounding (ie. take ceiling of ) can leak information about the datasetExample: Neighboring datasets , Always sample 1 records from and 2 records from No privacy amplification: !How can we fix this? 

17. Stratified sampling, proportional allocationConstant sampling rate within each stratum Randomized rounding Let With probability sample Example: Neighboring datasets , Always sample 1 records from sample 1 or 2 records from Regain privacy amplification: !  

18. Randomized data-independent sampling rateLet be a distribution over sample sizes.Draw and sample records from without replacement. Then, / where for all .To get amplification, must be concentrated away from !  

19. Randomized data-dependent sampling rateLet denote the data-dependent function that returns the sample size.Eg. Neyman allocation:Let denote a randomized approximation of To get privacy amplification, needs to have large varianceChallenge for budgeting in practice: may need to sample many more records than planned 

20. Proportional vs. Neyman allocationWhat is optimal in the non-private setting may not be optimal in the private settingProportional allocation can perform better than Neyman allocation in high privacy regimesDiscussion question: what practical settings to test this finding?

21. ConclusionIntuitively, sampling can increase the privacy amplificationBut, privacy amplification not guaranteed for complex sampling designsAmplification can be negligible even when only a small fraction of the population is included in the final sample If sampling design is data-dependent, privacy degradation can occur -- the sample design itself can reveal sensitive information

22. Thank you!

23. ResultsPositive resultsStratified sampling + proportional allocation with randomized rounding Negative resultsCluster sampling Stratified sampling + proportional allocation with deterministic roundingData-dependent sampling schemesNeyman allocation