/
Data Privacy 1 SADET Module D5: Data Privacy 1 SADET Module D5:

Data Privacy 1 SADET Module D5: - PowerPoint Presentation

WiseWhale
WiseWhale . @WiseWhale
Follow
343 views
Uploaded On 2022-08-01

Data Privacy 1 SADET Module D5: - PPT Presentation

Data Privacy Dr Balaji Palanisamy Associate Professor School of Computing and Information University of Pittsburgh bpalanpittedu Slides Courtesy Prof James Joshi University of Pittsburgh ID: 932069

privacy 1520 1523 1521 1520 privacy 1521 1523 data 15206 15212 15213 9098 dob score sex zip differential anonymization

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Data Privacy 1 SADET Module D5:" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Data Privacy

1

Slide2

SADET

Module D5: Data Privacy

Dr. Balaji Palanisamy

Associate Professor

School of Computing and InformationUniversity of Pittsburghbpalan@pitt.eduSlides Courtesy:Prof. James Joshi (University of Pittsburgh).Many slides in this lecture are adapted from SIGMOD 2009 tutorial, “Anonymized Data: Generation, Models, Usage”, Cormode & Srivastava and Indrajit Roy et. al, NSDI 2010 paper

IS-2150/TEL2810: Info. Security and Privacy

2

Slide3

Introduction to PrivacyData PrivacyAnonymization techniquesDifferential PrivacyIS-2150/TEL2810: Info. Security and Privacy

3

Slide4

What is privacy?Hard to define“Privacy is the claim of individuals, groups, or institutions to determine for themselves when, how, and to what extent information about them is communicated to others”Alan Westin, Privacy and Freedom, 1967

4

Slide5

OECD Guidelines on the Protection of Privacy (1980)Collection limitationData qualityPurpose specificationUse limitationSecurity safeguardsOpennessIndividual participationAccountability

5http://www.oecd.org/document/18/0,3343,en_2649_34255_1815186_1_1_1_1,00.html#part2

Slide6

Privacy LawsEU: ComprehensiveEuropean Directive on Data ProtectionUS: Sector specificHIPAA (Health Insurance Portability and Accountability Act of 1996)Protect individually identifiable health informationCOPPA (Children‘s Online Privacy Protection Act of 1998

)Address collection of personal information from children under 13, how to seek verifiable parental consent from their parents, etc.GLB (Gramm-Leach-Bliley-Act of 1999) Requires financial institutions to provide consumers with a privacy policy notice, including what info. collected, where info. shared (affiliates and nonaffiliated third parties), how info. used, how info. protected, opt-out options, etc.

Fair Credit Reporting Act

6

Slide7

Anonymized Data: Generation, Models, Usage – Cormode & Srivastava

7

Why anonymize and how?

For Data Sharing

Give real(istic) data to others to study without compromising privacy of individuals in the data For Data Retention and UsageVarious requirements prevent companies from retaining customer information indefinitely Anonymization methods:k-anonymityl-diversitydifferential privacy

Slide8

8

Tabular Data Example

Course record data recording scores and demographics

Releasing Student ID

 Score association violates individual’s privacyStudent ID is an identifier, Score is a sensitive attribute (SA)

Student ID

DOB

Sex

ZIP

Score

75835

9/28/96

M

15213

70

14792

9/29/96

F

15213

70

87593

1/21/95

F

15212

80

87950

9/28/96

M

15212

80

38833

5/25/92

M

15206

90

68054

1/13/92

F

15206

70

99316

7/28/92

M

15207

80

51589

1/13/92

F

15207

80

14941

1/13/98

F

15232

90

22563

7/28/99

M

15232

90

90652

1/22/99

M

15231

90

12386

2/23/98

F

15231

90

Slide9

9

Tabular Data Example: De-Identification

Course record

: remove Student ID to create de-identified table

Does the de-identified table preserve an individual’s privacy?Depends on what other information an attacker knows

DOB

Sex

ZIP

Score

9/28/96

M

15213

70

9/29/96

F

15213

70

1/21/95

F

15212

80

9/28/96

M

15212

80

5/25/92

M

15206

90

1/13/92

F

15206

70

7/28/92

M

15207

80

1/13/92

F

15207

80

1/13/98

F

15232

90

7/28/99

M

15232

90

1/22/99

M

15231

90

2/23/98

F

15231

90

Slide10

10

Tabular Data Example: Linking Attack

De-identified private data + publicly available data

Cannot uniquely identify either individual’s score

DOB is a quasi-identifier (QI)

DOB

Sex

ZIP

Score

9/28/96

M

15213

70

9/29/96

F

15213

70

1/21/95

F

15212

80

9/28/96

M

15212

80

5/25/92

M

15206

90

1/13/92

F

15206

70

7/28/92

M

15207

801/13/92F15207801/13/98F15232907/28/99M15232901/22/99M15231902/23/98F1523190

Student IDDOBSex758359/28/96M515891/13/92M

Slide11

11

Tabular Data Example: Linking Attack

De-identified private data + publicly available data

Uniquely identified one individual’s score, but not the other’s

DOB, Sex are quasi-identifiers (QI)

DOB

Sex

ZIP

Score

9/28/96

M

15213

70

9/29/96

F

15213

70

1/21/95

F

15212

80

9/28/96

M

15212

80

5/25/92

M

15206

90

1/13/92

F

15206

70

7/28/92

M

15207

801/13/92F15207801/13/98F15232907/28/99M15232901/22/99M15231902/23/98F1523190

Student IDDOBSex758359/28/96M225637/28/99M

Slide12

12

Tabular Data Example: Linking Attack

De-identified private data + publicly available data

Uniquely identified both individuals’ scores

[DOB, Sex, ZIP] is unique for lots of US residents [Sweeney 02]

DOB

Sex

ZIP

Score

9/28/96

M

15213

70

9/29/96

F

15213

70

1/21/95

F

15212

80

9/28/96

M

15212

80

5/25/92

M

15206

90

1/13/92

F

15206

70

7/28/92

M

15207

801/13/92F15207801/13/98F15232907/28/99M15232901/22/99M15231902/23/98F1523190

Student IDDOBSexZIP758359/28/96M1521322563

7/28/99

M

15232

Slide13

13

k-Anonymization

[Samarati, Sweeney 98]

k-anonymity: Table T satisfies k-anonymity with quasi-identifier QI

iff each tuple in (the multiset) T[QI] appears at least k timesProtects against “linking attack”k-anonymization: Table T’ is a k-anonymization of T if T’ is a generalization/suppression of T, and T’ satisfies k-anonymity

DOB

Sex

ZIP

Score

9/28/96

M

15213

70

9/29/96

F

15213

70

1/21/95

F

15212

80

9/28/96

M

15212

80

5/25/92

M

15206

90

1/13/92

F

15206

70

7/28/92

M

15207801/13/92F15207801/13/98F15232907/28/99M15232901/22/99M15231902/23/98F

1523190DOBSexZIPScore96-95*

1521*

70

96-95

*

1521*

70

96-95

*

1521*

80

96-95

*

1521*

80

92

*

1520*

90

92

*

1520*

70

92

*

1520*

8092*1520*8098-99*1523*9098-99*1523*9098-99*1523*9098-99*1523*90

4-anonymization

Slide14

14

k-Anonymization and Uncertainty

Intuition

: A k-anonymized table

T’ represents the set of all “possible world” tables Ti s.t. T’ is a k-anonymization of TiThe table T from which T’ was originally derived is one of the possible worlds

DOB

Sex

ZIP

Score

9/28/96

M

15213

70

9/29/96

F

15213

70

1/21/95

F

15212

80

9/28/96

M

15212

80

5/25/92

M

15206

90

1/13/92

F

15206

70

7/28/92

M

15207801/13/92F15207801/13/98F15232907/28/99M15232901/22/99M15231902/23/98F

1523190DOBSexZIPScore96-95*

1521*

70

96-95

*

1521*

70

96-95

*

1521*

80

96-95

*

1521*

80

92

*

1520*

90

92

*

1520*

70

92

*

1520*

8092*1520*8098-99*1523*9098-99*1523*9098-99*1523*9098-99*1523*90

One possibility

Slide15

15

k-Anonymization and Uncertainty

Intuition

: A k-anonymized table

T’ represents the set of all “possible world” tables Ti s.t. T’ is a k-anonymization of Ti(Many) other tables are also possible

DOB

Sex

ZIP

Score

9/28/96

M

15217

70

9/29/96

F

15213

70

1/21/95

F

15212

80

9/28/96

F

15212

80

5/25/92

M

15206

90

1/13/92

F

15206

70

7/28/92

M

15207

801/13/92F15207801/13/98M15232907/28/99M15232901/22/99M15232902/23/98F15231

90DOBSexZIPScore96-95*1521*70

96-95

*

1521*

70

96-95

*

1521*

80

96-95

*

1521*

80

92

*

1520*

90

92

*

1520*

70

92

*

1520*

80

92

*1520*8098-99*1523*9098-99*1523*9098-99*1523*9098-99*1523*90Another possibility

Slide16

16

Homogeneity Attack

[Machanavajjhala+ 06]

Issue

: k-anonymity requires each tuple in (the multiset) T[QI] to appear ≥ k times, but does not say anything about the SA valuesIf (almost) all SA values in a QI group are equal, loss of privacy!The problem is with the choice of grouping, not the data

Not Ok!

All scores are 90

In the QI group!

DOB

Sex

ZIP

Score

9/28/96

M

15213

70

9/29/96

F

15213

70

1/21/95

F

15212

80

9/28/96

M

15212

80

5/25/92

M

15206

90

1/13/92

F

15206

707/28/92M15207801/13/92F15207801/13/98F15232907/28/99M15232901/22/99M15231902/23/98F

1523190DOBSexZIPScore96-95*

1521*

70

96-95

*

1521*

70

96-95

*

1521*

80

96-95

*

1521*

80

92

*

1520*

90

92

*

1520*

70

92

*

1520*

8092*1520*8098-99*1523*9098-99*1523*9098-99*1523*9098-99*1523*90

Slide17

17

Homogeneity Attack

[Machanavajjhala+ 06]

Issue

: k-anonymity requires each tuple in (the multiset) T[QI] to appear ≥ k times, but does not say anything about the SA valuesIf (almost) all SA values in a QI group are equal, loss of privacy!The problem is with the choice of grouping, not the dataFor some groupings, no loss of privacy

Ok!

DOB

Sex

ZIP

Score

9/28/96

M

15213

70

9/29/96

F

15213

70

1/21/95

F

15212

80

9/28/96

M

15212

80

5/25/92

M

15206

90

1/13/92

F

15206

70

7/28/92

M15207801/13/92F15207801/13/98F15232907/28/99M15232901/22/99M1523190

2/23/98F1523190DOBSexZIPScore95-99

*

152**

70

95-99

*

152**

70

95-99

*

152**

80

95-99

*

152**

80

92

*

1520*

90

92

*

1520*

70

92

*

1520*8092*1520*8095-99*152**9095-99*152**9095-99*152**9095-99*152**

90

At least 3 unique

values in each QI group

3-diversity

with

3

distinct

values

for each QI group

Slide18

18

Homogeneity and Uncertainty

Intuition

: A k-anonymized table

T’ represents the set of all “possible world” tables Ti s.t. T’ is a k-anonymization of TiLack of diversity of SA values implies that in a large fraction of possible worlds, some fact is true, which can violate privacy

Student ID

DOB

Sex

ZIP

90652

1/22/99

M

15231

DOB

Sex

ZIP

Score

96-95

*

1521*

70

96-95

*

1521*

70

96-95

*

1521*

80

96-95

*

1521*

80

92

*

1520*

9092*1520*7092*1520*8092*1520*8098-99*1523*9098-99*1523*

9098-99*1523*9098-99*1523*90

Slide19

19

l

-Diversity

[Machanavajjhala+ 06]

Intuition: Most frequent value does not appear too often compared to the less frequent values in a QI groupl-Diversity Principle: a table is l-diverse if each of its QI groups contains at least l “well-represented” values for the SA“well-represented” extensions:Distinct l-diversity: simplest definition that at least l distinct values represented in each QI group

gEntropy l

-diversity

: for each QI group

g

,

entropy(g) ≥ log(

l

)

Recursive (

c,

l

)-diversity

: for each QI group

g

with

m

SA values, and

r

i

is the

i

’th

highest frequency,

r

1

< c (

r

l

+ r

l+1

+ … + rm)DOBSexZIPScore96-95*1521*7096-95*1521*7096-95*1521*8096-95*1521*8092*1520*9092*1520*7092*1520*8092*1520*8098-99*1523*9098-99*1523*9098-99*1523*9098-99

*1523*90

Slide20

Background: Differential privacy20A mechanism is

differentially private if every output is produced with similar probability whether any given input is included or notCynthia Dwork. Differential Privacy

. ICALP 2006

Slide21

Differential privacy (intuition)21A mechanism is

differentially private if every output is produced with similar probability whether any given input is included or not

Output distribution

F(x)

A

B

C

Cynthia Dwork.

Differential Privacy

. ICALP 2006

Slide22

Differential privacy (intuition)22A mechanism is

differentially private if every output is produced with similar probability whether any given input is included or not

Similar output distributions

Bounded risk for

D

if she includes her data!

F(x)

F(x)

A

B

C

A

B

C

D

Cynthia Dwork.

Differential Privacy

. ICALP 2006

Slide23

Achieving differential privacy23A simple differentially private mechanism

How much noise should one add?

Tell me f(x)

f(x)+noise

…xnx1

Slide24

Achieving differential privacy24Function sensitivity

(intuition): Maximum effect of any single input on the outputAim: Need to conceal this effect to preserve privacyExample: Computing the average height of the people in this room has low sensitivityAny single person’s height does not affect the final average by too much

Calculating the

maximum height

has high sensitivity

Slide25

Achieving differential privacy25Function sensitivity

(intuition): Maximum effect of any single input on the outputAim: Need to conceal this effect to preserve privacyExample: SUM over input elements drawn from [0, M]

X

1

X2X3X4SUM

Sensitivity = M

Max. effect of any input element is

M

Slide26

Achieving differential privacy26A simple differentially private mechanism

f(x)+Lap(∆(f))

x

nx1Tell me f(x)

Intuition: Noise needed to mask the effect of a single input

Lap = Laplace distribution

∆(f) = sensitivity

Slide27

27

Sensitivity of a Function f

How Much Can f(DB + Me) Exceed f(DB - Me)?

Recall:

K (f, DB) = f(DB) + noiseQuestion Asks: What difference must noise obscure?

f = max

DB, Me

|f(DB+Me) – f(DB-Me)|

eg,

Count = 1

Slide28

28

Calibrate Noise to Sensitivity

f = max

DB, Me

|f(DB+Me) – f(DB-Me)|

0

R

2R

3R

4R

5R

-R

-2R

-3R

-4R

Pr[

K

(f, DB - Me) = t]

Pr[

K

(f, DB + Me) = t]

=

exp(-(|t-

f

-

|-|t-

f

+

|)/R)

≤ exp(-

f/R)

Theorem: To achieve -differential privacy, use scaled symmetric noise Lap(|x|/R) with R = f/.