/
Introduction to Security Introduction to Security

Introduction to Security - PowerPoint Presentation

jaena
jaena . @jaena
Follow
342 views
Uploaded On 2022-02-15

Introduction to Security - PPT Presentation

Module 17 Sharing Data While Preserving Privacy some slides by Gen Bartlett Jelena Mirkovic USC CSCI 430 Why Do We Want to Share Share existing data sets Research Companies Buy data from each other ID: 908926

flu data privacy smith data flu smith privacy cancer age anonymization information paul gallbladder hosp examplenetwork payload attack heart

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Introduction to Security" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Introduction to Security

Module 17 – Sharing Data While Preserving Privacysome slides by Gen Bartlett

Jelena

Mirkovic

USC CSCI 430

Slide2

Why Do We Want to Share?Share existing data sets:Research

Companies Buy data from each other Check out each other’s assets before merges/buyoutsStart a new dataset:Mutually beneficial relationshipsShare data with me and you can use this service

2

Slide3

Sharing Everything?Easy, but what are the ramifications?Legal/policy may limit what can be shared/collected

IRBs: Institutional Review BoardHITECH & HIPAA: Health Insurance Portability and Accountability ActFuture use and protection of data?

3

Slide4

Mechanisms for Limited SharingRemove really sensitive stuff (sanitization)PPI & PII (private, personal & private identifying)

Without a crystal ball, this is hardAnonymizationReplace information to limit ability to tie entities to meaningful identitiesAggregationRemove PII by only collecting/releasing statistics

4

Slide5

Anonymization ExampleNetwork trace:

PAYLOAD

5

Slide6

Anonymization ExampleNetwork trace:

PAYLOAD

All sorts of PII and PPI in there!

6

Slide7

Anonymization ExampleNetwork trace:

PAYLOAD

Routing information: IP addresses, TCP flags/options, OS fingerprinting

7

Slide8

Anonymization ExampleNetwork trace:

PAYLOAD

Remove IPs? Anonymize IPs?

8

Slide9

Anonymization ExampleNetwork trace:

PAYLOAD

Removing IPs severely limits what you can do with the data.

Replace with something identifying, but not the same data.

IP1 = A

IP2 = B

Etc.

9

Slide10

Aggregation Example“Fewer U.S. Households Have Debt, But Those Who Do Have More, Census Bureau Reports”

10

Slide11

Methods Can Be Bad Or GoodJust because someone uses aggregation or anonymization, doesn’t mean the data is safeExample:

Release aggregate stats of people’s favorite color? 11

Slide12

What is Inferred?Take 2 sources of information, correlate dataX + Y = ….Example: Google Street View + what my car looks like + where I live = you know where I was last year

12

Slide13

Another ExamplePaula Broadwell who had an affair with CIA director David Petraeus, similarly took extensive precautions to hide her identity. She never logged in to her anonymous e-mail service from her home network. Instead, she used hotel and other public networks when she e-mailed him. The FBI correlated hotel registration data from several different hotels -- and hers was the common name.

13

Slide14

Another Example: Netflix & IMDBNetflix prize: released an anonymized datasetCorrelated with IMDB: undid anonymization (University of Texas)

14

Slide15

Designing Privacy-Preserving SystemsAim for the minimum amount of information needed to achieve goals

Think through how info can be gained and inferredInferred is often a gotcha! x + y = something private, but x and y by themselves don’t seem all that specialThink through how information can be gainedOn the wire? Stored in logs? At a router? At an ISP?

15

Slide16

Privacy and Stored InformationData is only as safe as the systemHow long is the data stored affects privacy

Longer term = bigger privacy risk (in general)Longer time frame, more data to correlate & inferLonger opportunity for data theftIncreased chances of mistakes, lapsed security etc.

16

Slide17

Anonymized Data

Goal: release anonymized dataRemove identifying information, like nameSome diseases are still unique to one person

17

name

age

hosp. reason

Paul Smith

80

cancer

Jerry Goel

43

cancer

Marry Smith

32

flu

Amy Gilbert

21

flu

Theodore Tuck

74

gallbladder

Jennifer Dill

53

heart attack

Slide18

k-anonymity

OK to release data if a sensitive feature pertains to k or more peopleImagine k=218

name

age

hosp. reason

Paul Smith

80

cancer

Jerry Goel

43

cancer

Marry Smith

32

flu

Amy Gilbert

21

flu

Theodore Tuck

74

gallbladder

Jennifer Dill

53

heart attack

Slide19

k-anonymity

But there is only one person age=80If I were to observe my elderly neighbor go into that hospital, I can learn his condition from anonymized data

19

name

age

hosp. reason

Paul Smith

80

cancer

Jerry Goel

43

cancer

Marry Smith

32

flu

Amy Gilbert

21

flu

Theodore Tuck

74

gallbladder

Jennifer Dill

53

heart attack

Slide20

k-anonymity

Anonymize age tooGood privacy but I’m losing correlationsin data

20

name

age

hosp. reason

Paul Smith

20-80

cancer

Jerry Goel

20-80

cancer

Marry Smith

20-80

flu

Amy Gilbert

20-80

flu

Theodore Tuck

74

gallbladder

Jennifer Dill

53

heart attack

Slide21

Differential Privacy

Allow queries on dataAdd random noise to protect privacyAmplitude of the noise ~ data distribution

21

name

age

hosp. reason

Paul Smith

80

cancer

Jerry Goel

43

cancer

Marry Smith

32

flu

Amy Gilbert

21

flu

Theodore Tuck

74

gallbladder

Jennifer Dill

53

heart attack

Slide22

Differential Privacy

22

LaPlace

mechanism adds noise drawn from

LaPlace

distribution with parameter

Where is

global sensitivity

of the function f (how much is max change of f if any one row of the table is removed)

E.g., sensitivity of a count is 1, sensitivity of avg(age) in our table is 9.8

Typical is 0.1

Slide23

Differential Privacy

Current state of the art for privacy protectionWorks well when you have a lot of dataWorks well to learn about the average population but not about outliersOffers strong mathematical guarantees about privacy, not so much about utilityAdopted by all major companies: Microsoft, Apple, Google, Facebook

23