/
Kunal Talwar MSR SVC The Price of Privacy and Kunal Talwar MSR SVC The Price of Privacy and

Kunal Talwar MSR SVC The Price of Privacy and - PowerPoint Presentation

luanne-stotts
luanne-stotts . @luanne-stotts
Follow
351 views
Uploaded On 2018-10-20

Kunal Talwar MSR SVC The Price of Privacy and - PPT Presentation

the Limits of LP decoding Dwork McSherry Talwar STOC 2007 TexPoint fonts used in EMF Read the TexPoint manual before you delete this box A A A A A A A A Compressed Sensing ID: 690264

decoding bad exp privacy bad decoding privacy exp gaussian error theorem differential vector questions database entries noise support small

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Kunal Talwar MSR SVC The Price of Privac..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Kunal TalwarMSR SVC

The Price of Privacy and the Limits of LP decoding

[Dwork, McSherry, Talwar, STOC 2007]

TexPoint fonts used in EMF.

Read the TexPoint manual before you delete this box.:

A

A

A

A

A

A

A

ASlide2

Compressed Sensing: If

x 2

RN is k-sparse Take

M

~ Ck log N/k random Gaussian measurements Then L1 minimization recovers x.For what k does this make sense (i.e M < N)?How small can C be?

TeaserSlide3

Privacy motivationCoding settingResultsProof Sketch

OutlineSlide4

Database of information about individualsE.g. Medical history, Census data, Customer info.Need to guarantee confidentiality of individual entriesWant to make deductions about the database; learn large scale trends.E.g. Learn that drug V increases likelihood of heart disease

Do not leak info about individual patientsSetting

Curator

AnalystSlide5

Simple Model (easily justifiable)

Database: n-bit binary vector

xQuery: vector a

True answer: Dot product

axResponse is ax+e = True Answer + Noise Blatant Non-Privacy: Attacker learns n−o(n) bits of x.

Theorem

:

If all responses are within

o

(

√n

)

of the true answer, then the algorithm is blatantly non-private even against a polynomial time adversary asking

O

(

n

log

2

n

)

random questions.

Dinur and Nissim [2003]Slide6

Privacy has a Price

There is no safe way to avoid increasing the noise as the number of queries increasesApplies to Non-Interactive Setting

Any non-interactive solution permitting answers that are “too accurate” to “too many” questions is vulnerable to the DiNi attack.

This work

: what if most responses have small error, but some can be arbitrarily off?ImplicationsSlide7

Real vector x

2

RnMatrix

A

2 Rmxn with i.i.d. Gaussian entriesTransmit codeword Ax 2 RmChannel corrupts message. Receive y=Ax +eDecoder must reconstruct

x

, assuming

e

has small support

small support: at most

m

entries of

e

are non-zero.

Error correcting codes: Model

Channel

Encoder

DecoderSlide8

The Decoding problem

min support(e'

)such that

y

=Ax'+e'x' 2 Rn solving this would give the original message x.

min

|

e

'

|

1

such that

y

=

Ax

'

+

e

'

x

'

2

R

n

this is a linear program; solvable in poly time.Slide9

Theorem [Donoho/

Candes-Rudelson-Tao-Vershynin] For an error rate

 < 1/2000, LP decoding succeeds in recovering

x

(for m=4n).This talk: How large an error rate  can LP decoding tolerate?LP decoding worksSlide10

Let 

* = 0

.2390318914495168038956510438285657…Theorem 1

: For any

<*, there exists c such that if A has i.i.d. Gaussian entries, and ifA has m = cn rowsFor k=m, every support k vector

e

k

satisfies

|

e

e

k

| <

then LP decoding reconstructs

x

where

|

x’-x

|

2

is

O(

∕ √n).

Theorem 2: For any

>

*, LP decoding can be made to fail, even if m grows arbitrarily.

Results Slide11

In the privacy setting: Suppose, for

<

*, the curatoranswers (1-

) fraction of questions within error o(√n)answers  fraction of the questions arbitrarily. Then the curator is blatantly non-private.Theorem 3: Similar LP decoding results hold when the entries of A are randomly chosen from §1.Attack works in non-interactive setting as well. Also leads to error correcting codes over finite alphabets.

ResultsSlide12

Theorem 1: For any

<

*, there exists c such that if B

has i.i.d. Gaussian entries, and ifB has M = (1 – c) N rowsFor k=m, for any vector x 2 RN then given

Ax

, LP decoding reconstructs

x

where

In Compressed sensing lingoSlide13

Let 

* = 0.2390318914495168038956510438285657…

Theorem 1 (=0): For any

<*, there exists c such that if A has i.i.d. Gaussian entries with m=cn rows, and if the error vector e has support at most m, then LP decoding accurately reconstructs x.Proof sketch…Rest of TalkSlide14

Scale and translation invariance

LP decoding is scale and translation invariantThus, without loss of generality,

transmit x = 0Thus receive

y =

Ax+e = eIf reconstruct z  0, then |z|2 = 1Call such a z bad for A.

Ax

Ax’

ySlide15

Proof Outline

Proof:

Any fixed z is very unlikely to be bad for

A

: Pr[z bad] · exp(-cm)Net argument to extend to Rn: Pr[9 bad z] · exp(-c’m)Thus, with high probability, A is such that LP decoding never fails.Slide16

z bad:

|Az

– e|1 < |

A0 – e

|1 ) |Az – e|1 < |e|1Let e have support T.Without loss of generality, e|T

=

Az

|

T

Thus

z

bad:

|

Az

|

T

c

< |

Az

|

T

)

|

Az

|

T

> ½|Az|

1Suppose z is bad

e1

e2

e3

..

..

em

0

0

0

.

.

.

.

0

a

1

z

a

2

z

a

3

z

.

.

.

.

a

m

z

T

0

y=e

Az

0

T

cSlide17

A i.i.d

. Gaussian ) Each entry of Az

is an i.i.d. GaussianLet W = Az

;

its entries W1,…Wm are i.i.d. Gaussiansz bad ) i 2 T |Wi| > ½i

|

W

i

|

Recall:

|

T

|

·

m

Define

S

(W)

to be sum of magnitudes of the top

fraction of entries of

W

Thus

z

bad

) S

(W) > ½ S1

(W)Few Gaussians with a lot of mass!

Suppose z is bad…

0

TSlide18

Let us look at

E[S]

Let w

*

be such thatLet * = Pr[|W| ¸ w*]Then E[S*] = ½ E[S1]

Moreover, for any

<

*

, E[S

]

·

(

½

) E[S

1

]

Defining

*

E[S

*

]

=

½

E[S

1

]

E[S

]

w

*Slide19

S

 depends on many independent Gaussians.

Gaussian Isoperimetric inequality implies: With high probability, S

(W)

close to E[S]. S1 similarly concentrated. Thus Pr[z is bad] · exp(-cm)Concentration of measureE[S

*

]

=

½

E[S

1

]

E[S

] Slide20

Proof wrap-up

Any fixed z is very unlikely to be bad for A

Pr[z bad] · exp(-cm)

Union bound over a dense net of unit ball in

Rn (size of net exp(c’n)) Pr[9 bad z in net] · exp(-c’’m)A continuity-type argument to show that no z is bad.Slide21

Now

E[S] > ( ½

+ ) E[S

1

]Similar measure concentration argument shows that any z is bad with high probability.Thus LP decoding fails w.h.p. beyond *Donoho/CRTV experiments used random error model.Beyond *

E[S

*

]

=

½

E[S

1

]

E[S

]

Slide22

Compressed Sensing: If

x 2

RN is k-sparse Take

M

~ Ck log N/k random Gaussian measurements Then L1 minimization recovers x.For what k does this make sense (i.e M < N)? How small can C be?

Teaser

k

<

*

N

0.239 N

C >

(

*

log 1

/

*

)

–1

2.02Slide23

Tight threshold for Gaussian LP decoding

To preserve privacy: lots of error in lots of answers.Similar results hold for +1/-1

queries.Inefficient attacks can go much further:

Correct

(½-) fraction of wild errors.Correct (1-) fraction of wild errors in the list decoding sense.Efficient Versions of these attacks?Dwork-Yekhanin: (½-) using AG codes.SummarySlide24

FormallyDatabase : a vector d

DN Mechanism:

M :

D

N → REvaluating M(d) should not reveal specific info about tuples in dSetting

Curator

AnalystSlide25

When 

is small: For d

, d' 

D

N differing on one input, and any S  R Pr[M(d)  S]  (1 ± ) × Pr

[

M

(

d

'

)

S

]

Probabilities taken over coins flipped by curator

Independent of other sources of data, databases, or even knowledge of every other input in

d

.

“Anything, good or bad, is essentially equally likely to occur, whether I join the database or not.”

Generalizes to groups of respondents

Although, if group is large, then outcomes should differ.

ε

-Differential PrivacySlide26

Dalenius’ Goal:

“Anything that can be learned about a respondent, given access to the statistical database, can be learned without access”is Provably Unachievable.

Sam the smoker tries to buy medical insurance Statistical DB teaches smoking causes cancerSam harmed: high premiums for medical insurance

Sam need not be in the database!

Differential Privacy guarantees that risk to Sam will not noticably increase if he enters the DBDBs have intrinsic social valueWhy Differential Privacy?Slide27

No perceptible risk is incurred by joining data setAnything adversary can do to Sam, it could do even if his data not in data set

An Ad Omnia Guarantee

Bad

r’s

:

X

X

X

Pr [r]Slide28

Suppose analyst is interested in a counting queryf(d

) = i

P [di] for some predicate

P

Example: P = 1 iff di smokes and has cancerCurator adds noisescaled symmetric noise ~ Lap(s) with s = 1/.Achieving Differential Privacy

0

s

2s

3s

4s

5s

-s

-2s

-3s

-4s

0

p(x)

 exp(-|x|/s)

Slide29

Suppose analyst is interested in a counting queryf(d

) = i

P [di]

for some predicate

PExample: P = 1 iff di smokes and has cancerCurator adds noisescaled symmetric noise ~ Lap(s) with s = 1/.Achieving Differential Privacy

0

s

2s

3s

4s

5s

-s

-2s

-3s

-4s

0

p(x)

 exp(-|x|/s)

p(x)

 exp(-|x-1|/s)

Slide30

For a general query f :

DN → R

kLet ∆f =

max

d, d´ :| d- d´ |= 1 |f(d) - f(d´)|1Example: f histogram has sensitivity ∆f = 1Curator adds noisesymmetric multidimensional noise ~ Lap(s)k with s = ∆f

/

.

Theorem: This gives

-differential privacy.

Achieving Differential Privacy

0

s

2s

3s

4s

5s

-s

-2s

-3s

-4s

0Slide31

This allows fairly accurate reporting of insensitive functionsWhen asking e.g. independent counting questions, noise grows linearly in the number of questions.Lots of algorithms/analyses can be written/rephrased so as to use a sequence of insensitive questions to the database

Means/Variance/CovariancesEM algorithm for k-meansPCAA set of low-dimensional marginalsAchieving Differential Privacy