/
Differential Privacy in the Streaming World Differential Privacy in the Streaming World

Differential Privacy in the Streaming World - PowerPoint Presentation

faustina-dinatale
faustina-dinatale . @faustina-dinatale
Follow
381 views
Uploaded On 2016-07-21

Differential Privacy in the Streaming World - PPT Presentation

Aleksandar Sasho Nikolov Rutgers University The Streaming Model Underlying frequency vector A A 1 A n start with A i 0 for all i We observe an ID: 413247

space privacy small streaming privacy space streaming small private problems pan sensitivity observation continual stream log differential error large level sketch amp

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Differential Privacy in the Streaming Wo..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Differential Privacy in the Streaming World

Aleksandar (

Sasho

)

Nikolov

Rutgers UniversitySlide2

The Streaming Model

Underlying

frequency vector

A = A [1], …, A[n] start with A[i] = 0 for all i. We observe an online sequence of updates:Increments only (cash register): Update is it  A[it] := A[it] + 1Fully dynamic (turnstile):Update is (it , ±1)  A[it] := A[it] ± 1Requirements: compute statistics on A Online, O(1) passes over the updatesSublinear space, polylog(n,m)

1, 4, 5, 19, 145, 14 , 5, 5, 16, 4

+, -, +, -, +, +

,

-, +, -, +Slide3

Typical Problems

Frequency moments:

F

k = |A[1]|k + … + |A[n]|krelated: Lp normsDistinct elements: F0 = #{i: A[i] ≠ 0}k-Heavy Hitters: output all i such that A[i] ≥ F1/kMedian: smallest i such that A[1] + … + A[i] ≥ F1/2Generalize to Quantiles Different models:Graph problems: a stream of edges, increments or dynamicmatchings, connectivity, triangle countGeometric problems: a stream of pointsvarious clustering problemsSlide4

When do we need this?

The universe size

n

is huge.Fast arriving stream of updates:IP traffic monitoringWeb searches, tweets Large unstructured data, external storage:multiple passes make senseStreaming algorithms can provide a first rough approximationdecide whether and when to analyze morefine tune a more expensive solutionOr they can be the only feasible solutionSlide5

Outline

Introduction to small space streaming

Small space & differential privacy

Privacy under continual observationPan-privacySlide6

A taste: the AMS sketch for F

2

[Alon Matias Szegedy 96] h:[n]  {± 1} is 4-wise independent+

h

(

i

1

)

= ±

1

h

(

i

4

)

h

(

i

3

)

h(i2)

X

E[X2] = F2

E

[

X

4

]

1/2

≤ O(

F

2

)Slide7

The Median of Averages Trick

X

11

X12X13X14X21

X22

X

23

X

24

X

31

X

32

X

33

X

34

X

41

X

42

X

43

X

44

X

51

X

52

X53X54

Average

X

1X2X3X4X5

Median

X

1/α

2

ln

1/

δ

Average: reduces variance by

α

2

.

Median: reduces probability of large error to

δ

. Slide8

Outline

Introduction to small space streaming

Small space & differential privacy

Privacy under continual observationPan-privacySlide9

Defining Privacy for Streams

We will use

differential privacy.

The database is represented by a streamonline stream of transactionsoffline large unstructured databaseNeed to define neighboring inputs:Event level privacy: differ in a single update 1, 4, 5, 19, 145, 14 , 5, 5, 16, 4 1, 1, 5, 19, 145, 14 , 5, 5, 16, 4User level privacy: replace some updates to i with updates to j 1, 4, 5, 19, 145, 14 , 5, 5, 16, 4 1, 4, 3, 19, 145, 14 , 3, 5, 16, 4We also allow the changed updates to be placed somewhere elseSlide10

Streaming & DP?

Large unstructured database of transactions

Estimate how many distinct users initiated transactions?

i.e. F0 estimationCan we satisfy both the streaming and privacy constraints?F0 has sensitivity 1 (under user privacy)Computing F0 exactly takes Ω(n) spaceClassic sketches from streaming may have large sensitivity Slide11

Oblivious Sketch

Flajolet

and Martin [FM 85] show a sketch

f(S)O(log n) bits of storageF0/2 ≤ f(S) ≤ 2F0 with constant probabilityObliviousness: distribution of f(S) is entirely determined by F0similar to functional privacy [Feigenbaum Ishai Malkin Nissim Strauss Wright 01]Why it helps: Pick noise ηfrom discretized Lap(1/ε)Create new stream S’ to feed to f:If η< 0, ignore first η distinct elementsIf η> 0, insert elements n+1, …,

n+ηDistribution of f(S’

)

is a function of

max{

F

0

+

η

, 0 }

:

ε

-DP (user)

Error

:

F

0

/

2 – O(1

/ε)≤

f(S) ≤ 2F0 + O(1/ε)Space: O(

1/ε + log n) can make log n

w.h.p. by first inserting O(1/

ε) elementsSlide12

Open Problems

When can a streaming estimate of a low-sensitivity function be computed privately, in small space?

does privacy & small space ever require more error than either?

Can we go beyond low-sensitivity, and local sensitivity?F2 has high sensitivity and high local sensitivityLipschitz extensions [Kasiviswanathan Nissim Raskhodnikova Smith 13] relevant?What can we say about graph problems, clustering problems?Private coresets [Feldman Fiat Kaplan Nissim 09]Slide13

Outline

Introduction to small space streaming

Small space & differential privacy

Privacy under continual observationPan-privacySlide14

Continual Observation

In an online stream, often need to

track

the value of a statistic. number of reported instances of a viral infectionsales over timenumber of likes on FacebookPrivacy under continual observation [Dwork Naor Pitassi Rothblum 10]:At each time step the algorithm outputs the value of the statisticThe entire sequence of outputs is ε-DP (usually event level)Results:A single counter (number of 1’s in a bit stream) [DNPR10]Time-decayed counters [Bolot Fawaz Muthukrishnan Nikolov Taft 13]Online learning [DNPR10] [Jain Kothari Thakurta 12] [Smith Thakurka 13]Generic transformation for monotone algorithms [DNPR10]Slide15

Binary Tree Technique [

DPNR10]

,

[Chan Shi Song 10]

1 0 1 1 1 0 0 1

1+0

1 + 2

1+1

3+2

1 + 1

1+0

0+1

Sensitivity of tree: log

m

Add

Lap(log

m

/

ε

)

to each nodeSlide16

Binary Tree Technique

1 0 1 1 1

0 0 1

1+0

1 + 2

1+1

3+2

1 + 1

1+0

0+1

Each prefix: sum of log

m

nodes

polylog

error per querySlide17

Open Problems

What is the optimal error possible for the counter problem?

Privacy under continual observation for statistics that are not easily decomposable?

User level?Expect privacy under continual observation to be ever more relevantWe usually want to track our statistics over timeWork on it!Slide18

Outline

Introduction to small space streaming

Small space & differential privacy

Privacy under continual observationPan-privacySlide19

Pan Privacy

Differential privacy guarantees that the

results

of our computation are privateWhat if data is requests by subpoena, leaked after a security breach, an unauthorized employee looks at it?Can we guarantee that intermediate states are also private?Makes sense for online data: not storedPan-privacy [Dwork Naor Pitassi Rothblum Yekhanin 10]:For each t: the state of the algorithm after processing the t-th update and the final output are jointly ε-DPCan be event level or user level Strategy: keep private statistics on top of sketchesSlide20

Warm-up: F0

[DNPRY10]

Solution:

randomized responseTwo distributions: D0 and D1 on {-1,1}D0 is 1 w.p. 1/2; D1 is 1 w.p. (1 + ε)/2Store a big table X[1], …, X[n]Initialize all X[i] from D0 When update it arrives, pick X[it] from D1 Can compute O(n1/2 /ε) additive approximationX = (X[1] + … + X[

n])/εE[

X

] =

F

0

and

E[

X

2

]

=

n/ε

2Slide21

Cropped F1

[Mir

Muthukrishnan

Nikolov Wright 11]Cropped moments:Fk (τ) = |min{A[1], τ}|k + |min{A[2], τ}|k + … + |min{A[n], τ}|kWe’ll be interested in F1(τ)Can pan-privately compute X s.t. F1(τ)/2 – O(τn1/2/ε) ≤ X ≤ F1(τ) + O(τn

1/2/ε) Idea: keep each

A[

i

] mod

τ

, with initial noise

W

hat if

A[

i

] =

τ

+ 1

?

Multiply each

A[

i

] by a random c

i uniform in [1, 2]Small A[i] (

≤τ/2) get distorted by at most factor 2For large A[i],

ci A[i

] mod τ is large on averageRange is τ, so noise O(τ/ε) per modular counter suffices

A[

i

]ciA[i]02ττSlide22

Heavy Hitters [

DNPRY10

]

[MMNW11]Recall, the k-Heavy Hitters (k-HH) are i s.t. A[i] ≥ F1/kat most k of themApproximate the number of k-HHnotation: Hka measure of how skewed the data isWill get pan-private estimator X s.t.: Hk/2 – O(k1/2) ≤ X ≤ Hk log k + O(k1/2)Slide23

k-HH and Cropped F1

Say we want to compute an estimate

X

in [Hk, Hck] Consider: (F1(F1/k) - F1(F1/ck))/(F1/k – F1/ck)k-Heavy Hitters contribute 1ck-Heavy Hitters contribute between 0 and 1Anything else contributes 0Error of O(F1n1/2/kε) for F1(F

1/k) is too much!

Sketch

to

r

educe the universe size

nSlide24

Idea: Use a (CM-type) Sketch

Hash

[n]

into [O(k)] (with a pairwise-independent hash)Compute the number of heavy buckets (weight ≥ F1/k)at least Hk/2 (balls and bins)no bucket containing items of weight ≤ F1/(k * log k) is heavyEssentially keeping private statistics on a CM sketchA[1]A[2]A[3]

A[4]

A[5]

A[6]

A[7]

A[8]

A[9]

A[10]

B[1]

B[2]

B[3]

B[4]Slide25

Lower bounds and Open Problems

The

O(

n1/2) additive error for F0 is optimalalso O(k1/2) for Hk, by reductionIdea: combine streaming-style LBs with reconstruction attacks [MMNW11]stop the algorithm at some time step and grab the private statedifferent continuations of the stream: answer many counting queries from the same stateinvoke [Dinur Nissim 03] type attacksLower bounds against many passes via connections to randomness extraction [McGregor Mironov Pitassi Reingold Talwar Vadhan 10]Do all problems of low streaming complexity admit accurate pan-private algorithmintuitively: less state  easier to make privateSlide26

Summary

Private analysis of massive online data presents new challenges

small space

continuous monitoringData is not stored: can ask for algorithms private inside and outTools from small-space streaming algorithms can be usefulbut we need to view them from a new angle