/
How Robust are Linear Sketches to Adaptive Inputs? How Robust are Linear Sketches to Adaptive Inputs?

How Robust are Linear Sketches to Adaptive Inputs? - PowerPoint Presentation

tawny-fly
tawny-fly . @tawny-fly
Follow
364 views
Uploaded On 2018-02-24

How Robust are Linear Sketches to Adaptive Inputs? - PPT Presentation

Moritz Hardt David P Woodruff IBM Research Almaden Two Aspects of Coping with Big Data Efficiency Handle enormous inputs Robustness Handle adverse conditions Big Question Can we have both ID: 634919

subspace sketch unknown find sketch subspace find unknown poly output vector log attack inputs dimension correlated orthogonal put input

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "How Robust are Linear Sketches to Adapti..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

How Robust are Linear Sketches to Adaptive Inputs?

Moritz

Hardt

, David P. Woodruff

IBM Research

AlmadenSlide2

Two Aspects of Coping with Big Data

Efficiency

Handle

enormous inputs

RobustnessHandle adverse conditions

Big Question: Can we have both?Slide3

Algorithmic paradigm: Linear Sketches

Linear map

A

Applications: Compressed sensing, data streams,

distributed computation, ...Unifying idea: Small number of linear measurements applied to dataData vector x in Rn

Output

Sketch

y = Ax in

R

r

r << n

For this talk: output can be any

(not necessarily efficient) function of ySlide4

“For each” correctness

For each

x: Pr { Alg(x) correct } > 1 – 1/poly(n)

Pr over

randomly chosen matrix ADoes this imply correctness on many inputs?No guarantee if input x2 depends on Alg(x1) for earlier input x1Only under modeling assumption:Inputs are non-adaptively chosenWhy not?Slide5

Example: Johnson-Lindenstrauss Sketch

Goal:

estimate |x|

2 from |Ax|2JL Sketch: if A is a k x n matrix of i.i.d. N(0, 1/k) random variable with k > log n, then Pr[|Ax|

2 = (1±1/2)|x|2] > 1-1/poly(n)Attack: 1. Query x = ei and x = ei + ej for all standard unit vectors ei and ejLearn |Ai|2, |Aj|2, |Ai + Aj|2, so learn <Ai, Aj> 2. Hence, learn AT A, and learn kernel of A 3. Query a vector x 2 kernel(A)Slide6

Benign/Natural

Monitor traffic using sketch, re-route traffic based on output, affects future inputs.

Adversarial

DoS attack on network monitoring unitCorrelations arise in nearly any realistic setting

In this work:

Broad impossibility results

Can we thwart

the attack?

Can we prove correctness?Slide7

Benchmark Problem

GapNorm(

B

): Given decide if

(YES) (NO)Easily solvable for B = 1+ε using “for each” guarantee by sketch with O(log n/ε2) rows using JL.Goal: Show impossibility for very basic problem.Slide8

Main Result

Theorem.

For every

B, given oracle access to a linear sketch using dimension r

· n – log(Bn), we can find in time poly(r,B) a distribution over inputs on which sketch fails to solve GapNorm(B)Corollary. Same result for any lp-norm.Corollary. Same result even if algorithm uses internal randomness on each query.Efficient attack (rules out crypto), even slightlynon-trivial sketching dimension impossibleSlide9

Application to Compressed Sensing

Theorem.

No linear sketch with o(

n/C2) rows

gurantees l2/l2 sparse recovery with approximation factor C on a polynomial number of adaptively chosen inputs.l2/l2 recovery: on input x, output x’ for which:Note: Impossible to achieve with deterministic matrix A, but possible with “for each” guarantee with r = k log(n/k).[Gilbert-Hemenway-Strauss-W-Wootters12] has some positive resultsSlide10

Outline

Proof of Main Theorem for GapNorm

Proved using “Reconstruction Attack”

Sparse Recovery ResultBy Reduction from GapNormNot in this talkSlide11

Computational Model

Definition (Sketch)

An

r-dimensional sketch

is anyfunction satisfyingfor some subspaceSketch has unbounded computational poweron top of PUx

Sketches Ax and U

T

x are equivalent, where U

T

has orthonormal rows and row-span(U

T

) = row-span(A)

Sketch U

T

x equivalent to P

U

x = UU

T

x

Why?Slide12

Algorithm (Reconstruction Attack)

Input:

Oracle access to sketch

f using unknown subspace U of dimension r

Put V0 = {0}, subspace of 0 dimensionFor t = 1 to t = r: (Correlation Finding) Find vectors x1,...,xm weakly correlated with unknown subspace U, orthogonal to Vt-1 (Boosting) Find single vector x strongly correlated with U

, orthogonal to

V

t-1

(Progress)

Put

V

t

= span{V

t-1

, x}

Output

: Subspace

V

rSlide13

Algorithm (Reconstruction Attack)

Input:

Oracle access to sketch

f using unknown subspace U of dimension r

Put V0 = {0}, empty subspaceFor t = 1 to t = r: (Correlation Finding) Find vectors x1,...,xm weakly correlated with unknown subspace U, orthogonal to Vt-1 (Boosting) Find single vector x strongly correlated with

U

, orthogonal to

V

t-1

(Progress)

Put

V

t

= V

t-1

+

span

{x}

Output

: Subspace

VrSlide14

Conditional Expectation Lemma

Moreover,

1.

Δ ≥ poly(1/d)

2. g = N(0,σ)n for a carefully chosen σ unknown to sketching algorithmLemma. Given d-dimensional sketch f, we can find using poly(d) queries adistribution g such that:“Advantage over random”Slide15

Simplification

Fact:

If g is Gaussian, then P

Ug = UUTg is Gaussian as well Hence, can think of query distribution as choosing random Gaussian g to be inside subspace

U. We drop the PU projection operator for notational simplicity. Slide16

The three step intuition

(Symmetry)

Since the queries are random Gaussian inputs g with an unknown variance, by spherical symmetry, sketch

f learns nothing more about query distribution than norm |g|(Averaging) If |g| is larger than expected, the sketch is “more likely” to output

1(Bayes) Hence, by sort-of-Bayes-Rule, conditioned on f(g)=1, expectation of |g| is likely to be largerSlide17

Def.

Let

p(y)

= Pr{ f(y) = 1 }y in U uniformly random with |y|2 = s

Fact. If g is Gaussian with E|g|2 = t, then,density of χ2-distribution with expectation tand d degrees of freedomSlide18

p(s) = Pr(

f

(y) = 1)

y in U unif. random with |y|2 = s

?

Norm s

1

0

d/n

Bd/n

By correctness

of sketch

l

rSlide19

Sliding χ2-distributions

¢

(s) = sl r (s-t) vt(s) dt

¢(s) < 0 unless s > r – O(r1/2 log r) s01 ¢(s) ds =s01 sl r (s-t) vt(s) dt ds = 0Slide20

Averaging Argument

h(s) = E[f(g

t

) | |g|2 = s]Correctness:For small s, h(s) ¼ 0, while for large s, h(s) ¼ 1

s01 h(s) ¢(s) ds =s01 sl r h(s) (s-t) vt(s) dt ds ¸ d 9 t so that s01 h(s) s vt(s) ds ¸ t s01 vt(s) ds + d/(l-r)For this t, E[|gt|2 | f(gt) = 1] ¸ t + ¢Slide21

Algorithm (Reconstruction Attack)

Input:

Oracle access to sketch

f using unknown subspace U of dimension r

Put V0 = {0}, empty subspaceFor t = 1 to t = r: (Correlation Finding) Find vectors x1,...,xm weakly correlated with unknown subspace U, orthogonal to Vt-1 (Boosting) Find single vector x

strongly correlated with

U

, orthogonal to

V

t-1

(Progress)

Put

V

t

= V

t-1

+

span

{x}

Output

: Subspace VrSlide22

Boosting small correlations

x

1

x2

x3...xmnpoly(r)M =

Sample

poly(r)

vectors

using

CoEx

Lemma

Compute top singular

vector x of M

Lemma:

|

P

U

x

| > 1-poly(1/r)

Proof: Discretization +

ConcentrationSlide23

Implementation in poly(r) time

W.l.og. can assume n = r + O(log nB)

Restrict host space to first r + O(log nB) coordinates

Matrix M is now O(r) x poly(r)Singular vector computation poly(r) timeSlide24

Iterating previous steps

Generalize Gaussian to

“subspace Gaussian”

= Gaussian vanishing on maintained subspace Vt

Intuition: Each step reduces sketch dimension by one.After r steps:

1. Sketch has no dimensions left!

2. Host space still has

n – r

> O(log nB) dimensionsSlide25

Problem

Top singular vector not

exactly

contained in UFormally, sketch still has dimension r

Can fix this by adding small amount of Gaussiannoise to all coordinatesSlide26

Algorithm (Reconstruction Attack)

Input:

Oracle access to sketch

f using unknown subspace U of dimension rPut V

0 = {0}, empty subspaceFor t = 1 to t = r: (Correlation Finding) Find vectors x1,...,xm weakly correlated with unknown subspace U, orthogonal to Vt-1 (Boosting) Find single vector x strongly correlated with U, orthogonal to Vt-1 (Progress) Put Vt = Vt-1 + span{x}Output: Subspace V

rSlide27

Open Problems

Achievable polynomial dependence still open

Optimizing efficiency alone may lead to non-robust algorithms

- What is the trade-off between robustness and efficiency in various settings of data analysis?