Moritz Hardt David P Woodruff IBM Research Almaden Two Aspects of Coping with Big Data Efficiency Handle enormous inputs Robustness Handle adverse conditions Big Question Can we have both ID: 201532
Download Presentation The PPT/PDF document "How Robust are Linear Sketches to Adapti..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
How Robust are Linear Sketches to Adaptive Inputs?
Moritz
Hardt
, David P. Woodruff
IBM Research
AlmadenSlide2
Two Aspects of Coping with Big Data
Efficiency
Handle
enormous inputs
RobustnessHandle adverse conditions
Big Question: Can we have both?Slide3
Algorithmic paradigm: Linear Sketches
Linear map
A
Applications: Compressed sensing, data streams,
distributed computation, ...Unifying idea: Small number of linear measurements applied to dataData vector x in Rn
Output
Sketch
y = Ax in
R
r
r << n
For this talk: output can be any
(not necessarily efficient) function of ySlide4
“For each” correctness
For each
x: Pr { Alg(x) correct } > 1 – 1/poly(n)
Pr over
randomly chosen matrix ADoes this imply correctness on many inputs?No guarantee if input x2 depends on Alg(x1) for earlier input x1Only under modeling assumption:Inputs are non-adaptively chosenWhy not?Slide5
Example: Johnson-Lindenstrauss Sketch
Goal:
estimate |x|
2 from |Ax|2JL Sketch: if A is a k x n matrix of i.i.d. N(0, 1/k) random variable with k > log n, then Pr[|Ax|
2 = (1±1/2)|x|2] > 1-1/poly(n)Attack: 1. Query x = ei and x = ei + ej for all standard unit vectors ei and ejLearn |Ai|2, |Aj|2, |Ai + Aj|2, so learn <Ai, Aj> 2. Hence, learn AT A, and learn kernel of A 3. Query a vector x 2 kernel(A)Slide6
Benign/Natural
Monitor traffic using sketch, re-route traffic based on output, affects future inputs.
Adversarial
DoS attack on network monitoring unitCorrelations arise in nearly any realistic setting
In this work:
Broad impossibility results
Can we thwart
the attack?
Can we prove correctness?Slide7
Benchmark Problem
GapNorm(
B
): Given decide if
(YES) (NO)Easily solvable for B = 1+ε using “for each” guarantee by sketch with O(log n/ε2) rows using JL.Goal: Show impossibility for very basic problem.Slide8
Main Result
Theorem.
For every
B, given oracle access to a linear sketch using dimension r
· n – log(Bn), we can find in time poly(r,B) a distribution over inputs on which sketch fails to solve GapNorm(B)Corollary. Same result for any lp-norm.Corollary. Same result even if algorithm uses internal randomness on each query.Efficient attack (rules out crypto), even slightlynon-trivial sketching dimension impossibleSlide9
Application to Compressed Sensing
Theorem.
No linear sketch with o(
n/C2) rows
gurantees l2/l2 sparse recovery with approximation factor C on a polynomial number of adaptively chosen inputs.l2/l2 recovery: on input x, output x’ for which:Note: Impossible to achieve with deterministic matrix A, but possible with “for each” guarantee with r = k log(n/k).[Gilbert-Hemenway-Strauss-W-Wootters12] has some positive resultsSlide10
Outline
Proof of Main Theorem for GapNorm
Proved using “Reconstruction Attack”
Sparse Recovery ResultBy Reduction from GapNormNot in this talkSlide11
Computational Model
Definition (Sketch)
An
r-dimensional sketch
is anyfunction satisfyingfor some subspaceSketch has unbounded computational poweron top of PUx
Sketches Ax and U
T
x are equivalent, where U
T
has orthonormal rows and row-span(U
T
) = row-span(A)
Sketch U
T
x equivalent to P
U
x = UU
T
x
Why?Slide12
Algorithm (Reconstruction Attack)
Input:
Oracle access to sketch
f using unknown subspace U of dimension r
Put V0 = {0}, subspace of 0 dimensionFor t = 1 to t = r: (Correlation Finding) Find vectors x1,...,xm weakly correlated with unknown subspace U, orthogonal to Vt-1 (Boosting) Find single vector x strongly correlated with U
, orthogonal to
V
t-1
(Progress)
Put
V
t
= span{V
t-1
, x}
Output
: Subspace
V
rSlide13
Algorithm (Reconstruction Attack)
Input:
Oracle access to sketch
f using unknown subspace U of dimension r
Put V0 = {0}, empty subspaceFor t = 1 to t = r: (Correlation Finding) Find vectors x1,...,xm weakly correlated with unknown subspace U, orthogonal to Vt-1 (Boosting) Find single vector x strongly correlated with
U
, orthogonal to
V
t-1
(Progress)
Put
V
t
= V
t-1
+
span
{x}
Output
: Subspace
VrSlide14
Conditional Expectation Lemma
Moreover,
1.
Δ ≥ poly(1/d)
2. g = N(0,σ)n for a carefully chosen σ unknown to sketching algorithmLemma. Given d-dimensional sketch f, we can find using poly(d) queries adistribution g such that:“Advantage over random”Slide15
Simplification
Fact:
If g is Gaussian, then P
Ug = UUTg is Gaussian as well Hence, can think of query distribution as choosing random Gaussian g to be inside subspace
U. We drop the PU projection operator for notational simplicity. Slide16
The three step intuition
(Symmetry)
Since the queries are random Gaussian inputs g with an unknown variance, by spherical symmetry, sketch
f learns nothing more about query distribution than norm |g|(Averaging) If |g| is larger than expected, the sketch is “more likely” to output
1(Bayes) Hence, by sort-of-Bayes-Rule, conditioned on f(g)=1, expectation of |g| is likely to be largerSlide17
Def.
Let
p(y)
= Pr{ f(y) = 1 }y in U uniformly random with |y|2 = s
Fact. If g is Gaussian with E|g|2 = t, then,density of χ2-distribution with expectation tand d degrees of freedomSlide18
p(s) = Pr(
f
(y) = 1)
y in U unif. random with |y|2 = s
?
Norm s
1
0
d/n
Bd/n
By correctness
of sketch
l
rSlide19
Sliding χ2-distributions
¢
(s) = sl r (s-t) vt(s) dt
¢(s) < 0 unless s > r – O(r1/2 log r) s01 ¢(s) ds =s01 sl r (s-t) vt(s) dt ds = 0Slide20
Averaging Argument
h(s) = E[f(g
t
) | |g|2 = s]Correctness:For small s, h(s) ¼ 0, while for large s, h(s) ¼ 1
s01 h(s) ¢(s) ds =s01 sl r h(s) (s-t) vt(s) dt ds ¸ d 9 t so that s01 h(s) s vt(s) ds ¸ t s01 vt(s) ds + d/(l-r)For this t, E[|gt|2 | f(gt) = 1] ¸ t + ¢Slide21
Algorithm (Reconstruction Attack)
Input:
Oracle access to sketch
f using unknown subspace U of dimension r
Put V0 = {0}, empty subspaceFor t = 1 to t = r: (Correlation Finding) Find vectors x1,...,xm weakly correlated with unknown subspace U, orthogonal to Vt-1 (Boosting) Find single vector x
strongly correlated with
U
, orthogonal to
V
t-1
(Progress)
Put
V
t
= V
t-1
+
span
{x}
Output
: Subspace VrSlide22
Boosting small correlations
x
1
x2
x3...xmnpoly(r)M =
Sample
poly(r)
vectors
using
CoEx
Lemma
Compute top singular
vector x of M
Lemma:
|
P
U
x
| > 1-poly(1/r)
Proof: Discretization +
ConcentrationSlide23
Implementation in poly(r) time
W.l.og. can assume n = r + O(log nB)
Restrict host space to first r + O(log nB) coordinates
Matrix M is now O(r) x poly(r)Singular vector computation poly(r) timeSlide24
Iterating previous steps
Generalize Gaussian to
“subspace Gaussian”
= Gaussian vanishing on maintained subspace Vt
Intuition: Each step reduces sketch dimension by one.After r steps:
1. Sketch has no dimensions left!
2. Host space still has
n – r
> O(log nB) dimensionsSlide25
Problem
Top singular vector not
exactly
contained in UFormally, sketch still has dimension r
Can fix this by adding small amount of Gaussiannoise to all coordinatesSlide26
Algorithm (Reconstruction Attack)
Input:
Oracle access to sketch
f using unknown subspace U of dimension rPut V
0 = {0}, empty subspaceFor t = 1 to t = r: (Correlation Finding) Find vectors x1,...,xm weakly correlated with unknown subspace U, orthogonal to Vt-1 (Boosting) Find single vector x strongly correlated with U, orthogonal to Vt-1 (Progress) Put Vt = Vt-1 + span{x}Output: Subspace V
rSlide27
Open Problems
Achievable polynomial dependence still open
Optimizing efficiency alone may lead to non-robust algorithms
- What is the trade-off between robustness and efficiency in various settings of data analysis?