Sublinear Statistics Paul Valiant Fishers Butterflies Turings Enigma Codewords How many new species if I observe for another period Probability mass of unseen codewords ID: 514163
Download Presentation The PPT/PDF document "Estimating the Unseen:" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Estimating the Unseen:Sublinear Statistics
Paul ValiantSlide2
Fisher’s Butterflies
Turing’s Enigma
Codewords
How many new species if I observe for another period?
Probability mass of unseen
codewords
+
+
+
+
+
+
+
+
-
-
-
-
-
-
-
F
1-F2+F3-F4+F5-…
F1/(number of samples)
(“Fingerprint”)Slide3
Characteristic FunctionsFor element p
i
:
Pr
[Not seen in first period, but seen in second period]
Pr[Not seen]*p
i
F
1
-F
2
+F
3
-F
4
+F
5
-
…
F
1
/(number of samples)
Slide4
Other Properties?Entropy: p
i
log
p
i
Support size: step function
Approximate as
log p
i
1/p
i
0
Accurate to O(1) for x=
Ω
(1)
linear samples
Exponentially hard to approximate below 1/k
Easier case? L
2
norm
p
i2
Slide5
L2 approximation
Works very well if we have a bound on the j’s encountered
L
2
distance related to L
1
:
Yields 1-sided testers for L
1
, also, L
1
-distance to uniform, also, L
1
-distance to arbitrary known distribution
[
Batu
,
Fortnow
,
Rubinfeld
, Smith, White,
‘00]Slide6
Are good testers computationally trivial?Slide7
Maximum Likelihood Distributions
[
Orlitsky
et al.,
Science,
etc
]Slide8
Relaxing the ProblemGiven {
F
j
}, f
ind a distribution p such that the expected fingerprint of k samples from p approximates Fj
By concentration bounds, the “right” distribution should also satisfy this, be in the feasible region of the linear program
Yields: n/log n-sample estimators for entropy, support size, L1
distance, anything similar
Does the extra computational power help??Slide9
Lower Bounds
Find not-large {c
i
} that minimize
DUAL
Find distributions
y
+
,y
-
that maximize
while
is small
“Find distributions with very different property values, but almost identical fingerprint expectations”
NEEDS: Theorem: close expected fingerprints
indistinguishable
[
Raskhodnikova
, Ron,
Shpilka
, Smith’07]Slide10
“Roos’s Theorem”
Generalized Multinomial Distributions
Definition: a distribution expressible as
where
Z
i
{
0
, (1,0,0,0,…), (0,1,0,0,…), (0,0,1,0,…), … }
Includes fingerprint distributions
Also: binomial distributions, multinomial distributions, and any sums of such distributions.
“Generalized Multinomial Distributions” appear all over CS, and characterizing them is central to many papers (for example,
Daskalakis
and Papadimitriou,
Discretized m
ultinomial distributions and Nash equilibria in anonymous games
, FOCS 2008.)
Comment:
Thm
: If there are bounds , s.t.
then
is multivariate Poisson to within
Slide11
Distributions of Rare Elements
Distribution of fingerprints
– provided every element is rare, even in k samples
Yields best known lower-bounds for non-trivial 1-sided testing problems:
Ω
(n
2/3
) for L
1
distance,
Ω
(n
2/3
m
1/3
) for “independence”
Note: impossible to confuse >log n with o(1). Can cut off above log n? Suggests these lower bounds are tight to within log n.
Can we do better?Slide12
A Better Central Limit Theorem (?)Roos’s
Theorem: Fingerprints are like
Poissons
(provided…)
Poissons
: 1-parameter family
Gaussians: 2-parameter family
New CLT: Fingerprints are like Gaussians
(provided variance is high enough in every direction)
How to ensure high variance? “Fatten” distributions by adding elements at many different probabilities.
can’t use for 1-sided bounds Slide13
Results
Additive estimates of Entropy, Support Size, L
1
distance :
2-approximation of L
1
distance to U
m
:
All testers are linear expressions in the fingerprintSlide14
Duality
Find not-large {c
i
} that minimize
DUAL
Find distributions
y
+
,y
-
that maximize
while
is small
that minimize
Yields estimator when d<½
Yields lower bound when d>½
“When
, optimum is log-convex”
Theorem: For linear symmetric property
π
, and
ε
>0, c>½, if all
p
+
,p
-
of support ≤n with
are distinguishable
w.p
. >c via k samples, then there exists a
linear
estimator with error
using (1+o(1))k samples, succeeding
w.p
. 1-o(1/poly(k))
Slide15
Open ProblemsDependence on
ε
(resolved for entropy)
Beyond additive estimates – “case-by-case optimal”?
We suspect linear programming
is better than linear estimators
Leveraging these results for non-symmetric properties
Monotonicity, with respect to different
posets
Practical applications!