/
CIS 700: CIS 700:

CIS 700: - PowerPoint Presentation

phoebe-click
phoebe-click . @phoebe-click
Follow
374 views
Uploaded On 2017-01-12

CIS 700: - PPT Presentation

algorithms for Big Data Grigory Yaroslavtsev httpgrigoryus Lecture 13 testing and isotonic regression   Slides at httpgrigoryusbigdataclasshtml Testing Big Data Q ID: 508836

monotonicity testing property tolerant testing monotonicity tolerant property probability close data distance tester bound testers functions boolean monotone test

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "CIS 700:" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

CIS 700: “algorithms for Big Data”

Grigory Yaroslavtsevhttp://grigory.us

Lecture 13:

-testing and isotonic regression

 

Slides at

http://grigory.us/big-data-class.htmlSlide2

Testing Big DataQ: How to make sense of big data?

Q: How to understand properties looking only at a small sample?Q: How to ignore noise and outliers?Q: How to minimize assumptions about the sample generation process?Q: How to optimize running time?Slide3

Which stocks were growing steadily?

Data from

http://finance.google.comSlide4

Property Testing [Goldreich

, Goldwasser, Ron; Rubinfeld, Sudan]

NO

YES

Randomized Algorithm

Accept with probability

 

Reject with probability

 

 

 

 

YES

NO

Property Tester

-

close

 

Accept with probability

 

Reject with probability

 

 

 

Don’t care

-

close :

fraction has to be changed to become

YES

 Slide5

Which stocks were growing steadily?

Data from

http://finance.google.comSlide6

Tolerant Property Testing [Parnas

, Ron, Rubinfeld]

-close :

fraction has to be changed to become YES 

 

YES

NO

Property Tester

-

close

 

Accept with probability

 

Reject with probability

 

 

 

Don’t care

 

Tolerant Property

T

ester

Accept with probability

 

Reject with probability

 

 

 

Don’t care

NO

-

close

 

-close

 

YESSlide7

Which stocks were growing steadily?

Data from

http://finance.google.comSlide8

-Isotonic Regression

 

Running time

[Ahuja, Orlin]

 Slide9

= class of monotone functions

-close:

 

Tolerant

Property Testing”

 

 

Tolerant “

Property Tester”

 

Accept with probability

 

Reject with probability

 

 

 

Don’t care

NO

-

close

 

-close

 

YESSlide10

New

-Testing Model for Real-Valued Data

 

Generalizes standard Hamming testingFor still has a probabilistic interpretation:

Compatible with existing

PAC-style learning models

(preprocessing for model selection

)

For Boolean functions,

.

 

10Slide11

Our Contributions

Relationships between

-testing modelsAlgorithms

-testers for

monotonicity, Lipschitz, convexity

Tolerant

-

tester for

monotonicity in 1D (

sublinear

algorithm for isotonic regression)

Our

-testers

beat lower bounds

for

Hamming

testers

Simple algorithms

backed up by involved analysis

Uniformly

sampled (or

easy to sample

) data

suffices

Nearly tight lower bounds

 

11Slide12

Implications for Hamming TestingSome techniques/results carry over to Hamming testing

Improvement on Levin’s work investment strategyConnectivity of bounded-degree graphs [Goldreich, Ron ‘02]

Properties of images [Raskhodnikova ‘03]

Multiple-input problems [Goldreich ‘13]First example of monotonicity testing problem where adaptivity helps

Improvements to Hamming testers for Boolean functions12Slide13

Definitions

(D = finite domain/

poset)

, for

Hamming weight (# of non-zero values)

Property

= class of functions (monotone, convex

, linear,

Lipschitz

, …)

 Slide14

Relationships:

-Testing 

(

,) = query complexity of

-testing property

at distance

(Cauchy-

Shwarz

)

Boolean functions

 Slide15

Relationships: Tolerant

-Testing 

(

,

)

= query complexity

of tolerant

-testing property

with distance parameters

No general relationship between tolerant

-

testing and tolerant Hamming testing

-

testing

for

is close in complexity to

-testing

For Boolean

functions

=

 Slide16

Testing Monotonicity

Line ()

 

Upper bound

[Ergun,

Kannan

, Kumar,

Rubinfeld

, Viswanathan’00]

Lower bound

[Fischer’04]

Upper bound

Lower boundSlide17

Monotonicity

Domain D=

(vertices of -dim hypercube)A function

is monotone

if increasing a coordinate of does not decrease

Special case

is monotone

is sorted.

One of the most studied properties in property testing

[

Ergün

Kannan

Kumar

Rubinfeld

Viswanathan

,

Goldreich

Goldwasser

Lehman Ron,

Dodis

Goldreich

Lehman

Raskhodnikova

Ron

Samorodnitsky, Batu Rubinfeld

White, Fischer Lehman Newman

Raskhodnikova Rubinfeld Samorodnitsky

, Fischer, Halevy Kushilevitz, Bhattacharyya

Grigorescu Jung

Raskhodnikova Woodruff, ..., Chakrabarty Seshadhri, Blais,

Raskhodnikova

Yaroslavtsev

, Chakrabarty Dixit Jha Seshadhri]

 

(1,1,1)

 Slide18

Monotonicity: Key Lemma

M = class of monotone functionsBoolean slicing operator

if

otherwise.

Theorem:

 Slide19

Proof sketch: slice and conquer

Closest monotone function with minimal -norm is unique (can be denoted as

an operator ).

and

commute:

=

 

2)

3)

1)Slide20

-Testers from

Boolean Testers

 

Thm: A nonadaptive, 1-sided error -test for monotonicity of

is also an

-test for monotonicity of

.

Proof:

A

violation

:

A

nonadaptive

, 1-sided error test queries a random set

and rejects

iff

contains a violation.

If

is monotone,

will not contain a violation.

If

then

W.p

.

, set

contains a violation

for

 

 

 

 Slide21

Our Results: Testing Monotonicity

Hypergrid ()

adaptive tester for Boolean functions

 

Upper bound

[

Dodis

et al. ’99,

…,

Chakrabarti

,

Seshadhri

’13

]

Lower bound

[

Dodis

et al.’99

…,

Chakrabarti

,

Seshadhri

’13]

Non-adaptive 1-sided error

Upper bound

Lower boundSlide22

Testing Monotonicity of

 

=

-th unit vector.For

where

an axis-parallel line along dimension

:

= set of all

axis-parallel lines

Dimension reduction for

[Dodis et al.’99]

:

If

=>

-

sample detects a violation

 Slide23

Testing Monotonicity on

 

Dimension reduction for

[Dodis et al.’99]:

If

=>

-

sample

can detect

a violation

“Inverse Markov”: For

with

and

-test

[

Dodis

et al.]

via “Levin’s economical work investment strategy” (used in other papers for testing connectedness of a graph, properties of images, etc.)

 Slide24

Testing Monotonicity on

 

“Discretized Inverse Markov”

For with

and

For each

pick

samples of size

=> complexity

For the good bucket

j

the test rejects with constant probability

=>

-test

 Slide25

Distance Approximation and Tolerant Testing

[Saks

Seshadhri

10]

Approximating

-distance to monotonicity

Sublinear

algorithm for isotonic regression

Time complexity of tolerant

-testing for monotonicity is

Better dependence than what follows from distance

appoximation

for

Improves

adaptive distance approximation of

[Fattal

,Ron’10]

for Boolean functions

 Slide26

Distance Approximation

 

Theorem

: with constant probability over the choice of a random sample S of size

Implies an

tolerant tester by setting

Suffices:

Improves previous

algorithm

[

Fattal

, Ron’10]

 Slide27

Distance Approximation

For

violation graph

edge

if

MM

(G) = maximum matching

VC

(G) = minimum vertex cover

[Fischer et al.’02]

 Slide28

 

Define:

has

hypergeometric

distribution:

 Slide29

Experiments

Data: Apple stock price data (2005-2015) from Google Finance

Left:

-isotonic regressionRight: error vs. sample size

 Slide30

-Testers for Other Properties

 

Via combinatorial characterization of

-distance to the propertyLipschitz property

:

Via (implicit)

proper learning

: approximate in

up to error

, test approximation on a random

-sample

Convexity

:

(tight for

)

Submodularity

[Feldman,

Vondrak

13]

 Slide31

Open Problems

All our algorithms for for were obtained directly from

-testers.

Can one design better algorithms by working directly with -distances?Our complexity for

-testing convexity grows exponentially with d

Is there an

-testing

algorithm for convexity with

subexponential

dependence on the dimension?

Our

-tester for monotonicity is

nonadaptive

, but we show that

adaptivity

helps for Boolean range.

Is there a better adaptive tester

?

We designed tolerant tester only for monotonicity (d=1,2).

T

olerant testers for higher dimensions?

Other properties?

 

Related Contents


Next Show more