/
Learning Learning

Learning - PowerPoint Presentation

karlyn-bohler
karlyn-bohler . @karlyn-bohler
Follow
361 views
Uploaded On 2016-03-31

Learning - PPT Presentation

a bductive reasoning using random examples Brendan Juba Washington University in St Louis Outline Models of abductive reasoning An abductive reasoning algorithm Noteasiness of abducing conjunctions ID: 272261

algorithm dnf abductive examples dnf algorithm examples abductive reasoning learning time prx

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Learning" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Learning abductive reasoning using random examples

Brendan Juba

Washington University in St. LouisSlide2

Outline

Models of

abductive

reasoning

An

abductive

reasoning algorithm

“Not-easiness” of abducing conjunctionsSlide3

Abductive reasoning: making plausible guesses

Abductive

reasoning

:

Given

a conclusion c, find a “plausible” h that implies/leads to/… c Two varieties of “plausibility” in common useLogical plausibility: a small h from which c followsBayesian plausibility: a h which has large posterior probability when given cIn symbols… Pr [h|c true] > …

Requires a

prior distribution

over representations…Slide4

Why might we want a new model?

Existing models only

tractable

in

simple cases

E.g. Horn rules (a⋀b⋀c⇒d …no negations), “nice” (conjugate) priorsThe choice of prior distribution really mattersAnd, it’s difficult to specify by handSlide5

New model: abductive reasoning

from random examples

Fix a set of

attributes

(propositional variables x1, x2, …, xn)An environment is modeled by an arbitrary, unknown distribution D over examples, i.e., settings of the n propositional variables.Task: for a conclusion c, find a h such thatPlausibility: Prx∈D[h(x

)=1] ≥ μ (for some given μ)

h almost entails c:

Pr

x∈D

[c(x)=1|h(x)=1] ≥ 1-ε

All probabilities over examples from

DSlide6

Example: identifying a subgoal

Consider

: blocks world. For

t

=1,2,…,T

Propositional state vars. (“fluents”) ONt(A,B), ONt (A,TABLE), ONt (C,A), etc. Actions also encoded by propositional vars.PUTt(B,A), PUTt(C,TABLE), etc.Given many examples of interaction…Our goal c: ONT(A,TABLE)⋀ONT(B,A) ⋀ONT(C,B)A perhaps plausibly good “subgoal”

h

:

[ON

T-1

(B,A) ⋀PUTT(C,B)]⋁[PUTT-1 (B,A) ⋀PUTT (C,B)]

A

B

Or, even given by examples, not explicitly formulated…Slide7

Formally: abductive reasoning

from random examples

for a class H

Fix a class of Boolean representations

H

Given Boolean formula c; ε, δ, μ∈(0,1);independent examples x(1),…,x(m) ∈D,Suppose that there exists a h*∈H such thatPlausibility: Prx∈D[h*(x)=1] ≥ μh* entails c:

Pr

x∈D

[

c

(x)=1|h*(x)=1] = 1Find a h (ideally in H) such that with prob. 1-δ,

Plausibility:

Pr

x∈D[h(x)=1] ≥ 1/poly(1/

μ, 1/1-ε, n)h almost entails c: Prx∈D[

c(x)=1|h(x)=1] ≥ 1-εSlide8

in pictures…

c

(

x

)=1

x ∈ {0,1}

n

h

(

x

)=1

x: h

(

x

)=1

c

(

x

)=1

c

(

x

)=0

c: goal/observation…

h: explanation/solution/…Slide9

Outline

Models of

abductive

reasoning

An

abductive reasoning algorithm“Not-easiness” of abducing conjunctionsSlide10

Theorem 1. If there is a k-DNF h*

such that

Plausibility:

Pr

x∈D[h*(x)=1] ≥ μh* entails c: Prx∈D[c(x)=1|h*(x)=1] = 1then using m = O(1/με (nk+log1/δ)) examples, in time O(mnk) we can find a k-DNF h such that with probability 1-δ,

Plausibility:

Pr

x∈D

[h(x)=1] ≥ μh almost entails c: Pr

x∈D

[

c

(x)=1|h(x)=1] ≥ 1-ε

k-DNF: an OR of “terms of size k” – ANDs of at most k “literals” – attributes or their negationsSlide11

Algorithm for k-DNF abductionStart with

h

as an OR over

all

terms of size

kFor each example x(1),…,x(m)If c(x(i)) = 0, delete all terms T from h such that T(x(i)) = 1Return hSimple algorithm, 1st proposed by J.S. Mill, 1843Running time is clearly O(mnk

)Slide12

Analysis pt 1:

Pr

x∈D

[

h

(x)=1] ≥ μWe are given that some k-DNF h* hasPlausibility: Prx∈D[h*(x)=1] ≥ μh* entails c: Prx∈D[c(x)=1|h*(x)=1] = 1Initially, every term of h* is in h

Terms of

h*

are

never

true when c(x)=0 by 2.every term of h*

remains

in

h

h* implies h, so Prx[h(x)=1]≥

Prx[h*(x)=1]≥μ.Slide13

Analysis pt 2:

Pr

x

[

c

(x)=1|h(x)=1] ≥ 1-εRewrite conditional probability:Prx∈D [c(x)=0⋀h(x)=1] ≤ εPrx∈D [h(x)=1]We’ll show: Pr

x∈D

[

c

(x)=0⋀h(x)=1] ≤ εμ ( ≤ εPr

x∈D

[

h(x)=1] by part 1)Consider any h’ s.t. Pr

x[c(x)=0⋀h’(x)=1] > εμ

Since each x(i) is drawn independently from D

Pr

x

∈D

[no

i

has

c

(

x

(

i

)

)

=

0

⋀ h’(

x

(

i

)

)=1] < (1-εμ)

m

A term of

h’

is deleted when

c=0

and

h’

=

1

So,

h’

is only possibly output

w.p

. <

(1-εμ)

mSlide14

Analysis pt 2, cont’d:

Pr

x

[

c

(x)=1|h(x)=1] ≥ 1-εWe’ll show: Prx∈D [c(x)=0⋀h(x)=1] ≤ εμ Consider any h’ s.t. Prx[c(x)=0⋀h’(

x

)=1]

>

εμ

h’ is only possibly output w.p. < (1-εμ)mThere are only 2O(n

k

)

possible k-DNF

h’Since (1-1/x)x ≤ 1

/e, m = O(1/με (nk

+log1/δ)) ex’s suffice to guarantee that each such h’ is only possible to output

w.p

. <

δ

/

2

O(

n

k

)

w.p

. >1-δ, our

h

has

Pr

x

[c(x

)=0

h

(

x

)=1

]

εμSlide15

Theorem 1. If there is a k-DNF h*

such that

Plausibility:

Pr

x∈D[h*(x)=1] ≥ μh* entails c: Prx∈D[c(x)=1|h*(x)=1] = 1then using m = O(1/με (nk+log1/δ)) examples, in time O(mnk) we can find a k-DNF h such that with probability 1-δ,

Plausibility:

Pr

x∈D

[h(x)=1] ≥ μh almost entails c: Prx∈D

[

c

(

x)=1|h(x)=1] ≥ 1-ε

k-DNF: an OR of “terms of size k” – ANDs of at most k “literals” – attributes or their negations

A version that tolerates exceptions is also possible–see paper…Slide16

Outline

Models of

abductive

reasoning

An

abductive reasoning algorithm“Not-easiness” of abducing conjunctionsSlide17

Theorem 2. Suppose that a polynomial-time algorithm exists for learning abduction from random examples for conjunctions.

Then there is a polynomial-time algorithm for PAC-learning DNF.Slide18

PAC Learning

(

x

(1

)

, c(x(1)))(x(2)1,c(x(2)))(x(m), c(x(m)))…

D

f

C

C∈

C

w.p

.

1-δ

o

ver…

x’=(

x’

1

,

x’

2

,…,

x’

n

)

w.p

.

1-ε over…

c

(

x

)

f

(

x

)Slide19

Theorem 2. Suppose that a polynomial-time algorithm exists for learning abduction from random examples for conjunctions.

Then there is a polynomial-time algorithm for PAC-learning DNF.

Theorem (

Daniely

& Shalev-Shwartz’14). If there is a polynomial-time algorithm for PAC-learning DNF, then for every f(k)⟶∞ there is a polynomial-time algorithm for refuting random k-SAT formulas of nf(k) clauses. This is a new hardness assumption. Use at your discretion.☞Slide20

Key learning technique: Boosting (Schapire

, 1990)

Suppose that there is a polynomial-time algorithm that, given examples of

c∈

C

, w.p. 1-δ produces a circuit f s.t. Prx[f(x)=c(x)] > ½+1/poly(n). Then there is a polynomial-time PAC-learning algorithm for the class C.i.e., using the ability to produce such f’s, we produce a g for which Pr

x

[

g

(

x)=c(x)] is “boosted” to any 1-ε we require.Slide21

Sketch of learning DNF using conjunctive abduction

If c is a DNF and

Pr

[c(

x

)=1]> ¼, then some term T of c has Pr[T(x)=1]>1/4|c|…and otherwise f≡0 satisfies Pr[f(x)=c(x)]>½+¼Note: Pr[c(x)=1|T(x)=1]=1, T is a conjunctionAbductive reasoning finds some h such thatPr[h

(

x

)=1]>

1

/poly(n) and Pr[c(x)=1|h(

x

)=1]>¾

Return

f: f(x)=1 whenever h(x)=1; if h

(x)=0, f(x)=majx:

h(x)=0{c(x)}—

Pr

[

f

(

x

)=c(

x

)]>½+

1

/

4poly(

n)Slide22

Recap: learning abduction from random examples

For goal condition

c,

if there is a

h*

such that Prx∈D[h*(x)=1] ≥ μ & Prx∈D[c(x)=1|h*(x)=1] = 1using examples x from D, find a h such thatPrx∈D[h(x)=1] ≥ μ’ & Pr

x

∈D

[

c

(x)=1|h(x)=1] ≥ 1-εThm

1

.

A

n efficient algorithm exists for k-DNF hThm 2. Unless DNF is PAC-learnable, there is no efficient algorithm for conjunctions