a bductive reasoning using random examples Brendan Juba Washington University in St Louis Outline Models of abductive reasoning An abductive reasoning algorithm Noteasiness of abducing conjunctions ID: 272261
Download Presentation The PPT/PDF document "Learning" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Learning abductive reasoning using random examples
Brendan Juba
Washington University in St. LouisSlide2
Outline
Models of
abductive
reasoning
An
abductive
reasoning algorithm
“Not-easiness” of abducing conjunctionsSlide3
Abductive reasoning: making plausible guesses
Abductive
reasoning
:
Given
a conclusion c, find a “plausible” h that implies/leads to/… c Two varieties of “plausibility” in common useLogical plausibility: a small h from which c followsBayesian plausibility: a h which has large posterior probability when given cIn symbols… Pr [h|c true] > …
Requires a
prior distribution
over representations…Slide4
Why might we want a new model?
Existing models only
tractable
in
simple cases
E.g. Horn rules (a⋀b⋀c⇒d …no negations), “nice” (conjugate) priorsThe choice of prior distribution really mattersAnd, it’s difficult to specify by handSlide5
New model: abductive reasoning
from random examples
Fix a set of
attributes
(propositional variables x1, x2, …, xn)An environment is modeled by an arbitrary, unknown distribution D over examples, i.e., settings of the n propositional variables.Task: for a conclusion c, find a h such thatPlausibility: Prx∈D[h(x
)=1] ≥ μ (for some given μ)
h almost entails c:
Pr
x∈D
[c(x)=1|h(x)=1] ≥ 1-ε
All probabilities over examples from
DSlide6
Example: identifying a subgoal
Consider
: blocks world. For
t
=1,2,…,T
Propositional state vars. (“fluents”) ONt(A,B), ONt (A,TABLE), ONt (C,A), etc. Actions also encoded by propositional vars.PUTt(B,A), PUTt(C,TABLE), etc.Given many examples of interaction…Our goal c: ONT(A,TABLE)⋀ONT(B,A) ⋀ONT(C,B)A perhaps plausibly good “subgoal”
h
:
[ON
T-1
(B,A) ⋀PUTT(C,B)]⋁[PUTT-1 (B,A) ⋀PUTT (C,B)]
A
B
Or, even given by examples, not explicitly formulated…Slide7
Formally: abductive reasoning
from random examples
for a class H
Fix a class of Boolean representations
H
Given Boolean formula c; ε, δ, μ∈(0,1);independent examples x(1),…,x(m) ∈D,Suppose that there exists a h*∈H such thatPlausibility: Prx∈D[h*(x)=1] ≥ μh* entails c:
Pr
x∈D
[
c
(x)=1|h*(x)=1] = 1Find a h (ideally in H) such that with prob. 1-δ,
Plausibility:
Pr
x∈D[h(x)=1] ≥ 1/poly(1/
μ, 1/1-ε, n)h almost entails c: Prx∈D[
c(x)=1|h(x)=1] ≥ 1-εSlide8
in pictures…
c
(
x
)=1
x ∈ {0,1}
n
h
(
x
)=1
x: h
(
x
)=1
c
(
x
)=1
c
(
x
)=0
c: goal/observation…
h: explanation/solution/…Slide9
Outline
Models of
abductive
reasoning
An
abductive reasoning algorithm“Not-easiness” of abducing conjunctionsSlide10
Theorem 1. If there is a k-DNF h*
such that
Plausibility:
Pr
x∈D[h*(x)=1] ≥ μh* entails c: Prx∈D[c(x)=1|h*(x)=1] = 1then using m = O(1/με (nk+log1/δ)) examples, in time O(mnk) we can find a k-DNF h such that with probability 1-δ,
Plausibility:
Pr
x∈D
[h(x)=1] ≥ μh almost entails c: Pr
x∈D
[
c
(x)=1|h(x)=1] ≥ 1-ε
k-DNF: an OR of “terms of size k” – ANDs of at most k “literals” – attributes or their negationsSlide11
Algorithm for k-DNF abductionStart with
h
as an OR over
all
terms of size
kFor each example x(1),…,x(m)If c(x(i)) = 0, delete all terms T from h such that T(x(i)) = 1Return hSimple algorithm, 1st proposed by J.S. Mill, 1843Running time is clearly O(mnk
)Slide12
Analysis pt 1:
Pr
x∈D
[
h
(x)=1] ≥ μWe are given that some k-DNF h* hasPlausibility: Prx∈D[h*(x)=1] ≥ μh* entails c: Prx∈D[c(x)=1|h*(x)=1] = 1Initially, every term of h* is in h
Terms of
h*
are
never
true when c(x)=0 by 2.every term of h*
remains
in
h
h* implies h, so Prx[h(x)=1]≥
Prx[h*(x)=1]≥μ.Slide13
Analysis pt 2:
Pr
x
[
c
(x)=1|h(x)=1] ≥ 1-εRewrite conditional probability:Prx∈D [c(x)=0⋀h(x)=1] ≤ εPrx∈D [h(x)=1]We’ll show: Pr
x∈D
[
c
(x)=0⋀h(x)=1] ≤ εμ ( ≤ εPr
x∈D
[
h(x)=1] by part 1)Consider any h’ s.t. Pr
x[c(x)=0⋀h’(x)=1] > εμ
Since each x(i) is drawn independently from D
Pr
x
∈D
[no
i
has
c
(
x
(
i
)
)
=
0
⋀ h’(
x
(
i
)
)=1] < (1-εμ)
m
A term of
h’
is deleted when
c=0
and
h’
=
1
So,
h’
is only possibly output
w.p
. <
(1-εμ)
mSlide14
Analysis pt 2, cont’d:
Pr
x
[
c
(x)=1|h(x)=1] ≥ 1-εWe’ll show: Prx∈D [c(x)=0⋀h(x)=1] ≤ εμ Consider any h’ s.t. Prx[c(x)=0⋀h’(
x
)=1]
>
εμ
h’ is only possibly output w.p. < (1-εμ)mThere are only 2O(n
k
)
possible k-DNF
h’Since (1-1/x)x ≤ 1
/e, m = O(1/με (nk
+log1/δ)) ex’s suffice to guarantee that each such h’ is only possible to output
w.p
. <
δ
/
2
O(
n
k
)
w.p
. >1-δ, our
h
has
Pr
x
[c(x
)=0
⋀
h
(
x
)=1
]
≤
εμSlide15
Theorem 1. If there is a k-DNF h*
such that
Plausibility:
Pr
x∈D[h*(x)=1] ≥ μh* entails c: Prx∈D[c(x)=1|h*(x)=1] = 1then using m = O(1/με (nk+log1/δ)) examples, in time O(mnk) we can find a k-DNF h such that with probability 1-δ,
Plausibility:
Pr
x∈D
[h(x)=1] ≥ μh almost entails c: Prx∈D
[
c
(
x)=1|h(x)=1] ≥ 1-ε
k-DNF: an OR of “terms of size k” – ANDs of at most k “literals” – attributes or their negations
A version that tolerates exceptions is also possible–see paper…Slide16
Outline
Models of
abductive
reasoning
An
abductive reasoning algorithm“Not-easiness” of abducing conjunctionsSlide17
Theorem 2. Suppose that a polynomial-time algorithm exists for learning abduction from random examples for conjunctions.
Then there is a polynomial-time algorithm for PAC-learning DNF.Slide18
PAC Learning
(
x
(1
)
, c(x(1)))(x(2)1,c(x(2)))(x(m), c(x(m)))…
D
f
C
C∈
C
w.p
.
1-δ
o
ver…
x’=(
x’
1
,
x’
2
,…,
x’
n
)
w.p
.
1-ε over…
c
(
x
’
)
f
(
x
’
)Slide19
Theorem 2. Suppose that a polynomial-time algorithm exists for learning abduction from random examples for conjunctions.
Then there is a polynomial-time algorithm for PAC-learning DNF.
Theorem (
Daniely
& Shalev-Shwartz’14). If there is a polynomial-time algorithm for PAC-learning DNF, then for every f(k)⟶∞ there is a polynomial-time algorithm for refuting random k-SAT formulas of nf(k) clauses. This is a new hardness assumption. Use at your discretion.☞Slide20
Key learning technique: Boosting (Schapire
, 1990)
Suppose that there is a polynomial-time algorithm that, given examples of
c∈
C
, w.p. 1-δ produces a circuit f s.t. Prx[f(x)=c(x)] > ½+1/poly(n). Then there is a polynomial-time PAC-learning algorithm for the class C.i.e., using the ability to produce such f’s, we produce a g for which Pr
x
[
g
(
x)=c(x)] is “boosted” to any 1-ε we require.Slide21
Sketch of learning DNF using conjunctive abduction
If c is a DNF and
Pr
[c(
x
)=1]> ¼, then some term T of c has Pr[T(x)=1]>1/4|c|…and otherwise f≡0 satisfies Pr[f(x)=c(x)]>½+¼Note: Pr[c(x)=1|T(x)=1]=1, T is a conjunctionAbductive reasoning finds some h such thatPr[h
(
x
)=1]>
1
/poly(n) and Pr[c(x)=1|h(
x
)=1]>¾
Return
f: f(x)=1 whenever h(x)=1; if h
(x)=0, f(x)=majx:
h(x)=0{c(x)}—
Pr
[
f
(
x
)=c(
x
)]>½+
1
/
4poly(
n)Slide22
Recap: learning abduction from random examples
For goal condition
c,
if there is a
h*
such that Prx∈D[h*(x)=1] ≥ μ & Prx∈D[c(x)=1|h*(x)=1] = 1using examples x from D, find a h such thatPrx∈D[h(x)=1] ≥ μ’ & Pr
x
∈D
[
c
(x)=1|h(x)=1] ≥ 1-εThm
1
.
A
n efficient algorithm exists for k-DNF hThm 2. Unless DNF is PAC-learnable, there is no efficient algorithm for conjunctions