/
Thirty-Three Years of Knowledge-Based Thirty-Three Years of Knowledge-Based

Thirty-Three Years of Knowledge-Based - PowerPoint Presentation

giovanna-bartolotta
giovanna-bartolotta . @giovanna-bartolotta
Follow
377 views
Uploaded On 2018-10-13

Thirty-Three Years of Knowledge-Based - PPT Presentation

Machine Learning Jude Shavlik University of Wisconsin USA Key Question of AI How to Get Knowledge into Computers Hand coding Supervised ML How can we mix these two extremes Small ML subcommunity has looked at ways to do so ID: 689169

knowledge slide learning advice slide knowledge advice learning neural training rule rules symbolic based domain initial network logic examples

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Thirty-Three Years of Knowledge-Based" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Thirty-Three Years of Knowledge-Based Machine Learning

Jude ShavlikUniversity of WisconsinUSASlide2

Key Question of AI: How to Get Knowledge into Computers?

Hand coding

Supervised ML

How can we

mix

these two extremes?

Small ML subcommunity has looked at ways to do so

Slide

2Slide3

The Standard Task for Supervised Machine Learning

Age

Weight

Height

BPSex. . .Outcome

3814264in150/88F. . .2718970in125/72M. . .

. . .. . .. . .

. . .. . .. . .

. . .Slide 3Given Training ExamplesFeaturesExamples

Produce a ‘Model’ (that Classifies Future Examples)Slide4

Learning with Data in

Multiple

Tables

(Relational ML)

Previous Mammograms

Previous

Blood Tests

Previous

Rx

Key challenge

different amount

of data for each patient

PatientsSlide5

How Does the ‘Domain Expert’ Impart His/Her Expertise?

Choose FeaturesChoose and Label ExamplesCan AI’ers Allow More?Knowledge-Based Artificial Neural NetworksKnowledge-Based Support Vector MachinesMarkov Logic Networks (Domingos

’ group)

Slide

5Slide6

Two Underexplored Questions in ML

How can we go beyond teaching machines solely via I/O pairs? ‘advice giving’How can we understand what an ML algorithm has discovered? ‘rule extraction’Slide 6Slide7

OutlineExplanation-Based Learning (1980’s)

Knowledge-Based Neural Nets (1990’s)Knowledge-Based SVMs (2000’s)Markov Logic Networks (2010’s)Slide 7Slide8

Explanation-Based Learning (EBL) – My PhD Years, 1983-1987

The EBL Hypothesis By understanding why an example is a member of a concept, can learn essential properties of the conceptTrade-Off The need to collect

many examples

for The ability to ‘explain’ single examples (a ‘domain theory’)Ie, assume a smarter learner

Slide

8Slide9

Knowledge-Based Artificial Neural Networks, KBANN (1988-2001)

Initial

Symbolic

Domain

Theory

Final

Symbolic

Domain

Theory

Initial

Neural

Network

Trained

Neural

network

Examples

Insert

Extract

Refine

Slide

9

Mooney, Pazzani, Cohen, etcSlide10

What Inspired KBANN?Geoff Hinton was an invited speaker

at ICML-88I recall him saying something like “one can backprop through any function”And I thought “what would it mean to backprop through a domain theory?”Slide

10Slide11

Inserting Prior Knowledge into a Neural Network

Domain Theory

Final

Conclusions

Supporting Facts

Intermediate Conclusions

Neural Network

Output

Units

Input Units

Hidden Units

Slide

11Slide12

Jumping Ahead a BitNotice that symbolic knowledge

induces a graphical model, which is then numerically optimizedSimilar perspective later followed in Markov Logic Networks (MLNs)However in MLNs, symbolic knowledge expressed in first-order logic

Slide

12Slide13

Mapping Rules to Nets

If A and

B then Z

If B and ¬C then Z

Z

A

B

C

D

4

4

4

2

-4

4

0

6

2

4

0

0

0

Bias

Weight

Maps

propositional

rule

sets to neural networks

Slide

13Slide14

Case Study: Learning to Recognize Genes(Towell

, Shavlik & Noordewier, AAAI-90)

DNA sequence

promoter

contact

conformation

minus_10

minus_35

promoter :- contact, conformation.

contact :- minus_35, minus_10.

<

4 rules for

conformation>

<

4 rules for

minus_35>

<

4 rules for

minus_10>

(Halved error rate

of standard BP)

Slide

14

We

compile

rules to a more basic language, but here we compile for

refinabilitySlide15

Learning Curves(similar results on many tasks and

with other advice-taking algo’s)

Testset Errors

Amount of Training Data

KBANN

STD. ANN

DOMAIN THEORY

Slide

15

Fixed amount of data

Given error-rate specSlide16

From Prior Knowledge to Advice

(Maclin PhD 1995)Originally ‘theory refinement’ community assumed domain knowledge was available before learning starts

(

prior

knowledge)When applying KBANN to reinforcement learning, we began to realizeyou should be able to provide domain knowledge to a machine learner whenever you think of something to sayChanging the metaphor: commanding vs. advising computersContinual (ie, lifelong)

Human Teacher – Machine Learner CooperationSlide 16Slide17

What Would You Like to Say to This Penguin?

IF a Bee is (Near and West) &

an

Ice

is (Near and North)Then Begin Move East Move North ENDSlide18

Some Advice for Soccer-Playing RobotsSlide

18

if

distanceToGoal

10and shotAngle  30then prefer shoot over all other actions

XSlide19

Some Sample ResultsSlide

19

Without advice

With adviceSlide20

Overcoming Bad Advice

Bad Advice

No AdviceSlide21

Rule Extraction

Initially Geoff Towell (PhD, 1991) viewed this as simplifying the trained neural network (M-of-N rules)Mark Craven (PhD, 1996) realizedThis is simply another learning task!Ie, learn what the neural network computes

Collect I/O pairs from trained neural network

Give them to decision-tree learner

Applies to SVMs, decision forests, etcSlide 21Slide22

Understanding

What a Trained ANN has Learned

- Human ‘Readability’ of Trained ANNs is Challenging

Rule Extraction

(Craven & Shavlik, 1996)

Roughly speaking, train ID3 to learn the I/O behavior of the

neural network – note we can generate as many labeled training

ex’s as desired by forward prop’ing through trained ANN!

Could be an ENSEMBLE of models

22Slide23

KBANN RecapUse symbolic knowledge to make an

initial guess at the concept description Standard neural-net approaches make a random guessUse training examples to refine

the

initial guess

(‘early stopping’ reduces overfitting)Nicely maps to incremental (aka online) machine learningValuable to show user the learned model expressed in symbols rather than numbersSlide24

Knowledge-Based Support Vector Machines (2001-2011)

Question arose during 2001 PhD defense of Tina Eliassi-Rad How would you apply the KBANN idea using SVMs?Led to collaboration with Olvi

Mangasarian

(who has worked on SVMs for over 50 years!)Slide 24Slide25

Generalizing the Idea of a

Training Example for SVM’s

Can extend SVM

linear program to

handle ‘

regions as

training examples

’Slide26

Knowledge-Based Support Vector Regression

Add soft constraints to linear program (so need only follow advice approximately)

minimize

||w||1 + C ||s||1 + penalty for violating advice

such that f(x) = y  s constraints that represent advice

Advice

: In this region,

y should exceed 4

Inputs

4

OutputSlide27

Sample Advice-Taking Results

if

distanceToGoal

10

and shotAngle  30then prefer shoot over all other actions

advice

std RL

2 vs 1 BreakAway,

rewards +1, -1

Q(shoot) > Q(pass)

Q(shoot) > Q(move)

Maclin et al:

AAAI ‘05, ‘06, ‘07Slide28

Automatically Creating Advice

Interesting approach to transfer learning(Lisa Torrey, PhD 2009)Learn in task APerform ‘rule extraction’Give as advice for related task

B

Since advice not assumed 100% correct, differences between tasks

A and B handled by training ex’s for task BSlide 28

So advice giving is done by MACHINE!Slide29

Close-Transfer Scenarios

2-on-1 BreakAway

3-on-2 BreakAway

4-on-3 BreakAwaySlide30

Distant-Transfer Scenarios

3-on-2 BreakAway

3-on-2 KeepAway

3-on-2 MoveDownfieldSlide31

Some Results: Transfer to 3-on-2 BreakAway

Torrey, Shavlik, Walker & Maclin: ECML 2006, ICML Workshop 2006Slide32

KBSVM Recap

Can view symbolic knowledge as a way to label regions of feature space (rather than solely labeling points)Maximize Model Simplicity

+ Fit to Advice

+ Fit to Training Examples

Note: does not fit view of “guess initial model, then refine using training ex’s”Slide 32Slide33

Markov Logic Networks, 2009+(and statistical-relational learning in general)

My current favorite for combining symbolic knowledge and numeric learningMLN = set of weighted FOPC sentences

wgt=3.2

x,y,z parent(x, y)  parent(z, y

) → married(x, z)Have worked on speeding up MLNinference (via RDBMS) plus learning MLN rule setsSlide 33Slide34

Logic & Probability: Two Major Math Underpinnings of AI

Logic

Probabilities

Add Probabilities

Add Relations

Probabilistic Logic

Learning

MLNs a popular

approachSlide35

Learning a Set of First-Order Regression Trees (each path to a leaf is an MLN rule) – ICDM ‘11

Slide 35

Data

Predicted Probs

vs

Gradients

=

Current Rules

+

+

learn

iterate

Final Ruleset =

+

+

+

+

…Slide36

Relational Probability Trees

job(?Y,

politician

)

0.01

0.05

0.17

0.99

0.98

noyesnoyesnonoyesyes

knows(?X,?Y)

job(?X,

politician

)

speed(?X) > 75 mph

Blockeel & De

Raedt

,

AIj

1998; Neville et al, KDD 2003

Computing

prob

(

getFine

(?X) | evidence)Slide37

Learn a relational regression tree

0 training example, collect error in its predicted probability

Learn another first-order regression tree that

fixes errors

collected in Step 2 (eg, for neg ex 9, predict -0.13)Goto

2 Basic Idea of Boosted Relational Dependency Networks…a(X,Y)d(Y)

b(Y,Z)

0.13

0.140.74

0.35True

False

True

False

True

False

e(Y,Z)

c(X)

a(X,Y)

-

0.32

0.25

0.32

-

0.27

True

False

True

False

True

False

Learns ‘structure’ and weights simultaneouslySlide38

Some ResultsSlide

38

advisedBy

(

x,y) AUC-PR

Training Time MLN-BT 0.94 ± 0.06 18.4 sec MLN-BC

0.95 ± 0.05

33.3 sec Alch-D 0.31 ± 0.10 7.1 hrs

Motif 0.43 ± 0.03 1.8 hrs LHL 0.42 ±

0.10

37.2 sec Slide39

Illustrative TaskGiven

a Text Corpus of Sports Articles, Determine Who Won/Lost Each Weekwinner(TeamName, GameID)loser(TeamName, GameID)

Evidence: we ran Stanford NLP Toolkit

on the articles and produced lots of facts

(nextWord, partOfSpeech, parseTree, etc)DARPA gave us a few hundred pos ex’sSlide 39Slide40

Some Expert-Provided ‘Advice’X ‘beat’

⟶ winner(X, ArticleDate)X ‘was beaten’ ⟶ loser(X, ArticleDate)

Home teams more likely to win.

If you found a likely game winner, then if you see a different team mentioned nearby in the text, it is likely the game loser

.If a team didn’t score any touchdowns, it probably lost.If ‘predict*’ in the sentence, ignore it.

Slide 40Slide41
Slide42

One Great Rule(very high precision, but moderate recall)

Look, in short phrases, for TeamName1 * Number1 * TeamName2 * Number2 where Number1 > Number2

Infer with high confidence

winningTeam

(TeamName1, ArticleDate)losingTeam(TeamName2 , ArticleDate)winnerScore(TeamName1, Number1 , ArticleDate)loserScore(TeamName2, Number2 , ArticleDate

)Rule matches none of DARPA-provided training examples! Slide 42Slide43

Differences from KBANN

Rules involve logical variablesDuring learning, we create new rules to correct errors in initial rulesRecent followup: also refine initial rules

(note that KBSVMs also do NOT

refine

rules, though we had one AAAI paper on that)Slide 43Slide44

Wrapping UpSymbolic knowledge refined/extended by

Neural networks Support-vector machines MLN rule and weight learningVariety of views takenMake initial guess at concept, then

refine

weights

Use advice to label a region in feature spaceMake initial guess at concept, then add wgt’ed rulesSeeing what was learned – rule extractionSlide 44

Applications in genetics, cancer, machine reading, robot learning, etcSlide45

Some Suggestions

Allow humans to continually observe learning and provide symbolic knowledge at any timeNever assume symbolic knowledge is 100% correctAllow user to see what was learned in a symbolic representation to facilitate additional advicePut a graphic on every slide

Slide

45Slide46

Thanks for Your TimeQuestions?

Papers, Powerpoint presentations, and some software available on line pages.cs.wisc.edu/~shavlik/mlrg/publications.html hazy.cs.wisc.edu/hazy/tuffy/ pages.cs.wisc.edu/~tushar/rdnboost/i

Slide

46