Machine Learning Jude Shavlik University of Wisconsin USA Key Question of AI How to Get Knowledge into Computers Hand coding Supervised ML How can we mix these two extremes Small ML subcommunity has looked at ways to do so ID: 689169
Download Presentation The PPT/PDF document "Thirty-Three Years of Knowledge-Based" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Thirty-Three Years of Knowledge-Based Machine Learning
Jude ShavlikUniversity of WisconsinUSASlide2
Key Question of AI: How to Get Knowledge into Computers?
Hand coding
Supervised ML
How can we
mix
these two extremes?
Small ML subcommunity has looked at ways to do so
Slide
2Slide3
The Standard Task for Supervised Machine Learning
Age
Weight
Height
BPSex. . .Outcome
3814264in150/88F. . .2718970in125/72M. . .
. . .. . .. . .
. . .. . .. . .
. . .Slide 3Given Training ExamplesFeaturesExamples
Produce a ‘Model’ (that Classifies Future Examples)Slide4
Learning with Data in
Multiple
Tables
(Relational ML)
Previous Mammograms
Previous
Blood Tests
Previous
Rx
Key challenge
different amount
of data for each patient
PatientsSlide5
How Does the ‘Domain Expert’ Impart His/Her Expertise?
Choose FeaturesChoose and Label ExamplesCan AI’ers Allow More?Knowledge-Based Artificial Neural NetworksKnowledge-Based Support Vector MachinesMarkov Logic Networks (Domingos
’ group)
Slide
5Slide6
Two Underexplored Questions in ML
How can we go beyond teaching machines solely via I/O pairs? ‘advice giving’How can we understand what an ML algorithm has discovered? ‘rule extraction’Slide 6Slide7
OutlineExplanation-Based Learning (1980’s)
Knowledge-Based Neural Nets (1990’s)Knowledge-Based SVMs (2000’s)Markov Logic Networks (2010’s)Slide 7Slide8
Explanation-Based Learning (EBL) – My PhD Years, 1983-1987
The EBL Hypothesis By understanding why an example is a member of a concept, can learn essential properties of the conceptTrade-Off The need to collect
many examples
for The ability to ‘explain’ single examples (a ‘domain theory’)Ie, assume a smarter learner
Slide
8Slide9
Knowledge-Based Artificial Neural Networks, KBANN (1988-2001)
Initial
Symbolic
Domain
Theory
Final
Symbolic
Domain
Theory
Initial
Neural
Network
Trained
Neural
network
Examples
Insert
Extract
Refine
Slide
9
Mooney, Pazzani, Cohen, etcSlide10
What Inspired KBANN?Geoff Hinton was an invited speaker
at ICML-88I recall him saying something like “one can backprop through any function”And I thought “what would it mean to backprop through a domain theory?”Slide
10Slide11
Inserting Prior Knowledge into a Neural Network
Domain Theory
Final
Conclusions
Supporting Facts
Intermediate Conclusions
Neural Network
Output
Units
Input Units
Hidden Units
Slide
11Slide12
Jumping Ahead a BitNotice that symbolic knowledge
induces a graphical model, which is then numerically optimizedSimilar perspective later followed in Markov Logic Networks (MLNs)However in MLNs, symbolic knowledge expressed in first-order logic
Slide
12Slide13
Mapping Rules to Nets
If A and
B then Z
If B and ¬C then Z
Z
A
B
C
D
4
4
4
2
-4
4
0
6
2
4
0
0
0
Bias
Weight
Maps
propositional
rule
sets to neural networks
Slide
13Slide14
Case Study: Learning to Recognize Genes(Towell
, Shavlik & Noordewier, AAAI-90)
DNA sequence
promoter
contact
conformation
minus_10
minus_35
promoter :- contact, conformation.
contact :- minus_35, minus_10.
<
4 rules for
conformation>
<
4 rules for
minus_35>
<
4 rules for
minus_10>
(Halved error rate
of standard BP)
Slide
14
We
compile
rules to a more basic language, but here we compile for
refinabilitySlide15
Learning Curves(similar results on many tasks and
with other advice-taking algo’s)
Testset Errors
Amount of Training Data
KBANN
STD. ANN
DOMAIN THEORY
Slide
15
Fixed amount of data
Given error-rate specSlide16
From Prior Knowledge to Advice
(Maclin PhD 1995)Originally ‘theory refinement’ community assumed domain knowledge was available before learning starts
(
prior
knowledge)When applying KBANN to reinforcement learning, we began to realizeyou should be able to provide domain knowledge to a machine learner whenever you think of something to sayChanging the metaphor: commanding vs. advising computersContinual (ie, lifelong)
Human Teacher – Machine Learner CooperationSlide 16Slide17
What Would You Like to Say to This Penguin?
IF a Bee is (Near and West) &
an
Ice
is (Near and North)Then Begin Move East Move North ENDSlide18
Some Advice for Soccer-Playing RobotsSlide
18
if
distanceToGoal
≤
10and shotAngle 30then prefer shoot over all other actions
XSlide19
Some Sample ResultsSlide
19
Without advice
With adviceSlide20
Overcoming Bad Advice
Bad Advice
No AdviceSlide21
Rule Extraction
Initially Geoff Towell (PhD, 1991) viewed this as simplifying the trained neural network (M-of-N rules)Mark Craven (PhD, 1996) realizedThis is simply another learning task!Ie, learn what the neural network computes
Collect I/O pairs from trained neural network
Give them to decision-tree learner
Applies to SVMs, decision forests, etcSlide 21Slide22
Understanding
What a Trained ANN has Learned
- Human ‘Readability’ of Trained ANNs is Challenging
Rule Extraction
(Craven & Shavlik, 1996)
Roughly speaking, train ID3 to learn the I/O behavior of the
neural network – note we can generate as many labeled training
ex’s as desired by forward prop’ing through trained ANN!
Could be an ENSEMBLE of models
22Slide23
KBANN RecapUse symbolic knowledge to make an
initial guess at the concept description Standard neural-net approaches make a random guessUse training examples to refine
the
initial guess
(‘early stopping’ reduces overfitting)Nicely maps to incremental (aka online) machine learningValuable to show user the learned model expressed in symbols rather than numbersSlide24
Knowledge-Based Support Vector Machines (2001-2011)
Question arose during 2001 PhD defense of Tina Eliassi-Rad How would you apply the KBANN idea using SVMs?Led to collaboration with Olvi
Mangasarian
(who has worked on SVMs for over 50 years!)Slide 24Slide25
Generalizing the Idea of a
Training Example for SVM’s
Can extend SVM
linear program to
handle ‘
regions as
training examples
’Slide26
Knowledge-Based Support Vector Regression
Add soft constraints to linear program (so need only follow advice approximately)
minimize
||w||1 + C ||s||1 + penalty for violating advice
such that f(x) = y s constraints that represent advice
Advice
: In this region,
y should exceed 4
Inputs
4
OutputSlide27
Sample Advice-Taking Results
if
distanceToGoal
≤
10
and shotAngle 30then prefer shoot over all other actions
advice
std RL
2 vs 1 BreakAway,
rewards +1, -1
Q(shoot) > Q(pass)
Q(shoot) > Q(move)
Maclin et al:
AAAI ‘05, ‘06, ‘07Slide28
Automatically Creating Advice
Interesting approach to transfer learning(Lisa Torrey, PhD 2009)Learn in task APerform ‘rule extraction’Give as advice for related task
B
Since advice not assumed 100% correct, differences between tasks
A and B handled by training ex’s for task BSlide 28
So advice giving is done by MACHINE!Slide29
Close-Transfer Scenarios
2-on-1 BreakAway
3-on-2 BreakAway
4-on-3 BreakAwaySlide30
Distant-Transfer Scenarios
3-on-2 BreakAway
3-on-2 KeepAway
3-on-2 MoveDownfieldSlide31
Some Results: Transfer to 3-on-2 BreakAway
Torrey, Shavlik, Walker & Maclin: ECML 2006, ICML Workshop 2006Slide32
KBSVM Recap
Can view symbolic knowledge as a way to label regions of feature space (rather than solely labeling points)Maximize Model Simplicity
+ Fit to Advice
+ Fit to Training Examples
Note: does not fit view of “guess initial model, then refine using training ex’s”Slide 32Slide33
Markov Logic Networks, 2009+(and statistical-relational learning in general)
My current favorite for combining symbolic knowledge and numeric learningMLN = set of weighted FOPC sentences
wgt=3.2
x,y,z parent(x, y) parent(z, y
) → married(x, z)Have worked on speeding up MLNinference (via RDBMS) plus learning MLN rule setsSlide 33Slide34
Logic & Probability: Two Major Math Underpinnings of AI
Logic
Probabilities
Add Probabilities
Add Relations
Probabilistic Logic
Learning
MLNs a popular
approachSlide35
Learning a Set of First-Order Regression Trees (each path to a leaf is an MLN rule) – ICDM ‘11
Slide 35
Data
Predicted Probs
vs
Gradients
=
Current Rules
+
+
learn
iterate
Final Ruleset =
+
+
+
+
…Slide36
Relational Probability Trees
job(?Y,
politician
)
0.01
0.05
0.17
0.99
0.98
noyesnoyesnonoyesyes
knows(?X,?Y)
job(?X,
politician
)
speed(?X) > 75 mph
Blockeel & De
Raedt
,
AIj
1998; Neville et al, KDD 2003
Computing
prob
(
getFine
(?X) | evidence)Slide37
Learn a relational regression tree
0 training example, collect error in its predicted probability
Learn another first-order regression tree that
fixes errors
collected in Step 2 (eg, for neg ex 9, predict -0.13)Goto
2 Basic Idea of Boosted Relational Dependency Networks…a(X,Y)d(Y)
b(Y,Z)
0.13
0.140.74
0.35True
False
True
False
True
False
e(Y,Z)
c(X)
a(X,Y)
-
0.32
0.25
0.32
-
0.27
True
False
True
False
True
False
Learns ‘structure’ and weights simultaneouslySlide38
Some ResultsSlide
38
advisedBy
(
x,y) AUC-PR
Training Time MLN-BT 0.94 ± 0.06 18.4 sec MLN-BC
0.95 ± 0.05
33.3 sec Alch-D 0.31 ± 0.10 7.1 hrs
Motif 0.43 ± 0.03 1.8 hrs LHL 0.42 ±
0.10
37.2 sec Slide39
Illustrative TaskGiven
a Text Corpus of Sports Articles, Determine Who Won/Lost Each Weekwinner(TeamName, GameID)loser(TeamName, GameID)
Evidence: we ran Stanford NLP Toolkit
on the articles and produced lots of facts
(nextWord, partOfSpeech, parseTree, etc)DARPA gave us a few hundred pos ex’sSlide 39Slide40
Some Expert-Provided ‘Advice’X ‘beat’
⟶ winner(X, ArticleDate)X ‘was beaten’ ⟶ loser(X, ArticleDate)
Home teams more likely to win.
If you found a likely game winner, then if you see a different team mentioned nearby in the text, it is likely the game loser
.If a team didn’t score any touchdowns, it probably lost.If ‘predict*’ in the sentence, ignore it.
Slide 40Slide41Slide42
One Great Rule(very high precision, but moderate recall)
Look, in short phrases, for TeamName1 * Number1 * TeamName2 * Number2 where Number1 > Number2
Infer with high confidence
winningTeam
(TeamName1, ArticleDate)losingTeam(TeamName2 , ArticleDate)winnerScore(TeamName1, Number1 , ArticleDate)loserScore(TeamName2, Number2 , ArticleDate
)Rule matches none of DARPA-provided training examples! Slide 42Slide43
Differences from KBANN
Rules involve logical variablesDuring learning, we create new rules to correct errors in initial rulesRecent followup: also refine initial rules
(note that KBSVMs also do NOT
refine
rules, though we had one AAAI paper on that)Slide 43Slide44
Wrapping UpSymbolic knowledge refined/extended by
Neural networks Support-vector machines MLN rule and weight learningVariety of views takenMake initial guess at concept, then
refine
weights
Use advice to label a region in feature spaceMake initial guess at concept, then add wgt’ed rulesSeeing what was learned – rule extractionSlide 44
Applications in genetics, cancer, machine reading, robot learning, etcSlide45
Some Suggestions
Allow humans to continually observe learning and provide symbolic knowledge at any timeNever assume symbolic knowledge is 100% correctAllow user to see what was learned in a symbolic representation to facilitate additional advicePut a graphic on every slide
Slide
45Slide46
Thanks for Your TimeQuestions?
Papers, Powerpoint presentations, and some software available on line pages.cs.wisc.edu/~shavlik/mlrg/publications.html hazy.cs.wisc.edu/hazy/tuffy/ pages.cs.wisc.edu/~tushar/rdnboost/i
Slide
46