Appears in Proceedings of the Eleventh International Workshop on Machine Learning pp
167K - views

Appears in Proceedings of the Eleventh International Workshop on Machine Learning pp

343351 Rutgers NJ July 1994 Com biningT opdo wnandBottomupT ec hniques inInductiv eLogicProgramming JohnMZelleRa ymondJMooneyandJosh uaBKon visser Departmen t of Computer Sciences Univ ersit yofT exas Austin TX 78712 zellemo oney k on vissacsutexase

Download Pdf

Appears in Proceedings of the Eleventh International Workshop on Machine Learning pp

Download Pdf - The PPT/PDF document "Appears in Proceedings of the Eleventh I..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

Presentation on theme: "Appears in Proceedings of the Eleventh International Workshop on Machine Learning pp"— Presentation transcript:

Page 1
Appears in Proceedings of the Eleventh International Workshop on Machine Learning pp. 343-351, Rutgers, NJ, July, 1994 Com biningT op-do wnandBottom-upT ec hniques inInductiv eLogicProgramming JohnM.Zelle,Ra ymondJ.MooneyandJosh uaB.Kon visser Departmen t of Computer Sciences Univ ersit yofT exas Austin, TX 78712 zelle,mo oney ,k on Abstract This pap er describ es a new metho d for induc- ing logic programs from examples whic h at- tempts to in tegrate the b est asp ects of exist- ing ILP metho ds in to a single coheren t frame- ork. In particular, it com

bines a b ottom- up metho d similar to Golem with a top- do wn metho d similar to oil It also in- cludes a metho d for predicate in en tion sim- ilar to Champ and an elegan t solution to the \noisy oracle" problem whic h allo ws the system to learn recursiv e programs without requiring a complete set of p ositiv e exam- ples. Systematic exp erimen tal comparisons to b oth Golem and oil on a range of prob- lems are used to clearly demonstrate the ad- an tages of the approac h. INTR ODUCTION Inductiv e Logic Programmi ng (ILP) researc hin es- tigates the construction of rst-order, de nite-clause

logic programs from a set of examples and giv en bac k- ground information. As suc h, it stands at the in tersec- tion of the traditional elds of mac hine learning and logic programmi ng. The explosion of recen t researc in this area has clustered around t o basic induction metho ds, b ottom-up and top-do wn. Bottom-up metho ds, dra wing hea vily on logic pro- grammi ng theory , searc h for program clauses b y con- sidering generalizations created b yin erting logical resolution or, more generally , implication. A suc- cessful represen tativ e of this class is Muggleton and eng's Golem

(Muggleton and F eng, 1992). op- do wn metho ds, in con trast, learn clauses b y searc h- ing from general to sp eci c in a manner analogous to traditional mac hine-learning approac hes for induc- ing decision trees. P erhaps the b est-kno wn example is Quinlan's oil (Quinlan, 1990; Cameron-Jones and Quinlan, 1994) whic h uses an information heuristic to guide searc h through the space of p ossible program clauses. While b oth Golem and oil ha e b een successful, eac h of these approac hes has w eaknesses. Golem is based on the construction of relativ e least-general generalizations, rlggs

(Plotkin, 1970) whic h forces the bac kground kno wledge to b e expressed extensionally as a set of ground facts. This explicit mo del of bac k- ground kno wledge can b e excessiv ely large, and the clauses constructed from suc h mo dels can gro w ex- plosiv ely A partial answ er to the eciency prob- lem is the restriction of h yp otheses to the so-called ij-determinate clauses. A related problem is sensitivit to the distribution of input examples. If only a random sample of p ositiv e examples is presen ted, the resulting mo del of the predicate to b e learned is incomplete, and Golem ma y

fail to create sucien tly general h yp othe- ses, resulting in diminished p erformance. oil also uses extensional bac kground kno wledge, but this requiremen t is for eciency reasons; top-do wn algorithms can easily use in tensionally de ned bac k- ground predicates to ev aluate v arious comp eting h y- p otheses. A more fundamen tal w eakness is that oil constructs clauses whic h are function-free. An y func- tions (e.g. list structures) m ust b e handled b y in- cluding explicit constructor predicates as part of the bac kground kno wledge. The proliferation of construc- tor predicates can

signi can tly degrade oil 's p erfor- mance. In addition, oil su ers its o wn v ersion of the incomplete mo del problem when trying to learn recursiv e predicates. Recursiv eh yp otheses are ev al- uated b y using the p ositiv e examples as a mo del of the predicate b eing learned. When the examples are incomplete, they pro vide a \noisy oracle" and oil has dicult y learning ev en simple recursiv e concepts (Cohen, 1993). This pap er describ es a new ILP algorithm, Chillin whic h com bines elemen ts of top-do wn and b ottom- or Chill ,INductionalgorithm. Chill isalanguage

acquisitionsystembasedonlearningcon trol-rulesforlogic programs(ZelleandMooney ,1993).
Page 2
up induction metho ds. This algorithm has b een used as an elemen t of a larger system whic h learns to parse natural language (Zelle and Mo oney , 1993). Chillin 's com bination of tec hniques o ers sev eral ad- an tages. Chillin learns with in tensionally expressed bac kground kno wledge and can handle examples con- taining functions without explicit constructor predi- cates. Additionally Chillin pro vides a simple, e- cien t solution to the noisy-oracle problem, resulting in b etter p

erformance than Golem or oil when learn- ing from random examples. Finally Chillin pro vides a simple framew ork for demand-driv en in en tion of new predicates. The follo wing section presen ts the Chillin algorithm and discusses the w ys in whic h it impro es on strictly top-do wn or b ottom-up metho ds. Section 3 presen ts exp erimen tal results across a range of domains com- paring Chillin to oil and Golem . Section 4 brie y discusses some related w ork. Section 5 outlines future ork, and Section 6 presen ts our conclusions. THECHILLINALGORITHM 2.1 VER VIEW The input to Chillin is a set of

p ositiv e and negativ examples of a concept expressed as ground facts, and a set of bac kground predicates expressed as de nite clauses. Chillin constructs a de nite-clause concept de nition whic hen tails the p ositiv e examples, but not the negativ e. Chillin is, at its core, a compaction algorithm whic tries to construct a small, simple program that co v- ers the p ositiv e examples. The algorithm starts with a most sp eci c de nition (the set of p ositiv e exam- ples) and in tro duces generalizations whic h mak e the de nition more compact. Compactness is measured ya Cigol -lik e size

metric (Muggleton and Bun tine, 1988) whic h is a simple measure of the syn tactic size of the program. The searc h for more general de ni- tions is carried out in a hill-clim bing fashion. A t eac step, Chillin considers a n um b er of p ossible general- izations; the one pro ducing the greatest compaction is implemen ted, and the pro cess rep eats. Generalizations are pro duced under the notion of em- piric al subsumption In tuitiv ely , the algorithm at- tempts to construct a clause that, when added to the curren t de nition, renders other clauses sup er uous. The sup er uous clauses are

then eliminated to pro- duce a more compact de nition. F ormally ,w e de ne empirical subsumption as follo ws: Giv en a set of Clauses ::: and a set of p ositiv e exam- ples pro able from , a clause empirically sub- sumes i :[( ]. Throughout the rest of the pap er, unless otherwise noted, the term \subsumption" should b e in terpreted in this empirical sense. DEF:= E:-true os Repeat AIRS:=asamplingofpairsofclausesfromDEF GENS:= G=build gen( ,DEF,P os,Neg) for ;C i2 AIRS G:=ClauseinGENSyieldingmostcompaction DEF:=(DEF (Clausessubsumedb yG)) Un til nofurthercompaction Figure 1: Basic Induction

Algorithm Figure 1 sho ws the basic compaction lo op of Chillin Lik Golem Chillin constructs generalizations from a random sampling of pairs of clauses in the curren de nition. The n um b er of pairs c hosen is a user- settable parameter whic h defaults to 15. The b est generalization from these pairs is used to reduce DEF The reduction is p erformed b y adding at the top of the de nition and then using the standard Prolog pro of strategy to nd the rst pro of of eac h p ositiv example; an y clause whic h is not used in one of these pro ofs is then deleted from the de nition. 2.2 CONSTR UCTING

GENERALIZA TIONS The build gen algorithm is sho wn in Figure 2. There are three basic pro cesses in olv ed. First is the construction of a simple generalization of the input clauses. If this generalization co ers no negativ e ex- amples, it is returned. If the initial generalization is to o general, an attempt is made to sp ecialize it b adding an teceden ts. If the expanded clause is still to o general, it is passed to a routine whic hin en ts a new predicate that further sp ecializes the clause so that it co ers no negativ e examples. These three pro cesses are explained in detail in the

follo wing subsections. 2.2.1 ConstructinganInitialGeneralization The initial generalization of the input clauses is computed b y nding the least-general-generalization (LGG) of the input clauses under theta-subsumption (Plotkin, 1970). The LGG of clauses and is the least general clause whic h subsumes (in the usual, non-empirical sense) b oth and . The LGG is eas- ily computed b y \matc hing" compatible literals of the clauses; wherev er the literals ha e di ering structure, the LGG con tains a v ariable. When iden tical pairings of di ering structures o ccurs, the same v ariable is used for

the pair in all lo cations. or example, when learning the de nition of member the initial de nition ma y con tain clauses suc h as: member(1,[1,2,3]):-true. member(2,[1,2,3]):-true. member(3,[3]):-true. The LGG of the rst and third clauses is: member(A,[A|B]):-true . Here the new v ariable,
Page 3
unction build gen ,DEF,P os,Neg) GEN:=LGG( )under -subsumption CNEGS:=Negativ esco eredb yGEN if CNEGS= fg return GEN GEN:=add an teceden ts(P os,CNEGS,GEN) CNEGS:=negativ esco eredb yGEN if CNEGS= fg return GEN REDUCED:=DEF-(Clausessubsumedb yGEN) CPOS:= os REDUCED 6` LITERAL:=in en

predicate(CPOS,CNEGS,GEN) GEN:=GEN LITERAL return GEN Figure 2: Build gen Algorithm has b een used for b oth pairings of the constan ts and . This LGG is a v alid generalization (it co ers no neg- ativ e examples of member ) and, in fact, represen ts the correct base case for the usual de nition. Of course, suc h generalizations are not alw ys correct. The LGG of the second and third clauses is: member(A,[B|C]) :-true whic h asserts that an ything is a mem ber of a list whic h con tains at least one elemen t. This general- ization requires further re nemen t to prev en tco erage of negativ e

examples. Although the initial de nitions in Chillin consist of unit clauses (the only an teceden t b eing true ), as the de nition b ecomes more compact, the clauses from whic h LGGs are constructed ma y con tain non-trivial conditions. LGG construction is still straigh t-forw ard; e need only b e sure that consisten tv ariable usage is main tained b et een as w ell as within literals. F or example, consider t o clauses de ning the concept of uncle uncle(X,Y ):- sib(X,Z),parent(Z,Y) ,male(X). uncle(X,Y ):- married(X, Z),sib(Z,W),parent(W,Y ),male(X). The LGG of these clauses yields: uncle(A,B

):-sib(C,D),parent(D,B) ,male(A). Here replaces the pair replaces , etc. Unlik e the RLGGs used b Golem , LGGs are inde- p enden tofan y bac kground kno wledge and ecien tly computable from the input clauses. t this p oin t, GEN is guaran teed to b e at least as general as either input clause, but ma y also co er negativ e examples. The pro cess also e ectiv ely in tro duces relev an tv ari- ables whic h directly decomp ose functional structures; this allo ws subsequen t sp ecialization without resort to explicit attening through constructor predicates. 2.2.2 AddingAn teceden ts As its name

implies, add antecedents attempts to sp ecialize GEN y adding new literals as an teceden ts. The goal is to minim ize co erage of negativ e examples while insuring that the clause still subsumes existing clauses. Add antecedents emplo ys a oil -lik e mec h- anism whic h adds literals deriv able either from bac k- ground or previously in en ted predicates. An teceden ts are added one at a time using a hill-clim bing pro cess; at eac h step a literal is added that maxim izes a heuris- tic gain metric. The gain metric emplo ed in Chillin is a mo di cation of the oil information-theoretic gain

metric. The coun t of p ositiv e tuples (lo osely , the n um ber of co ered p ositiv e examples) in the oil metric is replaced with an estimate of the n um b er of clauses in DEF whic are subsumed b GEN . This estimate is obtained b partitioning the set of p ositiv e examples according to the rst clause in DEF whic hco ers a giv en example. The en tire set of examples is pro ed once using DEF and subsequen t testing of GEN extensions is p erformed er the partitioned examples. As an example of this pro cess, consider learn- ing the t o-clause de nition of uncle illustrated ab o e. Initially DEF

con tains unit clauses represen t- ing the p ositiv e examples. Constructing the LGG of t o clauses, sa uncle(john,jay):-true and uncle(billsue):-true pro duces the unhelpful generalization uncle(A,B):-true whic hco ers all examples. P artitioning the p ositiv e examples accord- ing to the rst co ering-clause results in asso ciating eac h example with the unit clause constructed from the example. Th us, the coun t of subsumed clauses for an y giv en extension of GEN is just the coun t of p ositiv examples co ered b y GEN; add antecedents will p er- form analogously to oil , adding literals to

GEN one at a time to main tain maxim al co erage of p ositiv e ex- amples and eliminate co erage of negativ es. Dep end- ing on the distribution of examples, add antecedents migh t pro duce either of the uncle clauses. When the outer lo op of Chillin adds this clause to the de ni- tion, all of the unit clauses for the examples co ered y the clause are remo ed. In the next cycle the LGG of remain- ing unit clauses ma again pro duce the GEN uncle(A,B):-true This time, the partitioning in add antecedents asso ciates n umerous examples with the rst clause (the previously constructed general-

ization) and asso ciates one example with eac h of the unit clauses remaining in DEF . In extending GEN at this p oin t, the an teceden ts whic h had high gain previously will no wlookv ery p o or; although they co er man p ositiv e examples, they only allo w subsumption of one clause (namely the clause previously found). On the other hand, the an teceden ts of the second uncle dis- junct will ha ev ery high gain as they allo w the sub- sumption of the remaining unit clauses while elimi- nating negativ e examples. Add antecedents will then learn the second clause, and the de nition is

complete. Of course Chillin will try one more round of com- paction b efore disco ering that the t o disjuncts of uncle ma y not b e com bined to form an ev en more
Page 4
compact de nition. This discussion has assumed that add antecedents has predicates a ailable whic h will allo w it to com- pletely discriminate b et een the p ositiv e and negativ examples; ho ev er, this is not alw ys the case. In suc h situations, add antecedents ma y add a few an- teceden ts and then b e unable to extend the clause fur- ther b ecause no literal has p ositiv e gain. This partially completed

clause is then passed to invent predicate for completion. 2.2.3 In en tin gNewPredicates Predicate in en tion is carried out in a manner analo- gous to Champ (Kijsirikul et al., 1992). The rst step is to nd a minim al -arit y pro jection of the clause v ari- ables suc h that the set of pro jected tuples (a tuple is a particular instan tiation of v ariables app earing in the clause) from pro ofs of p ositiv e examples is disjoin with the pro jected tuples from pro ofs of negativ e ex- amples. These ground tuple sets form the p ositiv e and negativ e example sets for the a new predicate. The

top-lev el induction algorithm is recursiv ely in ok ed to create a de nition of the predicate. Champ searc hes for a minim al pro jection of v ariables y greedily remo ving v ariables that are not necessary to main tain the disjoin tness of the tuple sets. Chillin di ers in that it starts with no v ariables and greedily adds those v ariables whic h help separate the examples and also minim ize the set of p ositiv e examples for the new predicate. V ariables are c hosen to maxim ize the um b er of negativ e examples eliminated p er additional p ositiv e tuple created. The searc h terminates

when all negativ e examples ha e b een eliminated or there are no more v ariables to add. Supp ose that are trying to learn uncle but do not ha de nition of male After add antecedents GEN migh tbe: uncle(A,B):- sibling(A,C),parent(C,B) whic h still co ers some negativ e examples. Using this clause to pro e some p ositiv e and negativ e examples migh t result in the set of bindings sho wn here in tabular form: Set os john ja mary bill sue bruce Neg liz ja mary liz sue mary Notice that v ariables and ha eo erlap b et een p ositiv e and negativ e examples, while 's v alues are disjoin t. Invent

predicate will select the single v ari- able as the minim al pro jection and create p ositiv e ex- amples p1(john) and p1(bill) , along with the nega- tiv e example p1(liz) . Calling the top-lev el induction algorithm on these examples pro duces no compaction, so the learned de nition of p1 will just b e a listing of the p ositiv e examples. Finally build gen completes its clause b y adding the nal literal p1(A) whic histhe newly in en ted predicate represen ting male Once a predicate has b een in en ted and found useful for compressing the de nition, it is made a ailable for use in further

generalizations. This enables the induc- tion of clauses ha ving m ultiple in en ted an teceden ts, something whic h is not p ossible in the strictly top- do wn framew ork of Champ 2.2.4 HandlingRecursion When in tro ducing clauses with recursiv ean teceden ts, care m ust b e tak en to a oid unfounded recursion. oil deals with this issue b y attempting to estab- lish an ordering on the argumen ts whic hma y ap- p ear in a literal. Chillin tak es a m uc h simpler ap- proac h based on structure reduction: eac h recursiv literal m ust ha e an argumen t that is a prop er sub- term of the corresp

onding argumen t in the head of the clause. or example, in the clause member(A, [B|C]):-member(A,C) , the second argumen tofthe recursiv e literal is structure reducing, and an y recur- siv ec haining of this clause m ust ev en tually \b ottom- out." W ell-founded recursion among m ultiple clauses is guaran teed b y ensuring that ev ery recursiv e literal has at least one argumen t that is structure reducing, and for all other recursiv e literals in the de nition, the same argumen t is a (p ossibly improp er) subterm of the corresp onding argumen t in the head of the con tain- ing clause. This

prop ert y is main tained b y dropping an y unsound recursiv e literals pro duced b y the LGG op eration and only considering addition of recursiv an teceden ts whic h meet the structure-reduc ing condi- tions. These restrictions on recursiv e de nitions are strictly stronger than those imp osed b oil , but this simple approac hw orks w ell on a large class of prob- lems. The ev aluation of recursiv e clauses also requires some consideration. T esting the co erage of a non-recursiv clause is easily ac hiev ed b y unifying the head of a clause with an example and then attempting to pro the b o

dy of the clause using the bac kground theory Ev aluation of a recursiv e clause, ho ev er, requires a de nition of the concept b eing learned. As men tioned in the in tro duction, oil and Golem rely on the ex- tensional de nition pro vided b y the p ositiv e examples, giving rise to the noisy-oracle problem when these ex- amples are incomplete. Chillin , on the other hand, is able to use the curren t de nition of the predicate b eing learned, whic h is guaran teed to b e at least as general as the extensional de nition. Co erage of recursiv clauses is tested b y temp orarily adding the clause

to the existing de nition and ev aluating the an teceden ts in the con text of the bac kground kno wledge and the curren t (extended) de nition. In this w , generaliza- tion of the original examples (sa y the disco ery of the recursiv e base-case) can signi can tly impro e the co v-
Page 5
erage ac hiev ed b y correct recursiv e clauses. This ap- proac h giv es Chillin a signi can t adv an tage in learn- ing recursiv e concepts from random examples. This approac h to recursion has pro en e ectiv ein practice, although it is not without shortcomings. If a recursiv e clause is in tro

duced and subsequen t general- izations expand the co erage of the recursiv e call, the resulting de nition could co er negativ e training ex- amples; the curren t implem en tation do es not c hec kto insure that new generalizations main tain global con- sistency (although this w ould b e easy to do). Suc undesirable ordering e ects do not often arise b ecause recursiv e clauses do not generally sho w high gain un til adequate base-cases ha e b een constructed. Of course, if the data allo wo erly-general base-cases, then it is p ossible that the recursiv e clause ma y not b e generated at all.

2.3 EFFICIENCYCONSIDERA TIONS The actual implemen tation of Chillin is somewhat more complicated than the abstract description pre- sen ted so far. As the ab o e discussion indicates, the pro cess of constructing a generalization in olv es three steps: form an LGG, add an teceden ts and in en ta new predicate. If this m uc h e ort w as exp ended for a reasonable sampling of clause pairs on ev ery iteration of the compaction lo op, the algorithm w ould b e in tol- erably slo w. The curren t implemen tatio n pro vides t remedies for this problem. First, the outer compaction lo op is initially p

erformed using only the LGG construction pro cess to nd gen- eralizations. When no more compaction is found us- ing simple LGGs, the more sophisticated re nemen mec hanisms are tried. Signi can t compaction is of- ten obtainable in the initial phase, reducing the size of the theory on whic h the subsequen t (more in tensiv e) pro cessing is done. Examples in the con trol-rule do- mains for whic Chillin as designed are often highly structured, and this initial pass can reduce thousands of examples to a de nition con taining only tens of unit clauses. A second conserv ation of e ort is ac hiev

ed b yin terlea v- ing the building of generalizations. A giv en iteration of the compaction lo op b egins b y gathering a sampling of clause pairs from whic h LGGs are imm ediately con- structed. These generalizations form a p o ol of clauses whic hma y need further re nemen t. Chillin pro ceeds y rep eatedly remo ving the most promising clause and extending it with a single an teceden t. The resulting clause is then returned to the p o ol and the pro cess con tin ues. If the selected clause is unextendable, it is set aside as a candidate for predicate in en tion. Predi- cate in en tion is in

ok ed only if the p o ol of clauses has b een exhausted without nding a v alid generalization. Using this more ecien t appro ximation of the abstract algorithm, the curren t system, implem en ted in Quin- tus Prolog, running on a SparcStation 2, can handle induction problems in olving thousands of examples. EXPERIMENT ALEV ALUA TION Chillin has b een primarily used within a larger con trol-rule learning framew ork. Ho ev er, w eha e un- dertak en a series of exp erimen ts to compare Chillin 's p erformance with Golem and oil on some b enc h- mark ILP tasks. W ec hose to compare against these

systems b ecause they are w ell-kno wn, and arguably the most mature and ecien t ILP platforms dev elop ed to date. 3.1 EXPERIMENT ALDESIGN There is, as y et, no standard approac h to the ev alua- tion of ILP systems. W e are primarily in terested in the abilit y of systems to p erform \realistic" learning tasks. That is, giv en some (random) sampling of examples ho ww ell do the learned h yp otheses c haracterize the en tire example space. Therefore, w eha e adopted an exp erimen tal strategy , common in prop ositional learn- ing, of randomly splitting the example space in to dis- join t

training and testing sets. The systems w ere trained on progressiv ely larger p ortions of the train- ing examples and the p erformance of the learned rules assessed on the indep enden t testing set. This pro- cess of splitting, training and testing w as rep eated and the results a eraged o er 10 trials to pro duce learning curv es for eac h of the systems on sev eral b enc hmark problems. It is imp ortan t to note that ILP systems are often tested using a set of complete or carefully hosen p ositiv e examples. W ew ould not necessarily exp ect the systems to p erform as w ell under the more

realistic conditions of random selection used here. The n um b er of training examples in successiv e train- ing sets w as c hosen exp erimen tally to highligh t the in teresting parts of the learning curv es. Except where indicated, enough training examples w ere pro vided so that the system ha ving the b est accuracy ac hiev ed a p erfect score on the ma jorit y of the runs. The dis- tribution of p ositiv e examples in man y relational do- mains is quite sparse, and a relativ ely large n um ber of p ositiv e examples are required for eac h of these ILP systems. In order to insure a

reasonable n um ber of p ositiv e training examples, training sets w ere alw ys selected to b e one- fth p ositiv e and four- fths nega- tiv e examples. T esting sets included an equal n um ber of p ositiv e and negativ e examples to test the abilit yof the resulting rules to recognize instances of the concept and reject non-instances. Our exp erimen ts w ere p erformed using v ersion 5.0 of oil and v ersion 1.0 alpha of Golem b oth of whic are written in C. All of the algorithms w ere run with default settings of the v arious parameters. No extra mo de, t yp e, or bias information w as pro

vided b esides
Page 6
the examples and bac kground predicates. While all of the algorithms can mak e use of additional constrain ts, they do not necessarily do so in consisten tw ys; there- fore, pro viding no extra information to an y algorithm allo ws for a more direct comparison. 3.2 CCURA CYRESUL TS 3.2.1 LearningRecursiv ePrograms The rst three learning problems tested the abilit yto learn simple recursiv e concepts. W ec hose three prob- lems widely used in the ILP literature: the list predi- cates member and append and the arithmetic predicate multiply or the list predicates,

the data consisted of all lists of length 0-3 de ned o er three constan ts. The bac k- ground information consisted of de nitions of list con- struction predicates, null whic h holds for an empt list and components whic h decomp oses a list in to its head and tail. The results for these t o problems ere appro ximately the same. The learning curv es for member are presen ted in the left-hand graph of Fig- ure 3. As exp ected, with random examples, Chillin as able to learn accurate de nitions with few er ex- amples than the other systems, and without using the bac kground predicates. The domain

for the multiply problem consisted of in- tegers in the range from zero to ten. The de nition as to b e learned in terms of bac kground predicates: plus decrement zero , and one .W e exp ected oil and Golem to do w ell on this problem as it is a stan- dard b enc hmark whic h b oth systems ha e b een sho wn capable of learning. Chillin , in its curren t form, is not able to form ulate the correct recursiv e de nition for this predicate, since the required recursiv e clause do es not meet the structure-reduc ing conditions. The learning curv es, sho wn on the righ t-hand side of Figure 3, turned

out to b e quite surprising. None of the systems sho ed the abilit y to learn this concept accurately from random examples. Chillin quic kly con erged to de nitions that w ere 90 p ercen t correct for the limited domain, and w as unable to impro e. It's inaccurate de nitions, ho ev er, w ere m uc h b etter than those found b y either of the other systems. F ur- ther exp erimen tation sho ed that oil ept impro v- ing as the training set grew, but it w as only reliable in generating correct de nitions with nearly complete training sets. Golem as unable to learn the correct de nition without

additional guidance. 3.2.2 LearningwithNondeterminateLiterals Another traditional testb ed for relational learners is the domain of family relationships. W e p erformed ex- p erimen ts with an extended family tree in whic h the target predicate w as either grandfather or uncle and the bac kground consisted of facts concerning the rela- tions: parent sibling married male and female This domain is in teresting b ecause it requires the use of literals whic h violate determinacy conditions used Golem and other b ottom-up ILP systems. As exp ected, Chillin and oil do quite w ell on these problems,

and Golem is unable to learn an y reason- able de nitions. On the uncle problem, b oth oil and Chillin learned accurate de nitions from 100 training examples, with oil ha ving a sligh t edge er the 10 trial a erage. Rather surprisingly ,ho w- ev er, oil seemed to ha e more trouble on the sim- pler grandfather de nition. As can b e seen in the learning curv es in the left-hand graph Figure 4, oil 's p erformance tak es a m ysterious dip at 75 training ex- amples b efore catc hing up with Chillin at 125 exam- ples. Ev en at 175 examples where Chillin succeeds in nding a correct de nition in all

10 trials, oil is only learning the correct de nition half of the time. These exp erimen ts indicate that Chillin , lik oil is able to learn de nitions con taining nondeterminate literals. 3.2.3 Con trol-rul eLearning The previous exp erimen ts concerned learning w ell- de ned concepts con taining only one or t o clauses. Chillin as originally designed for learning con trol rules from structured examples where the de nition of the correct concept is not necessarily simple, and cer- tainly is not kno wn a priori .F or the last exp erimen ew an ted to compare the p erformance of these sys- tems

on this t yp e of problem. W ec hose a relativ ely simple task of determining when a shift-reduce parser should p erform a shift op eration in parsing a simple, regular corpus of activ e sen tences. Chillin ypically learns a v e or six clause de nition for this concept. The data for this problem w as mo di ed sligh tly so that the only logical functions app earing in the ex- amples are list constructions. Golem and Chillin can b oth handle these structures without explicit con- structor predicates. Unfortunately , it is not p ossible to run oil on this data. oil requires extension- ally

expressed constructor predicates; the components relation o er lists of the required size (up to 8) con- structible from the set of 34 constan ts app earing in these examples w ould require trillions of bac kground facts. This illustrates the diculties p osed b y the ex- tensional bac kground requiremen t. The righ t-hand graph of Figure 4 sho ws the learn- ing curv es. On this problem Chillin tends to in en new predicates. F or direct comparison, w e p erformed the exp erimen ts with t ov ersions of Chillin ; the curv e lab eled \ Chillin{ "is Chillin with predicate in en tion turned o . The

learning curv es sho w that Chillin rapidly con erges to v ery go o d de nitions. Thisdataisderiv edfromaframew orkforparsing sen tencesfrom(McClellandandKa amoto,1986)asde- scribedin(ZelleandMooney ,1993).
Page 7
10 20 30 40 50 60 70 80 90 100 20 40 60 80 100 120 Accuracy Training Examples Chillin Foil Golem 10 20 30 40 50 60 70 80 90 100 50 100 150 200 250 Accuracy Training Examples Chillin Foil Golem member multiply Figure 3: Accuracy on Recursiv e De nitions 10 20 30 40 50 60 70 80 90 100 20 40 60 80 100 120 140 160 180 Accuracy Training Examples Chillin Foil Golem 10 20 30 40 50

60 70 80 90 100 100 200 300 400 500 600 Accuracy Training Examples Chillin Chillin-- Golem grandfather shift Figure 4: Concept Accuracy for grandfather and con trol rule
Page 8
10 20 40 60 80 100 120 Learning Time Training Examples Golem Chillin Foil 20 40 60 80 100 120 140 160 180 Learning Time Training Examples Chillin Foil Golem member grandfather Figure 5: Timing Results Disabling predicate in en tion had only a minor im- pact (1%) in accuracy with smaller training sets, and no di erence w as detectable for larger sets. Golem on the other hand, nev er ac hiev es greater than 80%

accuracy and displa ys erratic learning b eha vior in this domain. 3.3 TIMINGRESUL TS Giv en the di erences in implemen tation, w e exp ected oil and Golem to b e considerably faster than Chillin Ho ev er, this w as not the case. On all problems where Golem as learning useful rules, it as signi can tly slo er than Chillin often b y a fac- tor of 10 or more. While oil tended to b e a bit faster than Chillin , the learning times for the t systems w ere generally comparable. The timing curv es for the member exp erimen ts sho wn in the left-hand graph of Figure 5 are t ypical. This graph sho ws

the time in seconds required to learn a set of rules as a function of training set size. In the exp erimen ts where oil had more dicult y learning accurate rules suc has multiply and grandfather Chillin actu- ally ran faster than oil at some data p oin ts. The left-hand graph of Figure 5 sho ws timing results for grandfather . Note that the run-time for Golem is lo er here b ecause it is not learning a de nition, but rather, just memorizing the examples. RELA TEDRESEAR CH Lik Chillin Series (Wirth and O'Rork e, 1991) and, later Indico (Stahl et al., 1993) mak e use of LGGs of examples to

construct clause heads con taining func- tions. Ho ev er, b oth of these systems precompute a set of clause heads for whic h b o dies are subsequen tly induced. The approac h tak en b Chillin in terlea es the b ottom-up and top-do wn mec hanisms, handling a larger class of concepts. An um b er of recen tin estigations ha e considered the noisy-oracle problem in the induction of recursiv e def- initions (Cohen, 1993; Lap oin te and Mat win, 1992; Muggleton, to app ear). Ho ev er, the prop osed mec h- anisms either sev erely limit the class of learnable pro- grams (e.g. to single clause,

linearly recursiv e) or rely on computationally exp ensiv e matc hing of subterms, or b oth. None has y et b een implem en ted and tested in a system for large-scale induction o er h undreds or thousands of examples. Predicate in en tion is also an area of considerable in- terest. Lik Chillin and Champ Series and Indico emplo y demand-driv en predicate in en tion. These sys- tems di er signi can tly in the heuristics used to select argumen ts for the new predicate. Another approac to in en tion is the use of the in tra{construction op- erator of in erse-resolution (Muggleton and Bun tine,

1988; Wirth, 1988; Rouv eirol, 1992; Banerji, 1992). In this approac h, new predicates are in en ted through a restructuring of an existing de nition, usually to mak it more compact. Unfortunately ,w e are not a are of an yw ork that has systematically ev aluated the com- p eting approac hes or the practical utilit y of predicate in en tion. FUTUREW ORK The top-do wn comp onen tof Chillin could clearly b e impro ed b y adding more of oil 's curren t features suc h as exploiting mo de and t yp e information, deal- ing with noise, and c hec king termination of recursiv clauses (Cameron-Jones and

Quinlan, 1994). Ehance- men ts are also needed for m ulti-predicate learning (De Raedt et al., 1993), particularly for in en ting predi- cates useful in the de nition of m ultiple concepts. Fi- nally , further exp erimen tal ev aluation on a wider range
Page 9
of more realistic problems is needed. In particular, ab- lation studies on the sp eci c utilit y of predicate in en- tion are indicated. CONCLUSION The Chillin ILP algorithm attempts to in tegrate the b est asp ects of existing ILP metho ds in to a coher- en t, no el framew ork that includes b oth top-do wn and b ottom-up searc

h, predicate in en tion, and a solution to the noisy-oracle problem. Our curren t exp erimen- tal results indicate that it is a robust and ecien t sys- tem whic h can learn a range of logic programs (includ- ing recursiv e and nondeterminate ones) from random examples more e ectiv ely than curren t metho ds suc as Golem and oil . It has also recen tly b een used to learn natural language parsers from real text cor- p ora requiring induction o er thousands of complex, structured examples (Zelle and Mo oney , 1994). Con- sequen tly ,w e b eliev eitpro vides an imp ortan t foun- dation for con

tin ued progress on robust and ecien induction of complex relational and recursiv e concepts. Ac kno wledgmen ts Thanks to Ross Quinlan and Mik e Cameron-Jones for oil , and Stephen Muggleton and Cao F eng for Golem This researc hw as supp orted b y the Na- tional Science F oundation under gran t IRI-9102926 and the T exas Adv anced Researc h Program under gran t 003658114. References Banerji, R. B. (1992). Learning theoretical terms. In Muggleton, S., editor, Inductive L gic Pr gr am- ming , 93{110. New Y ork, NY: Academic Press. Cameron-Jones, R. M., and Quinlan, J. R. (1994). Ecien t

top-do wn induction of logic programs. SIGAR T Bul letin , 5(1):33{42. Cohen, W. W. (1993). P ac-learning a resticted class of recursiv e logic programs. In Pr dings of Na- tional Confer enc eon A rti cial Intel ligenc , 86{ 92. W ashington, D.C. De Raedt, L., La vrac, N., and Dzeroski, S. (1993). Multiple predicate learning. In Pr dings of the Thirte enth International Joint c onfer enc on A rti cial intel ligenc , 1037{1042. Cham b ery rance. Kijsirikul, B., Numao, M., and Shim ura, M. (1992). Discrimination-based constructiv e induction of logic programs. In Pr dings of the T enth Na-

tional Confer enc eon A rti cial Intel ligenc , 44{ 49. San Jose, CA. Lap oin te, S., and Mat win, S. (1992). Sub-uni cation: A to ol for ecien t induction of recursiv e pro- grams. In Pr dings of the Ninth International Workshop on Machine L arning , 273{281. Ab- erdeen, Scotland. McClelland, J. L., and Ka amoto, A. H. (1986). Mec h- nisms of sen tence pro cessing: Assigning roles to constituen ts of sen tences. In Rumelhart, D. E., and McClelland, J. L., editors, Par al lel Dis- tribute dPr essing, V ol. II , 318{362. Cam bridge, MA: MIT Press. Muggleton, S. (to app ear). In erting

implication. r- ti cial Intel ligenc Muggleton, S., and Bun tine, W. (1988). Mac hine in- en tion of rst-order predicates b yin erting res- olution. In Pr dings of the Fifth International Confer enc e on Machine L arning , 339{352. Ann Arb or, MI. Muggleton, S., and F eng, C. (1992). Ecien t induction of logic programs. In Muggleton, S., editor, In- ductive L gic Pr gr amming , 281{297. New Y ork: Academic Press. Plotkin, G. D. (1970). A note on inductiv e general- ization. In Meltzer, B., and Mic hie, D., editors, Machine Intel ligenc e(V ol. 5) . New Y ork: Elsevier North-Holland. Quinlan,

J. (1990). Learning logical de nitions from relations. Machine L arning , 5(3):239{266. Rouv eirol, C. (1992). Extensions of in ersion of resolu- tion applied to theory completion. In Muggleton, S., editor, Inductive L gic Pr gr amming , 63{86. New Y ork, NY: Academic Press. Stahl, I., T ausend, B., and Wirth, R. (1993). Tw metho ds for impro ving inductiv e logic program- ming systems. In Machine L arning: ECML-93 41{55. Vienna. Wirth, R. (1988). Learning b y failure to pro e. In Pr dings of EWSL 88 , 237{51. Pitman. Wirth, R., and O'Rork e, P . (1991). Constrain ts on predicate in en tion.

In Pr dings of the Eighth International Workshop on Machine L arning 457{461. Ev anston, Ill. Zelle, J. M., and Mo oney , R. J. (1993). Learn- ing seman tic gramma rs with constructiv e induc- tiv e logic programming. In Pr dings of the Eleventh National Confer enc eon A rti cial Intel- ligenc , 817{822. W ashington, D.C. Zelle, J. M., and Mo oney , R. J. (1994). Inducing de- terministic Prolog parsers from treebanks: A ma- hine learning approac h. In Pr dings of Na- tional Confer enc eonA rti cial Intel ligenc . Seat- tle, W A.