/
CoarsetoFine Inference and Learning for FirstOrder Probabilistic Models Chlo e Kiddon CoarsetoFine Inference and Learning for FirstOrder Probabilistic Models Chlo e Kiddon

CoarsetoFine Inference and Learning for FirstOrder Probabilistic Models Chlo e Kiddon - PDF document

min-jolicoeur
min-jolicoeur . @min-jolicoeur
Follow
540 views
Uploaded On 2014-12-12

CoarsetoFine Inference and Learning for FirstOrder Probabilistic Models Chlo e Kiddon - PPT Presentation

washingtonedu Abstract Coarseto64257ne approaches use sequences of increasingly 64257ne approximations to control the complexity of inference and learning These techniques are often used in NLP and vision applications However no coarseto64257ne infer ID: 22927

washingtonedu Abstract Coarseto64257ne approaches use

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "CoarsetoFine Inference and Learning for ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

WeightedFirst-OrderLogicRules Evidence 1:4TAs(x;c)^Teaches(y;c))Advises(y;x) TAs(Anna;AI101) 4:3Publication(t;x)^Advises(y;x))Publication(t;y) Table1:ExampleofaMarkovlogicnetworkandevidence.Freevariablesareimplicitlyuniversallyquantied.anyprobabilisticinferencealgorithm.Ourframeworkusesandgeneralizeshierarchicalmodels,whicharewidespreadinmachinelearningandstatistics(e.g.,GelmanandHill(2006),Pfefferetal.(1999)).Ourapproachalsoincor-poratesmanyoftheadvantagesoflazyinference(Poon,Domingos,andSumner2008).OuralgorithmsareformulatedintermsofMarkovlogic(DomingosandLowd2009).Thegeneralityandsimplic-ityofMarkovlogicmakeitanattractivefoundationforacoarse-to-neinferenceandlearningframework.Inpartic-ular,ourapproachdirectlyappliestoallrepresentationsthatarespecialcasesofMarkovlogic,includingstandardgraph-icalmodels,probabilisticcontext-freegrammars,relationalmodels,etc.However,ourframeworkcouldalsobeformu-latedusingotherrelationalprobabilisticlanguages.Webeginwithnecessarybackground,presenttheframe-work,andthenprovideboundsontheapproximationerror.Wethenreportourexperimentsontworeal-worlddomains(asocialnetworkoneandamolecularbiologyone)apply-ingCFPIwithliftedbeliefpropagation.OurresultsshowthatourapproachcanbemoreeffectivecomparedtoliftedbeliefpropagationwithoutCFPI.BackgroundGraphicalmodelscompactlyrepresentthejointdistributionofasetofvariablesX=(X1;X2;:::;Xn)2Xasaproductoffactors(Pearl1988):P(X=x)=1 ZQkfk(xk),whereeachfactorfkisanon-negativefunctionofasub-setofthevariablesxk,andZisanormalizationconstant.IfP(X=x)�0forallx,thedistributioncanbeequiv-alentlyrepresentedasalog-linearmodel:P(X=x)=1 Zexp(Piwigi(x)),wherethefeaturesgi(x)arearbitraryfunctionsof(asubsetof)thestate.Thefactorgraphrep-resentationofagraphicalmodelisabipartitegraphwithanodeforeachvariableandfactorinthemodel(Kschischang,Frey,andLoeliger2001).(Forconvenience,weconsideronefactorfi(x)=exp(wigi(x))perfeaturegi(x),i.e.,wedonotaggregatefeaturesoverthesamevariablesintoasin-glefactor.)Undirectededgesconnectvariableswiththeap-propriatefactors.Themaininferencetaskingraphicalmod-elsistocomputetheconditionalprobabilityofsomevari-ables(thequery)giventhevaluesofothers(theevidence),bysummingouttheremainingvariables.InferencemethodsforgraphicalmodelsincludebeliefpropagationandMCMC.Arst-orderknowledgebase(KB)isasetofsentencesorformulasinrst-orderlogic.Constantsrepresentobjectsinthedomainofinterest(e.g.,people:Amy,Bob,etc.).Vari-ablesrangeoverthesetofconstants.Apredicateisasymbolthatrepresentsarelationamongobjects(e.g.,Advises)oranattributeofanobject(e.g.,Student)anditsarity(num-berofargumentsittakes).Anatomisapredicateappliedtoatupleofvariablesorobjects(e.g.,Advises(Amy;y))oftheproperarity.Aclauseisadisjunctionofatoms,eachofwhichcaneitherbenegatedornot.Agroundatomisanatomwithonlyconstantsasarguments.Agroundclauseisadisjunctionofgroundatomsortheirnegations.First-orderprobabilisticlanguagescombinegraphicalmodelswithelementsofrst-orderlogicbydeningtem-platefeaturesthatapplytowholeclassesofobjectsatonce.AsimpleandpowerfulsuchlanguageisMarkovlogic(DomingosandLowd2009).AMarkovlogicnetwork(MLN)isasetofweightedrst-orderclauses.Givenasetofconstants,anMLNdenesaMarkovnetworkwithonenodepergroundatomandonefeaturepergroundclause.Theweightofafeatureistheweightoftherst-orderclausethatoriginatedit.TheprobabilityofastatexisgivenbyP(x)=1 Zexp(Piwigi(x)),wherewiistheweightoftheithclause,gi=1iftheithclauseistrue,andgi=0oth-erwise.Table1showsanexampleofasimpleMLNrepre-sentinganacademiamodel.Anexampleofagroundatom,givenasevidence,isshown.StatesoftheworldwheremoreadviseesTAfortheiradvisors,andadviseesandtheiradvi-sorscoauthorpublications,aremoreprobable.InferenceinMarkovlogiccanbecarriedoutbycreatingandrunningin-ferenceoverthegroundnetwork,butthiscanbeextremelyinefcientbecausethesizeofthegroundnetworkisO(dc),wheredisthenumberofobjectsinthedomainandcisthehighestclausearity.Liftedinferenceestablishesamorecompactversionofthegroundnetworkinordertomakein-ferencemoreefcient.Inliftedbeliefpropagation(LBP),subsetsofcomponentsinthegroundnetworkareidentiedthatwillsendandreceiveidenticalmessagesduringbeliefpropagation(SinglaandDomingos2008).RepresentationThestandarddenitionofanMLNassumesanundifferenti-atedsetofconstants.Webeginbyextendingittoallowforahierarchyofconstanttypes.Denition1Atypeisasetofconstantst=fk1;:::;kng.Atypetisasubtypeofanothertypet0ifftt0.Atypetisasupertypeofanothertypet0ifft0t.Arenementofatypetisasetoftypesft1;:::;tmgsuchthat8i;jti\tj=;andt=t1[t2[[tm.Denition2Atypedpredicateisatuplea=(a0;t1;:::;tn),wherea0isapredicate,nisa0'sarity,andtiisthetypeofa0'sithargument.Atypedatomisatypedpredicateappliedtoatupleofvariablesorobjectsoftheproperarityandtypes.Atypedclauseisatuplec=(c0;t1;:::;tn),wherec0isarst-orderclause,nisthenumberofuniquevariablesinc0,andtiisthetypeoftheithvariableinc0.The (Proofsoftheoremsareprovidedintheappendix.)Whenatomsarepruned,thesetofpossibleworldsshrinksandtheprobabilitiesoftheremainingpossibleworldsmustberenormalized.Intuitively,errorsstemfrompruningpossi-bleworldsthathavenon-zeroprobability(orpruningworldswhereP(x)6=1forthehigh-probabilitycase).Wecanboundtheprobabilitymassofprunedworldsbasedonweightapproximationsandthenumberofpreviouslyprunedatoms.Inturn,wecanusethoseboundstobounderrorsinatommarginals.Infer()canbeanyliftedprobabilisticinferencealgorithm(orevenpropositionalizationfollowedbygroundinference,althoughthisisunlikelytoscaleeveninthecontextofCFPI).Iftheinferencealgorithmisexact(e.g.,FOVE(deSalvoBraz,Amir,andRoth2007)),theerror=0intheabovebound.However,realisticdomainsgenerallyrequireapproximateinference.Inthispaper,weuseliftedbeliefpropagation(SinglaandDomingos2008).WecallCFPIappliedtoliftedbeliefprop-agationCFLBP(C oarse-to-F ineL iftedB eliefP ropagation).WenowprovideanerrorboundforCFLBP.WhileTheorem1providesanintuitiveerrorboundthatisindependentoftheinferencemethodusedwithCFPI,Theorem2providesatighterboundwhentheerroriscalculatedconcurrentlywithinference.WebaseouralgorithmonTheorem15ofIhleretal.(2005)thatboundserrorsonatommarginalsduetomul-tiplicativeerrorsinthemessagespassedduringBP.SinceliftedBPcomputesthesamemarginalsasgroundBP,forthepurposesofaproof,theformercanbetreatedasthelat-ter.WecanviewtheerrorsinthemessagespassedduringBPinlevelkofCFLBPasmultiplicativeerrorsonthemes-sagesfromfactorstonodesateachstepofBP,duetoweightapproximationsatthatlevelandthelossofprunedatoms.Theorem2ForthenetworkatlevelkofCFPI,letpkxbetheprobabilityestimatedbyBPatconvergence,^pkxbetheprob-abilityestimatedbyCFLBPafterniterationsofBP,�and+bethesetsoflow-andhigh-probabilityatomsprunedinCFLBP'spreviousk�1runsofBP, kfbethedifferenceinweightoffactorfbetweenlevelkandthenallevelK,and bethepruningthreshold.Forabinarynodex,pkxcanbeboundedasfollows:Forx2�:0pkx Forx2+:1� pkx1Andforx62�[+:pkx1 (k;nx)2[(1=^pkx)�1]+1=lb(pkx)andpkx1 (1=k;nx)2[(1=^pkx)�1]+1=ub(pkx);wherelogk;nx=Xf2nb(x)logk;nf;x;k;1x;f=d(f)2;logk;i+1x;f=Xh2nb(x)nffglogk;ih;x;logk;i+1f;x=logd(f)2k;if;x+1 d(f)2+k;if;x+logd("kf;x);logk;if;x=Xy2nb(f)nfxglogk;iy;f;d("kf;x)= �1 2j�fj(1� )�1 2j+fje1 2 kf;andnodesareonlyprunedatlevelk0wheneitherub(pk0x) orlb(pk0x)1� .AlthoughTheorem2doesnothaveaparticularlyintuitiveform,ityieldsmuchtighterboundsthanTheorem1ifweperformtheboundcomputationsaswerunBP.Ifnoatomsareprunedatpreviouslevels,thexedpointbeliefsreturnedfromCFLBPonitskthlevelofBPafterniterationswillbeequivalenttothosereturnedbyBPafterniterationsonthenetworkatthatlevel.Coarse-to-FineLearningThecriticalassumptioninvokedbytheinferenceframe-workisthatobjectsofthesametypetendtoactinsimi-larmanners.IntermsofatypedMLN,strongerweightsonclausesovercoarsertypesallowpruningdecisionstobemadeearlier,whichspeedsuplateriterationsofinfer-ence.Toachievemodelsofthistype,welearnweightsinacoarse-to-nemannerthroughaseriesofsuccessivere-nementsofclauses.Theweightsforclausesateachiter-ationoflearningarelearnedwithallweightslearnedfromprecedingiterationsheldxed.Theeffectisthataweightlearnedforatypedclauseistheadditionalweightgiventoaclausegroundingbasedonhavingthatnewtypeinforma-tion.Astheweightsarelearnedforclausesovernerandnertypesignatures,theseweightsshouldbecomesucces-sivelysmallerastheextratypeinformationislessimportant.Abenetofthiscoarse-to-neapproachtolearningisthatassoonasreningatypedclausedoesnotgiveanynewin-formation(e.g.,alldirectrenementsofaclausearelearnedtohave0weight),thetypedclauseneednotberenedfur-ther.Theresultisasparsermodelthatwillcorrespondtofewerpossiblerenementsduringtheinferenceprocessandthereforemoreefcientinference.Proposition1ForatypedMLNMTlearnedinthecoarse-to-neframework,thereisanequivalenttyped-attenedMLNM0Tsuchthatnoclausec2M0Tcanbeobtainedthroughaseriesofdirectclauserenementsofanyotherclausec02M0T.WhenRene(M;MT)replacesaclausecinMbyitsdi-rectclauserenements,theweightofeachnewtypedclausec0iaddedtoMisw+w0i,wherewistheweightofcinMandw0iistheweightofc0iinMT.Whentherearenomorerenements,theresultingtypedMLNwillbeasubsetofthetype-attenedMLNM0T,accountingforprunedclauses.Coarse-to-nelearningisnotessential,butitgreatlyim-provestheefciencyofcoarse-to-neinference.Bydesignityieldsamodelthatisequivalenttothetype-attenedone,andsoincursnolossinaccuracy.Wenotethatusingregu-larizationwhilelearningcausesthetypedMLNtoonlybeapproximatelyequivalenttothetype-attenedMLNbutcan typesignatures.Thefulldatabasecontains4,000predicategroundings,includingtypepredicates.Toevaluateinferenceoverdifferentnumbersofobjectsinthedomain,weran-domlyselectedgraphcutsofvarioussizesfromthedomain.Figure2(a)showsacomparisonoftheruntimesofCFLBPandLBPfordifferentsizedcutsoftheUW-CSEdataset.WeranCFLBPwithpruningthresholdsof =0:01and =0:001.Thetimeisthesumofbothinitializationofthenetworkandtheinferenceitself;thetimesforCFLBPalsoincludetherenementtimesaftereachlevel.ForeachcutoftheUW-CSEdataset,theaverageconditionalloglikelihood(CLL)oftheresultsreturnedbyCFLBPwitheitherpruningthresholdwerevirtuallythesameastheaverageconditionalloglikelihoodreturnedbyLBP.Table2summarizesthere-sultsoftheUW-CSElinkpredictionexperimentoverthefullUWCSEdataset.Thefulldatasetcontained815objects,including265people,and3833evidencepredicates.With =0:01,weachieveanorderofmagnitudespeedup.BiomolecularEventPredictionAsnewbiomedicalliteratureaccumulatesatarapidpace,theimportanceoftextminingsystemstailoredtothedo-mainofmolecularbiologyisincreasing.Oneimportanttaskistheidenticationandextractionofbiomoleculareventsfromtext.Eventpredictionisachallengingtask(Kimetal.2003)andisnotthefocusofthispaper.Oursimpliedtaskistopredictwhichentitiesarethecausesandthemesofiden-tiedeventscontainedinthetext,representedbytwopredi-cates:Cause(event;entity)andTheme(event;entity).WeusedtheGENIAeventcorpusthatmarkslinguisticex-pressionsthatidentifybiomedicaleventsinscienticliter-aturespanning1,000Medlineabstracts;thereare36,114eventslabeled,andthecorpuscontainsafulltypehierarchyof32entitytypesand28eventtypes(Kim,Ohta,andTsu-jii2008).Ourfeaturesincludesemanticco-occurrenceanddirectsemanticdependencieswithasetofkeystems(e.g.,Subj(entity;stem;event)).Wealsolearnedglobalfea-turesthatrepresenttherolesthatcertainentitiestendtoll.WeusedtheStanfordparser,2fordependencyparsingandaPorterstemmertoidentifykeystems.3Werestrictedourfocustoeventswithonecauseandonethemeornocauseandtwothemeswherewecouldextractinterestingseman-ticinformationatoursimplelevel.ThemodelwaslearnedoverhalftheGENIAeventcorpusandtestedontheotherhalf;abstractsamplesofvaryingsizeswererandomlygen-erated.From13untypedclauses,thetype-attenedMLNhad38,020clauses.Figure2(b)showsacomparisonoftheruntimesofCFLBPwith =0:01andLBP.ForeachtestsetwherebothCFLBPandLBPnished,theaverageconditionalloglikelihoodswerealmostidentical.Thelargestdifferenceinaverageconditionalloglikelihoodwas0:019withadatasetof175objects;inallothertests,thedifferencebetweentheaverageswasnevermorethan0:001.Table2summarizestheresultsofthethelargestGENIAeventpredictionexperi-mentwherebothLBPandCFLBPnishedwithoutrunning 2http://nlp.stanford.edu/software/lex-parser.shtml3http://tartarus.org/martin/PorterStemmeroutofmemory.Thistestsetincluded125eventsand164entities.ConclusionandFutureWorkWepresentedageneralframeworkforcoarse-to-neinfer-enceandlearning.Weprovidedboundsontheapproxima-tionerrorincurredbythisframework.Wealsoproposedasimpleweightlearningmethodthatmaximizesthegainsob-tainablebythistypeofinference.Experimentsontwodo-mainsshowthebenetsofourapproach.Directionsforfu-tureworkinclude:inducingthetypehierarchyfromdataforuseinCFPI;broadeningthetypesoftypestructureallowedbyCFPI(e.g.,multipleinheritance);andapplyingCFPItootherliftedprobabilisticinferencealgorithmsbesidesLBP.AcknowledgementsWethankAniruddhNathforhishelpwithTheorem2.ThisresearchwaspartlyfundedbyAROgrantW911NF-08-1-0242,AFRLcontractFA8750-09-C-0181,NSFgrantIIS-0803481,ONRgrantN00014-08-1-0670,andtheNationalScienceFoundationGraduateResearchFellowshipunderGrantNo.DGE-0718124.Theviewsandconclusionscon-tainedinthisdocumentarethoseoftheauthorsandshouldnotbeinterpretedasnecessarilyrepresentingtheofcialpolicies,eitherexpressedorimplied,ofARO,DARPA,AFRL,NSF,ONR,ortheUnitedStatesGovernment.Appendix:ProofsofTheoremsProofofTheorem1Theprobabilityofanatomistheprobabilityofalltheworlds(e.g.,atomassignmentsx)inwhichthatatomistrue:P(xi)=Xx2XxiP(x)wherexiis1ifxiistrueinxand0otherwise.Assumethatxiisprunedatlevelkifitsapproximatemarginalprobabil-ity,^Pk(xi),fallsbelow afterrunninginferenceatlevelk.(Wewillconsiderpruninghigh-probabilityatomslater.)Ifxiispruned,thentheprobabilityofalltheworldswherexiistrueissetto0;theseworldsareessentiallyprunedfromthesetofpossibleworlds.LetWbeasetofworlds.LetP0(xi)bethemarginalprobabilityofxigiventhattheworldsinWhavebeenpruned(e.g.,theprobabilityofeachworldinWissetto0).Then,P0(xi)=Px2XnWxiePjwjfj(x) Px2XnWePjwjfj(x):Ateachlevelk,theweightwjisapproximatedbysome^wjwithatmostdifferencekw:jwj�^wjjkw:AssumenowthatWisthesetofworldsthathavebeenprunedinlevels1throughk�1.Ifxiisprunedatlevel deSalvoBraz,R.;Natarajan,S.;Bui,H.;Shavlik,J.;andRussell,S.2009.Anytimeliftedbeliefpropagation.InPro-ceedingsoftheSixthInternationalWorkshoponStatisticalRelationalLearning.Domingos,P.,andLowd,D.2009.MarkovLogic:AnIn-terfaceLayerforArticialIntelligence.MorganKaufmann.Felzenszwalb,P.F.,andHuttenlocher,D.P.2006.Efcientbeliefpropagationforearlyvision.InternationalJournalofComputerVision70(1):41–54.Felzenszwalb,P.;Girshick,R.;andMcAllester,D.2010.Cascadeobjectdetectionwithdeformablepartmodels.InIEEEConferenceonComputerVisionandPatternRecogni-tion,2241–2248.Gelman,A.,andHill,J.2006.DataAnalysisUsingRe-gressionandMultilevel/HierarchicalModels.CambridgeUniversityPress.Getoor,L.,andTaskar,B.,eds.2007.IntroductiontoStatis-ticalRelationalLearning.MITPress.Ihler,A.T.;III,J.W.F.;andWillsky,A.S.2005.Loopybeliefpropagation:Convergenceandeffectsofmessageer-rors.JournalofMachineLearningResearch6:905–936.Kersting,K.;Ahmadi,B.;andNatarajan,S.2009.Countingbeliefpropagation.InProceedingsofthe25thConferenceonUncertaintyinArticialIntelligence,277–284.Kim,J.-D.;Ohta,T.;Tateisi,Y.;andTsujii,J.2003.GENIAcorpus–semanticallyannotatedcorpusforbio-textmining.Bioinformatics19(1):i180–i182.Kim,J.-D.;Ohta,T.;andTsujii,J.2008.Corpusannotationforminingbiomedicaleventsfromliterature.BMCBioin-formatics9(1):10.Kisynski,J.,andPoole,D.2009.Liftedaggregationindi-rectedrst-orderprobabilisticmodels.InProceedingsoftheTwenty-SecondInternationalJointConferenceonArticialIntelligence,1922–1929.Kok,S.;Sumner,M.;Richardson,M.;Singla,P.;Lowd,H.P.D.;andDomingos,P.2007.TheAlchemysystemforsta-tisticalrelationalAI.Technicalreport,DepartmentofCom-puterScienceandEngineering,UniversityofWashington,Seattle,WA.http://alchemy.cs.washington.edu.Kschischang,F.R.;Frey,B.J.;andLoeliger,H.-A.2001.Factorgraphsandthesum-productalgorithm.IEEETrans-actionsonInformationTheory47(2):498–519.Pearl,J.1988.ProbabilisticReasoninginIntelligentSys-tems:NetworksofPlausibleInference.MorganKaufmann.Petrov,S.,andKlein,D.2007.Learningandinfer-enceforhierarchicallysplitPCFGs.InProceedingsoftheTwenty-SecondNationalConferenceonArticialIntel-ligence,1663–1666.Petrov,S.;Sapp,B.;Taskar,B.;andWeiss,D.(organizers).2010.NIPS2010WorkshoponCoarse-to-FineLearningandInference.Whistler,B.C.Pfeffer,A.;Koller,D.;Milch,B.;andTakusagawa,K.T.1999.SPOOK:Asystemforprobabilisticobject-orientedknowledgerepresentation.InProceedingsoftheFifteenthConferenceonUncertaintyinArticialIntelligence,541–550.Poole,D.2003.First-orderprobabilisticinference.InPro-ceedingsoftheEighteenthInternationalJointConferenceonArticialIntelligence,985–991.Poon,H.;Domingos,P.;andSumner,M.2008.AgeneralmethodforreducingthecomplexityofrelationalinferenceanditsapplicationtoMCMC.InProceedingsoftheTwenty-ThirdNationalConferenceonArticialIntelligence,1075–1080.Raphael,C.2001.Coarse-to-nedynamicprogramming.IEEETransactionsonPatternAnalysisandMachineIntelli-gence23(12):1379–1390.Sen,P.;Deshpande,A.;andGetoor,L.2009.Bisimulation-basedapproximateliftedinference.InProceedingsoftheTwenty-FifthConferenceonUncertaintyinArticialIntelli-gence,496–505.Singla,P.,andDomingos,P.2008.Liftedrst-orderbeliefpropagation.InProceedingsoftheTwenty-ThirdNationalConferenceonArticialIntelligence,1094–1099.Singla,P.2009.MarkovLogic:Theory,AlgorithmsandAp-plications.PhDinComputerScience&Engineering,Uni-versityofWashington,Seattle,WA.Staab,S.,andStuder,R.2004.HandbookonOntolo-gies(InternationalHandbooksonInformationSystems).SpringerVerlag.Weiss,D.,andTaskar,B.2010.Structuredpredictioncas-cades.InInternationalConferenceonArticialIntelligenceandStatistics,916–923.