/
Learning from Dyadic Data Learning from Dyadic Data

Learning from Dyadic Data - PDF document

lois-ondreau
lois-ondreau . @lois-ondreau
Follow
381 views
Uploaded On 2017-08-16

Learning from Dyadic Data - PPT Presentation

Jordan Cen ter for Biological and Computational Learning MIT Cam bridge MA hofmann jordan aimi tedu Institut f ur Informatik I I I Univ ersit at Bonn German jancsunib onnde Abstract Dyadic data refers to a domain with t o nite sets of ob jects in w ID: 80877

Jordan Cen ter for Biological

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Learning from Dyadic Data" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

LearningfromDyadicDataoappearin:ancesinNeuralInformationProcessingSystems11,MITPress ThomasHofmann,JanPuzic,MichaelI.JordanterforBiologicalandComputationalLearning,M.I.Tbridge,MA,hofmann,jordanInstitutfurInformatikIII,UnivatBonn,German,jan@cs.uni-bonn.deDyadicdatareferstoadomainwithto nitesetsofobjectsinhobservationsaremadefor,i.e.,pairswithoneelemenfromeitherset.Thistypeofdataarisesnaturallyinmanyap-plicationrangingfromcomputationallinguisticsandinformationaltopreferenceanalysisandcomputervision.Inthispaper,epresentasystematic,domain-independentframeworkoflearn-ingfromdyadicdatabystatisticalemo.Ourapproacersdi erentmodelswith atandhierarchicallatentclassstruc-tures.WeproposeanersionofthestandardEMalgo-rithmformodel ttingwhichisempiricallyevaluatedonavofdatasetsfromdi erentdomains.troductionerthepastdecadearningfromdatahasbecomeahighlyactive eldofre-hdistributedoermanydisciplineslikepatternrecognition,neuralcompu-tation,statistics,machinelearning,anddatamining.Mostdomain-independenlearningarchitecturesaswellastheunderlyingtheoriesoflearninghaebeenfo-cusingonafeature-baseddatarepresentationbectorsinanEuclideanspace.Fthisrestrictedcasesubstantialprogresshasbeenaced.Hoer,avyofimportantproblemsdoesnot tintothissettingandfarlessadvanceshaebeenmadefordatatypesbasedondi erentrepresenInthispaper,wewillpresentageneralframeworkforunsupervisedlearningfromdyadicdata.Thenotionreferstoadomainwithto(abstract)setsofob-;:::;x;:::;yinwhichobservaremadefor).Inthesimplestcase{onwhicefocus{anelementaryobservconsistsjustof()itself,i.e.,a,whileothercasesyalsoprovideascalarv(strengthofpreferenceorassociation).Someex-emplaryapplicationareasare:(i)Computationallinguisticswiththecorpus-basedstatisticalanalysisofwordco-occurrenceswithapplicationsinlanguagemodeling,ordclustering,wordsensedisambiguation,andthesaurusconstruction.(ii)dinformationr,whereycorrespondtoadocumentcollection, T.Hofmann,J.Puzicha,M.Jordan:LearningfromDyadicData,NIPS*98tokords,and(ouldrepresenttheoccurrenceofaterminadocumen.(iii)delingofpreandconsumptionbyidenindividualsandwithobjectsorstimuliasinollabative ltering.(iv),inparticularinthecontextofimagesegmentation,wherecorrespondstoimagelocations,todiscretizedorcategoricalfeaturevalues,andadyad(tsafeatureedataparticularlocationMixtureModelsforDyadicDataAcrossdi erentdomainsthereareatleasttotaskswhichplayafundamentalroleinunsupervisedlearningfromdyadicdata:(i)probabilisticmodeling,i.e.,learningajointorconditionalprobabilitymodeloXY,and(ii)structuredisco,e.g.,tifyingclustersanddatahierarchies.Thekeyprobleminprobabilisticmodelingisthedatasp:Howcanprobabilitiesforrarelyobservedorevenunobservco-occurrencesbereliablyestimated?Asananswerweproposeamodel-basedap-handformlatentclassemo.Thelatterhaethefurthertagetoo eraunifyingmethodforprobabilisticmodelingandstructuredis-.Thereareatleastthree(four,ifbothvtsin(ii)arecounted)di erenysofde ninglatentclassmodels:i.Themostdirectwyistointroducean(unobserved)mappingXY!;:::;cthatpartitionsXYclasses.Thistypeofmodelisandthepre-image)isreferredtoasanii.Alternativ,aclasscanbede nedasasubsetofofthespacesysymmetry,yieldingadi erentmodel),i.e.,X!f;:::;cinducesauniquepartitioningonXY).Thismodelisreferredtoasdclusteringiscalledaiii.Iflatentclassesarede nedforbothsets,X!f;:::;cY!f;:::;c,respectiv,thisinducesamappinghisapartitioningofXY.ThismodeliscalleddclusteringAspectModelforDyadicDataInordertospecifyanaspectmodelwemaketheassumptionthatallco-occurrencesinthesamplesetarei.i.d.andthatareconditionallyindependentgivtheclass.Withparameters)fortheclass-conditionaldistributionsandpriorprobabilities)thecompletedataprobabilitycanbewrittenasasP(cik)P(xijcik)P(ykjcik)]n(xi;yk);(1)wheren(xi;yk)aretheempiricalcountsfordyadsin).BysummingoerthelatentheusualmixtureformulationisobtainedwingthestandardctationMaximizationhformaximumlikelihoodestimation[Dempsteretal.,1977],theE-stepequationsfortheclassposteriorprob-abilitiesaregivenb Inthecaseofmultipleobservationsofdyadsithasbeenassumedthateachobservyhaeadi erentlatentclass.Ifonlyonelatentclassvariableisintroducedforeacad,slightlydi erentequationsareobtained. T.Hofmann,J.Puzicha,M.Jordan:LearningfromDyadicData,NIPS*98 two 0.18seven 0.10three 0.10four 0.06five 0.06years 0.11thousand 0.1hundred 0.1days 0.07cubits 0.05 #2, 0.005went 0.10go 0.08come 0.04brought 0.03up 0.40down 0.17forth 0.15out 0.09in 0.01 #3, 0.008have 0.38hath 0.22had 0.11hast 0.09be 0.02not 0.04done 0.04given 0.03made 0.03been 0.03 #4, 0.002shalt 0.18hast 0.08wilt 0.08art 0.07if 0.05thou 0.85not 0.01also 0.004indeed 0.003anoint 0.003 #5, 0.040the 0.95his 0.006my 0.005our 0.003thy 0.003lord 0.09children 0.02son 0.02land 0.02people 0.02 #6, 0.011 he 0.51god 0.08lord 0.05and 0.04who 0.03hath 0.14shall 0.07said 0.05is 0.04was 0.04 #7, 0.029&#x.000;&#x:000;&#x,000;&#x;000;&#x?000;and 0.33for 0.08but 0.07then 0.05so 0.02 #8, 0.007thee 0.04me 0.03him 0.03it 0.02you 0.02&#x?000;&#x,000;&#x:000;P(ca)maximalP(xi|ca)maximalP(yk|ca) Figure1:SomeaspectsoftheItisstraighardtoderivetheM-stepre-estimationformandananalogousequationfor).Byre-parameterizationtheaspectmodelcanalsobecharacterizedbyacross-enycriterion.Moreoer,formalequivlencetothegateMarkovmo,independentlyproposedforlanguagemodel-ingin[Saul,Pereira,1997],hasbeenestablished(cf.[Hofmann,Puzicha,1998]forOne-SidedClusteringModelThecompletedatamodelproposedfortheone-sidedclusteringmodelisisP(xi)P(ykjc(xi))]n(xi;yk)1A;(5)wherewehaemadetheassumptionthatobservations()foraparticularareconditionallyindependentgiv).Thise ectivelyde nesthemixturemixtureP(xi)P(ykjc )]n(xi;yk);(6)whereSiareallobservationsin.Noticethatco-occurrencesinarenotindependent(astheyareintheaspectmodel),butgetcoupledbythe(shared)).Asbefore,itisstraighardtoderiveanEMalgorithmwithupdateequations).Theone-sidedclusteringmodelissimilartothedistributionalclusteringmodel[Petal.,1993],hoer,therearetoimportantdi erences:(i)thenberoflikelihoodconin(7)scaleswiththenberofobservations{afactwhichfollowsfromBaes'rule{and(ii)mixingproportionsaremissingintheoriginaldistributionalclusteringmodel.Theone-sidedclusteringmodelcorrespondstoanunsupervisedversionofthenaiveBaes'classi er,ifweinasafeaturespaceforobjectsTherearealsowystowentheconditionalindependenceassumption,e.g.,butilizingamixtureoftreedependencymodels[Meila,Jordan,1998o-SidedClusteringModelThelatenariablestructureoftheto-sidedclusteringmodelsigni cantlyreducesthedegreesoffreedominthespeci cationoftheclassconditionaldistribution.W T.Hofmann,J.Puzicha,M.Jordan:LearningfromDyadicData,NIPS*98 Figure2:Exemplarysegmentationresultsonyone{sidedclustering.proposethefollowingcompletedatamodelareclusterassociationparameters.Inthismodelthelateninthespacearecoupledbythe{parameters.Therefore,thereexistsnosimplemixturemodelrepresentationfor).Skippingsomeofthetecdetails(cf.[Hofmann,Puzicha,1998])weobtain)andtheM-stepequations PiPfc(xi)=cx gPkn(xi;yk)][aswellasopreservyfortheremainingproblemofcomputingtheposteriorprobabilitiesintheE-step,weapplyafactorialapproximation(an eldximation),i.e.,.Thisresultsinthewingcoupledapproximationequationsforthemarginalposteriorprobabilities)expandasimilarequationfor.TheresultingapproximateEMalgorithmperformsupdatesaccordingtothesequence({post.,{post.,).Inthe(probabilistic)clusteringinonesetisoptimizedinalternationforagivenclus-teringintheotherspaceandviceversa.Theto{sidedclusteringmodelcanalsobeshowntomaximizeamutualinformationcriterion[Hofmann,Puzicha,1998:AspectsandClustersobetterunderstandthedi erencesofthepresentedmodelsitiselucidatingtosystematicallycomparetheconditionalprobabilities)and Aspect Model P(c jxi) P(xijc )P(c ) P(xi)Pfc(xi gP(xijc )P(c ) P(xi)Pfc(xix gP(c jyk) P(ykjc )P(c ) P(yk)P(ykjc )P(c ) Ascanbeseenfromtheaboetable,probabilities)and)correspondtoposteriorprobabilitiesoflatenariablesifclustersarede nedinthe{and{space,respectiv.Otherwise,theyarecomputedfrommodel.Thisisacrucialdi erenceas,forexample,theposteriorprobabilitiesareapproac T.Hofmann,J.Puzicha,M.Jordan:LearningfromDyadicData,NIPS*98 12345678901234567890123456789012 bodybookeaponbehafloorpersonbeingsocietexperiencepositionmethodpapershapespotberreportmajoritpropertpeoplehoolpurposeype goodoppositepotenpeacefulspecialimportanopenperfectcooloccasionalsocialpossiblemodernreciprocalpoorbeautifullocalpoliticalpersonal Figure3:Two{sidedclusteringofLOB:matrixandmostprobablewBooleanvaluesinthein nitedatalimitandareconergingtooneoftheclass-conditionaldistributions.Yet,intheaspectmodel)and)aretypicallynotpeakingmoresharplywithanincreasingnberofobservations.Intheaspectmodel,conditionals)areinherentlyawtedsumofthe`protot).Clustermodelsinturnultimatelylookforthe`best'class-conditionalandwtsareonlyindirectlyinducedbytheposterioruncertainTheCluster-AbstractionModelThemodelsdiscussedinSection2allde neanon-hierarchical,` at'latentclassstructure.Hoer,forstructurediscoeryitisimportantto ndhierarchicaldataorganizations.TherearewwnarchitectureslikethealMixturofExp[Jordan,Jacobs,1994]which thierarchicalmodels.Yet,inthecaseofdyadicdatathereisanalternativepossibilitytode neahierarchicalmodel.actionMo(CAM)isaclusteringmodel(e.g.,in)wheretheconditionals)areitself{speci caspectmixtures,)withalatentaspectmappingoobtainahierarccalorganization,clustersareidenti edwiththeterminalnodesofahierarc(e.g.,acompletebinarytree)andaspectswithinnerterminalnodes.Asacompatibilityconstraintitisimposedthat)=0wheneverthenodecorrespondingtoisnotonthepathtotheterminalnode.In,con-ditionedona`horizontal'clusteringallobservations(foraparticularetobegeneratedfromoneofthe`vertical'abstractionlevelsonthepathto).Sincedi erentclustersshareaspectsaccordingtotheirtopologicalrelation,thisfaorsameaningfulhierarchicalorganizationofclusters.Moreoer,aspectsatinnernodesdonotsimplyrepreseneragesoerclustersintheirsubtreeastheyareforcedtoexplicitlyrepresentwhatistoallsubsequentclusters.Skippingthetechnicaldetails,theE-stepisgivenbbP(ajc ;xi)P(ykja)]n(xi;yk)(12)andtheM-stepformulaeare,and T.Hofmann,J.Puzicha,M.Jordan:LearningfromDyadicData,NIPS*98 paramet pyramid casup phase shape invari neocogitronfilterlocal approcim threshold Figure4:Partsofthetoplevelsofahierarchicalclusteringsolutionforthedocumentcollection,aspectsarerepresentedbytheir5mostprobablewordstems.AnnealedExpectationMaximizationAnnealedEMisageneralizationofEMbasedontheideaofdeterministicanneanneetal.,1990]thathasbeensuccessfullyappliedasaheuristicoptimizationhniquetomanyclusteringandmixtureproblems.Annealingreducesthesensitiv-ytolocalmaxima,but,evenmoreimportantlyinthiscontext,itmayalsoimprothegeneralizationperformancecomparedtomaximumlikelihoodestimation.eyideainannealedEMistointroducean(inersetemperature)parameterandtoreplacethenegative(aeraged)completedatalog-likelihoodbyasubstitutewnastheeener(bothareinfactequivtat=1).Thise ectivresultsinasimplemodi cationoftheE-stepbytakingthelikelihoodconinBaes'ruletothepoerof.Inordertodeterminetheoptimalvalueforusedanadditionalvalidationsetinacrossvalidationprocedure.ResultsandConclusionsInourexperimentswehaeutilizedthefollowingreal-worlddatasets:(i)astandardtestcollectionfrominformationretrieval(=1400,=4898),(ii):adjective-nounco-occurrencesfromthePennTreebankcorpus(=6931,=4995)andtheLOBcorpus(=5448,=6052),(iii):adocumencollectionwithabstractsofjournalpapersonneuralnetorks(=1278,=6065),ordbigramsfromthebibleeditionoftheGutenbergproject(12858),(v)exturedaerialimagesforsegmentation(=128=192).InFig.1wehaevisualizedanaspectmodel ttedtothebigramdata.Noticethatalthoughtheroleoftheprecedingandthesubsequenordsinbigramsisquitedi erent.Segmentationresultsobtainedonapplyingtheone-sidedclusteringmodelaredepictedinFig.2.Amulti-scaleGabor lterbank(3octa4orientations)wasutilizedasanimagerepresentation(cf.[Hofmannetal.,1998InFig.3ato{sidedclusteringsolutionofLOBisshown.Fig.4showsthetopelsofthehierarcyfoundbytheCluster-AbstractionModelininnernodedistributionsprovideresolution-speci cdescriptorsforthedocumeninthecorrespondingsubtreewhichcanbeutilized,e.g.,ininebroforinformationretrievFig.5showstypicaltestsetperplexitycurvesofthe er,thetreetopologyfortheCAMisheuristicallygrownviaphasetransitions. T.Hofmann,J.Puzicha,M.Jordan:LearningfromDyadicData,NIPS*98 20 25 30 35 40 45 50 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3 3.1 3.2x 104 20 25 30 35 40 45 50 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3 3.1 3.2x 104 20 25 30 35 40 45 50 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3 3.1 3.2x 104 20 25 30 35 40 45 50 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3 3.1 3.2x 104 20 25 30 35 40 45 50 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3 3.1 3.2x 104 K=32K=64K=128K=256K=32K=64K=128K=256annealed EM iterationsperplexity 10 20 30 40 50 60 70 80 90 100 400 450 500 550 600 650 700 beta = 1.0beta = 0.9beta = 0.8beta = 0.7annealed EM iterationsperplexity 5 10 15 20 25 30 35 40 45 50 450 500 550 600 650 700 K=32beta=1beta=0.2beta=0.1beta=0.075annealed EM iterationsperplexity K=32beta=1beta=0.2beta=0.1beta=0.075 Figure5:PycurvesforannealedEM(aspect(a),(b)andone-sidedcluster-ingmodel(c))onthe Aspect-clusterCAM-clusterAspect-clusterCAM-cluster P P P P P P P P CranPenn -685------ -639------ 0.884820.095270.185110.67615 0.733120.083520.133220.55394 0.722550.073020.102680.51335 0.722550.073020.102680.51335 0.833860.070.124380.53506 0.712050.072540.082260.46286 0.793600.065270.114220.48477 0.691820.070.072040.44272 0.046630.10 0.062310.06able1:Pyresultsfordi erentmodelsonthe(predictingwordscondi-tionedondocuments)anddata(predictingnounsconditionedonadjectivannealedEMalgorithmfortheaspectandclusteringmodel(theper-observationlog-likelihood).A=1(standardEM)oer ttingisclearlyvisible,ane ectthatvanisheswithdecreasing.AnnealedlearningperformsalsobetterthanstandardEMwithearlystopping.Tab.1systematicallysummarizesperplexityresultsfordi erentmodelsanddatasets.mixturemodelsfordyadicdatahaeshownabroadapplicationpo-tial.AnnealingyieldsasubstantialimprotingeneralizationperformancecomparedtostandardEM,inparticularforclusteringmodels,andalsooutper-formsacomplexitycontrolvia.Intermsofperplexit,theaspectmodelhasthebestperformance.Detailedperformancestudiesandcomparisonswithotherstate-of-the-arttechniqueswillappearinforthcomingpapers.ers.etal.,1977]Dempster,A.P.,Laird,N.M.,Rubin,D.B.(1977).MaximumliklihoodfromincompletedataviatheEMalgorithm.J.RoyalStatist.Soc.B,1{38.[Hofmann,Puzicha,1998]Hofmann,T.,Puzicha,J.1998.almodelsforcedatah.rept.Arti calIntelligenceLaboratoryMemo1625,M.I.T.M.I.T.etal.,1998]Hofmann,T.,Puzicha,J.,Buhmann,J.M.(1998).UnsupervisedtexturesegmentationinadeterministicannealingframewIEEETansactionsonPatternAnalysisandMachineIntelligenc(8),803{818.[Jordan,Jacobs,1994]Jordan,M.I.,Jacobs,R.A.(1994).HierarchicalmixturesofexpertsandtheEMalgorithm.alComputation(2),181{214.[Meila,Jordan,1998]Meila,M.,Jordan,M.I.1998.EstimatingDependencyStructureasaHiddenVIn:AesinNeuralInformationPressingSystems1010ereiraetal.,1993]Pereira,F.C.N.,Tish,N.Z.,Lee,L.1993.DistributionalclusteringofEnglishwPages183{190of:PrdingsoftheAAetal.,1990]Rose,K.,Gurewitz,E.,Fx,G.(1990).Statisticalmechanicsandphasetransitionsinclustering.alReviewL(8),945{948.[Saul,Pereira,1997]Saul,L.,Pereira,F.1997.Aggregateandmixed{orderMarkvmod-elsforstatisticallanguageprocessing.In:Prdingsofthe2ndInternationalConfer-eonEmpiricalMethodsinNaturalLanguagePr