/
JMLR Workshop and Conference Proceedings      ACML  Co JMLR Workshop and Conference Proceedings      ACML  Co

JMLR Workshop and Conference Proceedings ACML Co - PDF document

tatyana-admore
tatyana-admore . @tatyana-admore
Follow
430 views
Uploaded On 2015-05-17

JMLR Workshop and Conference Proceedings ACML Co - PPT Presentation

neumannunibonnde Universtiy of Bonn Germany Roman Garnett rgarnettunibonnde Universtiy of Bonn Germany Kristian Kersting kristiankerstingcstudortmundde Technical University of Dortmund Germany Editor Cheng Soon Ong and Tu Bao Ho Abstract Exploiting a ID: 68575

neumannunibonnde Universtiy Bonn Germany

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "JMLR Workshop and Conference Proceedings..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

NeumannGarnettKerstingExploitingthisassumptionisoftenprotable,providedthatwithinasmallneighbourhoodofeachunlabelednode,wehaveasucientamountoflabelevidencetomakecondentpredictions.Duetotheenormoussizeofpresent-daynetworks,however,itiscommontohaveonlyveryfewlabelednodes,resultinginhavingtoofewobservationsnearmanyunlabelednodestoeectivelyapplyHypothesis1.Inturn,classicationgetsincreasinglymoredicult.Whendataaresparselylabeled,wethereforehavetodomorethanexploitingclosebylabelstoaccuratelyclassifyunlabelednodes.Previousworkinthisdirectionhasintroducedlatentgraphsbyaddingadditionaledges(Gallagheretal.,2008;Shietal.,2011),runmultiplerandomwalkswithrestarts(LinandCohen,2010a),andsuggestedschemesforactivelearning(JiandHan,2012).Hereweproposeanalternativeapproach.Wemovebeyondthestraightforwardho-mophilyassumption,tosaythatnotonlyarenearbynodeslikelytohavethesamelabel,butalsonodeswithsimilarlocalstructure,wherewedenestructuretobethearrange-mentandconnectivityoflabelsonnearbynodes.Hypothesis2(Label-structuresimilarity)Nodeswithsimilarlyarrangedlabelsintheirlocalneighbourhoodsarelikelytohavethesamelabel.Ourmaincontributionisanewkernel,thecoincidingwalkkernel(cwk),1thatusesshortrandomwalkstoquantifyhowsimilarlythelabelssurroundingeachnodearearranged.Randomwalks(rws)ingeneralenjoyhugepopularityingraph-basedlearningandhaveprovenapowerfultoolbothfordeningkernelsongraphs(denedbetweennodesofagraph)andgraphkernels(wheregraphsarethemselvesinputstothekernel).2Acommonideainthegraphkernelcommunityistomeasurethesimilarityoftwolabeledgraphsbyanalyzingthelabelsencounteredduringrandomwalksontherespectivegraphs(Gärtneretal.,2003;Kashimaetal.,2003;Neumannetal.,2012);thelastreferenceusedthisideatodesignakernelamongpartiallylabeledgraphs.cwksareinspiredbytheconstructionofthesegraphkernels;however,theydeneakernelamongthenodesofagraph.Commonkernelsongraphsincludethediusionkernel(KondorandLaerty,2002),thep-steprandomwalkkernel(SmolaandKondor,2003),andtheMoorePenrosepseudoinverseoftheLaplacian,L+,(Foussetal.,2012)whichisalimitingcaseoftheregularizedLaplaciankernel(SmolaandKondor,2003).Allofthesekernelshaverandom-walkinterpretations;however,noneofthemconsidersknownlabelsduringtheircomputation,andasaresult,theycannottakeadvantageofHypothesis2.Weviewknownnodelabelsasprovidingvaluableinformationthatshouldbeconsideredintheconstructionofakernelusedfornode-labelprediction.Moreprecisely,partiallyabsorbingrandomwalks(parws),where,withsomeprobability,thewalksstopprogressingoncetheyhitalabel,givetheknownlabelsinuenceoverthewalkprocess(Zhuetal.,2003;Wuetal.,2012).Weconsiderthedistributionoversequencesoflabelsencounteredduringaparwfromanodeasencodingitslabelstructure.ToaddressHypothesis2,wethendenethecwkbetweentwonodestobetheprobabilitythatparallelparwsleavingfromthosenodescoincide,thatis,hitthesamelabelatthesametime.Byliftingtherandomwalk 1.Portionsofthisworkappearedin(Neumannetal.,2013).2.Wemakethisdistinctionbetweenkernelsongraphsandgraphkernelsthroughout.358 CoincidingWalkKernelsrwswithtransitionprobabilitiesasgiveninEq.(1)untilconvergence,thenassigningthemostprobableabsorbinglabeltothenodesinVU.Fortherestofthispaperwewillrefertothislabel-absorbingrandomwalkjustasanabsorbingrandomwalk.2.2.PartiallyAbsorbingRandomWalksRecallthatourmaingoalistodeneakernelonagraphtoperformlearningtaskslikenodeclassicationinsparselylabelednetworksbasedonautocorrelationandlabel-structuresimilarity.Utilizingrwswithfullyabsorbingstatesatthelabelednodesasdenedabove,however,issomewhatrestrictivetowardsthisgoalonlytherstlabelencounteredwillhaveanyimpactontheevolutionofaparticularrw.Thisiscompatiblewiththehomophilyhypothesis,butnotveryusefulforcapturingthestructureofsurroundinglabels.Hence,wehavetosoftenthedenitionofabsorbingstates.Thiscanbenaturallyachievedbyemployingpartiallyabsorbingrandomwalks(parws)(Wuetal.,2012).Thesimplestwaytodeneparws,inthesettingoflabel-absorbingrwsconsideredhere,istoextendourgraphGbyaddingaspecialnodeforeachlabelin[k]andaddingedgesfromeachlabelednodei2VLtoitsrespectivelabelnode.Wethenmaketheseauxiliarynodesabsorbingstatesandvarythetransitionprobabilitiesfromthelabelednodestothem.Thetransitionprobabilitiesinthisgraph~G=(V[[k];~E)aregivenby~Thavingthefollowingblockstructure:~T=264TU;UTU;L0(1� )TL;U(1� )TL;L L00I375;(3)where 2[0;1]istheabsorbingprobability.Notethatbysetting =1wecanexactlymodelthefullyabsorbingrwsdenedpreviously.Ontheotherhand,bysetting =0wegetasimplepoweriterationwithconstantsteady-statedistribution.Whenusingthelattersettingforlearningitiscrucialtoapplysomekindofearlyterminationinordertolearnmeaningfulclustersorclasslabels(SzummerandJaakkola,2001;LinandCohen,2010b).Wewillutilizeparwsforourcoincidingwalkkernelongraphsbycombiningbothtechniques,partiallabelpropagationandearlystopping,intoameasureforlocalstructuresimilarityofthenodesinagraph.2.3.ParallelAbsorbingRandomWalksThenalingredientweneedareparallelrandomwalks,astheyallowonetorefertothesequencesofstatesoftwoormorerandomwalksofthesamelength.Co-occurringrwscanbeusedtodescribethesimilarityofeitherentiregraphsornodesinagraphbasedonthestructureofthelocalneighbourhoodofthenodes.Thesesimilaritieswillbethebasisofthecoincidingwalkkerneldenedinthenextsection.Letusnowgiveaformaldenitionofparallelrandomwalks.AparallelrandomwalkoflengthtmaxamongasetofnodesSisgivenbythesequencesfX(i)tg0ttmaxoftmaxstatesvisitedbytherandomwalksstartingattherespectivenodesi2S.Parallelpartiallyabsorbingrandomwalksaregivenbystraightforwardlycombiningtheaccordingdenitions.361 CoincidingWalkKernelsTheorem1KcwasdenedinEq.(4)ispositive-semidenite(i.e.,isavalidMercerkernel).ItisobviousthatKcwisapositive-semidenitekernelasitisthescaledsumofpolynomialkernelsk(x;y)=(x�y+c)d,withc=0andd=1,i.e.,Kcw(i;j)/Ptmaxt=0(Pt)i(Pt)�j.ThecomputationofcwkonagraphGissummarizedinAlgorithm1.Thecom-putationalcomplexitiesoftherequirednaïvecalculationsareO(ktmaxjEjn)fortheonesteptransitionandO(ktmaxn2)forthekernelcontribution,wherejEjisthenumberofedges.Itisworthmentioningthatformostlearningtasksitissucienttocomputethetraintrainandtraintestfractionsofthekernelmatrix.Thiscanbeaccomplishede-cientlybyprecomputingthefPtgandsummingonlytherequiredouterproductswithacomplexityofO(ktmaxjVLjn).Algorithm1hasanoverallcomputationalcomplexityofO(ktmaxjEjn),however,thekernelcomputationforsparsegraphs(smalljEj)withfewlabelednodes(jVLjn)isecient.InFigure1weprovideanillustrationofcwkonasubgraphofalabeledgraphbuiltfromconceptsinthedbpediaontologymarkedaspopulatedplaces.4EachconceptisanodeinourgraphandisbackedbyaWikipediapage.WeaddedanundirectededgebetweentwoplacesifoneoftheircorrespondingWikipediapageslinkstotheother.Thedbpediaontologyfurtherdividespopulatedplacesintocountries,administrativeregions,cities,towns,andvillages;thesevelabelsserveasclasslabels.Thisexamplewaschosenbecausetheresultinggraphdoesnotnecessarilyexhibithomophily;forexample,villages(approximatelyhalfthedataset)aremuchmorelikelytolinktocountriesthantoothervillages.Forourillustration,webuiltagraphwithjVj=500nodesbytakingabreadth-rstsearchfromAtlanta.WethencalculatethepseudoinverseoftheLaplaciankernel(L+)aswellasthecoincidingwalkkernel(with =0:5andtmax=10),usingarandomselectionof20%ofthenodesforVL.Atlantawasnotamongthelabelednodes.TherowsofKcwcorrespondingtoK(Atlanta;)areillustratedinFigure1(b)and(d).OnecanclearlyseethatcwkisabletocapturestructuresimilarityasseveraldistantnodeshavehighvaluesandnearbynodesincludingnodesinthedirectneighbourhoodofAtlantashowlowvalues.TherowsofL+areshowninFigure1(c)and(e).WecanseethatL+(onaverage)decreasessmoothlywithincreasingdistancefromAtlanta(reectingthehomophilyassumption);whereasthevalueofKcwalsoshowssomehighlycorrelatedfar-awaynodes,aswellaslesscorrelatednearbynodes.Moreover,themagnitudeofKcwishighlycorrelatedwiththecorrectlabel(city)thehighestkernelvaluesareexclusivelyachievedbyothercitiesthroughoutthenetwork,exactlythebehaviordesiredforpredictingAtlanta'slabel.Itisalsointerestingtonotethatthelowestkernelvaluesareexclusivelyamongnodesinthetownclass,perhapsduetostrikinglydierentlabelstructureintheirneighbourhoods.3.2.LearningwithStructureSimilarityBeforepresentingourexperimentalresults,wewilldescribeoneofthebaselineapproachesandbrieyreviewrelatedworkonlearningwithstructuresimilarity.Theclosestapproachtocwks,alsoincorporatinglocalstructuresimilarity,isintroducedin(DesrosiersandKarypis, 4.Animplementationoftheuseddbpedia(www.dbpedia.org)graphextractorisavailableathttps://github.com/rmgarnett/dbpedia_graph_extractor.363 NeumannGarnettKerstingTable1:Datasetproperties.pp-xkisshortforpopulated-places-xk.#graphsindicatesthenumberofconnectedcomponentsandPfreqtheproportionofthemostfrequentclass.Pswitchreectstheprobabilityofadjacentnodesswitchingtheirlabels. properties dataset#nodes#edges#labels#graphsPfreqPswitch pp-1k100052535143%69%pp-3k3000165465150%66%pp-5k5000266485153%70% webkb1462617666428%33%dblp171128984136%21%cora2708527877830%18%citeseer32644536639021%26% pp-100k100000374480(usedforruntimeanalysiscf.Fig.4(b)) whichisthevonNeumanndiusionkernel(vnd)(Foussetal.,2012)onthenormalizedadja-cencymatrix.Notethat,vndiscloselyrelatedtotheregularizedLaplaciankernel7(SmolaandKondor,2003).Hence,wechoosevnd,diffandl+torepresentexistingsuccessfulkernelsongraphs.Allkernel-basedpredictions(cwk,vnd,diff,andl+)areachievedviasupportvectormachine(svm)classication.Thefollowinggraphdatasetsareusedforevaluation:populated-places8(linkgraphextractedfromdbpedia,describedabove),webkb9(cocitationgraphofwebpagesfromcomputersciencedepartmentsoffouruniversities),dblp10(connectedcoauthorgraphextractedfromthedblpdatabase),cora11(citationnetworkofscienticpapers),andciteseer11(citationnetworkofscienticpapers).Tomeasurehomophilyinthedatasets,wecomputeastatistic,Pswitch,astheprobabilitythatamixedrandomwalkswitcheslabelsonadjacentnodes.Thatis,ifPsisthestationarydistributionoftherandomwalk12andPswitchjiisthevectorofconditionalprobabilitiesofswitchinglabelsfromagivennode,thenPswitch=P�sPswitchji.Lowvaluesofthismeasuresignalthepresenceofhomophily(favouringHypothesis1);whereashighvaluesindicatealackoflabelsmoothness(rejectingHypothesis1).FordatasetswithlowPswitch,exploitinglabel-structuresimilarity(Hypothesis2)maybemorebenecial.PswitchandotherpropertiesofalldatasetsaresummarizedinTable1.Forthepopulated-placesdataset,wecreatedgraphsofvaryingsizesbyperformingabreadth-rstsearchfromtherstnodeinthegraph 7.Kreglap=(I� ~L)�1=(I� D�1=2LD�1=2)�1=(I� (I�D�1=2AD�1=2))�1=((1+ )I� D�1=2AD�1=2)�1,where~ListhenormalizedLaplacian.8.http://www-kd.iai.uni-bonn.de/pubattachments/727/populated_places.tar.xz9.http://www.netkit-srl.sourceforge.net/data.html10.http://www.cs.illinois.edu/homes/mingji1/DBLP_four_area.zip11.http://www.cs.umd.edu/projects/linqs/projects/lbc/index.html12.PscanbecalculatedasthenormalizedeigenvectorofT�withmaximaleigenvalue.366 CoincidingWalkKernelsTable3:Averageaccuracies(%)on20testsetsofthedatasetsdblp,webkb,cora,andciteseer.Italicindicatesstatisticallysignicantbestperformanceamongthekernelmethodsandboldindicatesstatisticallysignicantbestperformanceamongallmethodsbothunderapairedt-test(p0:05). 5%10% cwkdiffl+vndrllgclp cwkdiffl+vndrllgclp dblp62.855.660.256.761.764.061.0 69.365.567.966.267.969.469.1webkb61.547.738.854.457.261.643.4 63.252.649.059.061.065.645.7cora72.170.657.259.967.973.873.2 76.077.267.470.973.578.278.2citeseer53.550.850.949.051.251.650.9 57.855.452.855.255.555.454.9 (a)webkb (b)citeseerFigure3:Averageaccuracies(andstandarderrors)onwebkbandciteseer.Thedottedlinesindicate5%and10%trainingdata,correspondingtotheresultsreportedinTable3.Bestviewedincolour.4.3.ParameterAnalysisandRuntimesToanalysethesensitivityofcwk'spredictivepowerwithrespecttochangesinthekernelparameters,wecomputedtheaverageaccuraciesover10randomlygeneratedtestsetsforallcombinationsof andtmax,where 2f0:0;0:01;:::;1:0gandtmax2f0;1;:::;100gonthecoradatasetwith5%labeleddata.AheatmapoftheresultsisshowninFigure4(a).Whereasthehighestaccuracy(72:1%)isachievedforanabsorbingprobabilityof =0:75andamaximumwalklengthoftmax=59,weseethatforall �0:4andtmax�2,theaccuracyishigherthan65%.Thisshowsthatcwkisnoteminentlysensitivetoitsparameters.Theslightslopetotheisoperformancecurvessuggestthatwalksofagivenlengthandabsorbingprobabilitybehavesomewhatlikeslightlylongerwalkswithaslightlysmallerabsorbingprobability,whichagreeswithintuition.Inthefollowingweanalysethescalabilityofthecwkcomputationforsparselyla-belednetworks.Asinvestigatingfastandscalablekernelmethodsgoesbeyondthescopeofourwork,wefocusouranalysisonthescalabilityofthekernelcomputation.Wecom-paretheruntimesforcalculatingalltestedkernelsonthepopulated-placesdataset369 CoincidingWalkKernelsFinally,hybridsemi-supervisedsupportvectormachinesalsoconstituteagreatframeworktoinvestigatethepowerofcwksinsemi-supervisedlearning.AcknowledgmentsThisworkwassupportedbytheEuropeanCommissionunderFP7-248258-First-MM,theGermanFederalOceforAgricultureandFood(ble)under2815411310,theGermanSci-enceFoundation(dfg)underGA1615/1-1,andtheFraunhoferattractgrantstream.ReferencesO.Chapelle,V.Sindhwani,andS.S.Keerthi.OptimizationTechniquesforSemi-SupervisedSupportVectorMachines.JournalofMachineLearningResearch,9:203233,2008.C.DesrosiersandG.Karypis.Within-NetworkClassicationUsingLocalStructureSimi-larity.InProceedingsoftheEuropeanConferenceonMachineLearningandKnowledgeDiscoveryinDatabases(ECML/PKDD-09),pages260275,2009.F.Fouss,K.Françoisse,L.Yen,A.Pirotte,andM.Saerens.AnExperimentalInvestigationofKernelsonGraphsforCollaborativeRecommendationandSemisupervisedClassica-tion.NeuralNetworks,31:5372,2012.B.Gallagher,H.Tong,T.Eliassi-Rad,andC.Faloutsos.UsingGhostEdgesforClassica-tioninSparselyLabeledNetworks.InProceedingsofthe14thACMSIGKDDInternationalConferenceonKnowledgeDiscoveryandDataMining(KDD-08),pages256264,2008.T.Gärtner,P.Flach,andS.Wrobel.OnGraphKernels:HardnessResultsandEcientAlternatives.InComputationalLearningTheoryandKernelMachinesProceedingsofthe16thAnnualConferenceonComputationalLearningTheoryand7thKernelWorkshop(COLT/Kernel-03),pages129143,2003.T.HwangandR.Kuang.AHeterogeneousLabelPropagationAlgorithmforDiseaseGeneDiscovery.InProceedingsoftheSIAMInternationalConferenceonDataMining(SDM-10),pages583594,2010.M.JiandJ.Han.AVarianceMinimizationCriteriontoActiveLearningonGraphs.JournalofMachineLearningResearch-ProceedingsTrack(AISTATS-12),22:556564,2012.H.Kashima,K.Tsuda,andA.Inokuchi.MarginalizedKernelsBetweenLabeledGraphs.InMachineLearning,ProceedingsoftheTwentiethInternationalConference(ICML-03),pages321328,2003.R.I.KondorandJ.D.Laerty.DiusionKernelsonGraphsandOtherDiscreteInputSpaces.InMachineLearning,ProceedingsoftheNineteenthInternationalConference(ICML-02),,pages315322,2002.F.LinandW.W.Cohen.Semi-SupervisedClassicationofNetworkDataUsingVeryFewLabels.InInternationalConferenceonAdvancesinSocialNetworksAnalysisandMining(ASONAM-10),pages192199,2010a.371