/
Clustering with Bregman Divergences Clustering with Bregman Divergences

Clustering with Bregman Divergences - PDF document

alida-meadow
alida-meadow . @alida-meadow
Follow
409 views
Uploaded On 2017-04-27

Clustering with Bregman Divergences - PPT Presentation

IntroductionTechniquesinclusteringapproximationregressionpredictionetcusesquaredEuclideandistancetomeasureerrororlosskmeansclusteringleastsquareregressionWeinerlteringSquaredlossisnotappropri ID: 340011

IntroductionTechniquesinclustering approximation regression prediction etc. usesquaredEuclideandistancetomeasureerrororlosskmeansclustering leastsquareregression WeinerlteringSquaredlossisnotappropri

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Clustering with Bregman Divergences" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

ClusteringwithBregmanDivergencesJoydeepGhoshUniversityofTexasatAustinghosh@ece.utexas.eduJointworkwithArindamBanerjee,InderjitDhillon,SrujanaMerugu,DharmendraModha–p.1/?? IntroductionTechniquesinclustering,approximation,regression,prediction,etc.,usesquaredEuclideandistancetomeasureerrororlosskmeansclustering,leastsquareregression,WeinerlteringSquaredlossisnotappropriateinmanysituationsSparse,high-dimensionaldataProbabilitydistributions,non-negativematricesCountingmeasuresCanweuseotherlossfunctions?–p.2/?? KMeansClustering***Initializerepresentatives(“means”)–p.3/?? KMeansClustering***Assigntonearestrepresentative–p.4/?? KMeansClustering***Re-estimatemeans–p.5/?? KMeansClustering***Assigntonearestmean(again)–p.6/?? KMeansClustering***Re-estimatemeansTheobjectivefunctionis \n  \r–p.7/?? AgendaandOverviewofResultsBackground:BregmandivergencesPartI:Generalizationofcentroid-basedClusteringHardClusteringBregmandivergencesExponentialfamiliesSoftClusteringPartII:ConnectionsandExtensionsInformationTheoryCo-clustering–p.8/?? BregmanDivergenceszyxd (x,y)(z)f (x)ff(y)ff(x-y). '(y)isstrictlyconvex,differentiable "!#$&%'()#$#'(+*$*'%-#'(.–p.9/?? Examples#$)/$/0isstrictlyconvexanddifferentiableon12 3!#$&%'()/$*'/0[squaredEuclideandistance]#54)27698:;6�?;6(negativeentropy)isstrictlyconvexanddifferentiableonthe@-simplex A!#54%B()27698:;6=?CEDFGFH[KL-divergence]#$)*2I698:=?J6isstrictlyconvexanddifferentiableon12LKK A!#$&%'()2M698:CONFPF*�?CNFPFH*QH[Itakura-Saitodistance]–p.10/?? BregmanHardClusteringForsquaredloss,meanisthebestconstantpredictorR)S$ST)UV?WXYZS/$S*[/0Theorem:ForallBregmandivergencesR)S$ST)UV?WXYZS A!#$S%[(–p.11/?? BregmanHardClusteringAlgorithmInitialize\R^]_`]8:Repeatuntilconvergence{AssignmentStep}Assign$tonearestclustera]whereb)UV?WXY]c A!#$&%R]c({Re-estimationstep}Forallb,recomputemeanR]asR])defg$T]–p.12/?? PropertiesGuarantee:MonotonicallydecreasesobjectivefunctiontillconvergenceScalability:EveryiterationislinearinthesizeoftheinputExhaustiveness:Ifsuchanalgorithmexistsforalossfunctionh#$&%R(,thenhhastobeaBregmandivergenceLinearSeparators:ClustersareseparatedbyhyperplanesMixedDatatypes:AllowsappropriateBregmandivergenceforsubsetsoffeatures–p.13/?? TheExponentialFamilyDenition:Amultivariateparametricfamilywithdensity;ijlkmn#$()opq\r.*s#r(;ut#$(sisthecumulantorlog-partitionfunctionsuniquelydeterminesafamilyExamples:Gaussian,Bernoulli,Multinomial,Poissonrxesaparticulardistributioninthefamilysisastrictlyconvexfunction–p.14/?? AConnectionTheorem:Foranyregularexponentialfamily;ijlkmn,forall$vw=W#(,;ijlkmn#$)opq#* 3!#$&%R(x!#$(%forauniquelydeterminedx!,whereristhenaturalparameterandyistheexpectationparameterBregmanConvexCumulantExponentialf(m)y(q)d (x, )fLegendrep (x)(y,q)m–p.15/?? ExamplesRegularexponentialfamiliesRegularBregmandivergencesGaussianSquaredLossMultinomialKL-divergenceGeometricItakura-SaitodistancePoissonI-divergence–p.16/?? BregmanSoftClusteringAlgorithmInitialize\{z]%R]_`]8:Repeatuntilconvergence{ExpectationStep}Forall$&%b%theposteriorprobability;#b|$()z]opq#* A!#$&%R](}~#$%where~#$(isthenormalizationfunction{Maximizationstep}Forallb,z])QTd;#b|$(R])d;#b|$($d;#b|$(–p.17/?? PropertiesGuarantee:MonotonicallydecreasesobjectivefunctiontillconvergenceScalability:EveryiterationislinearinthesizeofinputsInterpretability:ExponentialfamilyBregmandivergenceMixedDatatypes:Combinationofdifferentexponentialmodelsforvariousattributes–p.18/?? ConclusionssoFarResultsKMeanstypealgorithmforallBregmandivergencesBijectionbetweenBregmandivergencesandexponentialfamiliesEfcientlearningofmixtureofexponentialdistributions–p.19/?? PartII:ConnectionsandExtensionsInformationTheoryCo-clusteringandMatrixApproximation–p.20/?? RateDistortionTheoryXX^LossyLossyRate=Numberofbits/symbolDistortion=€ #%‚(ƒFor€ #%‚(„…,whatistheminimumbits/symbolrequired?Goal:EncodeasourcedistributionusingasfewbitsaspossiblewithouttoomuchdistortionTheratedistortionfunction[S'48]–p.21/?? RateDistortionTheoryXX^LossyLossyRate=Numberofbits/symbolDistortion=€ #%‚(ƒFor€ #%‚(„…,whatistheminimumbits/symbolrequired?Goal:EncodeasourcedistributionusingasfewbitsaspossiblewithouttoomuchdistortionTheratedistortionfunction[S'48]†#…()WXYDi‡‰ˆŠˆn‹Œiˆk‡ˆnŽ‘#%‚(–p.21/?? RateDistortionwithBregmanDivergencesTheorem:IfdistortionisaBregmandivergence,Either,†#…(isequaltotheShannon-BregmanlowerboundOr,|‚|isniteWhen|‚|isniteBregmandivergencesExponentialfamilydistributionsRatedistortionModelingwithmixtureofwithBregmandivergencesexponentialfamilydistributions†#…(canbeobtainedeitheranalyticallyorcomputationallyCompressionvs.lossinBregmaninformationformulationInformationbottleneckasaspecialcase–p.22/?? PotentialApplicationsRatedistortionresultsforBregmandivergencesDesigningquantizersbasedonBregmandivergencesRatedistortionmaximumlikelihoodmixtureestimationNewlearningtechniquesbasedonexistinglossycodingmethodsCompressionvs.Bregmaninformationtrade-offIBstyletechniquesforBregmandivergences–p.23/?? RateDistortionTheorySource’;#$(overa,Reproductionalphabet‚a,Distortionmeasure Rate:average#bitstoencodeasymbol,Distortion:€ #%‚(Themainquestion:Whatistheminimumnumberofbitsrequiredtoencodesuchthat€ #%‚(ƒ„…?Answer:Rate-distortionfunction†#…()WXYDi‡‰ˆŠˆn”“‹–•˜—™•Œiˆk‡ˆnŽ‘#š‚(Minimizer;›#‚|(determinestheoptimalencodingschemeandtheoptimalreproductionsupport‚aœ)\‚$ž‚$v‚a%;›#‚$(Ÿ) _–p.24/?? MaximumLikelihoodMixtureEstimationGiven:Finitesetofi.i.d.samplesa)\$S_¡S8:,mixturemodelcardinality¢,parametricfamily£Objective:Findthemixturemodelof¢distributionsfrom£andoptimalassignmentssuchthatthedatalikelihoodofaismaximized[Nealetal.,'97]TheMLMEproblemisexactlyequivalenttotheminimumvariationalfreeenergyproblemWXY‡f"¤kDi‡¥ˆŠˆnŠ‡fA¤Š8`¦*ˆk‡ˆ€�?;#%‚(ƒ*§#‚|(¨žempiricaldistribution;‚žchoiceofmixturecomponent‚aœžmixturemodel;;#%‚();#‚(;#|‚(s.t.;#|‚(v£–p.25/?? RateDistortionwithFiniteCardinalitySupportGiven:Source’empiricaldistributionoveranitesetofi.i.d.samplesa)\$S_¡S8:,reproductionsupportcardinality¢,distortionmeasure ,tolerabledistortionvalue…Objective:Findthereproductionsupport‚aœofcardinality¢andtheoptimalassignments;#‚|(suchthat†#…(isachieved,i.e.,WXY‡f¤kDi‡ˆŠˆnŠ‡fA¤Š8`¦‘#š‚(©ªˆk‡ˆ€ #%‚(ƒ¨whereªistheoptimalvariationalparameterfordistortion…–p.26/?? MLMERDFCEmpiricaldistributionSourceMixturemodelReproductionsupportsetChoiceof£Choiceof and…VariationalfreeenergyRatedistortionobjectivefunctionConnectingmissinglinks:[Banerjeeetal.,'04]Bijectionbetweenexponentialdensities;ijlkmn#¬«andBregmandivergences !#«%y(=?;ijlkmn#$)* 3!#$%y(#s%(areLegendredualsandy)-s#r(,r)-#y(–p.27/?? EquivalenceTheoremForagivenempiricaldistributionandmixturemodel/supportsetcardinality(A)RDFCproblemforBregmandivergence !anddistortionlevel…(B)MLMEproblembasedonthescaledexponentialfamily£i­njTheorem.ProblemsAandBareexactlyequivalentwhenª)ª,thevariationalparametercorrespondingto…and%sareLegendredualsFollowssincenegativelog-likelihooddistortion®maximumlikelihoodminimumdistortion–p.28/?? BregmanInformation[Banerjeeetal.,'04]TheBregmanInformationofarandomvariableistheexpectedBregmandivergencetothemean,i.e.,‘!#()ˆ€ "!#%€ƒExamples:Variance:#$)|$|0®‘!#()ˆ¯|*€ƒ0°MutualInformation:For~’;#J(overconditionaldensities\;#±|J(,#³²)#;#±|J()*§#;#±|J(%€~ƒ)ˆ€;#±|(ƒ);#±(®‘!#~()ˆ€´#;#±|(;#±()‘#š±(–p.29/?? Compressionvs.LossinBregmanInformationTheorem.ExpecteddistortionbetweensourceandreproductionrandomvariablesisequaltothelossintheBregmaninformationˆk‡ˆ€ "!#%‚()‘!#(*‘!#‚(when‚)ˆŠ‡ˆ€ƒLeadstotheconstrainedRDFCproblem:WXYDi‡ˆŠˆn\‘#š‚(©ªˆk‡¥ˆ€ 3!#%‚(ƒµWXYDi‡¥ˆŠˆn\‘#š‚(+*ª‘!#‚(Compression¶‘#š‚(BregmanInformation¶‘!#‚(–p.30/?? InformationBottleneck:ASpecialCaseSource,Reproduction‚,anyotherrandomvariable±~)~#();#±|(and‚~)‚~#();#±|‚(Theorem.ForKL-divergence,theconstrainedRDFCproblemisWXYDi‡¥ˆŠˆn\‘#š‚(+*ª‘#‚š±(Followssince #%‚()´#~|‚~(and‘!#‚~()‘#‚š±(IBAssumptionsMutualinformationholdsallrelevantinformation·;#±|(istheappropriatesufcientstatisticrepresentationKL-divergenceistheappropriatedistortionmeasureConditionalindependencerelation‚¶¶±·‚~)¸Š‡¸€~ƒ–p.31/?? ConclusionsRatedistortionforBregmandivergencesRatedistortionmaximumlikelihoodmixtureestimationCompressionvs.lossinBregmaninformationtrade-off–p.32/?? ComputationalBiologyGenes¹Genes¹ExperimentalConditionsExperimentalConditions–p.33/?? Co-clustering:BasicIntuitionº¼»1234561-6654-639351962358737-2684-223-6856-649252944308332-2480-215-6355-60925395OriginalMatrix¾º¼»135246430328083-24-21235378487-26-225-63-60535592951-66-63515493963-68-6452569294ReorderedMatrix¿¾Alowparameter,co-clusteringbasedapproximationÀÂÁÃ%‚Ã12101210301410501RowClusteringĂÃ%‚ÆÅ123133.583.5-23.32-64.053.593.7LowParameterMatrixĂÇÅ%Å123456110100020100103000101ColumnClustering–p.34/?? MainResults:Co-clusteringCo-clusteringbasedonBregmandivergencesMinimumBregmaninformationprinciplethatgeneralizesthemax-entropyandtheleast-squaresprinciplesMetaco-clusteringalgorithmwithlocaloptimalitypropertiesKnownco-clusteringalgorithms(sum-squareresidue,informationtheoretic)arespecialcasesNewclassofmatrixapproximationtechniques–p.35/?? ApplicationsProblemsCompressionwhilepreservingsummarystatisticsLearningfromsparse,highdimensionaldataMissingvaluepredictionLearningcorrelationsDomainsMicroarrayanalysis(genesandconditions)Textanalysis(wordsanddocuments)Recommendersystems(usersanditems)Marketanalysis(customersandproducts)–p.36/?? DimensionalityReduction(3,20)(3,500)(3,2500)13891213643189204929291455335144621311239404049982911994447172337ConfusionmatricesfortheClassic3datasetwithdifferentnumberofdocumentclustersClusteringinterleavedwithimplicitdimensionalityreductionSuperiorperformanceascomparedtoone-sidedclustering–p.37/?? MissingValuePredictionAlgoSqE1SqE3IDiv1IDiv3PearsonError0.83980.76390.83970.77231.4211Meanabsoluteerrorforratings(0-5)inEachMoviedatasetSqE1/SqE3-squaredEuclideandistancewithschemes1and3;IDiv1/IDiv3-I-Divergencewithschemes1and3Assignzeromeasureformissingelements,co-clusterandusereconstructedmatrixforpredictionImplicitdiscoveryofcorrelatedsub-matrices–p.38/?? LearningCorrelationsMovie ClustersUser Clusters1234567891012345678910Cluster1ItisaWonderfulLife,Casablanca,LifeisBeautiful,AnAffairtoRememberCluster4UsualSuspects,ManhattanMurderMystery,PulpFiction,NorthbyNorthWestCluster7StarTrekV,BladeRunner,TheTerminator,AClockworkOrangeUsercluster-MovieclustercorrelationsforsubsetofEachMoviedatasetUselowparameterrepresentationstodiscovercorrelationsbetweenrowandcolumnentitiesHelpfulindecisionsupportsystems–p.39/?? ExtensionsApplicabletomultidimensionaldatacubes(yes)Softco-clusteringalgorithms(yes)Generalco-clusteringmodels,e.g.,overlappingclusters(yes)–p.40/?? SummaryPartI:Generalizationofcentroid-basedclusteringKMeanstypealgorithmforallBregmandivergencesBregmandivergencesExponentialfamiliesBregmansoftclusteringMixturemodelingwithexponentialfamilydistributionsPartII:ConnectionsandExtensionsMixturemodelingRatedistortionwithBregmandivergencesLowparametermatrixapproximationswithlossmeasuredbyBregmandivergencesMinimumBregmanInformationprincipleMetaco-clusteringalgorithmthatrendersPartIasaspecialcase–p.41/?? HistoricalReferencesL.M.Bregman.“Therelaxationmethodof®ndingthecommonpointofconvexsetsanditsapplicationtothesolutionofproblemsinconvexprogramming.”USSRComputationalMathematicsandPhysics,7:200-217,1967.Problem:ÈÉËÊÌÍÏÎsubjecttoÑÒÔÓÕÖ×ӉØÙÖÚØÛÜÞÝßIterativeprocedure:1.StartwithÕàáâthatsatis®esãÌÍÕáÐÖÝäÒEå.SetæÖç.2.ComputeÕàèéêâtobethe“Bregman”projectionofÕàèâontothehyperplaneÑÒëÓÕÖ×Ó,whereìÖæÈíîï.SetæÖæðñandrepeat.Convergestogloballyoptimalsolution.Thiscyclicprojectionmethodcanbeextendedtohalfspaceandconvexconstraints,whereeachprojectionisfollowedbyacorrection).CensorandLent(1981)coinedtheterm“Bregmandistance”NaturalQuestion:HowimportantareBregmanDivergencesindataanalysis?–p.42/??