IntroductionTechniquesinclusteringapproximationregressionpredictionetcusesquaredEuclideandistancetomeasureerrororlosskmeansclusteringleastsquareregressionWeinerlteringSquaredlossisnotappropri ID: 340011
Download Pdf The PPT/PDF document "Clustering with Bregman Divergences" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
ClusteringwithBregmanDivergencesJoydeepGhoshUniversityofTexasatAustinghosh@ece.utexas.eduJointworkwithArindamBanerjee,InderjitDhillon,SrujanaMerugu,DharmendraModhap.1/?? IntroductionTechniquesinclustering,approximation,regression,prediction,etc.,usesquaredEuclideandistancetomeasureerrororlosskmeansclustering,leastsquareregression,WeinerlteringSquaredlossisnotappropriateinmanysituationsSparse,high-dimensionaldataProbabilitydistributions,non-negativematricesCountingmeasuresCanweuseotherlossfunctions?p.2/?? KMeansClustering***Initializerepresentatives(means)p.3/?? KMeansClustering***Assigntonearestrepresentativep.4/?? KMeansClustering***Re-estimatemeansp.5/?? KMeansClustering***Assigntonearestmean(again)p.6/?? KMeansClustering***Re-estimatemeansTheobjectivefunctionis \n \rp.7/?? AgendaandOverviewofResultsBackground:BregmandivergencesPartI:Generalizationofcentroid-basedClusteringHardClusteringBregmandivergencesExponentialfamiliesSoftClusteringPartII:ConnectionsandExtensionsInformationTheoryCo-clusteringp.8/?? BregmanDivergenceszyxd (x,y)(z)f (x)ff(y)ff(x-y). '(y)isstrictlyconvex,differentiable "!#$&%'()#$#'(+*$*'%-#'(.p.9/?? Examples#$)/$/0isstrictlyconvexanddifferentiableon12 3!#$&%'()/$*'/0[squaredEuclideandistance]#54)27698:;6?;6(negativeentropy)isstrictlyconvexanddifferentiableonthe@-simplex A!#54%B()27698:;6=?CEDFGFH[KL-divergence]#$)*2I698:=?J6isstrictlyconvexanddifferentiableon12LKK A!#$&%'()2M698:CONFPF*?CNFPFH*QH[Itakura-Saitodistance]p.10/?? BregmanHardClusteringForsquaredloss,meanisthebestconstantpredictorR)S$ST)UV?WXYZS/$S*[/0Theorem:ForallBregmandivergencesR)S$ST)UV?WXYZS A!#$S%[(p.11/?? BregmanHardClusteringAlgorithmInitialize\R^]_`]8:Repeatuntilconvergence{AssignmentStep}Assign$tonearestclustera]whereb)UV?WXY]c A!#$&%R]c({Re-estimationstep}Forallb,recomputemeanR]asR])defg$T]p.12/?? PropertiesGuarantee:MonotonicallydecreasesobjectivefunctiontillconvergenceScalability:EveryiterationislinearinthesizeoftheinputExhaustiveness:Ifsuchanalgorithmexistsforalossfunctionh#$&%R(,thenhhastobeaBregmandivergenceLinearSeparators:ClustersareseparatedbyhyperplanesMixedDatatypes:AllowsappropriateBregmandivergenceforsubsetsoffeaturesp.13/?? TheExponentialFamilyDenition:Amultivariateparametricfamilywithdensity;ijlkmn#$()opq\r.*s#r(;ut#$(sisthecumulantorlog-partitionfunctionsuniquelydeterminesafamilyExamples:Gaussian,Bernoulli,Multinomial,Poissonrxesaparticulardistributioninthefamilysisastrictlyconvexfunctionp.14/?? AConnectionTheorem:Foranyregularexponentialfamily;ijlkmn,forall$vw=W#(,;ijlkmn#$)opq#* 3!#$&%R(x!#$(%forauniquelydeterminedx!,whereristhenaturalparameterandyistheexpectationparameterBregmanConvexCumulantExponentialf(m)y(q)d (x, )fLegendrep (x)(y,q)mp.15/?? ExamplesRegularexponentialfamiliesRegularBregmandivergencesGaussianSquaredLossMultinomialKL-divergenceGeometricItakura-SaitodistancePoissonI-divergencep.16/?? BregmanSoftClusteringAlgorithmInitialize\{z]%R]_`]8:Repeatuntilconvergence{ExpectationStep}Forall$&%b%theposteriorprobability;#b|$()z]opq#* A!#$&%R](}~#$%where~#$(isthenormalizationfunction{Maximizationstep}Forallb,z])QTd;#b|$(R])d;#b|$($d;#b|$(p.17/?? PropertiesGuarantee:MonotonicallydecreasesobjectivefunctiontillconvergenceScalability:EveryiterationislinearinthesizeofinputsInterpretability:ExponentialfamilyBregmandivergenceMixedDatatypes:Combinationofdifferentexponentialmodelsforvariousattributesp.18/?? ConclusionssoFarResultsKMeanstypealgorithmforallBregmandivergencesBijectionbetweenBregmandivergencesandexponentialfamiliesEfcientlearningofmixtureofexponentialdistributionsp.19/?? PartII:ConnectionsandExtensionsInformationTheoryCo-clusteringandMatrixApproximationp.20/?? RateDistortionTheoryXX^LossyLossyRate=Numberofbits/symbolDistortion= #%(For #%( ,whatistheminimumbits/symbolrequired?Goal:EncodeasourcedistributionusingasfewbitsaspossiblewithouttoomuchdistortionTheratedistortionfunction[S'48]p.21/?? RateDistortionTheoryXX^LossyLossyRate=Numberofbits/symbolDistortion= #%(For #%( ,whatistheminimumbits/symbolrequired?Goal:EncodeasourcedistributionusingasfewbitsaspossiblewithouttoomuchdistortionTheratedistortionfunction[S'48]# ()WXYDinikn#%(p.21/?? RateDistortionwithBregmanDivergencesTheorem:IfdistortionisaBregmandivergence,Either,# (isequaltotheShannon-BregmanlowerboundOr,||isniteWhen||isniteBregmandivergencesExponentialfamilydistributionsRatedistortionModelingwithmixtureofwithBregmandivergencesexponentialfamilydistributions# (canbeobtainedeitheranalyticallyorcomputationallyCompressionvs.lossinBregmaninformationformulationInformationbottleneckasaspecialcasep.22/?? PotentialApplicationsRatedistortionresultsforBregmandivergencesDesigningquantizersbasedonBregmandivergencesRatedistortionmaximumlikelihoodmixtureestimationNewlearningtechniquesbasedonexistinglossycodingmethodsCompressionvs.Bregmaninformationtrade-offIBstyletechniquesforBregmandivergencesp.23/?? RateDistortionTheorySource;#$(overa,Reproductionalphabeta,Distortionmeasure Rate:average#bitstoencodeasymbol,Distortion: #%(Themainquestion:Whatistheminimumnumberofbitsrequiredtoencodesuchthat #%( ?Answer:Rate-distortionfunction# ()WXYDinikn#(Minimizer;#|(determinestheoptimalencodingschemeandtheoptimalreproductionsupporta)\$$va%;#$() _p.24/?? MaximumLikelihoodMixtureEstimationGiven:Finitesetofi.i.d.samplesa)\$S_¡S8:,mixturemodelcardinality¢,parametricfamily£Objective:Findthemixturemodelof¢distributionsfrom£andoptimalassignmentssuchthatthedatalikelihoodofaismaximized[Nealetal.,'97]TheMLMEproblemisexactlyequivalenttotheminimumvariationalfreeenergyproblemWXYf"¤kDi¥nfA¤8`¦*k?;#%(*§#|(¨empiricaldistribution;choiceofmixturecomponentamixturemodel;;#%();#(;#|(s.t.;#|(v£p.25/?? RateDistortionwithFiniteCardinalitySupportGiven:Sourceempiricaldistributionoveranitesetofi.i.d.samplesa)\$S_¡S8:,reproductionsupportcardinality¢,distortionmeasure ,tolerabledistortionvalue Objective:Findthereproductionsupportaofcardinality¢andtheoptimalassignments;#|(suchthat# (isachieved,i.e.,WXYf¤kDinfA¤8`¦#(©ªk #%(¨whereªistheoptimalvariationalparameterfordistortion p.26/?? MLMERDFCEmpiricaldistributionSourceMixturemodelReproductionsupportsetChoiceof£Choiceof and VariationalfreeenergyRatedistortionobjectivefunctionConnectingmissinglinks:[Banerjeeetal.,'04]Bijectionbetweenexponentialdensities;ijlkmn#¬«andBregmandivergences !#«%y(=?;ijlkmn#$)* 3!#$%y(#s%(areLegendredualsandy)-s#r(,r)-#y(p.27/?? EquivalenceTheoremForagivenempiricaldistributionandmixturemodel/supportsetcardinality(A)RDFCproblemforBregmandivergence !anddistortionlevel (B)MLMEproblembasedonthescaledexponentialfamily£injTheorem.ProblemsAandBareexactlyequivalentwhenª)ª,thevariationalparametercorrespondingto and%sareLegendredualsFollowssincenegativelog-likelihooddistortion®maximumlikelihoodminimumdistortionp.28/?? BregmanInformation[Banerjeeetal.,'04]TheBregmanInformationofarandomvariableistheexpectedBregmandivergencetothemean,i.e.,!#() "!#%Examples:Variance:#$)|$|0®!#()¯|*0°MutualInformation:For~;#J(overconditionaldensities\;#±|J(,#³²)#;#±|J()*§#;#±|J(%~);#±|();#±(®!#~()´#;#±|(;#±()#±(p.29/?? Compressionvs.LossinBregmanInformationTheorem.ExpecteddistortionbetweensourceandreproductionrandomvariablesisequaltothelossintheBregmaninformationk "!#%()!#(*!#(when)LeadstotheconstrainedRDFCproblem:WXYDin\#(©ªk¥ 3!#%(µWXYDi¥n\#(+*ª!#(Compression¶#(BregmanInformation¶!#(p.30/?? InformationBottleneck:ASpecialCaseSource,Reproduction,anyotherrandomvariable±~)~#();#±|(and~)~#();#±|(Theorem.ForKL-divergence,theconstrainedRDFCproblemisWXYDi¥n\#(+*ª#±(Followssince #%()´#~|~(and!#~()#±(IBAssumptionsMutualinformationholdsallrelevantinformation·;#±|(istheappropriatesufcientstatisticrepresentationKL-divergenceistheappropriatedistortionmeasureConditionalindependencerelation¶¶±·~)¸¸~p.31/?? ConclusionsRatedistortionforBregmandivergencesRatedistortionmaximumlikelihoodmixtureestimationCompressionvs.lossinBregmaninformationtrade-offp.32/?? ComputationalBiologyGenes¹Genes¹ExperimentalConditionsExperimentalConditionsp.33/?? Co-clustering:BasicIntuitionº¼»1234561-6654-639351962358737-2684-223-6856-649252944308332-2480-215-6355-60925395OriginalMatrix¾º¼»135246430328083-24-21235378487-26-225-63-60535592951-66-63515493963-68-6452569294ReorderedMatrix¿¾Alowparameter,co-clusteringbasedapproximationÀÂÁÃ%Ã12101210301410501RowClusteringÄÃ%ÆÅ123133.583.5-23.32-64.053.593.7LowParameterMatrixÄÇÅ%Å123456110100020100103000101ColumnClusteringp.34/?? MainResults:Co-clusteringCo-clusteringbasedonBregmandivergencesMinimumBregmaninformationprinciplethatgeneralizesthemax-entropyandtheleast-squaresprinciplesMetaco-clusteringalgorithmwithlocaloptimalitypropertiesKnownco-clusteringalgorithms(sum-squareresidue,informationtheoretic)arespecialcasesNewclassofmatrixapproximationtechniquesp.35/?? ApplicationsProblemsCompressionwhilepreservingsummarystatisticsLearningfromsparse,highdimensionaldataMissingvaluepredictionLearningcorrelationsDomainsMicroarrayanalysis(genesandconditions)Textanalysis(wordsanddocuments)Recommendersystems(usersanditems)Marketanalysis(customersandproducts)p.36/?? DimensionalityReduction(3,20)(3,500)(3,2500)13891213643189204929291455335144621311239404049982911994447172337ConfusionmatricesfortheClassic3datasetwithdifferentnumberofdocumentclustersClusteringinterleavedwithimplicitdimensionalityreductionSuperiorperformanceascomparedtoone-sidedclusteringp.37/?? MissingValuePredictionAlgoSqE1SqE3IDiv1IDiv3PearsonError0.83980.76390.83970.77231.4211Meanabsoluteerrorforratings(0-5)inEachMoviedatasetSqE1/SqE3-squaredEuclideandistancewithschemes1and3;IDiv1/IDiv3-I-Divergencewithschemes1and3Assignzeromeasureformissingelements,co-clusterandusereconstructedmatrixforpredictionImplicitdiscoveryofcorrelatedsub-matricesp.38/?? LearningCorrelationsMovie ClustersUser Clusters1234567891012345678910Cluster1ItisaWonderfulLife,Casablanca,LifeisBeautiful,AnAffairtoRememberCluster4UsualSuspects,ManhattanMurderMystery,PulpFiction,NorthbyNorthWestCluster7StarTrekV,BladeRunner,TheTerminator,AClockworkOrangeUsercluster-MovieclustercorrelationsforsubsetofEachMoviedatasetUselowparameterrepresentationstodiscovercorrelationsbetweenrowandcolumnentitiesHelpfulindecisionsupportsystemsp.39/?? ExtensionsApplicabletomultidimensionaldatacubes(yes)Softco-clusteringalgorithms(yes)Generalco-clusteringmodels,e.g.,overlappingclusters(yes)p.40/?? SummaryPartI:Generalizationofcentroid-basedclusteringKMeanstypealgorithmforallBregmandivergencesBregmandivergencesExponentialfamiliesBregmansoftclusteringMixturemodelingwithexponentialfamilydistributionsPartII:ConnectionsandExtensionsMixturemodelingRatedistortionwithBregmandivergencesLowparametermatrixapproximationswithlossmeasuredbyBregmandivergencesMinimumBregmanInformationprincipleMetaco-clusteringalgorithmthatrendersPartIasaspecialcasep.41/?? HistoricalReferencesL.M.Bregman.Therelaxationmethodof®ndingthecommonpointofconvexsetsanditsapplicationtothesolutionofproblemsinconvexprogramming.USSRComputationalMathematicsandPhysics,7:200-217,1967.Problem:ÈÉËÊÌÍÏÎsubjecttoÑÒÔÓÕÖ×ÓØÙÖÚØÛÜÞÝßIterativeprocedure:1.StartwithÕàáâthatsatis®esãÌÍÕáÐÖÝäÒEå.SetæÖç.2.ComputeÕàèéêâtobetheBregmanprojectionofÕàèâontothehyperplaneÑÒëÓÕÖ×Ó,whereìÖæÈíîï.SetæÖæðñandrepeat.Convergestogloballyoptimalsolution.Thiscyclicprojectionmethodcanbeextendedtohalfspaceandconvexconstraints,whereeachprojectionisfollowedbyacorrection).CensorandLent(1981)coinedthetermBregmandistanceNaturalQuestion:HowimportantareBregmanDivergencesindataanalysis?p.42/??