/
The Optimality of Naive Bayes Harry Zhang Faculty of Computer Science University of New The Optimality of Naive Bayes Harry Zhang Faculty of Computer Science University of New

The Optimality of Naive Bayes Harry Zhang Faculty of Computer Science University of New - PDF document

olivia-moreira
olivia-moreira . @olivia-moreira
Follow
530 views
Uploaded On 2014-12-19

The Optimality of Naive Bayes Harry Zhang Faculty of Computer Science University of New - PPT Presentation

ca Abstract Naive Bayes is one of the most ef64257cient and effective inductive learning algorithms for machine learning and data mining Its competitive performance in classi64257ca tion is surprising because the conditional independence assumption o ID: 26190

Abstract Naive Bayes

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "The Optimality of Naive Bayes Harry Zhan..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

TheOptimalityofNaiveBayesHarryZhangFacultyofComputerScienceUniversityofNewBrunswickFredericton,NewBrunswick,CanadaE3B5A3email:hzhang@unb.caNaiveBayesisoneofthemostefÞcientandeffectiveinductivelearningalgorithmsformachinelearninganddatamining.ItscompetitiveperformanceinclassiÞca-tionissurprising,becausetheconditionalindependenceassumptiononwhichitisbased,israrelytrueinreal-worldapplications.Anopenquestionis:whatisthetruereasonforthesurprisinglygoodperformanceofnaiveBayesinclassiÞcation?Inthispaper,weproposeanovelexplanationonthesuperbclassiÞcationperformanceofnaiveBayes.Weshowthat,essentially,thedependencedistribution;i.e.,howthelocaldependenceofanodedistributesineach iscalledaBayesianclassiÞer.Assumethatallattributesareindependentgiventhevalueoftheclassvariable;thatis, toextenditsstructuretorepresentexplicitlythedependen-ciesamongattributes.AnaugmentednaiveBayesiannet-work,orsimplyaugmentednaiveBayes(ANB),isanex-tendednaiveBayes,inwhichtheclassnodedirectlypointstoallattributenodes,andthereexistlinksamongattributenodes.Figure2showsanexampleofANB.Fromtheviewofprobability,anANBrepresentsajointprobabilitydis-tributionrepresentedbelow.denotesanassignmenttovaluesofthepar-entsof.WeusetodenotetheparentsofANBisaspecialformofBayesiannetworksinwhichnonodeisspeciÞedasaclassnode.IthasbeenshownthatanyBayesiannetworkcanberepresentedbyanANB(Zhang&Ling2001).Therefore,anyjointprobabilitydistributioncanberepresentedbyanANB. 321 Figure2:AnexampleofANBWhenweapplyalogarithmtoinEquation1,theresultingclassiÞeristhesameas,inthesensethatanexamplebelongstothepositiveclass,ifandonlyifinEquation2issimilar.Inthispaper,weassumethat,givenaclassiÞer,anexamplebelongstothepositiveclass,ifandonlyifRelatedWorkManyempiricalcomparisonsbetweennaiveBayesandmod-erndecisiontreealgorithmssuchasC4.5(Quinlan1993)showedthatnaiveBayespredictsequallywellasC4.5(Lan-gley,Iba,&Thomas1992;Kononenko1990;Pazzani1996).ThegoodperformanceofnaiveBayesissurprisingbecauseitmakesanassumptionthatisalmostalwaysviolatedinreal-worldapplications:giventheclassvalue,allattributesareAnopenquestioniswhatisthetruereasonforthesur-prisinglygoodperformanceofnaiveBayesonmostclassiÞ-cationtasks?Intuitively,sincetheconditionalindependenceassumptionthatitisbasedonisalmostneverhold,itsper-formancemaybepoor.Ithasbeenobservedthat,however,itsclassiÞcationaccuracydoesnotdependonthedependen-cies;i.e.,naiveBayesmaystillhavehighaccuracyonthedatasetsinwhichstrongdependenciesexistamongattributes(Domingos&Pazzani1997).DomingosandPazzani(Domingos&Pazzani1997)presentanexplanationthatnaiveBayesowesitsgoodper-formancetothezero-onelossfunction.Thisfunctionde-ÞnestheerrorasthenumberofincorrectclassiÞcations(Friedman1996).Unlikeotherlossfunctions,suchasthesquarederror,thezero-onelossfunctiondoesnotpenalizeinaccurateprobabilityestimationaslongasthemaximumprobabilityisassignedtothecorrectclass.ThismeansthatnaiveBayesmaychangetheposteriorprobabilitiesofeachclass,buttheclasswiththemaximumposteriorprobabilityisoftenunchanged.Thus,theclassiÞcationisstillcorrect,althoughtheprobabilityestimationispoor.Forexample,letusassumethatthetrueprobabilitiesrespectively,andthattheprobabilityestimatesproducedbynaiveBayesare.Obviously,theprobabilityestimatesarepoor,buttheclassiÞcation(positive)isnotaffected.DomingosandPazzaniÕsexplanation(Domingos&Paz-zani1997)isveriÞedbytheworkofFranketal.(Frank2000),whichshowsthattheperformanceofnaiveBayesismuchworsewhenitisusedforregression(predictingacontinuousvalue).Moreover,evidencehasbeenfoundthatnaiveBayesproducespoorprobabilityestimates(Bennett2000;Monti&Cooper1999).Inouropinion,however,DomingosandPazzani(Domin-gos&Pazzani1997)ÕsexplanationisstillsuperÞcialasitdoesnotuncoverwhythestrongdependenciesamongat-tributescouldnotßiptheclassiÞcation.Fortheexampleabove,whythedependenciescouldnotmaketheprobabilityproducedbynaiveBayesbe?ThekeypointhereisthatweneedtoknowhowthedependenciesaffecttheclassiÞcation,andunderwhatconditionsthedependenciesdonotaffecttheclassiÞcation.TherehasbeensomeworktoexploretheoptimalityofnaiveBayes(Rachlin,Kasif,&Aha1994;Garg&Roth2001;Roth1999;Hand&Yu2001),butnoneofthemgiveanexplicitconditionfortheoptimalityofnaiveBayes.Inthispaper,weproposeanewexplanationthattheclas-siÞcationofnaiveBayesisessentiallyaffectedbythede-pendencedistribution,insteadbythedependenciesamongattributes.Inaddition,wepresentasufÞcientconditionfortheoptimalityofnaiveBayesundertheGaussiandistribu-tion,andshowtheoreticallywhennaiveBayesworkswell.ANewExplanationontheSuperbClassiÞcationPerformanceofNaiveBayesInthissection,weproposeanewexplanationforthesurpris-inglygoodclassiÞcationperformanceofnaiveBayes.Thebasicideacomesfromtheobservationasfollows.Inagivendataset,twoattributesmaydependoneachother,butthedependencemaydistributeevenlyineachclass.Clearly,inthiscase,theconditionalindependenceassumptionisvio-lated,butnaiveBayesisstilltheoptimalclassiÞer.Fur-ther,whateventuallyaffectstheclassiÞcationisthecom-binationofdependenciesamongallattributes.Ifwejustlookattwoattributes,theremayexiststrongdependencebe-tweenthemthataffectstheclassiÞcation.Whenthedepen-denciesamongallattributesworktogether,however,they =fE)niG(xi|pa(xi FromTheorem1,weknowthat,infact,itisthedepen-dencedistributionfactorthatdeterminesthedif-ferencebetweenanANBanditscorrespondentnaiveBayesintheclassiÞcation.Further,istheproductoflocaldependencederivativeratiosofallnodes.Therefore,itre-ßectstheglobaldependencedistribution(howeachlocalde-pendencedistributesineachclass,andhowalllocaldepen-denciesworktogether).Forexample,whenhasthesameclassiÞcationas.Infact,itisnotnecessarytorequire,inordertomakeanANBhasthesameclassiÞcationasitscorrespondentnaiveBayes,asshowninthetheorembelow.Theorem2Givenanexample,...,x,anisequaltoitscorrespondentnaiveBayesderzero-oneloss;i.e.,(DeÞnition1),ifandonlyifwhen;orwhenProof:TheproofisstraightforwardbyapplyDeÞnition1andTheorem1.FromTheorem2,ifthedistributionofthedependencesamongattributessatisÞescertainconditions,thennaiveBayesclassiÞesexactlythesameastheunderlyingANB,eventhoughtheremayexiststrongdependenciesamongat-tributes.Moreover,wehavethefollowingresults:1.When,thedependenciesinANBhasnoinßuenceontheclassiÞcation.Thatis,theclassiÞcationisexactlythesameasthatofitscorrespondentnaive.Thereexistthreecasesfornodependenceexistsamongattributes.foreachattribute;thatis,thelo-caldistributionofeachnodedistributesevenlyinboththeinßuencethatsomelocaldependenciessupportiscanceledoutbytheinßu-encethatotherlocaldependencessupportclassifyingdoesnotrequirethatThepreciseconditionisgivenbyTheorem2.Thatex-plainswhynaiveBayesstillproducesaccurateclassiÞca-tioneveninthedatasetswithstrongdependenciesamongattributes(Domingos&Pazzani1997).3.ThedependenciesinanANBßip(change)theclassiÞca-tionofitscorrespondentnaiveBayes,onlyiftheconditiongivenbyTheorem2isnolongertrue.Theorem2representsasufÞcientandnecessaryconditionfortheoptimalityofnaiveBayesonanexample.Ifforeachexampleintheexamplespace,,thennaiveBayesisgloballyoptimal.ConditionsfortheOptimalityofNaiveBayesInSection,weproposedthatnaiveBayesisoptimalifthedependencesamongattributescanceleachotherout.Thatis,undercircumstance,naiveBayesisstilloptimaleventhoughthedependencesdoexist.Inthissection,wein-vestigatenaiveBayesunderthemultivariateGaussiandis-tributionandproveasufÞcientconditionfortheoptimalityofnaiveBayes,assumingthedependencesamongattributesdoexist.Thatprovidesuswiththeoreticevidencethatthedependencesamongattributesmaycanceleachotherout.LetusrestrictourdiscussiontotwoattributesandassumethattheclassdensityisamultivariateGaussianinboththepositiveandnegativeclasses.Thatis,+)= 2|+|1/2e1 2(xµ+)T1+(xµ+),p(x1,Š 2||1/2e1 arethecovariancematri-cesinthepositiveandnegativeclassesrespectively,arethedeterminantsofaretheinversesofarethemeansofattributethepositiveandnegativeclassesrespectively,andarethetransposesofWeassumethattwoclasseshaveacommoncovariance,andhavethesamevarianceinbothclasses.Then,whenapplyingalogarithmtotheBayesianclassiÞer,deÞnedinEquation1,weobtaintheclassiÞerbelow.)=log (x1,Š)=Š1 Then,becauseoftheconditionalindependenceassump-tion,wehavethecorrespondentnaiveBayesianclassiÞer 2(µ+1Šµ1)x1+1 Assumethatareindependentif.If,wehave 22 22 22 22. canbesimpliÞedasbelow. .If,then )=(1 Itiseasytoverifythat,when,wecangetthesameinEquation15.Similarly,when,wehave)=(1+ FromEquation15and16,weseethatisaffected.Itistruethatincreases,ascreases.Thatmeans,theabsoluteratioofdistancesbetweentwomeansofclassesaffectsigniÞcantlytheperformanceofnaiveBayes.Moreprecisely,thelessabsoluteratio,thebet-terperformanceofnaiveBayes.Inthispaper,weproposeanewexplanationontheclassiÞca-tionperformanceofnaiveBayes.Weshowthat,essentially,thedependencedistribution;i.e.,howthelocaldependenceofanodedistributesineachclass,evenlyorunevenly,andhowthelocaldependenciesofallnodesworktogether,con-sistently(supportacertainclassiÞcation)orinconsistently(canceleachotherout),playsacrucialroleintheclassiÞca-tion.Weexplainwhyevenwithstrongdependencies,naiveBayesstillworkswell;i.e.,whenthosedependenciescanceleachotherout,thereisnoinßuenceontheclassiÞcation.Inthiscase,naiveBayesisstilltheoptimalclassiÞer.Inaddi-tion,weinvestigatedtheoptimalityofnaiveBayesundertheGaussiandistribution,andpresentedtheexplicitsufÞcientconditionunderwhichnaiveBayesisoptimal,eventhoughtheconditionalindependenceassumptionisviolated.ReferencesBennett,P.N.2000.AssessingthecalibrationofNaiveBayesÕposteriorestimates.InTechnicalReportNo.CMU-Domingos,P.,andPazzani,M.1997.Beyondinde-pendence:ConditionsfortheoptimalityofthesimpleBayesianclassiÞer.MachineLearningFrank,E.;Trigg,L.;Holmes,G.;andWitten,I.H.2000.NaiveBayesforregression.MachineLearningFriedman,J.1996.Onbias,variance,0/1-loss,andthecurseofdimensionality.DataMiningandKnowledgeDis-coveryGarg,A.,andRoth,D.2001.UnderstandingprobabilisticplassiÞers.InRaedt,L.D.,andFlach,P.,eds.,Proceed-ingsof12thEuropeanConferenceonMachineLearningSpringer.179Ð191.Hand,D.J.,andYu,Y.2001.IdiotsBayes-notsostupidafterall?InternationalStatisticalReviewKononenko,I.1990.ComparisonofinductiveandnaiveBayesianlearningapproachestoautomaticknowledgeac-quisition.InWielinga,B.,ed.,CurrentTrendsinKnowl-edgeAcquisition.IOSPress.Langley,P.;Iba,W.;andThomas,K.1992.AnanalysisofBayesianclassiÞers.InProceedingsoftheTenthNationalConferenceofArtiÞcialIntelligence.AAAIPress.223ÐMonti,S.,andCooper,G.F.1999.ABayesiannetworkclassiÞerthatcombinesaÞnitemixturemodelandaNaiveBayesmodel.InProceedingsofthe15thConferenceonUncertaintyinArtiÞcialIntelligence.MorganKaufmann.Pazzani,M.J.1996.SearchfordependenciesinBayesianclassiÞers.InFisher,D.,andLenz,H.J.,eds.,fromData:ArtiÞcialIntelligenceandStatisticsV.SpringerVerlag.Quinlan,J.1993.C4.5:ProgramsforMachineLearningMorganKaufmann:SanMateo,CA.Rachlin,J.R.;Kasif,S.;andAha,D.W.1994.To-wardabetterunderstandingofmemory-basedreasoningsystems.InProceedingsoftheEleventhInternationalMa-chineLearningConference.MorganKaufmann.242Ð250.Roth,D.1999.Learninginnaturallanguage.InProceed-ingsofIJCAIÕ99.MorganKaufmann.898Ð904.Zhang,H.,andLing,C.X.2001.LearnabilityofaugmentedNaiveBayesinnominaldomains.InBrod-ley,C.E.,andDanyluk,A.P.,eds.,ProceedingsoftheEighteenthInternationalConferenceonMachineLearn-.MorganKaufmann.617Ð623.