/
Journal of Japanese Society for Artificial Intelligenc Journal of Japanese Society for Artificial Intelligenc

Journal of Japanese Society for Artificial Intelligenc - PDF document

mitsue-stanley
mitsue-stanley . @mitsue-stanley
Follow
424 views
Uploaded On 2015-04-13

Journal of Japanese Society for Artificial Intelligenc - PPT Presentation

In Japanese translation by Naoki Abe A Short Introduction to Boosting Yoav Freund Robert E Schapire ATT Labs Research Shannon Laboratory 180 Park Avenue Florham Park NJ 07932 USA wwwresearchattcom f yoav schapire yoav schapire researchattcom Abstrac ID: 50767

Japanese translation

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Journal of Japanese Society for Artifici..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Journal of Japanese Society for Artificial Intelligence,14(5):771-780, September, 1999.AShortIntroductiontoBoostingYoavFreundRobertE.SchapireAT&TLabsShannonLaboratory180ParkAvenueFlorhamPark,NJ07932USAwww.research.att.com/yoav,schapireyoav,schapireBoostingisageneralmethodforimprovingtheaccuracyofanygivenlearningalgorithm.ThisshortoverviewpaperintroducestheboostingalgorithmAdaBoost,andexplainstheun-derlyingtheoryofboosting,includinganexplanationofwhyboostingoftendoesnotsufferfromoverÞttingaswellasboostingÕsrelationshiptosupport-vectormachines.Someexamplesofrecentapplicationsofboostingarealsodescribed.IntroductionAhorse-racinggambler,hopingtomaximizehiswinnings,decidestocreateacomputerprogramthatwillaccuratelypredictthewinnerofahorseracebasedontheusualinformation(numberofracesrecentlywonbyeachhorse,bettingoddsforeachhorse,etc.).Tocreatesuchaprogram,heasksahighlysuccessfulexpertgamblertoexplainhisbettingstrategy.Notsurprisingly,theexpertisunabletoarticulateagrandsetofrulesforselectingahorse.Ontheotherhand,whenpresentedwiththedataforaspeciÞcsetofraces,theexperthasnotroublecomingupwithaÒruleofthumbÓforthatsetofraces(suchas,ÒBetonthehorsethathasrecentlywonthemostracesÓorÒBetonthehorsewiththemostfavoredoddsÓ).Althoughsucharuleofthumb,byitself,isobviouslyveryroughandinaccurate,itisnotunreasonabletoexpectittoprovidepredictionsthatareatleastalittlebitbetterthanrandomguessing.Furthermore,byrepeatedlyaskingtheexpertÕsopinionondifferentcollectionsofraces,thegamblerisabletoextractmanyrulesofthumb.Inordertousetheserulesofthumbtomaximumadvantage,therearetwoproblemsfacedbythegambler:First,howshouldhechoosethecollectionsofracespresentedtotheexpertsoastoextractrulesofthumbfromtheexpertthatwillbethemostuseful?Second,oncehehascollectedmanyrulesofthumb,howcantheybecombinedintoasingle,highlyaccuratepredictionrule?referstoageneralandprovablyeffectivemethodofproducingaveryaccuratepre-dictionrulebycombiningroughandmoderatelyinaccuraterulesofthumbinamannersimilarto thatsuggestedabove.Thisshortpaperoverviewssomeoftherecentworkonboosting,focusingespeciallyontheAdaBoostalgorithmwhichhasundergoneintensetheoreticalstudyandempiricaltesting.AfterintroducingAdaBoost,wedescribesomeofthebasicunderlyingtheoryofboosting,includinganexplanationofwhyitoftentendsnottooverÞt.Wealsodescribesomeexperimentsandapplicationsusingboosting.BackgroundBoostinghasitsrootsinatheoreticalframeworkforstudyingmachinelearningcalledtheÒPACÓlearningmodel,duetoValiant[46];seeKearnsandVazirani[32]foragoodintroductiontothismodel.KearnsandValiant[30,31]weretheÞrsttoposethequestionofwhetheraÒweakÓlearn-ingalgorithmwhichperformsjustslightlybetterthanrandomguessinginthePACmodelcanbeÒboostedÓintoanarbitrarilyaccurateÒstrongÓlearningalgorithm.Schapire[38]cameupwiththeÞrstprovablepolynomial-timeboostingalgorithmin1989.Ayearlater,Freund[17]developedamuchmoreefÞcientboostingalgorithmwhich,althoughoptimalinacertainsense,neverthe-lesssufferedfromcertainpracticaldrawbacks.TheÞrstexperimentswiththeseearlyboostingalgorithmswerecarriedoutbyDrucker,SchapireandSimard[16]onanOCRtask.TheAdaBoostalgorithm,introducedin1995byFreundandSchapire[23],solvedmanyofthepracticaldifÞcultiesoftheearlierboostingalgorithms,andisthefocusofthispaper.PseudocodeforAdaBoostisgiveninFig.1.Thealgorithmtakesasinputatrainingset;:::;whereeachbelongstosomeinstancespace,andeachisinsomelabel.Formostofthispaper,weassume;later,wediscussextensionstothemulticlasscase.AdaBoostcallsagivenbaselearningalgorithmrepeatedlyinaseriesof;:::;T.Oneofthemainideasofthealgorithmistomaintainadistributionorsetofweightsoverthetrainingset.Theweightofthisdistributionontrainingexampleonround.Initially,allweightsaresetequally,butoneachround,theweightsofincorrectlyclassiÞedexamplesareincreasedsothattheweaklearnerisforcedtofocusonthehardexamplesinthetrainingset.TheweaklearnerÕsjobistoÞndaweakhypothesisappropriateforthedistribution.Thegoodnessofaweakhypothesisismeasuredbyitserror=Prrht(xi)6=yi]=Xi:ht(xi)6=yiDt(i):Noticethattheerrorismeasuredwithrespecttothedistributiononwhichtheweaklearnerwastrained.Inpractice,theweaklearnermaybeanalgorithmthatcanusetheweightsonthetrainingexamples.Alternatively,whenthisisnotpossible,asubsetofthetrainingexamplescanbesampledaccordingto,andthese(unweighted)resampledexamplescanbeusedtotraintheweaklearner.Relatingbacktothehorse-racingexample,theinstancescorrespondtodescriptionsofhorseraces(suchaswhichhorsesarerunning,whataretheodds,thetrackrecordsofeachhorse,etc.) Given:;:::;)=1For;:::;TTrainweaklearnerusingdistributionGetweakhypothesis!fwitherror=Pr 21t t.Update:Dt+1(i)=Dt(i) )exp( isanormalizationfactor(chosensothatwillbeadistribution).OutputtheÞnalhypothesis:)=signFigure1:TheboostingalgorithmAdaBoost. andthelabelsgivetheoutcomes(i.e.,thewinners)ofeachrace.TheweakhypothesesaretherulesofthumbprovidedbytheexpertgamblerwherethesubcollectionsthatheexaminesarechosenaccordingtothedistributionOncetheweakhypothesishasbeenreceived,AdaBoostchoosesaparameterasintheÞgure.Intuitively,measurestheimportancethatisassignedto.Notethat(whichwecanassumewithoutlossofgenerality),andthatgetslargerasgetssmaller.ThedistributionisnextupdatedusingtheruleshownintheÞgure.TheeffectofthisruleistoincreasetheweightofexamplesmisclassiÞedby,andtodecreasetheweightofcorrectlyclassiÞedexamples.Thus,theweighttendstoconcentrateonÒhardÓexamples.ÞnalhypothesisisaweightedmajorityvoteoftheweakhypotheseswhereistheweightassignedtoSchapireandSinger[42]showhowAdaBoostanditsanalysiscanbeextendedtohandleweakhypotheseswhichoutputreal-valuedorconÞdence-ratedpredictions.Thatis,foreachinstancetheweakhypothesisoutputsapredictionwhosesignisthepredictedlabel()andwhosemagnitudegivesameasureofÒconÞdenceÓintheprediction.Inthispaper,however,wefocusonlyonthecaseofbinary()valuedweak-hypothesispredictions. error 10 100 1000 0 5 10 15 20 cumulativedistribution -1 -0.5 0.5 1 0.5 1.0 #roundsmarginFigure2:ErrorcurvesandthemargindistributiongraphforboostingC4.5ontheletterdatasetasreportedbySchapireetal.[41].:thetrainingandtesterrorcurves(loweranduppercurves,respectively)ofthecombinedclassiÞerasafunctionofthenumberofroundsofboosting.ThehorizontallinesindicatethetesterrorrateofthebaseclassiÞeraswellasthetesterroroftheÞnalcombinedclassiÞer.:Thecumulativedistributionofmarginsofthetrainingexamplesafter5,100and1000iterations,indicatedbyshort-dashed,long-dashed(mostlyhidden)andsolidcurves,respectively. AnalyzingthetrainingerrorThemostbasictheoreticalpropertyofAdaBoostconcernsitsabilitytoreducethetrainingerror.Letuswritetheerror .SinceahypothesisthatguesseseachinstanceÕsclassatrandomhasanerrorrateof(onbinaryproblems),thusmeasureshowmuchbetterthanrandomareÕspredictions.FreundandSchapire[23]provethatthetrainingerror(thefractionofmistakesonthetrainingset)oftheÞnalhypothesisisatmost t(1t)=Ytq Thus,ifeachweakhypothesisisslightlybetterthanrandomsothatforsome,thenthetrainingerrordropsexponentiallyfast.Asimilarpropertyisenjoyedbypreviousboostingalgorithms.However,previousalgorithmsrequiredthatsuchalowerboundbeknownaprioribeforeboostingbegins.Inpractice,knowl-edgeofsuchaboundisverydifÞculttoobtain.AdaBoost,ontheotherhand,isinthatitadaptstotheerrorratesoftheindividualweakhypotheses.ThisisthebasisofitsnameÑÒAdaÓisshortforÒadaptive.ÓTheboundgiveninEq.(1),combinedwiththeboundsongeneralizationerrorgivenbelow,provethatAdaBoostisindeedaboostingalgorithminthesensethatitcanefÞcientlyconvertaweaklearningalgorithm(whichcanalwaysgenerateahypothesiswithaweakedgeforanydistribution)intoastronglearningalgorithm(whichcangenerateahypothesiswithanarbitrarilylowerrorrate,givensufÞcientdata). 0 5 10 15 20 25 30 0 5 10 15 20 25 30 C4.5 0 5 10 15 20 25 30 Figure3:ComparisonofC4.5versusboostingstumpsandboostingC4.5onasetof27benchmarkproblemsasreportedbyFreundandSchapire[21].Eachpointineachscatterplotshowsthetesterrorrateofthetwocompetingalgorithmsonasinglebenchmark.The-coordinateofeachpointgivesthetesterrorrate(inpercent)ofC4.5onthegivenbenchmark,andthe-coordinategivestheerrorrateofboostingstumps(leftplot)orboostingC4.5(rightplot).Allerrorrateshavebeenaveragedovermultipleruns. GeneralizationerrorFreundandSchapire[23]showedhowtoboundthegeneralizationerroroftheÞnalhypothesisintermsofitstrainingerror,thesamplesize,theVC-dimensionoftheweakhypothesisspaceandthenumberofboostingrounds.(TheVC-dimensionisastandardmeasureoftheÒcomplexityÓofaspaceofhypotheses.See,forinstance,Blumeretal.[5].)SpeciÞcally,theyusedtechniquesfromBaumandHaussler[4]toshowthatthegeneralizationerror,withhighprobability,isatmostPr[ Pr[denotesempiricalprobabilityonthetrainingsample.Thisboundsuggeststhatboost-ingwilloverÞtifrunfortoomanyrounds,i.e.,asbecomeslarge.Infact,thissometimesdoeshappen.However,inearlyexperiments,severalauthors[9,15,36]observedempiricallythatboost-ingoftendoesoverÞt,evenwhenrunforthousandsofrounds.Moreover,itwasobservedthatAdaBoostwouldsometimescontinuetodrivedownthegeneralizationerrorlongafterthetrainingerrorhadreachedzero,clearlycontradictingthespiritoftheboundabove.Forinstance,theleftsideofFig.2showsthetrainingandtestcurvesofrunningboostingontopofQuinlanÕsC4.5decision-treelearningalgorithm[37]ontheÒletterÓdataset.InresponsetotheseempiricalÞndings,Schapireetal.[41],followingtheworkofBartlett[2],gaveanalternativeanalysisintermsofthemarginsofthetrainingexamples.Themarginof 0 2 4 6 8 10 12 14 16 3 4 5 6 % Error Number of Classes AdaBoost Sleeping-experts Rocchio Naive-Bayes PrTFIDF 5 10 15 20 25 30 35 4 6 8 10 12 14 16 18 20 % Error Number of Classes AdaBoost Sleeping-experts Rocchio Naive-Bayes PrTFIDF Figure4:ComparisonoferrorratesforAdaBoostandfourothertextcategorizationmethods(naiveBayes,probabilisticTF-IDF,Rocchioandsleepingexperts)asreportedbySchapireandSinger[43].ThealgorithmsweretestedontwotextcorporaÑReutersnewswirearticles(left)andAPnewswireheadlines(right)Ñandwithvaryingnumbersofclasslabelsasindicatedonthe-axisofeachÞgure. examplex;yisdeÞnedtobe Itisanumberinin1;+1]whichispositiveifandonlyifcorrectlyclassiÞestheexample.More-over,themagnitudeofthemargincanbeinterpretedasameasureofconÞdenceintheprediction.Schapireetal.provedthatlargermarginsonthetrainingsettranslateintoasuperiorupperboundonthegeneralizationerror.SpeciÞcally,thegeneralizationerrorisatmostPr[margin(x;y d foranywithhighprobability.Notethatthisboundisentirelyindependentof,thenumberofroundsofboosting.Inaddition,Schapireetal.provedthatboostingisparticularlyaggressiveatreducingthemargin(inaquantiÞablesense)sinceitconcentratesontheexampleswiththesmallestmargins(whetherpositiveornegative).BoostingÕseffectonthemarginscanbeseenempirically,forinstance,ontherightsideofFig.2whichshowsthecumulativedistributionofmarginsofthetrainingexamplesontheÒletterÓdataset.Inthiscase,evenafterthetrainingerrorreacheszero,boostingcontinuestoincreasethemarginsofthetrainingexampleseffectingacorrespondingdropinthetesterror.Attempts(notalwayssuccessful)tousetheinsightsgleanedfromthetheoryofmarginshavebeenmadebyseveralauthors[7,27,34].ThebehaviorofAdaBoostcanalsobeunderstoodinagame-theoreticsettingasexploredbyFreundandSchapire[22,24](seealsoGroveandSchuurmans[27]andBreiman[8]).Inparticular,boostingcanbeviewedasrepeatedplayofacertaingame,andAdaBoostcanbeshowntobea specialcaseofamoregeneralalgorithmforplayingrepeatedgamesandforapproximatelysolvingagame.Thisalsoshowsthatboostingiscloselyrelatedtolinearprogrammingandonlinelearning.Relationtosupport-vectormachinesThemargintheorypointstoastrongconnectionbetweenboostingandthesupport-vectormachinesofVapnikandothers[6,12,47].Toclarifytheconnection,supposethatwehavealreadyfoundtheweakhypothesesthatwewanttocombineandareonlyinterestedinchoosingthecoefÞcients.OnereasonableapproachsuggestedbytheanalysisofAdaBoostÕsgeneralizationerroristochoosethecoefÞcientssothattheboundgiveninEq.(3)isminimized.Inparticular,supposethattheÞrsttermiszeroandletusconcentrateonthesecondtermsothatweareeffectivelyattemptingtomaximizethemarginofanytrainingexample.Tomakethisideaprecise,letusdenotethevectorofweak-hypothesispredictionsassociatedwiththeexamplex;y;:::;hwhichwecalltheinstancevectorandthevectorofcoefÞcients;:::; whichwecalltheweightvector.UsingthisnotationandthedeÞnitionofmargingiveninEq.(2)wecanwritethegoalofmaximizingtheminimummarginas jjjjwhere,forboosting,thenormsinthedenominatoraredeÞnedas:=max(WhentheÕsallhaverangeissimplyequaltoIncomparison,theexplicitgoalofsupport-vectormachinesistomaximizeaminimalmarginoftheformdescribedinEq.(4),butwherethenormsareinsteadEuclidean: Xt 2t;(x):=s Thus,SVMÕsusethenormforboththeinstancevectorandtheweightvector,whileAdaBoostusesthenormfortheinstancevectorandnormfortheweightvector.Whendescribedinthismanner,SVMandAdaBoostseemverysimilar.However,thereareseveralimportantdifferences:Differentnormscanresultinverydifferentmargins.ThedifferencebetweenthenormsmaynotbeverysigniÞcantwhenoneconsiderslowdimensionalspaces.How-ever,inboostingorinSVM,thedimensionisusuallyveryhigh,ofteninthemillionsormore.Insuchacase,thedifferencebetweenthenormscanresultinverylargedifferences Ofcourse,AdaBoostdoesnotexplicitlyattempttomaximizetheminimalmargin.Nevertheless,Schapireetal.Õs[41]analysissuggeststhatthealgorithmdoestrytomakethemarginsofallthetrainingexamplesaslargeaspossible,sointhissense,wecanregardthismaximumminimalmarginalgorithmasanillustrativeapproximationofAdaBoost.Infact,algorithmsthatexplicitlyattempttomaximizeminimalmarginhavenotbeenexperimentallyassuccessfulasAdaBoost[7,27]. inthemarginvalues.Thisseemstobeespeciallysowhenthereareonlyafewrelevantvariablessothatcanbeverysparse.Forinstance,supposetheweakhypothesesallhaveandthatthelabelonallexamplescanbecomputedbyamajorityvoteofoftheweakhypotheses.Inthiscase,itcanbeshownthatifthenumberofrelevantweakisasmallfractionofthetotalnumberofweakhypothesesthenthemarginassociatedwithAdaBoostwillbemuchlargerthantheoneassociatedwithsupportvectorThecomputationrequirementsaredifferent.Thecomputationinvolvedinmaximizingthemarginismathematicalprogramming,i.e.,maximizingamathematicalexpressiongivenasetofinequalities.ThedifferencebetweenthetwomethodsinthisregardisthatSVMcor-respondstoquadraticprogramming,whileAdaBoostcorrespondsonlytolinearprogram-.(Infact,asnotedabove,thereisadeeprelationshipbetweenAdaBoostandlinearprogrammingwhichalsoconnectsAdaBoostwithgametheoryandonlinelearning[22].)AdifferentapproachisusedtosearchefÞcientlyinhighdimensionalspace.programmingismorecomputationallydemandingthanlinearprogramming.However,thereisamuchmoreimportantcomputationaldifferencebetweenSVMandboostingalgorithms.PartofthereasonfortheeffectivenessofSVMandAdaBoostisthattheyÞndlinearclassi-Þersforextremelyhighdimensionalspaces,sometimesspacesofinÞnitedimension.WhiletheproblemofoverÞttingisaddressedbymaximizingthemargin,thecomputationalprob-lemassociatedwithoperatinginhighdimensionalspacesremains.SupportvectormachinesdealwiththisproblemthroughthemethodofkernelswhichallowalgorithmstoperformlowdimensionalcalculationsthataremathematicallyequivalenttoinnerproductsinahighdimensionalÒvirtualÓspace.Theboostingapproachisinsteadtoemploygreedysearchfromthisperspective,theweaklearnerisanoracleforÞndingcoordinatesofthathaveanon-negligiblecorrelationwiththelabel.Thereweightingoftheexampleschangesthedistributionwithrespecttowhichthecorrelationismeasured,thusguidingtheweaklearnertoÞnddifferentcorrelatedcoordinates.MostoftheactualworkinvolvedinapplyingSVMorAdaBoosttospeciÞcclassiÞcationproblemshastodowithselectingtheappropriateker-nelfunctionintheonecaseandweaklearningalgorithmintheother.Askernelsandweaklearningalgorithmsareverydifferent,theresultinglearningalgorithmsusuallyoperateinverydifferentspacesandtheclassiÞersthattheygenerateareextremelydifferent.MulticlassclassiÞcationSofar,wehaveonlyconsideredbinaryclassiÞcationproblemsinwhichthegoalistodistinguishbetweenonlytwopossibleclasses.Many(perhapsmost)real-worldlearningproblems,however,withmorethantwopossibleclasses.ThereareseveralmethodsofextendingAda-Boosttothemulticlasscase.Themoststraightforwardgeneralization[23],calledAdaBoost.M1,isadequatewhentheweaklearnerisstrongenoughtoachievereasonablyhighaccuracy,evenontheharddistributionscreatedbyAdaBoost.However,thismethodfailsiftheweaklearnercannotachieveatleast50%accuracywhenrunontheseharddistributions. :1/0.27,4/0.17 5:0/0.26,5/0.17 7:4/0.25,9/0.18 1:9/0.15,7/0.15 2:0/0.29,2/0.19 9:7/0.25,9/0.17 :5/0.28,3/0.28 9:7/0.19,9/0.19 4:1/0.23,4/0.23 4:1/0.21,4/0.20 4:9/0.16,4/0.16 9:9/0.17,4/0.17 :4/0.18,9/0.16 4:4/0.21,1/0.18 7:7/0.24,9/0.21 9:9/0.25,7/0.22 4:4/0.19,9/0.16 Figure5:AsampleoftheexamplesthathavethelargestweightonanOCRtaskasreportedbyFreundandSchapire[21].Theseexampleswerechosenafter4roundsofboosting(topline),12rounds(middle)and25rounds(bottom).Underneatheachimageisalineoftheform,whereisthelabeloftheexample,arethelabelsthatgetthehighestandsecondhighestvotefromthecombinedhypothesisatthatpointintherunofthealgorithm,andarethecorrespondingnormalizedscores. Forthelattercase,severalmoresophisticatedmethodshavebeendeveloped.Thesegenerallyworkbyreducingthemulticlassproblemtoalargerbinaryproblem.SchapireandSingerÕs[42]algorithmAdaBoost.MHworksbycreatingasetofbinaryproblems,foreachexampleeachpossiblelabel,oftheform:ÒForexample,isthecorrectlabelorisitoneoftheotherlabels?ÓFreundandSchapireÕs[23]algorithmAdaBoost.M2(whichisaspecialcaseofSchapireandSingerÕs[42]AdaBoost.MRalgorithm)insteadcreatesbinaryproblems,foreachexamplewithcorrectlabelandeachincorrectoftheform:ÒForexample,isthecorrectlabelThesemethodsrequireadditionaleffortinthedesignoftheweaklearningalgorithm.Adif-ferenttechnique[39],whichincorporatesDietterichandBakiriÕs[14]methodoferror-correctingoutputcodes,achievessimilarprovableboundstothoseofAdaBoost.MHandAdaBoost.M2,butcanbeusedwithanyweaklearnerwhichcanhandlesimple,binarylabeleddata.SchapireandSinger[42]giveyetanothermethodofcombiningboostingwitherror-correctingoutputcodes. ExperimentsandapplicationsPractically,AdaBoosthasmanyadvantages.Itisfast,simpleandeasytoprogram.Ithasnoparameterstotune(exceptforthenumberofround).ItrequiresnopriorknowledgeabouttheweaklearnerandsocanbeßexiblycombinedwithmethodforÞndingweakhypotheses.Finally,itcomeswithasetoftheoreticalguaranteesgivensufÞcientdataandaweaklearnerthatcanreliablyprovideonlymoderatelyaccurateweakhypotheses.Thisisashiftinmindsetforthelearning-systemdesigner:insteadoftryingtodesignalearningalgorithmthatisaccurateovertheentirespace,wecaninsteadfocusonÞndingweaklearningalgorithmsthatonlyneedtobebetterthanrandom.Ontheotherhand,somecaveatsarecertainlyinorder.Theactualperformanceofboostingonaparticularproblemisclearlydependentonthedataandtheweaklearner.Consistentwiththeory,boostingcanfailtoperformwellgiveninsufÞcientdata,overlycomplexweakhypothesesorweakhypotheseswhicharetooweak.Boostingseemstobeespeciallysusceptibletonoise[13](moreonthislater).AdaBoosthasbeentestedempiricallybymanyresearchers,including[3,13,15,29,33,36,45].Forinstance,FreundandSchapire[21]testedAdaBoostonasetofUCIbenchmarkdatasets[35]usingC4.5[37]asaweaklearningalgorithm,aswellasanalgorithmwhichÞndsthebestÒdecisionstumpÓorsingle-testdecisiontree.SomeoftheresultsoftheseexperimentsareshowninFig.3.AscanbeseenfromthisÞgure,evenboostingtheweakdecisionstumpscanusuallygiveasgoodresultsasC4.5,whileboostingC4.5generallygivesthedecision-treealgorithmasigniÞcantimprovementinperformance.Inanothersetofexperiments,SchapireandSinger[43]usedboostingfortextcategorizationtasks.Forthiswork,weakhypotheseswereusedwhichtestonthepresenceorabsenceofawordorphrase.SomeresultsoftheseexperimentscomparingAdaBoosttofourothermethodsareshowninFig.4.Innearlyalloftheseexperimentsandforalloftheperformancemeasurestested,boostingperformedaswellorsigniÞcantlybetterthantheothermethodstested.BoostinghasalsobeenappliedtotextÞltering[44],ÒrankingÓproblems[19]andclassiÞcationproblemsarisinginnaturallanguageprocessing[1,28].ThegeneralizationofAdaBoostbySchapireandSinger[42]providesaninterpretationofboostingasagradient-descentmethod.Apotentialfunctionisusedintheiralgorithmtoasso-ciateacostwitheachexamplebasedonitscurrentmargin.Usingthispotentialfunction,theoperationofAdaBoostcanbeinterpretedasacoordinate-wisegradientdescentinthespaceoflinearclassiÞers(overweakhypotheses).Basedonthisinsight,onecandesignalgorithmsforlearningpopularclassiÞcationrules.Inrecentwork,CohenandSinger[11]showedhowtoapplyboostingtolearnrulelistssimilartothosegeneratedbysystemslikeRIPPER[10],IREP[26]andC4.5rules[37].Inotherwork,FreundandMason[20]showedhowtoapplyboostingtolearnageneralizationofdecisiontreescalledÒalternatingtrees.ÓAnicepropertyofAdaBoostisitsabilitytoidentifyoutliers,i.e.,examplesthatareeithermislabeledinthetrainingdata,orwhichareinherentlyambiguousandhardtocategorize.BecauseAdaBoostfocusesitsweightonthehardestexamples,theexampleswiththehighestweightoftenturnouttobeoutliers.AnexampleofthisphenomenoncanbeseeninFig.5takenfromanOCRexperimentconductedbyFreundandSchapire[21].Whenthenumberofoutliersisverylarge,theemphasisplacedonthehardexamplescan becomedetrimentaltotheperformanceofAdaBoost.ThiswasdemonstratedveryconvincinglybyDietterich[13].Friedmanetal.[25]suggestedavariantofAdaBoost,calledÒGentleAdaBoostÓwhichputslessemphasisonoutliers.Inrecentwork,Freund[18]suggestedanotheralgorithm,calledÒBrownBoost,Ówhichtakesamoreradicalapproachthatde-emphasizesoutlierswhenitseemsclearthattheyareÒtoohardÓtoclassifycorrectly.ThisalgorithmisanadaptiveversionofFreundÕs[17]Òboost-by-majorityÓalgorithm.Thiswork,togetherwithSchapireÕs[40]workonÒdriftinggames,Órevealsomeinterestingnewrelationshipsbetweenboosting,Brownianmotion,andrepeatedgameswhileraisingmanynewopenproblemsanddirectionsforfutureresearch.References[1]StevenAbney,RobertE.Schapire,andYoramSinger.BoostingappliedtotaggingandPPattachment.InProceedingsoftheJointSIGDATConferenceonEmpiricalMethodsinNaturalLanguageProcessingandVeryLargeCorpora,1999.[2]PeterL.Bartlett.ThesamplecomplexityofpatternclassiÞcationwithneuralnetworks:thesizeoftheweightsismoreimportantthanthesizeofthenetwork.IEEETransactionsonInformationTheory,44(2):525Ð536,March1998.[3]EricBauerandRonKohavi.AnempiricalcomparisonofvotingclassiÞcationalgorithms:Bagging,boosting,andvariants.MachineLearning,toappear.[4]EricB.BaumandDavidHaussler.Whatsizenetgivesvalidgeneralization?NeuralCompu-,1(1):151Ð160,1989.[5]AnselmBlumer,AndrzejEhrenfeucht,DavidHaussler,andManfredK.Warmuth.Learn-abilityandtheVapnik-Chervonenkisdimension.JournaloftheAssociationforComputingMachinery,36(4):929Ð965,October1989.[6]BernhardE.Boser,IsabelleM.Guyon,andVladimirN.Vapnik.AtrainingalgorithmforoptimalmarginclassiÞers.InProceedingsoftheFifthAnnualACMWorkshoponComputa-tionalLearningTheory,pages144Ð152,1992.[7]LeoBreiman.Arcingtheedge.TechnicalReport486,StatisticsDepartment,UniversityofCaliforniaatBerkeley,1997.[8]LeoBreiman.PredictiongamesandarcingclassiÞers.TechnicalReport504,StatisticsDe-partment,UniversityofCaliforniaatBerkeley,1997.[9]LeoBreiman.ArcingclassiÞers.TheAnnalsofStatistics,26(3):801Ð849,1998.[10]WilliamCohen.Fasteffectiveruleinduction.InProceedingsoftheTwelfthInternationalConferenceonMachineLearning,pages115Ð123,1995.[11]WilliamW.CohenandYoramSinger.Asimple,fast,andeffectiverulelearner.InProceed-ingsoftheSixteenthNationalConferenceonArtiÞcialIntelligence,1999. [12]CorinnaCortesandVladimirVapnik.Support-vectornetworks.MachineLearning20(3):273Ð297,September1995.[13]ThomasG.Dietterich.Anexperimentalcomparisonofthreemethodsforconstructingen-semblesofdecisiontrees:Bagging,boosting,andrandomization.MachineLearning,toappear.[14]ThomasG.DietterichandGhulumBakiri.Solvingmulticlasslearningproblemsviaerror-correctingoutputcodes.JournalofArtiÞcialIntelligenceResearch,2:263Ð286,January[15]HarrisDruckerandCorinnaCortes.Boostingdecisiontrees.InAdvancesinNeuralInfor-mationProcessingSystems8,pages479Ð485,1996.[16]HarrisDrucker,RobertSchapire,andPatriceSimard.Boostingperformanceinneuralnet-works.InternationalJournalofPatternRecognitionandArtiÞcialIntelligence,7(4):705Ð719,1993.[17]YoavFreund.Boostingaweaklearningalgorithmbymajority.InformationandComputation121(2):256Ð285,1995.[18]YoavFreund.Anadaptiveversionoftheboostbymajorityalgorithm.InProceedingsoftheTwelfthAnnualConferenceonComputationalLearningTheory,1999.[19]YoavFreund,RajIyer,RobertE.Schapire,andYoramSinger.AnefÞcientboostingalgorithmforcombiningpreferences.InMachineLearning:ProceedingsoftheFifteenthInternationalConference,1998.[20]YoavFreundandLlewMason.Thealternatingdecisiontreelearningalgorithm.InMachineLearning:ProceedingsoftheSixteenthInternationalConference,1999.toappear.[21]YoavFreundandRobertE.Schapire.Experimentswithanewboostingalgorithm.InchineLearning:ProceedingsoftheThirteenthInternationalConference,pages148Ð156,[22]YoavFreundandRobertE.Schapire.Gametheory,on-linepredictionandboosting.InProceedingsoftheNinthAnnualConferenceonComputationalLearningTheory,pages325Ð332,1996.[23]YoavFreundandRobertE.Schapire.Adecision-theoreticgeneralizationofon-linelearningandanapplicationtoboosting.JournalofComputerandSystemSciences,55(1):119Ð139,August1997.[24]YoavFreundandRobertE.Schapire.Adaptivegameplayingusingmultiplicativeweights.GamesandEconomicBehavior,toappear.[25]JeromeFriedman,TrevorHastie,andRobertTibshirani.Additivelogisticregression:asta-tisticalviewofboosting.TechnicalReport,1998. [26]JohannesF¬urnkranzandGerhardWidmer.Incrementalreducederrorpruning.InMachineLearning:ProceedingsoftheEleventhInternationalConference,pages70Ð77,1994.[27]AdamJ.GroveandDaleSchuurmans.Boostinginthelimit:Maximizingthemarginoflearnedensembles.InProceedingsoftheFifteenthNationalConferenceonArtiÞcialIntelli-gence,1998.[28]MasahikoHaruno,SatoshiShirai,andYoshifumiOoyama.Usingdecisiontreestoconstructapracticalparser.MachineLearning,34:131Ð149,1999.[29]JeffreyC.JacksonandMarkW.Craven.Learningsparseperceptrons.InAdvancesinNeuralInformationProcessingSystems8,pages654Ð660,1996.[30]MichaelKearnsandLeslieG.Valiant.LearningBooleanformulaeorÞniteautomataisashardasfactoring.TechnicalReportTR-14-88,HarvardUniversityAikenComputationLaboratory,August1988.[31]MichaelKearnsandLeslieG.Valiant.CryptographiclimitationsonlearningBooleanformu-laeandÞniteautomata.JournaloftheAssociationforComputingMachinery,41(1):67Ð95,January1994.[32]MichaelJ.KearnsandUmeshV.Vazirani.AnIntroductiontoComputationalLearningThe-.MITPress,1994.[33]RichardMaclinandDavidOpitz.Anempiricalevaluationofbaggingandboosting.InProceedingsoftheFourteenthNationalConferenceonArtiÞcialIntelligence,pages546Ð551,[34]LlewMason,PeterBartlett,andJonathanBaxter.DirectoptimizationofmarginsimprovesgeneralizationincombinedclassiÞers.Technicalreport,DeparmentofSystemsEngineering,AustralianNationalUniversity,1998.[35]C.J.MerzandP.M.Murphy.UCIrepositoryofmachinelearningdatabases,1998.www.ics.uci.edu/mlearn/MLRepository.html.[36]J.R.Quinlan.Bagging,boosting,andC4.5.InProceedingsoftheThirteenthNationalConferenceonArtiÞcialIntelligence,pages725Ð730,1996.[37]J.RossQuinlan.C4.5:ProgramsforMachineLearning.MorganKaufmann,1993.[38]RobertE.Schapire.Thestrengthofweaklearnability.MachineLearning,5(2):197Ð227,[39]RobertE.Schapire.Usingoutputcodestoboostmulticlasslearningproblems.InMachineLearning:ProceedingsoftheFourteenthInternationalConference,pages313Ð321,1997.[40]RobertE.Schapire.Driftinggames.InProceedingsoftheTwelfthAnnualConferenceonComputationalLearningTheory,1999. [41]RobertE.Schapire,YoavFreund,PeterBartlett,andWeeSunLee.Boostingthemar-gin:Anewexplanationfortheeffectivenessofvotingmethods.TheAnnalsofStatistics26(5):1651Ð1686,October1998.[42]RobertE.SchapireandYoramSinger.ImprovedboostingalgorithmsusingconÞdence-ratedpredictions.InProceedingsoftheEleventhAnnualConferenceonComputationalLearning,pages80Ð91,1998.Toappear,MachineLearning[43]RobertE.SchapireandYoramSinger.BoosTexter:Aboosting-basedsystemfortextcatego-MachineLearning,toappear.[44]RobertE.Schapire,YoramSinger,andAmitSinghal.BoostingandRocchioappliedtotextÞltering.InSIGIRÕ98:Proceedingsofthe21stAnnualInternationalConferenceonResearchandDevelopmentinInformationRetrieval,1998.[45]HolgerSchwenkandYoshuaBengio.Trainingmethodsforadaptiveboostingofneuralnet-works.InAdvancesinNeuralInformationProcessingSystems10,pages647Ð653,1998.[46]L.G.Valiant.Atheoryofthelearnable.CommunicationsoftheACM,27(11):1134Ð1142,November1984.[47]VladimirN.Vapnik.TheNatureofStatisticalLearningTheory.Springer,1995.