/
Journal of Mac hine Learning Researc   Submitted  Revi Journal of Mac hine Learning Researc   Submitted  Revi

Journal of Mac hine Learning Researc Submitted Revi - PDF document

olivia-moreira
olivia-moreira . @olivia-moreira
Follow
382 views
Uploaded On 2015-05-03

Journal of Mac hine Learning Researc Submitted Revi - PPT Presentation

H Lee herbieamsucscedu Dep artment of Applie Math and Statistics Scho ol of Engine ering University of California Santa Cruz Santa Cruz CA 95064 USA Merlise A Clyde cl ydest tdukeedu Institute of Statistics and De cision Scienc es Box 90251 Duke Uni ID: 59913

Lee herbieamsucscedu Dep

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Journal of Mac hine Learning Researc S..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

JournalofMachineLearningResearch5(2004)143-151Submitted1/03;Revised8/03;Published2/04LosslessOnlineBayesianBaggingHerbertK.H.Leeherbie@ams.ucsc.eduDepartmentofAppliedMathandStatisticsSchoolofEngineeringUniversityofCalifornia,SantaCruzSantaCruz,CA95064,USAMerliseA.Clydeclyde@stat.duke.eduInstituteofStatisticsandDecisionSciencesBox90251DukeUniversityDurham,NC27708,USAEditor:BinYuAbstractBaggingfrequentlyimprovesthepredictiveperformanceofamodel.Anonlineversionhasrecentlybeenintroduced,whichattemptstogainthebene¯tsofanonlinealgorithmwhileapproximatingregularbagging.However,regularonlinebaggingisanapproximationtoitsbatchcounterpartandsoisnotlosslesswithrespecttothebaggingoperation.ByoperatingundertheBayesianparadigm,weintroduceanonlineBayesianversionofbaggingwhichisexactlyequivalenttothebatchBayesianversion,andthuswhencombinedwithalosslesslearningalgorithmgivesacompletelylosslessonlinebaggingalgorithm.WealsonotethattheBayesianformulationresolvesatheoreticalproblemwithbagging,produceslessvariabilityinitsestimates,andcanimprovepredictiveperformanceforsmallerdatasets.ords:Classi¯cationTree,BayesianBootstrap,DirichletDistribution1.IntroductionInatypicalpredictionproblem,thereisatrade-o®betweenbiasandvariance,inthatafteracertainamountof¯tting,anyincreaseintheprecisionofthe¯twillcauseanincreaseinthepredictionvarianceonfutureobservations.Similarly,anyreductioninthepredictionvariancecausesanincreaseintheexpectedbiasforfuturepredictions.Breiman(1996)introducedbaggingasamethodofreducingthepredictionvariancewithouta®ectingthepredictionbias.Asaresult,predictiveperformancecanbesigni¯cantlyimproved.Bagging,shortfor\BootstrapAGGregatING",isanensemblelearningmethod.Insteadofmakingpredictionsfromasinglemodel¯tontheobserveddata,bootstrapsamplesaretakenofthedata,themodelis¯toneachsample,andthepredictionsareaveragedoverallofthe¯ttedmodelstogetthebaggedprediction.Breiman(1996)explainsthatbaggingworkswellforunstablemodelingprocedures,i.e.thoseforwhichtheconclusionsaresensitivetosmallchangesinthedata.Healsogivesatheoreticalexplanationofhowbaggingworks,demonstratingthereductioninmean-squaredpredictionerrorforunstablec°2004HerbertK.H.LeeandMerliseA.Clyde. LeeandClydeprocedures.Breiman(1994)demonstratedthattreemodels,amongothers,areempiricallyunstable.Onlinebagging(OzaandRussell,2001)wasdevelopedtoimplementbaggingsequen-tially,astheobservationsappear,ratherthaninbatchoncealloftheobservationshavearrived.Itusesanasymptoticapproximationtomimictheresultsofregularbatchbagging,andassuchitisnotalosslessalgorithm.Onlinealgorithmshavemanyusesinmoderncomputing.Byupdatingsequentially,theupdateforanewobservationisrelativelyquickcomparedtore-¯ttingtheentiredatabase,makingreal-timecalculationsmorefeasible.Suchalgorithmsarealsoadvantageousforextremelylargedatasetswherereadingthroughthedatajustonceistime-consuming,sobatchalgorithmswhichwouldrequiremultiplepassesthroughthedatawouldbeinfeasible.Inthispaper,weconsideraBayesianversionofbagging(ClydeandLee,2001)basedontheBayesianbootstrap(Rubin,1981).Thisovercomesatechnicaldi±cultywiththeusualbootstrapinbagging.Italsoleadstoatheoreticalreductioninvarianceoverthebootstrapforcertainclassesofestimators,andasigni¯cantreductioninobservedvarianceanderrorratesforsmallerdatasets.Wepresentanonlineversionwhich,whencombinedwithalosslessonlinemodel-¯ttingalgorithm,continuestobelosslesswithrespecttothebaggingoperation,incontrasttoordinaryonlinebagging.TheBayesianapproachrequiresthelearningbasealgorithmtoacceptweightedsamples.Inthenextsectionwereviewthebasicsofthebaggingalgorithm,ofonlinebagging,andofBayesianbagging.NextweintroduceouronlineBayesianbaggingalgorithm.Wethendemonstrateitse±cacywithclassi¯cationtreesonavarietyofexamples.2.BaggingInordinary(batch)bagging,bootstrapre-samplingisusedtoreducethevariabilityofanunstableestimator.Aparticularmodeloralgorithm,suchasaclassi¯cationtree,isspeci¯edforlearningfromasetofdataandproducingpredictions.ForaparticulardatasetXm,denotethevectorofpredictions(attheobservedsitesoratnewlocations)byG(Xm).DenotetheobserveddatabyX=(x1;:::;xn).Abootstrapsampleofthedataisasamplewithreplacement,sothatXm=(xm1;:::;xmn),wheremi2f1;:::;ngwithrepetitionsallowed.Xmcanalsobethoughtofasare-weightedversionofX,wheretheweights,!(m)iaredrawnfromthesetf0;1n;2n;:::;1g,i.e.,n!(m)iisthenumberoftimesthatxiappearsinthemthbootstrapsample.Wedenotetheweightedsampleas(X;!(m)).Foreachbootstrapsample,themodelproducespredictionsG(Xm)=G(Xm)1;:::;G(Xm)PwherePisthenumberofpredictionsites.Mtotalbootstrapsamplesareused.Thebaggedpredictorforthejthelementisthen1MMXm=1G(Xm)j=1MMXm=1G(X;!(m))j;orinthecaseofclassi¯cation,thejthelementispredictedtobelongtothemostfrequentlypredictedcategorybyG(X1)j;:::;G(XM)j.Aversionofpseudocodeforimplementingbaggingis1.Form2f1;:::;Mg,144 LosslessOnlineBayesianBagging(a)Drawabootstrapsample,Xm,fromX.(b)FindpredictedvaluesG(Xm).2.Thebaggingpredictoris1MPM=1G(Xm).Equivalently,thebootstrapsamplecanbeconvertedtoaweightedsample(X;!(m))wheretheweights!(m)iarefoundbytakingthenumberoftimesxiappearsinthebootstrapsampleanddividingbyn.Thustheweightswillbedrawnfromf0;1n;2n;:::;1gandwillsumto1.Thebaggingpredictorusingtheweightedformulationis1MPM=1G(Xm;!(m))forregression,orthepluralityvoteforclassi¯cation.2.1OnlineBaggingOnlinebagging(OzaandRussell,2001)wasrecentlyintroducedasasequentialapproxima-tiontobatchbagging.Inbatchbagging,theentiredatasetiscollected,andthenbootstrapsamplesaretakenfromthewholedatabase.Anonlinealgorithmmustprocessobservationsastheyarrive,andthuseachobservationmustberesampledarandomnumberoftimeswhenitarrives.ThealgorithmproposedbyOzaandRussellresampleseachobservationaccordingtoaPoissonrandomvariablewithmean1,i.e.,P(Km=k)=exp(¡1)=k!,whereKmisthenumberofresamplesin\bootstrapsample"m,Km2f0;1;:::g.Thusaseachobservationarrives,itisaddedKmtimestoXm,andthenG(Xm)isupdated,andthisisdoneform2f1;:::;Mg.PseudocodeforonlinebaggingisFori2f1;:::;ng,1.Form2f1;:::;Mg,(a)DrawaweightKmfromaPoisson(1)randomvariableandaddKmcopiesofxitoXm.(b)FindpredictedvaluesG(Xm).2.Thecurrentbaggingpredictoris1MPM=1G(Xm).Ideally,step1(b)isaccomplishedwithalosslessonlineupdatethatincorporatestheKmnewpointswithoutre¯ttingtheentiremodel.Wenotethatnmaynotbeknownaheadoftime,butthebaggingpredictorisavalidapproximationateachstep.Onlinebaggingisnotguaranteedtoproducethesameresultsasbatchbagging.Inparticular,itiseasytoseethatafternpointshavebeenobserved,thereisnoguaranteethatXmwillcontainexactlynpoints,asthePoissonweightsarenotconstrainedtoadduptonlikearegularbootstrapsample.Whileithasbeenshown(OzaandRussell,2001)thatthesesamplesconvergeasymptoticallytotheappropriatebootstrapsamples,theremaybesomediscrepancyinpractice.Thuswhileitcanbecombinedwithalosslessonlinelearningalgorithm(suchasforaclassi¯cationtree),thebaggingpartoftheonlineensembleprocedureisnotlossless.145 LeeandClyde2.2BayesianBaggingOrdinarybaggingisbasedontheordinarybootstrap,whichcanbethoughtofasreplacingtheoriginalweightsof1noneachpointwithweightsfromthesetf0;1n;2n;:::;1g,withthetotalofallweightssummingto1.AvariationistoreplacetheordinarybootstrapwiththeBayesianbootstrap(Rubin,1981).TheBayesianapproachtreatsthevectorofweights!asunknownparametersandderivesaposteriordistributionfor!,andhenceG(X;!).Thenon-informativepriorQn=1!¡1i,whencombinedwiththemultinomiallikelihood,leadstoaDirichletn(1;:::;1)distributionfortheposteriordistributionof!.ThefullposteriordistributionofG(X;!)canbeestimatedbyMonteCarlomethods:generate!(m)fromaDirichletn(1;:::;1)distributionandthencalculateG(X;!(m))foreachsample.TheaverageofG(X;!(m))overtheMsamplescorrespondstotheMonteCarloestimateoftheposteriormeanofG(X;!)andcanbeviewedasaBayesiananalogofbagging(ClydeandLee,2001).Inpractice,wemayonlybeinterestedinapointestimate,ratherthanthefullposteriordistribution.Inthiscase,theBayesianbootstrapcanbeseenasacontinuousversionoftheregularbootstrap.ThusBayesianbaggingcanbeachievedbygeneratingMBayesianbootstrapsamples,andtakingtheaverageormajorityvoteoftheG(X;!(m)).Thisisidenticaltoregularbaggingexceptthattheweightsarecontinuous-valuedon(0;1),insteadofbeingrestrictedtothediscretesetf0;1n;2n;:::;1g.Inbothcases,theweightsmustsumto1.Inbothcases,theexpectedvalueofaparticularweightis1nforallweights,andtheexpectedcorrelationbetweenweightsisthesame(Rubin,1981).ThusBayesianbaggingwillgenerallyhavethesameexpectedpointestimatesasordinarybagging.ThevariabilityoftheestimateisslightlysmallerunderBayesianbagging,asthevariabilityoftheweightsisnn+1timesthatofordinarybagging.Asthesamplesizegrowslarge,thisfactorbecomesarbitrarilyclosetoone,butwedonotethatitisstrictlylessthanone,sotheBayesianapproachdoesgiveafurtherreductioninvariancecomparedtothestandardapproach.Inpractice,forsmallerdatasets,weoften¯ndasigni¯cantreductioninvariance,possiblybecausetheuseofcontinuous-valuedweightsleadstofewerextremecasesthandiscrete-valuedweights.PseudocodeforBayesianbaggingis1.Form2f1;:::;Mg,(a)Drawrandomweights!(m)fromaDirichletn(1;:::;1)toproducetheBayesianbootstrapsample(X;!(m)).(b)FindpredictedvaluesG(X;!(m)).2.Thebaggingpredictoris1MPM=1G(X;!(m)).UseoftheBayesianbootstrapdoeshaveamajortheoreticaladvantage,inthatforsomeproblems,baggingwiththeregularbootstrapisactuallyestimatinganunde¯nedquantity.Totakeasimpleexample,supposeoneisbaggingthe¯ttedpredictionsforapointyfromaleast-squaresregressionproblem.Technically,thefullbaggingestimateis1M0Pm^ymwheremrangesoverallpossiblebootstrapsamples,M0isthetotalnumberofpossiblebootstrapsamples,and^ymisthepredictedvaluefromthemodel¯tusingthemthbootstrapsample.Theissueisthatoneofthepossiblebootstrapsamplescontainsthe146 LosslessOnlineBayesianBagging¯rstdatapointreplicatedntimes,andnootherdatapoints.Forthisbootstrapsample,theregressionmodelisunde¯ned(sinceatleasttwodi®erentpointsarerequired),andso^yandthusthebaggingestimatorareunde¯ned.Inpractice,onlyasmallsampleofthepossiblebootstrapsamplesisused,sotheprobabilityofdrawingabootstrapsamplewithanunde¯nedpredictionisverysmall.Yetitisdisturbingthatinsomeproblems,thebaggingestimatoristechnicallynotwell-de¯ned.Incontrast,theuseoftheBayesianbootstrapcompletelyavoidsthisproblem.Sincetheweightsarecontinuous-valued,theprobabilitythatanyweightisexactlyequaltozeroiszero.Thuswithprobabilityone,allweightsarestrictlypositive,andtheBayesianbaggingestimatorwillbewell-de¯ned(assumingtheordinaryestimatorontheoriginaldataiswell-de¯ned).WenotethattheBayesianapproachwillonlyworkwithmodelsthathavelearningal-gorithmsthathandleweightedsamples.Moststandardmodelseitherhavereadilyavailablesuchalgorithms,ortheiralgorithmsareeasilymodi¯edtoacceptweights,sothisrestrictionisnotmuchofanissueinpractice.3.OnlineBayesianBaggingRegularonlinebaggingcannotbeexactlyequivalenttothebatchversionbecausethePoissoncountscannotbeguaranteedtosumtothenumberofactualobservations.GammarandomvariablescanbethoughtofascontinuousanalogsofPoissoncounts,whichmotivatesourderivationofBayesianonlinebagging.Thekeyistorecallafactfrombasicprobability|asetofindependentgammarandomvariablesdividedbyitssumhasaDirichletdistribution,i.e.,Ifwi»¡(®i;1),thenµw1Pwi;w2Pwi;:::;wkPwi¶»Dirichletn(®1;®2;:::;®k):(Seeforexample,HoggandCraig,1995,pp.187{188.)ThisrelationshipisacommonmethodforgeneratingrandomdrawsfromaDirichletdistribution,andsoisalsousedintheimplementationofbatchBayesianbagginginpractice.ThusintheonlineversionofBayesianbagging,aseachobservationarrives,ithasarealizationofaGamma(1)randomvariableassociatedwithitforeachbootstrapsample,andthemodelisupdatedaftereachnewweightedobservation.Iftheimplementationofthemodelrequiresweightsthatsumtoone,thenwithineach(Bayesian)bootstrapsample,allweightscanbere-normalizedwiththenewsumofgammasbeforethemodelisupdated.Atanypointintime,thecurrentpredictionsarethoseaggregatedacrossallbootstrapsamples,justaswithbatchbagging.Ifthemodelis¯twithanordinarylosslessonlinealgorithm,asexistsforclassi¯cationtrees(Utgo®etal.,1997),thentheentireonlineBayesianbaggingprocedureiscompletelylosslessrelativetobatchBayesianbagging.Furthermore,sincebatchBayesianbagginggivesthesamemeanresultsasordinarybatchbagging,onlineBayesianbaggingalsohasthesameexpectedresultsasordinarybatchbagging.PseudocodeforonlineBayesianbaggingisFori2f1;:::;ng,1.Form2f1;:::;Mg,147 LeeandClyde(a)Drawaweight!(m)ifromaGamma(1;1)randomvariable,associateweightwithxi,andaddxitoX.(b)FindpredictedvaluesG(X;!(m))(renormalizingweightsifnecessary).2.Thecurrentbaggingpredictoris1MPM=1G(X;!(m)).Instep1(b),theweightsmayneedtoberenormalized(bydividingbythesumofallcurrentweights)iftheimplementationrequiresweightsthatsumtoone.Wenotethatformanymodels,suchasclassi¯cationtrees,thisrenormalizationisnotamajorissue;foratree,eachsplitonlydependsontherelativeweightsoftheobservationsatthatnode,sonodesnotinvolvingthenewobservationwillhavethesameratioofweightsbeforeandafterrenormalizationandtherestofthetreestructurewillbeuna®ected;inpractice,inmostimplementationsoftrees(includingthatusedinthispaper),renormalizationisnotnecessary.WediscussthepossibilityofrenormalizationinordertobeconsistentwiththeoriginalpresentationofthebootstrapandBayesianbootstrap,andwenotethatordinaryonlinebaggingimplicitlydealswiththisissueequivalently.ThecomputationalrequirementsofBayesianversusordinaryonlinebaggingarecom-parable.Theproceduresarequitesimilar,withthemaindi®erencebeingthatthe¯ttingalgorithmmusthandlenon-integerweightsfortheBayesianversion.Formodelssuchastrees,thereisnosigni¯cantadditionalcomputationalburdenforusingnon-integerweights.4.ExamplesWedemonstratethee®ectivenessofonlineBayesianbaggingusingclassi¯cationtrees.Ourimplementationusesthelosslessonlinetreelearningalgorithms(ITI)ofUtgo®etal.(1997)(availableathttp://www.cs.umass.edu/»lrn/iti/).WecomparedBayesianbaggingtoasingletree,ordinarybatchbagging,andordinaryonlinebagging,allthreeofwhichweredoneusingtheminimumdescriptionlengthcriterion(MDL),asimplementedintheITIcode,todeterminetheoptimalsizeforeachtree.ToimplementBayesianbagging,thecodewasmodi¯edtoaccountforweightedobservations.WeuseageneralizedMDLtodeterminetheoptimaltreesizeateachstage,replacingallcountsofobservationswiththesumoftheweightsoftheobservationsatthatnodeorleafwiththesameresponsecategory.Replacingthetotalcountdirectlywiththesumoftheweightsisjusti¯edbylookingatthemultinomiallikelihoodwhenwrittenasanexponentialfamilyincanonicalform;theweightsenterthroughthedispersionparameteranditiseasilyseenthattheunweightedcountsarereplacedbythesumsoftheweightsoftheobservationsthatgointoeachcount.Tobemorespeci¯c,adecisiontreetypicallyoperateswithamultinomiallikelihood,YleavesjYclasseskpnjkjk;wherepjkisthetrueprobabilitythatanobservationinleafjwillbeinclassk,andnjkisthecountofdatapointsinleafjinclassk.Thisiseasilyre-writtenastheproductoverallobservations,Qn=1p¤whereifobservationiisinleafjandamemberofclasskthenp¤=pjk.Forsimplicity,weconsiderthecasek=2asthegeneralizationtolargerkisstraightforward.Nowconsiderasinglepoint,y,whichtakesvalues0or1dependingonwhichclassisitamemberof.Transformingtothecanonicalparameterization,letµ=p1¡p,148 LosslessOnlineBayesianBaggingwherepisthetrueprobabilitythaty=1.Writingthelikelihoodinexponentialfamilyformgivesexpn³yµ+log11+expfµg´.aowhereaisthedispersionparameter,whichwouldbeequalto1forastandarddataset,butwouldbethereciprocaloftheweightforthatobservationinaweighteddataset.Thusthelikelihoodforanobservationywithweightwisexpn³yµ+log11+expfµg´.(1=w)o=pwy(1¡p)w(1¡y)andsoreturningtothefullmultinomial,theoriginalcountsaresimplyreplacedbytheweightedcounts.AsMDLisapenalizedlikelihoodcriterion,wethususetheweightedlikelihoodandreplaceeachcountwithasumofweights.Wenotethatforordinaryonlinebagging,usingasinglePoissonweightKwithourgeneralizedMDLisexactlyequivalenttoincludingKcopiesofthedatapointinthedatasetandusingregularMDL.Table1showsthedatasetsweusedforclassi¯cationproblems,thenumberofclassesineachdataset,andthesizesoftheirrespectivetrainingandtestpartitions.Table2displaystheresultsofourcomparisonstudy.Allofthedatasets,exceptthe¯nalone,areavail-ableonlineathttp://www.ics.uci.edu/»mlearn/MLRepository.html,theUCIMachineLearningRepository.ThelastdatasetisdescribedinLee(2001).Wecomparetheresultsoftrainingasingleclassi¯cationtree,ordinarybatchbagging,onlinebagging,andBayesianonlinebagging(orequivalentlyBayesianbatch).Foreachofthebaggingtechniques,100bootstrapsampleswereused.Foreachdataset,werepeated1000timesthefollowingpro-cedure:randomlychooseatraining/testpartition;¯tasingletree,abatchbaggedtree,anonlinebaggedtree,andaBayesianbaggedtree;computethemisclassi¯cationerrorrateforeach¯t.Table2reportstheaverageerrorrateforeachmethodoneachdataset,aswellastheestimatedstandarderrorofthiserrorrate.SizeofSizeofNumberofTrainingTestDataSetClassesDataSetDataSetBreastcancer(WI)2299400Contraceptive3800673Credit(German)2200800Credit(Japanese)2290400Dermatology6166200Glass716450Housevotes2185250Ionosphere2200151Iris39060Liver3145200Pimadiabetes2200332SPECT280187Wine378100Mushrooms210007124Spam220002601Credit(American)240004508Table1:Sizesoftheexampledatasets149 LeeandClydeBayesianSingleBatchOnlineOnline/BatchDataSetTreeBaggingBaggingBaggingBreastcancer(WI)0.055(.020)0.045(.010)0.045(.010)0.041(.009)Contraceptive0.522(.019)0.499(.017)0.497(.017)0.490(.016)Credit(German)0.318(.022)0.295(.017)0.294(.017)0.285(.015)Credit(Japanese)0.155(.017)0.148(.014)0.147(.014)0.145(.014)Dermatology0.099(.033)0.049(.017)0.053(.021)0.047(.019)Glass0.383(.081)0.357(.072)0.361(.074)0.373(.075)Housevotes0.052(.011)0.049(.011)0.049(.011)0.046(.010)Ionosphere0.119(.026)0.094(.022)0.099(.022)0.096(.021)Iris0.062(.029)0.057(.026)0.060(.025)0.058(.025)Liver0.366(.036)0.333(.032)0.336(.034)0.317(.033)Pimadiabetes0.265(.027)0.250(.020)0.247(.021)0.232(.017)SPECT0.205(.029)0.200(.030)0.202(.031)0.190(.027)Wine0.134(.042)0.094(.037)0.101(.037)0.085(.034)Mushrooms0.004(.003)0.003(.002)0.003(.002)0.003(.002)Spam0.099(.008)0.075(.005)0.077(.005)0.077(.005)Credit(American)0.350(.007)0.306(.005)0.306(.005)0.305(.006)Table2:Comparisonofaverageclassi¯cationerrorrates(withstandarderror)Wenotethatinallcases,bothonlinebaggingtechniquesproduceresultssimilartoordinarybatchbagging,andallbaggingmethodssigni¯cantlyimproveupontheuseofasingletree.However,forsmallerdatasets(allbutthelastthree),online/batchBayesianbaggingtypicallybothimprovespredictionperformanceanddecreasespredictionvariability.5.DiscussionBaggingisausefulensemblelearningtool,particularlywhenmodelssensitivetosmallchangesinthedataareused.Itissometimesdesirabletobeabletousethedatainanonlinefashion.ByoperatingintheBayesianparadigm,wecanintroduceanonlinealgorithmthatwillexactlymatchitsbatchBayesiancounterpart.Unlikepreviousversionsofonlinebagging,theBayesianapproachproducesacompletelylosslessbaggingalgorithm.Itcanalsoleadtoincreasedaccuracyanddecreasedpredictionvarianceforsmallerdatasets.knowledgmentsThisresearchwaspartiallysupportedbyNSFgrantsDMS0233710,9873275,and9733013.Theauthorswouldliketothanktwoanonymousrefereesfortheirhelpfulsuggestions.150 LosslessOnlineBayesianBaggingReferencesBreiman.Heuristicsofinstabilityinmodelselection.Technicalreport,UniversityofCaliforniaatBerkeley,1994.L.Breiman.Baggingpredictors.MachineLearning,26(2):123{140,1996.M.A.ClydeandH.K.H.Lee.BaggingandtheBayesianbootstrap.InT.RichardsonandT.Jaakkola,editors,Arti¯cialIntelligenceandStatistics2001,pages169{174,2001.R.V.HoggandA.T.Craig.IntroductiontoMathematicalStatistics.Prentice-Hall,UpperSaddleRiver,NJ,5thedition,1995.H.K.H.Lee.Modelselectionforneuralnetworkclassi¯cation.JournalofClassi¯cation,18:227{243,2001.N.C.OzaandS.Russell.Onlinebaggingandboosting.InT.RichardsonandT.Jaakkola,editors,Arti¯cialIntelligenceandStatistics2001,pages105{112,2001.D.B.Rubin.TheBayesianbootstrap.AnnalsofStatistics,9:130{134,1981.P.E.Utgo®,N.C.Berkman,andJ.A.Clouse.Decisiontreeinductionbasedone±cienttreerestructuring.MachineLearning,29:5{44,1997.151