TheparadigmisoftheoreticalinterestbecauseitshowsthatthereisafundamentalalternativetothedominantapproachtoclassicationlearningThedominantapproachperformssearchthroughahypothesisspacetoidentifythehyp ID: 335498
Download Pdf The PPT/PDF document "Decreasingly naive Bayes Aggregating n-d..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
DecreasinglynaiveBayes:Aggregatingn-dependenceestimatorsGeoreyI.WebbJaniceR.BoughtonFeiZhengKaiMingTingHoussamSalemFacultyofInformationTechnology,MonashUniversity,VIC3800,AustraliafGeo.Webb,Janice.Boughton,Kaiming.Tingg@monash.eduKNOWSYSLabTechnicalReport2010-01,MonashUniversity,2010AbstractAveragedn-DependenceEstimators(AnDE)isanapproachtoprob-abilisticclassicationlearningthatlearnswithoutsearch.Itutilizesasingleparameterthattransformstheapproachbetweenalow-variancehigh-biaslearner(NaiveBayes)andahigh-variancelow-biaslearnerwithBayesoptimalasymptoticerror.ItextendstheunderlyingstrategyofAveragedOne-DependenceEstimators(AODE),whichrelaxestheNaiveBayesindependenceassumptionwhileretainingmanyofNaiveBayes'de-sirablecomputationalandtheoreticalproperties.AnDEfurtherrelaxestheindependenceassumptionbygeneralizingAODEtohigher-levelsofde-pendence.Extensiveexperimentalevaluationshowsthatthebias-variancetrade-oforAveraged2-DependenceEstimatorsresultsinstrongpredic-tiveaccuracyoverawiderangeofdatasets.Ithastrainingtimelinearwithrespecttothenumberofexamples,supportsincrementallearning,handlesdirectlymissingvalues,andisrobustinthefaceofnoise.Beyondthepracticalutilityofitslower-ordervariants,AnDEisofinterestinthatitdemonstratesthatitispossibletocreatelow-biashigh-variancegener-ativelearnersandsuggestsstrategiesfordevelopingevenmorepowerfulclassiers.1IntroductionThispaperpresentsanalternativetotheclassicalclassicationlearningparadigmoflearningassearch.Thisalternativeparadigmsupportslearningwithoutsearch.Giventhelargenumberofexistingclassicationlearningalgorithms,thereaderwouldbeforgivenforwonderingwhythereisaneedforanewal-gorithm,letaloneforanotherlearningparadigmandresultingfamilyofalgo-rithms.Thisalternativeparadigmisofpotentialinteresttothecommunityforboththeoreticalandpracticalreasons.1 Theparadigmisoftheoreticalinterestbecauseitshowsthatthereisafun-damentalalternativetothedominantapproachtoclassicationlearning.Thedominantapproachperformssearchthroughahypothesisspacetoidentifythehypothesisthatoptimizessomeobjectivefunctionwithrespecttothetrainingdata.Thisalternativeperformsnosearch.Theparadigmisofpracticalinterestbecauseitgivesrisetoafamilyofal-gorithmswithauniquecombinationoffeaturesthatisbewellsuitedtomanyapplications.Wediscussthesefeaturesinmoredetailbelow.Notableamongstthemaretrainingcomplexitylinearwithrespecttothenumberoftrainingexam-ples;directcapacityforincrementallearning;andaccuracythatiscompetitivewiththestate-of-the-art.Thefamilycontainsarangeofalgorithmsthatrangefromlowvariancecou-pledwithhighbiasthroughtohighvariancecoupledwithlowbias.Successivemembersofthefamilywillbebestsuitedtodieringquantitiesofdata,start-ingwithlowvarianceforsmalldata,withsuccessivelylowerbiasbuthighervariancesuitingeverincreasingdataquantities[1].TheasymptoticerrorofthelowestbiasvariantisBayesoptimal.Onememberofthisfamilyofalgorithms,naiveBayes(NB),isalreadywellknown.Asecondmember,AveragedOne-DependenceEstimators(AODE)[2],hasenjoyedconsiderablepopularitysinceitsintroductionin2005[3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25].Theworkpre-sentedinthispaperarisesfromtherealizationthatNBandAODEarebuttwoinstancesofafamilyofalgorithms,whichwecallAnDE.TheAnDEalgorithms,includingNBandAODE,operatebyafundamentallydierentprincipletothemajorityofclassicationlearningalgorithms.Ratherthanperformingsearchtoselectahypothesisabouttherelationshipbetweentheattributesandtheclass,theAnDEalgorithmsutilizeapredenedfunctiontoextrapolatefromobservedlow-levelrelationshipstotherequiredmultivariaterelationship.InSection2weexplainhowitispossibletolearnwithoutsearch,anddenetheAnDEfamilyofalgorithms.TheAnDEfamilyofalgorithmsbuilduponthemethodpioneeredbyAODE[2].InSection3wediscusshowtheAnDEalgorithmsrelatetoFeating[26],agenericapproachtoensemblingthatalsobuildsupontechniquespioneeredbyAODE.InSection4wepresentanexten-siveevaluationoftheAnDEfamilyofalgorithms,comparingtheirperformancetorelevantBayesiantechniques,toFeatingandtothestate-of-the-artRan-domForestclassier.Section5presentsconclusionsanddirectionsforfutureresearch.2Learningwithoutsearch:TheAnDEfamilyofalgorithmsWewishtoestimatefromatrainingsampleTofclassiedobjectstheprobabilityP(yx)thatanexamplex=hx1;:::;xaibelongstoclassy,wherexiisthevalueoftheithattributeandy2c1;:::ck,whicharethekclasses.Weusevtodenote2 theaveragenumberofvaluesperattribute.FromthedenitionofconditionalprobabilitywehaveP(yx)=P(y;x)=P(x)(1)AsP(x)=Pki=1P(cix),wecanalwaysestimateEq1fromestimatesofP(y;x)foreachclassusingP(y;x)=P(x)=P(y;x)=kXi=1P(cix)(2)Inconsequence,intheremainderofthispaperweconsideronlytheproblemofestimatingP(y;x),therebysettingtheworkinagenerativeframework.Wedenetheorderofaprobabilityorprobabilityestimateasthenumberofattributesinthedistributiontowhichtheprobabilityorestimaterelates.Hence,theorderofP(cix)isa+1.IfthetrainingdatadoesnotcontainsucientexamplesofxtodirectlyderiveaccurateestimatesofeachP(cix),wemustextrapolatetheseestimatesfromobservationsoflower-orderstatisticsofthedata.Allotherthingsbeingequal,anestimateofalower-orderprobabilityfromagivennitetrainingsetwillbemoreaccuratethananestimateofahigher-orderprobability,andestimatesofhigher-orderprobabilitieswillvarymorefromtrainingsampletotrainingsam-ple.Hence,modelsderivedfromlower-orderprobabilityestimatesarelikelytohavelowervariancethanmodelsderivedfromhigher-orderprobabilityesti-mates.Ontheotherhand,modelsderivedfromhigher-orderprobabilitiesarelikelytohavelowerbias,aslessrestrictiveassumptionsaremadeabouttheformoftheprobabilitydistribution.ThisisillustratedinFigure1,thatshowsasimpleattribute-spacewiththreeternaryattributesandabinaryclass.Toclassifyanewobjectwithattribute-valuesAge=Old,Pulse=SlowandTemperature=High,onewishestoinfertheclassdistributioninthecellhighlightedinFigure1(a),whichisafourth-orderprobabilitydistribution.Ifthereareinsucientexamplestodirectlyestimatethatdistribution,itmightbeextrapolatedfromanyofanumberoflowerorderprobabilitydistributions.ThepriorclassdistributionP(y)isarst-orderprobabilitydistributionthatcanbeestimatedfromtheentireattribute-space(Figure1(b)).Thesecond-orderprobabilitiesP(y^Age=Old),P(y^Pulse=Slow),P(y^Temperature=High)canbeestimatedfromtheregionsdepictedin(Figure1(c-e)).Theregionsassociatedwiththethird-orderproba-bilitiesP(y^Age=Old^Pulse=Slow),P(y^Age=Old^Temperature=High)andP(y^Pulse=Slow^Temperature=High)areillustratedin(Figure1(f-h)).Byapplicationoftheproductrulewehavethefollowing.P(y;x)=P(y)P(xy)(3)Ifthenumberofclasses,k,issmall,itshouldbepossibletoobtainasucientlyaccurateestimateofP(y)fromthesamplefrequencies.However,westillhave3 1 1P(y^Age=Old^Pulse=Slow^Temperature=High)P(y)(a)(b) 1 1 1P(y^Age=Old)P(y^Pulse=Slow)P(y^Temperature=High)(c)(d)(e) 1 1 1P(y^Age=Old^Pulse=Slow)P(y^Age=Old^Temperature=High)P(y^Pulse=Slow^Temperature=High)(f)(g)(h)Figure1:Statisticsofvaryingorderforanattribute-spacewiththreeternaryattributesandabinaryclass4 theproblemthatxmaynotoccursucientlyfrequentlyinthetrainingdataandhenceaccurateestimatesofP(xy)cannotbeobtaineddirectlyfromthesamplefrequencies.ThesolutionusedbyNBistoextrapolatetoP(xy)fromeachsecondorderprobabilityP(xiy)byassumingtheattributesareindependentgiventheclass.FromthisassumptionitfollowsthatP(xy)=aYi=1P(xiy)(4)HenceweclassifyusingNB(y;x)=P(y)aYi=1P(xiy)(5)WithreferencetoFigure1,NBestimatesthedistributionin(a)byextrap-olationfromthedistributionsin(b)(givingP(y)),(c)(givingP(Age=oldy)),(d)(givingP(Pulse=slowy))and(e)(givingP(Temperature=highy)).Theindependenceassumptionisaverystrongassumptionabouttheunder-lyingprobabilitydistribution.Asaresult,NBhasveryhighbias.However,duetotheloworderofthebasestatisticsfromwhichthemodelisestimated,ithaslowvariance.2.1AODEAveragedOne-DependenceEstimators(AODE)[2]extendstothird-orderprob-abilitiesNB'ssearch-freestrategyofextrapolationfromlower-orderprobabili-ties.Itdoessobyaveragingtheestimatesofallofaclassofthird-orderesti-mators.ASuper-ParentOne-DependenceEstimator(SPODE)isa1-DBC(third-orderprobabilityestimator)thatrelaxestheassumptionofconditionalinde-pendencebymakingallotherattributesindependentgiventheclassandoneprivilegedattribute,thesuper-parent,x.Thisisaweakerconditionalinde-pendenceassumptionthanNB's,asitisnecessarilytrueifNB'sistrueandmayalsobetruewhenNB'sisnot.ItusesP(y;x)=P(y;x)P(xy;x)(6)togetherwithanindependenceassumptionthatentitlesP(xy;x)=aYi=1P(xiy;x)(7)Asthisisaweakerassumptionthanthatwhichentitles(4),thebiasofthemodelshouldbelowerthanthatofNB.However,itisderivedfromhigher-orderprobabilityestimatesandhenceitsvarianceshouldbehigher.5 ADOEexploitsthelowerbiasofSPODEswhileaddressingtheirhighervariancebyaveragingoverallestimatesofP(y;x)producedbyusingdierentsuper-parents.AODEseekstouseP(y;x)=aX=1P(y;x)P(xy;x)=a:(8)However,inpracticeitisdesirabletoonlyuseestimatesofprobabilitiesforwhichrelevantexamplesoccurinthedata.Hence,AODEactuallyusesAODE(y;x)=8-3.3;〱-3.3;〱:aX=1()P(y;x)P(xy;x)=aX=1():aX=1()-3.3;〱0NB(y;x):otherwise(9)where()is1ifattribute-valuexispresentinthedata,otherwise0.Thatis,itaveragesoverallsuperparentswhosevalueoccursinthedata,anddefaultstoNBiftherearenosuchsuperparents.AsAODEusesallofapredenedfamilyofestimators,eachofwhichex-trapolatesthedesiredhigh-orderprobabilityfromlower-orderprobabilities,itdoesnotperformsearch.Intermsoftheexampleattributespace,AODEextrapolatestoFigure1(a)fromthelower-orderstatisticsillustratedinFigure1(c-h)with(f)conditionedon(c)and(d),(g)conditionedon(c)and(e),and(g)conditionedon(d)and(e).AODEhasdemonstratedstrongpredictionaccuracy(bothzero-onelossandmean-squarederror)withrelativelymodestcomputationalrequirementsforlowdimensionaldata[2].Inconsequence,ithasenjoyedsubstantialuptake[2,3,5,6,7,8,9,10,12,11,14,16,18,19,21].2.2AnDEInthispaperwegeneralizetohigher-orderprobabilitiesthestrategyofsearch-freeextrapolationfromlower-orderprobabilities.Fornotationalconveniencewedenexi;j;:::q=hxixj;:::;xqi(10)Forexample,x2;3;5=hx2x3x5iAnDEaimstouseAnDE(y;x)=Xs2SnP(y;xs)P(xy;xs)=an(11)whereSnindicatesallsubsetsofsizenofthesetf1;:::ag6 However,inpracticewealsoneedtoavoidusingpairsofsuperparentswhosevaluesdonotoccurinthedata,andhenceuseAnDE(y;x)=8-3.3;〱:Xs2Sn(s)P(y;xs)P(xy;xs)=Xs2Sn(s):Xs2Sn(s)-3.3;〱0A(n 1)DE(y;x):otherwise(12)Attributesareassumedindependentgiventhesuperparentsandtheclass.Hence,P(xy;xs)isestimatedbyP(xy;xs)=aYi=1P(xiy;xs)(13)NotethatP(xiy;xs)=1whenxi2xs,andthatsmoothedestimatesshouldnotbeusedinthiscase.A0DE=NBandA1DE=AODE.Intermsofthesimpleattribute-spacedepictedinFigure1,A2DEextrapo-latesto(a)from(a)conditionedoneachof(f),(g)and(h),andA3DEmakesinferencesdirectlyfromtheclassdistributionin(a).Sa=ff1;:::aggandhencewhenn=axs=x.Therefore,theultimateexpressionofAnDE,AaDEseekstoclassifyusingAaDE(y;x)=P(y;x)P(xy;x)=aa(14)whereP(y;x)isestimateddirectlyfromD,cascadingtoeverlowerdependenceestimatorsshouldthecombinationofattribute-valuesnotbepresentinD.AsP(xy;x)and aabothequal1,itclassiesusingonlyitsdirectestimateofP(y;x).Observation1TheasymptoticclassicationperformanceofAaDEequalsthatoftheBayesoptimalclassier.Proof1AaDEclassiesusingargmaxy ^P(y;x)=Xz2Y^P(z;x)!whereeach^P()isdirectlyestimatedfromtheobserveddataandhenceap-proachesP()asthequantityofdataapproachesinnity.Hence,inthelimit,AaDEapproachesargmaxy P(y;x)=Xz2YP(z;x)!whichistheBayesoptimalclassier. 7 However,assumingthereissucientdatatocomputethenecessarystatis-tics,andwewishtostorethenecessarystatisticsratherthancomputingthemasrequiredforclassication,thespacecomplexityofAaDEisO(kQai=1vi).Thisisbecausejointfrequenciesmustbestoredforeverycombinationofattributevalueandclassvalue.Exceptincasesoflowdimensionaldata,eventhecom-putationalrequirementsofA3DEdefeatourWekaimplementation,andhenceinthispaperwepresentonlyresultsforA2DEwithsomeillustrativeexamplesofA3DE.AnDEformsan(n+2)-dimensionalprobabilitytablecontainingtheobservedfrequencyforeachcombinationofn+1attributevaluesandtheclassvalues.ThespacecomplexityofthetableisO k an+1vn+1andthetimecomplexityofcompilingitisO t an+1,asweneedtoupdateeachentryforeverycombi-nationofthen+1attribute-valuesforeveryinstance.ThetimecomplexityofclassifyingasingleexampleisO ka anasweneedtoconsidereachattributeforeveryqualiedcombinationofnparentattributeswithineachclass.Wedemonstratethatasnincreases,averagedn-dependenceestimatorsachievelowerbiasatthecostofhighervariance.Inconsequence,theidealorderofde-pendencewilldependonthedegreetowhichtheunderlyingprobabilitydistri-butiontstheassumptionsofthen-dependenceestimator,thequantityofdataavailabletoestimatethebaseprobabilities,andthecomputationaldemandsofaveragingoverhigher-orderestimators.2.3WeightedaveragingAODEanditsgeneralizationAnDEperformanunweightedaverageofthecom-ponentn-dependenceestimators.Ithasbeendemonstratedthatweightedav-eragingcanimproveupontheaccuracyofAODE'sestimates[27,28,29].TheempiricalevidencesuggeststhattheBayesianmodelaveragingofMaximumaPosterioriLinearMixtureofGenerativeDistributions(MAPLMG)isthemosteectiveofcurrentapproaches[27,29].Itseemslikelythatsimilarapproacheswillbeequallyeectivewithn-dependenceestimators.ItisnotablethattheintroductionofBayesianmodelaveragingtotheAnDEframeworkintroducesbothsearchanddiscriminativelearning,asasearchisperformedforthesetofweightsthatoptimizetheposteriorprobabilitiesrelativetothetrainingdata.Doingsocanbeexpectedtoreducebiasatthecostofintroductionofvariance.Oneoftheinterestingquestionsthatthispaperinvestigatesistherela-tivepayofortheinvestmentofadditionalcomputationineitherperformingBayesianmodelaveragingonAnDEorincreasingnandusingAn+1DE.Bothapproachescanbeexpectedtoreducebiasatthecostofanincreaseinbothvarianceandcomputation.Whichprovidesthemoreeectivetrade-o?2.4TreeAugmentedNaiveBayesAnn-dependenceBayesianclassier(n-DBC)[30]isaBayesiannetworkinwhicheachattributedependsupontheclassandatmostnothernon-class8 attributes.Ann-DBCuses(n+2)-orderprobabilities.Withinthisframework,NBisa0-DBC,AODEisa1-DBCandthefullBayesianclassierisan(a 1)-DBC.AnalternativetotheAnDEapproachtorelaxingNB'sindependenceas-sumptionistousesearchtoselectasinglemodelthataddsselectedinterde-pendenciesbetweenattributes.TreeAugmentedNaiveBayes(TAN)[31]isapopularapproachofthistype.Itusesconditionalmutualinformationtoselectabestsingleparentforeachattributes,inadditiontotheclass.Thus,itisa1-DBC.ItisinterestingtoconsiderhowsearchforasingleBayesianclassiermodelcompareswithaveragingoveraclassofBayesianclassiermodelsofthesamelevelofdependenceorofahigherlevelofdependence.Thispaperalsoinvesti-gatesthisissue.3RelationshiptoFeatingFeating[26]isagenericensemblelearningtechniquethatalsobuildsupontheensemblingstrategyofAODE.LikeAnDE,Featingoperatesbybuildingalocalmodelforeachcombinationofnattributevalues.Toclassifyanewinstance,Featingappliesallapplicablelocalmodelsandaggregatestheresultsbyper-formingamajorityvoteoftheresultingclassications.AnDEissimilartoFeatingNB.However,Featingaggregatesthepredictionsofitsbaselearnersbytakingthemodeoftheclasspredictions.Forprobabilisticclassiers,thesecorrespondtothemaximumposteriorprobability.Incontrast,AnDEusestheensembletoestimatethejointprobability,P(x;y)foreachclass,andthencal-culatesitsestimateoftheposteriorprobabilityfromthisensembledestimateofthejointprobability.Agenericensemblingtechnique,suchasFeating,cannotworkbycalculatinganesembleestimateofthejointprobabilitiesbecausemanyclassiersdonotproduceappropriateprobabilityestimates.DespitethecloserelationshiptoFeating,AnDEisworthyofstudyinitsownrightforthreereasons.First,irrespectiveofwhichaggregationmethodisused,couplingthesearch-lessensemblingstrategyembodiedbyFeatingwithsearch-lessbaselearnerNBcreatesalearnerthatcandeliverlowbiaswithoutsearch.Hence,AnDEpro-videsanexampleofanalternativetothetraditionalsearch-basedlearningparadigmwhichisabletodeliverlowbiasclassiers.Second,asalreadynoted,AnDEutilizesadierentaggregationmethodtoFeating.Itisinterestingtoexaminetheconsequencesofthesedierences.CerquidesanddeMantaras[27]foundthatweightedensemblesofjointprobabil-ityestimatesachievedlowererrorthanweightedensemblesofposteriorprobabil-ityestimates,sothereissomeevidencethattheoutcomesmaybesubstantiallydierent.Third,asthereisoverlapintheinformationrequiredbyeachofitslocalmodels,AnDEcanuseasinglecompiledmatrixofjointfrequencies,savingconsiderablespacerelativetostoringallofthelocalmodelsindependently.The9 spacecomplexityofanAnDEmodelisO k an+1vn+1whereasthespacecom-plexityofFeatingNBtolevelnisO k(a n) anvn+1,whichis(n+1)timesthespacecomplexityofAnDE.MostbasemodelsformedbyFeatingwillnothavethisproperty,andhenceAnDEisanotablespecialcase.4EvaluationInthissection,weevaluatetheecacyofAnDE.Duetorelativelyhightimecomplexityofhigher-orderestimators,thehighestlevelofAnDEwithwhichweperformdetailedassessmentisA2DE.Theprimarymetricsweusearebias,variance,zero-onelossandRMSE.Toassesscomputationaloverheadsweusetotaltrainingandclassicationtimesdividedbythenumberofexamples.WerststudytheperformanceofNB,AODEandA2DEtorevealhowperformancevariesasnincreaseswithintheAnDEframework.TANandMAPLMGarestudiedtoshowhowthesearch-freegenerativeAnDEstrategycompareswith,respectively,discriminativesearchforasingleBayesiannet-workclassierofthesameorderofdependence,anddiscriminativesearchforaweightedclassierofthenextlowerorderofdependence.WealsocompareA2DEwithvariantsthatcalculatethemeanposteriorprobability,ratherthanthemeanjointprobabilityofthesubmodels(PA2DE)andperformFeatingofNBbytakingthemodeoftheclasspredictionsofthesubmodels(FA2DE).Fi-nally,toexplorehowtheclassicationperformanceofA2DEcomparestostate-of-the-artclassiers,wealsostudyRandomForest[32]withtentrees(RF10)andRandomForestwith100trees(RF100).WecomparethesealgorithmsimplementedintheWekaworkbench[33]onthe62datasetsdescribedinTable1thathavebeenusedpreviouslyinrelatedresearch[2,34,35,36,37,38].Eachalgorithmistestedoneachdatasetusingtherepeatedcross-validationbias-varianceestimationmethod[39].Inordertomaximizethevariationinthetrainingdatafromtrialtotrial,weusetwo-foldcrossvalidation.Toobtainminimizethevarianceinourmeasurmentswereportaveragevaluesover50cross-validationtrials.WealsoformlearningcurvesforNB,AODE,A2DEandA3DEontheAdultdatasettofurtherinvestigatehowincreasingnwithintheAnDEframeworkaectsperformanceasthequantityofdataincreases.ThecurrentimplementationsofAODEandA2DEarelimitedtocategoricaldata.AnumberofapproacheshavebeendevelopedforextendingAODEtonu-mericdata[40].ThesecouldbegeneralizedtotheAnDEframework,buthowbesttodosoisamatterforfutureresearch.Hence,weassessonlytherelativecapacitiesofthesealgorithmswithrespecttocategoricaldata.Tothisend,allnumericattributesarediscretized.WhenMDLdiscretization[41],acommondiscretizationmethodforNB,isusedtodiscretizequantitativeattributeswithineachcross-validationfold,manyattributeshaveonlyonevalue.Intheseex-periments,wediscretizequantitativeattributesusingthree-binequal-frequencydiscretizationpriortoclassication.Thebaseprobabilitiesareestimatedusingm-estimation[42](m=1),asit10 Table1:Datasetsusedforexperiments No.DomainCaseAttClassNo.DomainCaseAttClass 1Abalone41779332LetterRecognition2000017262Adult4884215233LiverDisorders(Bupa)345723Annealing89839634LungCancer325734Audiology226702435Lymphography1481945AutoImports20526736MAGICGammaTelescope190201126BalanceScale6255337Mushrooms81242327BreastCancer(Wisconsin)69910238Nettalk(Phoneme)54388528CarEvaluation17288439New-Thyroid215639Census-Income(KDD)29928540240Nursery129609510Connect-4Opening6755743341OpticalDigits5620491011Contact-lenses245342PageBlocks547311512ContraceptiveMethodChoice147310343PenDigits10992171013Covertype58101255744PimaIndiansDiabetes7689214CreditScreening69016245PostoperativePatient909315CylinderBands54040246PrimaryTumor339182216Dermatology36635647PromoterGeneSequences10658217Echocardiogram1317248Segment231020718German100021249Sick-euthyroid377230219GlassIdentication21410350Sign125469320Haberman'sSurvival3064251SonarClassication20861221HeartDisease(Cleveland)30314252SPAME-mail460158222Hepatitis15520253Splice-junctionGeneSequences319062323HorseColic36822254Syncon60061624HouseVotes8443517255TeachingAssistantEvaluation1516325Hungarian29414256Tic-Tac-ToeEndgame95810226Hypothyroid(Garavan)377230457Vehicle84619427Ionosphere35135258Volcanoes15204428IrisClassication1505359Vowel990141129King-rook-vs-king-pawn319637260Waveform-5000500041330Labornegotiations5717261WineRecognition17814331LED100071062Zoo101177 11 oftenappearstoleadtomoreaccurateprobabilitiesthanLaplaceestimationforNBandAODE.Anexceptionisthatwealwaysuse1.0forP(xiy;xs)whenxi2xsTheaboveexperimentswereconductedonasingleCPUsinglecorevirtualLinuxmachinerunningonaDellPowerEdge1950withdualquadcoreIntelXeonE5410processorsrunningat2333Mhzwith32GBofRAM.Duetotech-nicalissuesincludingmemoryleaksintheWekaimplementationofRandomForest,itwasnotpossibletocompleteall50runsof2-foldcrossvalidationforRF10onCovertypeandRF100onCovertypeandCensus-Income(KDD).TheseexperimentswereinsteadcompletedonaLinuxClusterofXeon2.8GHzCPUs,anenvironmentthatdoesnotallowreliabletimemeasurementstobetaken.ForRF10andRF100onCovertype,computetimeswereestimatedbyaveragingoverthoserunsthatcouldbecompletedonthevirtualmachine.NorunscouldbecompletedonthevirtualmachineforRF100onCensus-Income(KDD)andsonotimeresultsarereported.Averagevaluesforeachcombinationofmetric,algorithmanddatasetareprovidedintheAppendix.Summaryresultsareprovidedinthetext.4.1VaryingnwithinAnDEWerstconsidertherelativeperformanceofthethreevariantsofAnDE.Foreachperformancemeasure,thenumberofdatasetsforwhichA2DEhaslower,equalorhigheroutcomesrelativetoAODEandNBaresummarizedintowin/draw/lossrecords,andlikewiseforAODErelativetoNB.TheseresultsarepresentedinTable2.Asexpected,weseethatincreasingnfrom0(NB)to1(AODE)to2(A2DE)consistentlydecreasesbiasatthecostofanincreaseinvariance.Aswebelievethatdierentbiasandvarianceprolessuitdierentdataquantities[1],webelievethatthezero-onelossandRMSEresultstellusasmuchaboutthecompositionofthedatacollectionastheydoabouttheal-gorithms.Specically,wecontendthatwhetheronealgorithmoranotherwillwinonagivendatasetisdeterminedbyhowwellthetwoalgorithms'learningbiasesmatchtheunderlyingdistribution,bytheirvariance,andbythequantityofdata.Alowvariancealgorithmwillusuallyhaveanadvantageforsmalldatawhilealowbiasalgorithmwillusuallybeadvantagedbylargedata.Forourdatasets,bothAODEandA2DEreducebothzero-onelossandRMSEsigni-cantlyoftenrelativetoNB.WhileA2DEobtainslowerzero-onelossandRMSEthanAODEmoreoftenthanthereverse,thisdierenceisnotsignicant.Toinvestigateingreaterdetailourexpectationthatalgorithmswithlowervariancewillbeadvantagedforsmalldataandthosewithlowerbiasforlargerdata,weformlearningcurvesforAdult,replicatingthemethodofWebbetal.[2].1000objectsareselectedatrandomasatestsetandtrainingsetsweresampledfromtheremainingobjects.Thetrainingsetsizestartsfrom23andthendoublesupto47104,thisbeingaprogressionthatendswithasclosetoalltheavailabledataaspossibleoncethe1000testcasesareremoved.Thisprocessisrepeated50timesandeachalgorithmisevaluatedontheresultingtraining-testsetpairs.Thelearningcurvesofzero-onelossandRMSEforNB,12 Table2:Win/Draw/Loss:AnDE,n=0,1and2,onall62datasets A2DEvsAODEA2DEvsNBAODEvsNB W/D/LpW/D/LpW/D/Lp Bias47/0/150.00149/2/110.00148/0/140.001Variance19/1/420.00115/0/470.00120/1/410.005Zero-oneloss33/2/270.25942/1/190.00244/1/170.001RMSE35/1/260.15315/0/470.00149/1/120.001 AODE,A2DEandA3DEarepresentedinFigure2. 14 15 16 17 18 19 20 21 22 23 24 25 0 46 184 736 2944 11776 47104 Error%Training set sizeNB AODE A2DE A3DE 18 19 20 21 22 23 24 25 0 46 184 736 2944 11776 47104 RMSE%Training set sizeNB AODE A2DE A3DE Figure2:Zero-onelossandRMSEofNBandAnDEonAdultdataset,asfunctionoftrainingsetsizeTheplotsforzero-onelossclearlyshowthepredictedtrade-oforincreasingn.Atthesmallestdatasize,wherelowvarianceismoreimportantthanlowbias,zero-onelossisminimizedbyn=0(NB)andincreasesasnincreases.Atthelargestdatasize,wherelowbiasismostimportant,thisorderisreversed.AsimilartrendisshownwithrespecttoRMSEalthoughthealgorithmshavenotyetachievedtheirasymptoticratesatthelargestdatasizesavailable.Itisinterestingtoseehowtherelativebias/variancetrade-osofincreas-ingnplayowhenNB'sattributeindependenceassumptionholds.TheLED13 Figure3:Averageper-exampletrainingandclassicationtimesforNB,AODEandA2DE 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 NB AODE A2DE average time (seconds) Training Time 0 0.01 0.02 0.03 0.04 0.05 NB AODE A2DE average time (seconds) Classification Time datasethasaspeciccongurationofattribute-valuesforeachclass,makingtheattributesconditionallyindependentgiventheclass.Eachattributehas10%noiseadded.AODEandA2DEareabletoovertthisnoise,leadingtoincreasederror.NB's0-1lossis0.2627,AODE'sis0.2639andA2DE'sis0.2667.Theseoutcomesarewithtrainingsetsizesof500.UsingtheUCIdatagenerator,wegenerated10LEDdatasetscomprising2,000and10comprising4,000instancesandrepeatedthecross-validationexperimentsthereon.Forthedatasetsof2,000,thetrainingsetsizeis1,000andthemeanandstandarddeviationoftherespective0-1lossisNB:0.26030.0099,AODE:0.26010.0101andA2DE0.26030.0102.Forthedatasetsof4,000,thetrainingsetsizeis2,000andthemeanandstandarddeviationoftherespective0-1lossisNB:0.25970.0049,AODE:0.25980.0051andA2DE:0.26030.0053.Itseemsclearthatincreas-ingtrainingsetsizesrapidlyreducethemagnitudeoftheerroradvantagethatNBenjoysinthiscontextwhereitsconditionalattributeassumptionissatised.Asnalconrmationthathighernisbestsuitedtolargerdata,onthetenlargestdatasets,thosewithmorethan8,000examples,A2DEconsistentlyachieveslower0-1lossandRMSEthanAODE(p=0001).Asexpected,bothtrainingandclassicationcomputetimesincreaseasnincreases.Figure3showsthegrandaveragesfortheper-exampletrainingandclassicationtimesforeachalgorithm.4.2ComparisonwithTANWewishtoexploretherelativebenetsofdiscriminativesearchforasinglebestBayesianclassiermodelagainstAnDE'ssearch-freeapproachofaveragingoveraclassofBayesianclassiermodels.TothisendwecompareAODEandA2DEwithTAN.Table3presentswin/draw/lossresultscomparingAODEandA2DEtoTAN.TANhasanadvantageinbiasbutadisadvantageisvariancerelativetoAODE.Whenusingsearchtoselectasingle1-DBCmodeliscomparedtoav-eragingoveraclassof2-DBCs,thebiasadvantageislostbutthevariance14 Table3:Win/Draw/Loss:AnDE,n=1and2vsTANonall62datasets A2DEvsTANAODEvsTAN W/D/LpW/D/Lp Bias34/0/280.26320/1/410.005Variance48/0/140.00152/1/90.001Zero-oneloss48/0/140.00143/1/180.001RMSE43/1/180.00140/1/210.010 Figure4:Averageper-exampletrainingandclassicationtimesforAODE,A2DEandTANTwovaluesareshownforeachalgorithm,theaverageacrossalldatasetsandtheaverageacrossthetenlowest-dimensional(4{7attributes)datasets. 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 AODE A2DE TAN average time (seconds) Training Time all datasets low-dimensional 0 0.01 0.02 0.03 0.04 0.05 AODE A2DE TAN average time (seconds) Classification Time all datasets low-dimensional disadvantageremains.Therelativebias-variancetradeosofAODEandTANresultinageneralerroradvantagetoAODE.WhenTANlosesitsbiasadvantagebymovingfromAODEtoA2DE,theerroradvantageoftheAnDEclassierbecomesevenmoreconsistent.Figure4showstherelativetrainingandclassicationtimesforAODE,A2DEandTAN.ItisclearthatA2DEhasaconsiderablygreatercomputationalre-quirementsbothfortrainingandclassication.However,thisdisadvantagedisappearswhenweconsideronlythetenlowestdimensionaldatasets,alsoil-lustratedinthisgure.4.3ComparisonwithMAPLMGAsdiscussedabove,wewishtoinvestigatetherelativepayosobtainedbyinvestingadditionalcomputationtothatrequiredbyAODEbyrespectivelyusingdiscriminativelearningofweightsor,alternatively,increasingtheorderoftheprobabilitiesfromwhichtheposteriorprobabilityisextrapolated.Tothisend,Table4presentswin/draw/lossresultscomparingA2DEandAODEtoMAPLMG.Asestablishedbypreviousresearch[27],MAPLMG'sapproachofusingdis-15 Table4:Win/Draw/Loss:AnDE,n=1and2vsMAPLMGonall62datasets A2DEvsMAPLMGAODEvsMAPLMG W/D/LpW/D/Lp Bias40/0/220.01517/4/410.001Variance19/1/420.00236/5/210.031Zero-oneloss30/1/310.50022/4/360.043RMSE34/1/280.26319/0/390.006 Figure5:Averageper-exampletrainingandclassicationtimesforAODE,A2DEandMAPLMGInadditiontothetimesforalldatasets,trainingtimesareshownforthetenlargestdatasets. 0 0.02 0.04 0.06 0.08 0.1 0.12 AODE A2DE MAPLMG average time (seconds) Training Time all datasets largest 0 0.01 0.02 0.03 0.04 0.05 0.06 AODE A2DE MAPLMG average time (seconds) Classification Time criminativelearningofweightsfortheAODElinearcombinationsignicantlyreducesbiasrelativetoAODEatthecostofanincreaseinvariance.However,relativetothisdiscriminativeapproachtoextrapolatingfromthirdorderprob-abilities,A2DE'ssearchlessapproachtoextrapolatingfromfourthorderprob-abilitiesfurtherreducesbiasatthecostofanincreaseinvariance.Whiletheresultingdierenceinerrorisnotsignicantacrossthefullsuiteof62datasets,whenthetenlargestdatasetsareconsidered,thelowerbiasalgorithm,A2DE,consistentlyachieveslower0-1lossandRMSEthanMAPLMG(p=0001).MAPLMG'sBayesianmodelaveragingcomesatconsiderablecostintrain-ingtime.Figure5showstheaverageper-exampletrainingandtesttimesforAODE,A2DEandMAPLMG.NotethatMAPLMGisimplementedasanex-ternalfunctiontoWeka,andhenceislikelytobeinherentlymoreecient.Thetrainingandtesttimesincludeasubstantialxedoverhead,andhencetheper-instancetrainingtimesshoulddecreaseifthecomplexityislinearwithrespecttothetrainingsetsize.However,MAPLMG'ssuper-lineartrainingcomplexityminimizesthiseect,demonstratingthatitwillnotbefeasibletoapplyittoverylargedata.16 Table5:Win/Draw/Loss:A2DEvsPA2DEandFA2DE A2DEvsPA2DEA2DEvsFA2DE W/D/LpW/D/Lp Bias41/2/190.00146/1/150.001Variance21/3/380.01822/1/390.020Zero-oneloss33/1/280.30436/2/240.078RMSE30/0/320.44949/0/130.001 4.4ComparisonwithFeatingTounderstandhowtheAnDEapproachperformsrelativetoFeatingNB,wecompareA2DEwithavariantthatcalculatesthemeanoftheposteriorproba-bilities,ratherthanthemeanofthejointprobabilities(PA2DE)andonethatcalculatesthemodeoftheclassespredictedbythesubmodels(FA2DE).AsdescribedinSection3andthestartofSection4,theseembodythetwomaindierencesbetweenAnDEandFeatingNB.Table5showsthewin/draw/lossresultscomparingA2DEtothesevariants.ItisclearthatbothvariantshavehigherbiasbutlowervariancethanA2DE.ItisstraightforwardtounderstandwhyFeatingwouldhavelowervariance.Themodeisamuchmorestableestimatorofcentraltendencythanthemean,whichcanbegreatlyin\ruencedbyasingleoutlier.Itislessobviouswhylowervarianceshouldresultfromaveragingovertheestimatesoftheposteriorratherthanofthejointprobability.Nonetheless,theresultisconsistentwithCerquidesanddeMantaras'[27]ndingthatalinearcombinationofjointprobabilityestimatesresultedinhigheraccuracythanalinearcombinationofposteriorprobabilityes-timates.Thisremainsaninterestingunexplainedphenomenaworthyoffurtherinvestigation.Overthefullrangeofdatasetsthesedierencesinbiasandvarianceprolesdonotresultinsignicantdierencesoneithermeasureoferror,exceptwithrespecttoRMSEforFeating.Thisre\rectsthemannerinwhichFeatingselectsasingleclassratherthanproducingadistributionofclassprobabilities.Duetoitslowerbias,A2DEachieveslower0-1lossthanPA2DEandFA2DEonthetenlargestdatasets(WDLvsPA2DE=8/0/2,p=0054;WDLvsFA2DE=8/1/1,p=0019).Thisoutcomeisstatisticallysignicantatthe0.05levelwithrespecttoFA2DE,butnarrowlymissesoutonbeingsignicantwithrespecttoPA2DE.A2DEachieveslowerRMSEthanboththealternativesonnineofthetenlargestdatasets,anddrawsontheremainingdataset,p=0002.Hence,theevidenceisstronglysuggestivethattheAnDEapproachispreferableforlargedata.17 Table6:Win/Draw/Loss:AnDE,n=0,1and2,vsRF10andRF100onall62datasets AnDEvsRF10AnDEvsRF100 W/D/LpW/D/Lp A2DEBias18/1/430.00122/2/380.026Variance57/0/50.00145/1/160.001Zero-oneloss42/0/200.00436/3/230.059RMSE40/0/220.01535/0/270.187 AODEBias16/0/460.00120/0/420.004Variance57/1/40.00147/0/150.001Zero-oneloss41/0/210.00833/1/280.304RMSE39/0/230.02834/0/280.263 NBBias14/1/470.00116/1/450.001Variance56/0/60.00151/0/110.001Zero-oneloss33/0/290.35230/1/310.500RMSE30/0/320.45028/0/340.263 4.5Comparisonwiththestate-of-the-artInadditiontotherelativeperformanceoftheserelatedalgorithms,itisusefultounderstandhowtheperformancecomparestowellknownexamplesofthestate-of-the-art.WechooseRandomForests[32]asthecomparatoralgorithmbecauseitisrelativelyunparameterizedandhencereadilyproducesclearlyunderstoodperformanceoutcomes.WeuseRandomForestswithboththedefaultsettingof10trees(RF10)andwith100trees(RF100),allowingustoexploretherelativecomputational/accuracytrade-os.Table6showsthewin/draw/lossresultsforeachofA2DE,AODEandNBagainstRF10andRF100foreachofzero-oneloss,Bias,VarianceandRMSE.AllthreelevelsofAnDEhavehigherbiasbutlowervariancethanbothlevelsofRandomForest.Thistrade-odeliverslowererrorsignicantlymoreoftenthannotforbothA2DEandAODErelativetoRF10.BothdeliverlowererrormoreoftenthanRF100,butnotsignicantlyso.Notably,NBachieveshighererroralmostasoftenaslowerrelativetobothRF10andRF100.Thisillustratestheweaknessesofsuch`bake-os'withrespecttoerror.Aswehavearguedabove,lowvariancealgorithmssuchasNBwillbeadvantagedbytherelativelysmalldatasetsusedinthisstudy.Toassessthiseect,werepeatedtheerrorcomparisonsusingonlythetenlargestdatasets,thosecontainingmorethan8000examples.TheresultsareshowninTable7.Fortheselargerdatasets,bothR10andRF100achievelowererrormoreoftenthanallthreeofA2DE,AODEandNB,signicantlysowithrespecttoAODEandNBandwhencomparingRF100toA2DEon0-1loss.18 Table7:Win/Draw/Loss:AnDE,n=0,1and2,vsRF10andRF100onlargedatasets AnDEvsRF10AnDEvsRF100 W/D/LpW/D/Lp A2DEZero-oneloss2/0/80.0551/1/80.020RMSE3/0/70.1722/0/80.055 AODEZero-oneloss0/0/100.0010/0/100.001RMSE1/0/90.0110/0/100.001 NBZero-oneloss0/0/100.0010/0/100.001RMSE0/0/100.0010/0/100.001 However,RandomForest'serroradvantageforlargedatacomesatacostintrainingtime.Figure6showsthetrainingandclassicationtimesforAODE,A2DE,RF10andRF100.Itisapparentthat,overall,RF100hasveryhightrainingtimes.WhileA2DE'strainingtimedoesapproachRF100'sforhighdimensionaldata,forsmalldataandlowdimensionaldataitstrainingtimesarecompetitivewithRF10.Ontheotherhand,A2DErequiressubstantiallymoreclassicationtimeonaveragethanrandomforest.Thisrequirementgrowsgreatlywithhigh-dimensionaldata.A2DEwillnotbefeasibleforclassicationoflargenumbersofhigh-dimensionalobjects.Incontrast,itsclassicationtimeisverycompetitivewithlow-dimensionaldata.5ConclusionsandDirectionsforFutureRe-searchAnDEprovidesanattractiveframeworkfordevelopingmachinelearningtech-niques.Asingleparameterncontrolsabias-variancetrade-osuchthatn=aprovidesaclassierwhoseasymptoticerroristheBayesoptimalerrorrate.However,forhigh-dimensionaldataonlyverylow-orderformsofAnDEarefea-sible.Nonetheless,wehaveestablishedthathigher-ordervariantsarelikelytodelivergreateraccuracythanlower-orderalternativeswhenthenumberoftrain-ingexamplesishigh.Inconsequence,apromisingdirectionforfutureresearchistodevelopcomputationallyecienttechniquesforapproximatingAnDEforhighvaluesofnAfurtherunresolvedissueishowtoselectanappropriatevalueofnforanyspecicdatasetD.Aretheremorecomputationallyecientapproachesthanasimplewrapper-basedcomparisonofeachpossiblevalue?AnumberoftechniqueshavebeendevelopedforextendingAODEtohandlenumericdata[40].ThereisaneedtoextendthisworktothemoregeneralAnDEframework.19 Figure6:Averageper-exampletrainingandclassicationtimesforAODE,A2DE,RF10andRF100Trainingtimesarepresentedforall,thetenlargest(excludingCensusIncome,forwhichRF100couldnotbeexecutedonamachineforwhichreliabletimescouldbeobtained,thus5,620{581,012examples),thetenlowestdimensional(5{7attributes)andthetenhighestdimensional(43{70attributes)datasets.Classicationtimesarepresentedforall,thetenlowestdimensionalandthetenhighestdimensionaldatasets. 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 AODE A2DE RF10 RF100 average time (seconds) Training Time all datasets largest low-dimensional high-dimensional 0 0.05 0.1 0.15 0.2 0.25 0.3 AODE A2DE RF10 RF100 average time (seconds) Classification Time all datasets high-dimensional low-dimensional Wehavepresentedastrategyforlearningwithoutsearch.Wedonotar-gue,however,thatsearchshouldnecessarilybeavoided.Indeed,ithasbeendemonstratedthatbothappropriatefeatureselection[43,44]andweightingofthesubmodels[27,28,29]canreducetheerrorofAODE.Therefore,itislikelytobeworthwhiletoexploreecientmethodsforeachofthesestrategiesforhighervaluesofn.Iffastclassicationisrequired,andtimefortrainingislessconstrained,approachesthatusesearchtoselectasmallnumberofsubmodelsfromanAnDEmodelarelikelytobedesirable.Wherethereissucienttrain-ingtimeavailable,searchforappropriatesubmodelweightsisalsolikelytobeuseful.Wehavedevelopedagenerativelearningalgorithmthatgeneralizestheprin-ciplesthatunderlieAODEtoeverhigherlevelsofdependence.Ithasthefol-lowingdesirablefeatures:computationalcomplexityislinearwithrespecttothenumberoftrainingexamples;directpredictionofclassprobabilities;integratedhandlingofmissingvalues;robustnessinthefaceofnoise;otherthanthechoiceofwhichinstantiation(choiceofn)andchoiceofsmoothingtechnique,theapproachusesnotuneableparametersthereisnomodelselection;20 asimplemechanismcontrolsthebias/variancetrade-o;incrementallearning;learningandclassicationcanreadilyutilizeparallelcomputation;andthereisadirecttheoreticalbasisthatprovidesoptimalpredictionexceptinsofarasclearlyspeciedassumptionsareviolated.Asingleparameternprovidescontroloverabias-variancetrade-o,suchthathighervaluesofnareappropriateforgreaternumbersoftrainingcases.AnDEdemonstratesthatitispossibletodevelopcompetitivelearnerswithoutusingsearch.Offurtherinterest,thisfamilyofalgorithmsshowthatitispossibletodeveloplowbiasalgorithmsinagenerativeframework.Finally,A2DEprovestobeacomputationallytractableversionofAnDEthatdeliversstrongclassi-cationaccuracyforlargedatawithoutanyparametertuning.AcknowledgementsThisresearchhasbeensupportedbyAustralianResearchCouncilgrantDP110101427.WearegratefultoJoaoGamaandKevinKorbforinsightfuldiscussionsonthisresearchandfeedbackondraftsofthispaperADetailedResultsDetailedresultsforBias,Variance,0-1Loss,RMSE,TrainingTimeandClas-sicationTimesarepresentedinTables8to13.Thedatasetsarelistedinascendingorderonnumberofinstances.References[1]Brain,D.,Webb,G.I.:Theneedforlowbiasalgorithmsinclassicationlearningfromlargedatasets.In:ProceedingsoftheSixthEuropeanCon-ferenceonPrinciplesofDataMiningandKnowledgeDiscovery(PKDD2002),Berlin,Springer-Verlag(2002)62{73[2]Webb,G.I.,Boughton,J.,Wang,Z.:NotsonaiveBayes:Aggregatingone-dependenceestimators.MachineLearning58(1)(2005)5{24[3]Nikora,A.P.:Classifyingrequirements:Towardsamorerigorousanalysisofnatural-languagespecications.In:ProceedingsoftheSixteenthIEEEInternationalSymposiumonSoftwareReliabilityEngineering,Washington,DC,USA,IEEEComputerSociety(2005)291{300[4]Camporelli,M.:UsingaBayesianclassierforprobabilityestimation:Analysisoftheamisscoreforriskstraticationinmyocardialinfarction.DiplomaThesis,DepartmentofInformatics,UniversityofZurich(2006)21 Table8:BiasDatasetNBAODEA2DEPA2DEFA2DETANMAPLMGRF10RF100Contact-lenses0.18970.19700.19670.20180.20020.32450.19130.17100.1728LungCancer0.34130.34230.34430.37020.37260.31740.34280.34540.3739Labornegotiations0.05550.06080.06500.06620.06540.08070.06270.10090.1011PostoperativePatient0.27300.28930.29490.29410.29230.27170.28780.28090.2907Zoo0.03820.03080.03110.03170.03150.05180.03080.03020.0329PromoterGeneSequences0.06920.09020.22990.16240.17370.08080.09340.09010.0576Echocardiogram0.26530.26780.26280.26130.26370.24160.26160.25000.2543Lymphography0.12860.12020.11640.11530.11810.12560.11850.13070.1373IrisClassification0.03080.02860.02960.03070.02930.02700.02880.03060.0308TeachingAssistantEvaluation0.35670.30750.29970.29990.30660.32800.31090.29690.3001Hepatitis0.13680.12620.12300.12760.12670.10080.12440.13070.1327WineRecognition0.04300.03870.04050.03910.04130.04060.03880.04340.0435AutoImports0.26150.17750.16800.17240.17400.19960.17680.13800.1362SonarClassification0.22270.16100.14220.15010.15080.14390.15930.12210.1344GlassIdentification0.21330.21240.21100.20750.21050.21420.21260.20410.2074New-Thyroid0.09970.09440.09600.09330.09270.10260.09470.09810.0998Audiology0.21930.21960.21930.21790.21790.18170.21960.15680.1712Hungarian0.12320.12250.12320.12400.12370.12620.12190.14580.1496HeartDisease(Cleveland)0.14900.14700.14540.14830.14830.13970.14680.15140.1530Haberman'sSurvival0.24160.24080.24130.24250.24200.24060.24180.23420.2369PrimaryTumor0.38110.38190.38100.38270.38150.37640.38330.37190.3824LiverDisorders(Bupa)0.32690.32090.32040.32130.32010.31370.32020.32030.3242Ionosphere0.17580.06880.06150.06330.06360.04760.06720.05470.0585Dermatology0.01070.01200.01230.01130.01160.01300.01190.02190.0195HorseColic0.17350.15270.14570.14460.14480.14120.15160.12830.1343HouseVotes840.09170.04710.03670.04360.04080.04500.04150.02990.0339CylinderBands0.15480.13170.13150.13780.13740.33260.12930.18620.2007Syncon0.10180.05360.04360.07280.07400.03000.04860.03840.0382BalanceScale0.13810.14530.15050.15030.15080.14320.14540.14290.1436CreditScreening0.12660.11970.11910.12040.12090.11810.11880.11370.1165BreastCancer(Wisconsin)0.02550.02760.02630.02610.02550.02690.02720.03070.0314PimaIndiansDiabetes0.24990.24060.22760.22590.22950.21890.23930.21070.2122Vehicle0.38490.29480.28210.28760.28540.26780.29010.24040.2458Annealing0.09080.06760.06710.06470.06540.06250.06730.09690.0986Tic-Tac-ToeEndgame0.25090.20100.07190.10030.11140.16930.19610.04000.0275Vowel0.33080.13700.11940.12000.12440.16740.11890.10540.1056LED0.22610.22680.22700.22720.22660.22410.22690.22610.2277German0.21000.20100.19700.20230.20350.18400.20120.18260.1977ContraceptiveMethodChoice0.46830.43390.40740.41680.41050.38300.42470.36820.3754Volcanoes0.50340.46300.45770.45770.45830.45570.46560.45390.4577CarEvaluation0.10610.05910.04110.03790.04170.04820.05200.03950.0388Segment0.16050.12110.11620.11770.11780.11040.11560.09250.0933Splice-junctionGeneSequences0.03830.03040.02660.05810.05690.03760.03020.03550.0259King-rook-vs-king-pawn0.10740.07160.05410.07220.08580.06260.04580.00790.0062Hypothyroid(Garavan)0.07430.07380.07370.07400.07420.07460.07390.07390.0741Sick-euthyroid0.06630.06240.06130.06180.06230.05890.06030.05900.0590Abalone0.47370.39250.39020.38600.39030.39070.39210.37450.3769SPAME-mail0.33220.33240.33230.33220.33220.33100.33240.33080.3312Waveform-50000.20700.16920.16250.18540.18790.15690.16290.13120.1531Nettalk(Phoneme)0.19570.14760.15120.13220.13730.18780.13610.10630.1098PageBlocks0.07210.07710.07740.07670.07810.07950.07760.07750.0776OpticalDigits0.08040.02860.02290.04100.04210.03820.02790.02430.0221Mushrooms0.02050.00020.00000.00000.00000.00020.00020.00000.0000PenDigits0.15620.06010.04740.05160.05080.07700.05310.03300.0341Sign0.45300.40010.36610.36850.36780.38590.38330.33780.3391Nursery0.09010.06550.04330.04540.04270.05460.06680.01050.0084MAGICGammaTelescope0.23870.21710.19920.20730.20570.21350.21660.20450.2049LetterRecognition0.44500.37400.34730.35680.36350.36680.36970.25640.2591Adult0.17770.16680.15790.16520.16610.15520.16000.13970.1416Connect-4Opening0.26460.22500.21070.24210.25840.22240.22030.13040.1427Census-Income(KDD)0.23580.08530.06040.13180.12240.06280.08060.04290.0447Covertype0.34440.31800.30750.33350.34440.32330.31140.23630.236922 Table9:VarianceDatasetNBAODEA2DEPA2DEFA2DETANMAPLMGRF10RF100Contact-lenses0.14190.14720.16750.14150.20980.09720.13790.18730.1647LungCancer0.17940.19210.19630.19670.19670.21570.19160.25960.2218Labornegotiations0.03930.04830.04700.04150.04330.09020.04680.10010.0827PostoperativePatient0.08590.07820.08760.08320.08220.10270.08100.10910.0961Zoo0.03430.03080.03020.03020.03240.04720.03080.04780.0386PromoterGeneSequences0.06930.11190.11800.09050.08970.15620.11400.16610.0909Echocardiogram0.06360.06970.08230.08580.08570.08680.07420.11090.1043Lymphography0.04100.04440.04790.04210.04430.10410.04570.08980.0622IrisClassification0.01120.01100.01190.01310.01520.01710.01160.01260.0127TeachingAssistantEvaluation0.16490.19580.19450.19800.19110.19980.19650.21510.2108Hepatitis0.02370.03270.03740.03330.03380.06220.03340.06050.0464WineRecognition0.01740.01830.02070.01990.01940.03640.01840.03890.0203AutoImports0.13710.13750.13780.13250.13110.12730.13750.15500.1289SonarClassification0.07560.08830.07810.08430.08470.11970.08650.13800.0904GlassIdentification0.04840.04000.04490.04660.04550.05650.04280.06180.0616New-Thyroid0.01920.02580.02690.02470.02420.03380.02560.02980.0287Audiology0.10140.09950.09860.09900.10080.15880.09970.16190.1104Hungarian0.02490.02910.03810.03700.03630.05960.03130.07100.0568HeartDisease(Cleveland)0.02090.02990.03950.03480.03430.06350.03160.08670.0708Haberman'sSurvival0.04110.05690.06390.06460.05950.06450.05050.08530.0808PrimaryTumor0.15940.15820.16070.15760.15890.20010.15800.23610.2029LiverDisorders(Bupa)0.09680.10750.11000.11110.11330.11860.10860.11830.1155Ionosphere0.03260.01170.01680.00960.00970.01980.01310.03780.0177Dermatology0.00950.01110.01120.01060.01020.02880.01110.03640.0179HorseColic0.02450.03690.04360.03350.03320.05870.03820.04990.0313HouseVotes840.00610.00960.01370.00980.01130.02040.01120.01730.0084CylinderBands0.10650.11120.10730.09540.09600.07030.11000.12720.0910Syncon0.00920.01990.01960.01660.01630.02590.01970.05820.0254BalanceScale0.05550.06060.07120.07240.07180.06720.06220.08280.0808CreditScreening0.02560.02700.02810.02780.02720.04510.02750.05780.0441BreastCancer(Wisconsin)0.00160.00950.01270.00760.00730.02410.01050.01380.0076PimaIndiansDiabetes0.02540.03490.05080.05130.04780.05760.03690.07720.0729Vehicle0.08570.09860.10490.09970.10230.12330.10240.14610.1305Annealing0.01170.01740.01690.01820.01790.02060.01620.04520.0383Tic-Tac-ToeEndgame0.04070.05080.05360.05620.05990.07870.05410.08090.0490Vowel0.22680.16370.14510.13980.14440.21530.15280.14240.1183LED0.03670.03710.03970.03980.03920.04650.03760.05510.0509German0.04810.05400.06020.05190.05070.09690.05460.09930.0689ContraceptiveMethodChoice0.07190.10360.13150.11900.12580.15920.11320.19820.1875Volcanoes0.08290.13330.14150.14150.14130.14130.13050.14690.1427CarEvaluation0.05060.04720.04360.04050.04520.03350.04560.05170.0389Segment0.01980.01840.01660.01840.01890.02170.01780.02690.0245Splice-junctionGeneSequences0.00830.00970.01260.01750.01720.02320.00980.08240.0222King-rook-vs-king-pawn0.01990.02040.02000.01790.01880.01550.01280.01120.0067Hypothyroid(Garavan)0.00080.00060.00070.00060.00070.00160.00050.00760.0063Sick-euthyroid0.01450.00340.00280.00290.00330.00530.00330.00850.0073Abalone0.02340.08370.08660.09170.08590.08640.08390.10520.1032SPAME-mail0.00510.00500.00500.00500.00510.00740.00500.00670.0061Waveform-50000.01320.02920.03300.01660.01610.05870.03280.10710.0475Nettalk(Phoneme)0.10760.11670.12950.11320.12460.16020.09850.08650.0686PageBlocks0.01770.00760.00630.00700.00570.00430.00610.00510.0049OpticalDigits0.00990.00860.00760.00900.00910.01420.00890.04190.0118Mushrooms0.00290.00010.00000.00010.00010.00010.00010.00010.0000PenDigits0.00790.00640.00590.00540.00640.01700.00680.01860.0131Sign0.01130.01000.01600.01230.01890.01360.01740.01930.0176Nursery0.00770.00880.01130.01040.01170.01460.01230.02310.0160MAGICGammaTelescope0.00950.00950.01820.00800.01170.00790.00710.00980.0089LetterRecognition0.05200.04780.04730.04390.04690.07680.04770.10000.0917Adult0.00510.00850.01050.00790.00710.01400.00970.03700.0331Connect-4Opening0.01460.01890.02120.01460.01450.01440.01960.08310.0449Census-Income(KDD)0.00330.01210.01100.00550.00640.00770.01300.01390.0102Covertype0.00290.00280.00380.00310.00240.00350.00450.01490.014123 Table10:0-1LossDatasetNBAODEA2DEPA2DEFA2DETANMAPLMGRF10RF100Contact-lenses0.33170.34420.36420.34330.41000.42170.32920.35830.3375LungCancer0.52060.53440.54060.56690.56940.53310.53440.60500.5956Labornegotiations0.09470.10910.11190.10770.10880.17090.10950.20110.1839PostoperativePatient0.35890.36760.38240.37730.37440.37440.36890.39000.3869Zoo0.07250.06160.06140.06200.06400.09900.06160.07800.0715PromoterGeneSequences0.13850.20210.34790.25280.26340.23700.20740.25620.1485Echocardiogram0.32890.33760.34500.34720.34950.32840.33590.36090.3586Lymphography0.16960.16460.16430.15740.16240.22970.16420.22040.1995IrisClassification0.04200.03960.04150.04370.04450.04410.04040.04320.0435TeachingAssistantEvaluation0.52160.50330.49420.49790.49770.52780.50740.51190.5109Hepatitis0.16050.15880.16040.16090.16050.16300.15780.19120.1791WineRecognition0.06040.05700.06120.05900.06080.07700.05720.08240.0638AutoImports0.39860.31490.30590.30490.30510.32680.31430.29310.2651SonarClassification0.29830.24930.22040.23440.23550.26370.24580.26010.2248GlassIdentification0.26170.25240.25590.25410.25600.27070.25540.26590.2690New-Thyroid0.11900.12020.12290.11800.11690.13650.12040.12790.1285Audiology0.32060.31910.31790.31690.31880.34040.31930.31880.2816Hungarian0.14820.15160.16140.16100.15990.18580.15320.21670.2064HeartDisease(Cleveland)0.16990.17690.18490.18320.18260.20320.17840.23810.2238Haberman'sSurvival0.28270.29770.30520.30710.30140.30510.29230.31950.3178PrimaryTumor0.54050.54010.54170.54030.54050.57650.54120.60790.5853LiverDisorders(Bupa)0.42370.42830.43050.43230.43340.43230.42880.43860.4397Ionosphere0.20840.08060.07820.07290.07330.06740.08030.09250.0762Dermatology0.02030.02310.02350.02190.02180.04180.02300.05840.0374HorseColic0.19790.18970.18930.17810.17800.19980.18980.17820.1656HouseVotes840.09780.05670.05030.05340.05210.06540.05270.04710.0423CylinderBands0.26130.24300.23880.23310.23340.40290.23920.31340.2917Syncon0.11100.07350.06330.08940.09040.05590.06840.09660.0636BalanceScale0.19360.20590.22170.22280.22260.21040.20760.22580.2244CreditScreening0.15230.14670.14730.14820.14820.16320.14630.17140.1606BreastCancer(Wisconsin)0.02710.03710.03900.03370.03290.05090.03770.04450.0390PimaIndiansDiabetes0.27520.27540.27840.27720.27720.27650.27630.28790.2851Vehicle0.47070.39330.38700.38730.38780.39110.39240.38650.3764Annealing0.10250.08500.08400.08290.08330.08310.08340.14200.1369Tic-Tac-ToeEndgame0.29160.25190.12550.15660.17130.24800.25020.12090.0766Vowel0.55770.30070.26450.25980.26870.38270.27170.24770.2239LED0.26270.26390.26670.26700.26590.27060.26440.28110.2786German0.25810.25500.25730.25420.25420.28090.25590.28190.2666ContraceptiveMethodChoice0.54020.53750.53890.53580.53630.54220.53790.56640.5630Volcanoes0.58630.59630.59920.59920.59950.59700.59620.60070.6004CarEvaluation0.15670.10630.08460.07850.08690.08180.09760.09120.0776Segment0.18030.13940.13280.13610.13670.13210.13340.11940.1178Splice-junctionGeneSequences0.04660.04010.03920.07550.07410.06080.04000.11790.0480King-rook-vs-king-pawn0.12730.09200.07410.09010.10460.07800.05860.01910.0129Hypothyroid(Garavan)0.07510.07440.07440.07460.07490.07610.07450.08150.0804Sick-euthyroid0.08080.06570.06410.06470.06560.06430.06360.06750.0664Abalone0.49710.47620.47690.47770.47620.47700.47600.47960.4801SPAME-mail0.33730.33730.33730.33720.33730.33840.33730.33750.3373Waveform-50000.22030.19840.19550.20200.20400.21550.19570.23830.2006Nettalk(Phoneme)0.30320.26430.28070.24540.26180.34800.23460.19270.1784PageBlocks0.08980.08470.08360.08370.08370.08390.08380.08260.0825OpticalDigits0.09030.03720.03050.05000.05120.05240.03670.06620.0339Mushrooms0.02340.00030.00000.00010.00010.00030.00030.00010.0000PenDigits0.16410.06650.05340.05700.05720.09400.05990.05170.0472Sign0.46430.41010.38210.38080.38670.39950.40060.35710.3567Nursery0.09790.07430.05460.05580.05440.06920.07910.03360.0244MAGICGammaTelescope0.24820.22650.21740.21530.21740.22150.22370.21430.2139LetterRecognition0.49690.42170.39460.40070.41040.44360.41730.35640.3508Adult0.18280.17530.16840.17310.17320.16920.16980.17670.1747Connect-4Opening0.27920.24400.23190.25670.27290.23680.24000.21350.1876Census-Income(KDD)0.23910.09740.07140.13730.12890.07060.09360.05690.0550Covertype0.34730.32080.31130.33660.34680.32680.31590.25120.251024 Table11:RMSEDatasetNBAODEA2DEPA2DEFA2DETANMAPLMGRF10RF100Contact-lenses0.37820.39070.39720.39240.41090.44620.38370.40980.3941LungCancer0.56230.57050.57360.49660.49910.51690.57060.48750.4736Labornegotiations0.26840.28200.28400.29240.28570.35160.28220.37080.3592PostoperativePatient0.42050.42660.43550.42740.45100.42130.42740.44480.4377Zoo0.12450.11630.11400.11250.11490.14080.11600.13120.1248PromoterGeneSequences0.33420.39870.42650.41590.41680.44550.40190.42370.3946Echocardiogram0.46450.47430.48410.47950.52760.47690.47450.50410.4993Lymphography0.26010.25420.25570.24210.24620.28950.25460.28140.2710IrisClassification0.14500.14150.14410.14720.15260.14720.14240.15640.1559TeachingAssistantEvaluation0.51570.50600.50780.48760.51640.48750.50380.48680.4825Hepatitis0.36540.35480.35610.34270.34920.35430.35380.36420.3516WineRecognition0.17750.17210.17840.17730.17670.19520.17230.21000.1965AutoImports0.30530.27100.26860.24900.25150.27110.27060.24410.2330SonarClassification0.49540.44250.41760.39110.39320.44900.43840.41640.3935GlassIdentification0.36180.35680.35730.35410.38110.36030.35750.36780.3645New-Thyroid0.24270.24400.24630.24440.26620.25920.24380.25350.2518Audiology0.14630.14620.14610.14120.14980.14090.14620.13980.1333Hungarian0.34030.33740.34310.34200.35450.37100.33800.39110.3799HeartDisease(Cleveland)0.37140.36790.37180.36850.38270.38710.36870.41180.3973Haberman'sSurvival0.45520.46440.48010.48140.54220.47030.46260.48870.4853PrimaryTumor0.18100.18010.18010.17920.19350.18130.18020.19340.1890LiverDisorders(Bupa)0.49620.49620.49860.49940.61260.50240.49650.51930.5168Ionosphere0.42600.26730.25910.26060.26110.23690.26620.28220.2672Dermatology0.07250.07550.07670.07970.07860.10090.07540.14170.1306HorseColic0.40670.38810.38770.37400.37740.40570.38850.37530.3667HouseVotes840.29880.20920.19470.20080.20160.22770.20190.19360.1842CylinderBands0.46550.44370.44000.39740.40230.49000.44160.44400.4353Syncon0.18310.13730.12750.13840.13990.12030.13180.17080.1565BalanceScale0.31550.31080.30990.31030.34160.31480.31170.31770.3150CreditScreening0.34140.33580.33650.33260.34630.35720.33460.36280.3518BreastCancer(Wisconsin)0.15970.17770.17800.17780.17150.20020.17830.19150.1834PimaIndiansDiabetes0.43290.43090.43150.42970.47940.43090.43090.45420.4513Vehicle0.43140.35600.35060.34830.36840.35550.35420.36540.3583Annealing0.15350.14290.14110.14170.15140.13730.14160.18630.1843Tic-Tac-ToeEndgame0.43360.40440.31760.34550.33850.41080.40290.31320.2921Vowel0.25560.19450.18380.18550.18710.21780.18690.18320.1747LED0.19860.19930.20030.20070.21930.20330.19940.20990.2088German0.42040.41850.42230.41280.42830.44700.41930.43770.4218ContraceptiveMethodChoice0.46470.45360.45400.45370.51080.45860.45330.50290.4970Volcanoes0.41160.41250.41470.41490.54300.41350.41250.41620.4158CarEvaluation0.22920.20830.19220.19050.17860.18480.19630.18840.1782Segment0.19220.16820.16290.16640.18210.16280.16300.15380.1525Splice-junctionGeneSequences0.15390.14300.14280.19800.19510.17510.14270.28270.2601King-rook-vs-king-pawn0.30490.27190.25410.27120.26040.23920.23540.14490.1265Hypothyroid(Garavan)0.18650.18460.18430.18480.19210.18660.18480.19170.1909Sick-euthyroid0.23850.23020.22660.22810.24410.22640.22470.23170.2305Abalone0.48010.43470.43180.43250.51610.43280.43140.43660.4362SPAME-mail0.45240.45230.45220.45210.58000.45340.45230.45290.4526Waveform-50000.34150.30580.30220.30720.31600.32300.30230.33210.3146Nettalk(Phoneme)0.09530.08840.09090.08830.08860.09860.08440.07600.0731PageBlocks0.16140.15980.15930.15910.17530.15930.15950.15900.1588OpticalDigits0.12270.07670.06990.08530.08520.09210.07620.13010.1172Mushrooms0.13220.01530.00640.02950.02880.01650.01420.01230.0086PenDigits0.16520.10010.08950.09640.09660.11860.09580.09220.0886Sign0.43690.41690.40620.40500.47410.41500.41140.39110.3908Nursery0.17700.15830.14240.14280.13010.14250.14820.10920.1009MAGICGammaTelescope0.40830.39560.39170.39140.41840.39490.39420.38960.3893LetterRecognition0.15730.14530.14040.14210.15170.14890.14440.13590.1344Adult0.36640.35190.34420.34670.36490.34220.34600.35420.3516Connect-4Opening0.35910.33810.33030.34580.37830.33220.33510.32150.3057Census-Income(KDD)0.46280.27310.23080.30580.30850.22750.26600.20980.2048Covertype0.26040.24820.24450.25450.29400.25460.24600.22050.220425 Table12:PerInstanceTrainingTimeDatasetNBAODEA2DEPA2DEFA2DETANMAPLMGRF10RF100Contact-lenses0.005710.004710.005540.005210.012460.006830.245670.014290.04650LungCancer0.004090.005530.716310.648280.698000.025340.437440.030130.25059Labornegotiations0.002120.002190.003490.003330.002950.004470.102250.012470.09011PostoperativePatient0.001510.001460.001660.001600.001240.002200.059100.007910.05682Zoo0.001310.001290.002160.002230.001960.002870.092470.007390.04889PromoterGeneSequences0.001920.001800.185120.155370.177210.012490.224530.016300.14082Echocardiogram0.000880.000960.001040.001110.000880.001570.041790.006260.05076Lymphography0.000910.000930.002050.001860.003550.002280.065300.008890.07238IrisClassification0.001020.000700.000990.000830.000710.001210.034420.002910.01227TeachingAssistantEvaluation0.001000.000830.001250.001040.000770.004270.036000.007420.07446Hepatitis0.001000.000790.002830.001740.001600.001920.056640.008640.07545WineRecognition0.000740.000700.002660.001060.000970.001460.040890.005090.04271AutoImports0.000740.000820.014860.008980.019640.002720.120480.016190.14782SonarClassification0.000750.001130.054170.052250.067120.005140.202190.019830.26942GlassIdentification0.000640.000530.000720.000720.000620.001000.029600.005140.04545New-Thyroid0.000570.000510.000660.000620.000520.002510.024590.002750.01636Audiology0.000680.005520.634030.586340.559590.013652.240210.042930.45907Hungarian0.000430.000410.000740.000790.000650.001210.027060.007120.06659HeartDisease(Cleveland)0.000450.000410.000740.000920.000710.000870.026780.006310.06091Haberman'sSurvival0.000410.000380.000480.000410.000390.000580.016700.002450.01966PrimaryTumor0.000390.000470.001300.001830.002390.001190.133800.016540.16519LiverDisorders(Bupa)0.000350.000300.000410.000800.000370.000590.016950.003280.02885Ionosphere0.000400.000560.004630.005560.006740.001750.077230.011040.10150Dermatology0.000400.000730.020990.019900.028240.002300.155390.010260.09502HorseColic0.000370.000400.001620.001610.002140.001000.041040.027230.25749HouseVotes840.000290.000320.000860.000800.000750.000700.027660.003950.03675CylinderBands0.000300.002660.521350.470190.159900.011250.161610.162641.39439Syncon0.000290.000870.121260.116580.113870.003970.475640.018810.18113BalanceScale0.000230.000180.000230.000210.000200.000290.009800.002030.01860CreditScreening0.000230.000220.000650.000650.000630.000480.022630.008780.08624BreastCancer(Wisconsin)0.000190.000190.000360.000360.000840.000420.015050.002520.02392PimaIndiansDiabetes0.000190.000160.000240.000240.000230.000300.011790.004460.04201Vehicle0.000170.000200.000930.000920.001270.000520.038820.011480.11048Annealing0.000190.000360.010700.010430.014150.000820.181880.057060.62783Tic-Tac-ToeEndgame0.000140.000130.000240.000240.000220.000250.011900.004300.03970Vowel0.000140.000160.000620.000580.000710.000440.047110.009070.09508LED0.000120.000110.000170.000170.000180.000220.016480.003800.03551German0.000170.000190.001140.001130.001390.000510.031430.009580.09459ContraceptiveMethodChoice0.000100.000100.000200.000200.000180.000290.012110.006360.06713Volcanoes0.000090.000080.000090.000080.000080.000110.005510.000930.00834CarEvaluation0.000080.000080.000110.000120.000110.000140.011920.002440.02308Segment0.000080.000130.001250.000940.001110.000300.061860.006080.06088Splice-junctionGeneSequences0.000150.000580.067430.068420.064330.002000.231560.022160.21648King-rook-vs-king-pawn0.000100.000240.004400.004460.004630.000620.103760.015540.15649Hypothyroid(Garavan)0.000090.000170.002900.002650.002770.000410.091190.022190.22663Sick-euthyroid0.000090.000160.002440.002480.002540.000430.051230.017830.17659Abalone0.000050.000050.000120.000120.000110.000080.010160.002930.02821SPAME-mail0.000130.000490.017670.017500.016890.001410.142330.113171.15056Waveform-50000.000100.000280.007240.006870.007100.000760.116780.022980.21661Nettalk(Phoneme)0.000040.000310.018170.016160.015080.001690.054300.017620.17862PageBlocks0.000040.000050.000170.000170.000170.000090.015040.002710.02672OpticalDigits0.000120.000400.049190.048990.046320.001260.459070.025290.21022Mushrooms0.000060.000120.001380.001480.001490.000280.026410.004570.04698PenDigits0.000040.000080.000590.000590.000700.000160.059310.007760.07362Sign0.000030.000030.000100.000100.000100.000060.012970.003600.03670Nursery0.000030.000030.000110.000100.000100.000060.017430.003070.02825MAGICGammaTelescope0.000030.000040.000150.000150.000150.000070.010420.004940.04798LetterRecognition0.000050.000080.000670.000630.000700.000170.139070.013970.13481Adult0.000040.000070.000430.000400.000420.000120.022770.019490.19112Connect-4Opening0.000100.000290.007220.007420.007250.000760.170490.037720.37281Census-Income(KDD)0.000100.000330.013040.013260.012790.000780.110650.06108|Covertype0.000130.000440.017230.017560.017070.001180.441940.108640.8734026 Table13:PerInstanceClassicationTimeDatasetNBAODEA2DEPA2DEFA2DETANMAPLMGRF10RF100Contact-lenses0.000250.000290.000710.000250.000330.000330.000630.000080.00021LungCancer0.000220.004160.142090.143310.140190.000410.004590.000090.00025Labornegotiations0.000090.000260.000720.001000.000720.000120.000650.000040.00074PostoperativePatient0.000140.000170.000580.000500.000500.000230.000440.000120.00058Zoo0.000180.000930.004460.004660.004370.000180.001660.000100.00039PromoterGeneSequences0.000130.002810.150110.139260.140860.000250.003180.000100.00034Echocardiogram0.000100.000150.000160.000150.000160.000080.000230.000090.00062Lymphography0.000230.000760.004730.004590.004160.000180.001120.000160.00049IrisClassification0.000150.000150.000230.000130.000110.000090.000230.000070.00029TeachingAssistantEvaluation0.000260.000130.000190.000250.000140.000170.000280.000150.00047Hepatitis0.000150.000460.005540.003170.003070.000120.000680.000120.00057WineRecognition0.000100.000420.003240.001840.001640.000120.000580.000130.00042AutoImports0.000190.002270.041200.022350.022010.000260.002960.000090.00068SonarClassification0.000160.003780.161240.164180.152680.000190.003380.000080.00067GlassIdentification0.000080.000270.000680.000680.000620.000080.000370.000120.00054New-Thyroid0.000090.000110.000170.000170.000180.000100.000210.000080.00036Audiology0.000730.043191.259241.321991.264690.001920.045730.000200.00169Hungarian0.000080.000220.000700.000790.000670.000090.000380.000060.00066HeartDisease(Cleveland)0.000060.000260.001270.001420.001260.000070.000400.000080.00054Haberman'sSurvival0.000050.000040.000060.000070.000050.000050.000140.000060.00030PrimaryTumor0.000240.003000.013500.024240.013610.000450.004790.000220.00217LiverDisorders(Bupa)0.000080.000120.000180.000280.000170.000050.000170.000090.00054Ionosphere0.000130.001320.023400.028680.023420.000120.001400.000110.00058Dermatology0.000170.003130.044340.045400.044700.000260.003730.000080.00052HorseColic0.000080.000410.002760.002790.002710.000110.000620.000300.00399HouseVotes840.000090.000390.002450.002550.002430.000100.000530.000070.00040CylinderBands0.000100.001540.040880.042530.040650.000150.002030.000080.00076Syncon0.000250.010670.333850.340020.340390.000440.010240.000100.00069BalanceScale0.000030.000080.000080.000080.000070.000040.000160.000060.00042CreditScreening0.000060.000310.001920.001980.001920.000080.000460.000080.00078BreastCancer(Wisconsin)0.000050.000170.000420.000450.000410.000070.000280.000040.00028PimaIndiansDiabetes0.000040.000150.000350.000360.000320.000070.000210.000070.00067Vehicle0.000090.000850.005770.006070.005810.000110.001050.000090.00122Annealing0.000140.001090.003350.003490.003370.000160.002370.000500.01097Tic-Tac-ToeEndgame0.000050.000160.000450.000480.000450.000050.000260.000060.00059Vowel0.000130.001200.004840.005050.004820.000190.001750.000110.00144LED0.000090.000370.000580.000640.000570.000120.000770.000100.00076German0.000050.000490.004630.004760.004680.000070.000630.000070.00078ContraceptiveMethodChoice0.000040.000180.000530.000570.000530.000050.000360.000080.00134Volcanoes0.000040.000070.000050.000060.000050.000040.000180.000040.00035CarEvaluation0.000040.000140.000200.000210.000190.000050.000300.000050.00051Segment0.000100.001490.013940.010960.010720.000170.001780.000090.00069Splice-junctionGeneSequences0.000110.004640.208740.219270.212330.000240.004890.000080.00128King-rook-vs-king-pawn0.000060.001200.024910.025680.025050.000110.001450.000090.00106Hypothyroid(Garavan)0.000080.001530.019720.018050.018060.000140.001850.000160.00280Sick-euthyroid0.000050.000800.011350.011980.011440.000090.000980.000120.00128Abalone0.000030.000170.000420.000440.000420.000050.000290.000070.00064SPAME-mail0.000100.003430.118470.120600.118440.000160.003000.000310.00310Waveform-50000.000090.002630.058070.057220.056320.000150.002480.000110.00293Nettalk(Phoneme)0.000300.002200.003130.003370.002970.000610.006260.000240.00239PageBlocks0.000050.000380.001180.001250.001180.000080.000570.000060.00053OpticalDigits0.000300.012460.287980.292860.280910.000550.010340.000150.00249Mushrooms0.000040.000540.005970.006010.005810.000070.000710.000040.00034PenDigits0.000120.001640.008760.009230.008810.000200.002040.000120.00240Sign0.000030.000160.000420.000440.000420.000040.000280.000070.00084Nursery0.000040.000240.000540.000580.000550.000060.000450.000070.00121MAGICGammaTelescope0.000030.000160.000610.000630.000600.000040.000250.000080.00094LetterRecognition0.000290.004170.022290.023700.022350.000480.004960.000320.00595Adult0.000040.000270.001690.001670.001600.000050.000370.000420.00803Connect-4Opening0.000110.002890.066170.068750.065190.000170.002560.000390.00682Census-Income(KDD)0.000090.001770.053120.053130.051930.000130.001800.00060|Covertype0.000220.009260.248540.249370.242170.000450.008890.000400.0156127 [5]Flikka,K.,Martens,L.,Vandekerckhove,J.,Gevaert,K.,Eidhammer,I.:Improvingthereliabilityandthroughputofmassspectrometry-basedpro-teomicsbyspectrumqualityltering.Proteomics6(7)(2006)2086{2094[6]Orhan,Z.,Altan,Z.:Impactoffeatureselectionforcorpus-basedWSDinturkish.In:ProceedingsofthefthMexicanInternationalConferenceonArticialIntelligence,SpringerBerlin/Heidelberg(2006)868{878[7]Lasko,T.A.,Atlas,S.J.,Barry,M.J.,Chueh,K.H.C.:Automatedidenti-cationofaphysician'sprimarypatients.JournaloftheAmericanMedicalInformaticsAssociation13(1)(2006)74{79[8]Hunt,K.:EvaluationofNovelAlgorithmstoOptimizeRiskStraticationScoresinMyocardialInfarction.PhDthesis,DepartmentofInformatics,UniversityofZurich(2006)[9]Ferrari,L.D.,Aitken,S.:MininghousekeepinggeneswithanaiveBayesclassier.BMCGenomics7(1)(2006)277[10]Birzele,F.,Kramer,S.:Anewrepresentationforproteinsecondarystruc-turepredictionbasedonfrequentpatterns.Bioinformatics22(21)(2006)2628{2634[11]Kunchevaa,L.I.,Vilas,V.J.D.R.,Rodrguezc,J.J.:Diagnosingscrapieinsheep:Aclassicationexperiment.ComputersinBiologyandMedicine37(8)(2007)1194{1202[12]Lau,Q.P.,Hsu,W.,Lee,M.L.,Mao,Y.,Chen,L.:Predictionofcerebralaneurysmrupture.In:ProceedingsofthenineteenthIEEEInternationalConferenceonToolswithArticialIntelligence,Washington,DC,USA,IEEEComputerSociety(2007)350{357[13]Masegosa,A.,Joho,H.,Jose,J.:Evaluatingquery-independentobjectfeaturesforrelevancyprediction.AdvancesinInformationRetrieval(2007)283{294[14]Wang,H.,Klinginsmith,J.,Dong,X.,Lee,A.,Guha,R.,Wu,Y.,Crippen,G.,Wild,D.:ChemicaldataminingoftheNCIhumantumorcelllinedatabase.JournalofChemicalInformationandModeling47(6)(2007)2063{2076[15]Eduardo,A.,Iakes,E.,Beatriz,G.,Alfonso,V.,David,J.:EcID.AdatabasefortheinferenceoffunctionalinteractionsinE.coli.NucleicAcidsResearch(2008)[16]Garcia,B.,Aler,R.,Ledezma,A.,Sanchis,A.:Protein-proteinfunctionalassociationpredictionusinggeneticprogramming.In:ProceedingsoftheTenthAnnualConferenceonGeneticandEvolutionaryComputation,NewYork,NY,USA,ACM(2008)347{34828 [17]Tian,Y.,Chen,C.,Zhang,C.:Aodeforsourcecodemetricsforimprovedsoftwaremaintainability.In:FourthInternationalConferenceonSeman-tics,KnowledgeandGrid.(2008)330{335[18]Kurz,D.,Bernstein,A.,Hunt,K.,Radovanovic,D.,Erne,P.,Siudak,Z.,Bertel,O.:Simplepoint-of-careriskstraticationinacutecoronarysyndromes:theAMISmodel.BritishMedicalJournal95(8)(2009)662[19]Leon,A.,etal.:EcID.Adatabasefortheinferenceoffunctionalinterac-tionsinE.coli.NucleicAcidsResearch37(Databaseissue)(2009)D629[20]Shahri,S.,Jamil,H.:AnExtendableMeta-learningAlgorithmforOntol-ogyMapping.FlexibleQueryAnsweringSystems(2009)418{430[21]Simpson,M.,Demner-Fushman,D.,Sneiderman,C.,Antani,S.,Thoma,G.:Usingnon-lexicalfeaturestoidentifyeectiveindexingtermsforbiomedicalillustrations.In:Proceedingsofthe12thConferenceoftheEuropeanChapteroftheAssociationforComputationalLinguistics,Asso-ciationforComputationalLinguistics(2009)737{744[22]Aendey,L.,Paris,I.,Mustapha,N.,Sulaiman,M.,Muda,Z.:Rank-ingofIn\ruencingFactorsinPredictingStudents'AcademicPerformance.InformationTechnologyJournal9(4)(2010)832{837[23]Garca-Jimenez,B.,Juan,D.,Ezkurdia,I.,Andres-Leon,E.,Valencia,A.:InferenceofFunctionalRelationsinPredictedProteinNetworkswithaMachineLearningApproach.PLoSONE(4)(2010)e9969[24]Hopfgartner,F.,Urruty,T.,Lopez,P.,Villa,R.,Jose,J.:Simulatedevaluationoffacetedbrowsingbasedonfeatureselection.MultimediaToolsandApplications47(3)(2010)631{662[25]Liew,C.,Ma,X.,Yap,C.:ConsensusmodelforidenticationofnovelPI3Kinhibitorsinlargechemicallibrary.Journalofcomputer-aidedmoleculardesign24(2)(2010)131{141[26]Ting,K.M.,Wells,J.R.,Tan,S.C.,Teng,S.W.,Webb,G.I.:Feature-subspaceaggregating:Ensemblesforstableandunstablelearners.MachineLearning(in-press)[27]Cerquides,J.,Mantaras,R.L.D.:RobustBayesianlinearclassierensem-bles.In:ProceedingsoftheSixteenthEuropeanConferenceonMachineLearning.(2005)70{81[28]Jiang,L.,Zhang,H.:Weightilyaveragedone-dependenceestimators.PRI-CAI2006:TrendsinArticialIntelligence970{974[29]Yang,Y.,Webb,G.I.,Cerquides,J.,Korb,K.B.,Boughton,J.,Ting,K.M.:Toselectortoweigh:Acomparativestudyoflinearcombinationschemesforsuperparent-one-dependenceestimators.IEEETransactionsonKnowl-edgeandDataEngineering19(12)(2007)1652{166529 [30]Sahami,M.:LearninglimiteddependenceBayesianclassiers.In:Pro-ceedingsoftheSecondInternationalConferenceonKnowledgeDiscoveryinDatabases,MenloPark,CA:AAAIPress(1996)334{338[31]Friedman,N.,Geiger,D.,Goldszmidt,M.:Bayesiannetworkclassiers.MachineLearning29(2)(1997)131{163[32]Breiman,L.:Randomforests.MachineLearning45(2001)5{32[33]Witten,I.H.,Frank,E.:DataMining:PracticalMachineLearningToolsandTechniques.MorganKaufmann(2005)[34]Langley,P.,Sage,S.:InductionofselectiveBayesianclassiers.In:Pro-ceedingsoftheTenthConferenceonUncertaintyinArticialIntelligence,MorganKaufmann(1994)399{406[35]Pazzani,M.J.:ConstructiveinductionofCartesianproductattributes.ISIS:Information,StatisticsandInductioninScience(1996)66{77[36]Domingos,P.,Pazzani,M.J.:Beyondindependence:ConditionsfortheoptimalityofthesimpleBayesianclassier.In:ProceedingsoftheThir-teenthInternationalConferenceonMachineLearning,MorganKaufmann(1996)105{112[37]Zheng,Z.,Webb,G.I.:LazylearningofBayesianrules.MachineLearning41(1)(2000)53{84[38]Yang,Y.,Webb,G.,Cerquides,J.,Korb,K.,Boughton,J.,Ting,K.M.:Toselectortoweigh:AcomparativestudyofmodelselectionandmodelweighingforSPODEensembles.In:ProceedingsoftheSeventeenthEuro-peanConferenceonMachineLearning,Springer(2006)533{544[39]Webb,G.I.:Multiboosting:Atechniqueforcombiningboostingandwag-ging.MachineLearning40(2)(2000)159{196[40]Flores,M.,Gamez,J.,Martnez,A.,Puerta,J.:GAODEandHAODE:TwoproposalsbasedonAODEtodealwithcontinuousvariables.In:Pro-ceedingsofthe26thAnnualInternationalConferenceonMachineLearning,ACM(2009)313{320[41]Fayyad,U.M.,Irani,K.B.:Multi-intervaldiscretizationofcontinuous-valuedattributesforclassicationlearning.In:ProceedingsoftheThir-teenthInternationalJointConferenceonArticialIntelligence,MorganKaufmann(1993)1022{1029[42]Cestnik,B.:Estimatingprobabilities:Acrucialtaskinmachinelearning.In:ProceedingsoftheNinthEuropeanConferenceonArticialIntelligence,London:Pitman(1990)147{14930 [43]Zheng,F.,Webb,G.I.:Ecientlazyeliminationforaveraged-onede-pendenceestimators.In:ProceedingsoftheTwenty-thirdInternationalConferenceonMachineLearning,ACMPress(2006)1113{1120[44]Zheng,F.,Webb,G.I.:Findingtherightfamily:Parentandchildselectionforaveragedone-dependenceestimators.In:ProceedingsoftheEighteenthEuropeanConferenceonMachineLearning,SpringerBerlin/Heidelberg(2007)490{50131