ZhangBurerandStreetshowedthatthegeneralizationerrorofanensembleislooselyboundedby s2whereistheaveragecorrelationbetweenclassiersandsistheoverallstrengthoftheclassiersForcontinuouspredictionpr ID: 344278
Download Pdf The PPT/PDF document "Zhang,BurerandStreetChawlaetal.,2004).Fo..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Zhang,BurerandStreetChawlaetal.,2004).Forexample,ensemble-baseddistributeddata-miningtechniquesenablelargecompanies(likeWalMart)thatstoredataathundredsofdierentlocationstobuildlearningmodelslocallyandthencombineallthemodelsforfuturepredictionandknowledgediscovery.Thestorageandcomputationtimewillbecomenon-trivialundersuchcircumstances.Inaddition,itisnotalwaystruethatthelargerthesizeofanensemble,thebetteritis.Forexample,theboostingalgorithmfocusesonthosetrainingsamplesthataremisclassiedbythepreviousclassierineachroundoftrainingandnallysqueezesthetrainingerrortozero.Ifthereisacertainamountofnoiseinthetrainingdata,theboostingensemblewillovert(OpitzandMaclin,1999;Dietterich,2000).Insuchcases,itwillbebettertoreducethecomplexityofthelearningmodelinordertocorrecttheovertting,likepruningadecisiontree.Foraboostingensemble,selectingasubsetofclassiersmayimprovethegeneralizationperformance.Ensemblemethodshavealsobeenappliedtominestreamingdata(StreetandKim,2001;Wangetal.,2003).Theensembleclassiersaretrainedfromsequentialchunksofthedatastream.Inatime-evolvingenvironment,anychangeintheunderlyingdata-generatingpatternmaymakesomeoftheoldclassiersobsolete.Itisbettertohaveascreeningprocessthatonlykeepsclassiersthatmatchthecurrentformofthedriftingconcept.Asimilarsituationoccurswhenclassiersaresharedamongslightlydierentproblemdomains.Forexample,inapeer-to-peerspamemaillteringsystem,eachemailusercanintroducespamltersfromotherusersandconstructanensemble-lter.However,becauseofthedierenceofinterestamongemailusers,sharingltersindiscriminatelyisnotagoodsolution.Thesharingsystemshouldbeabletopickltersthatttheindividualityofeachuser.Alloftheabovereasonsmotivatetheappearanceofvariousensemblepruningalgo-rithms.Astraightforwardpruningmethodistoranktheclassiersaccordingtotheirindividualperformanceonaheld-outtestsetandpickthebestones(Caruanaetal.,2004).Thissimpleapproachmaysometimesworkwellbutistheoreticallyunsound.Forexample,anensembleofthreeidenticalclassierswith95%accuracyisworsethananensembleofthreeclassierswith67%accuracyandleastpairwisecorrelatederror(whichisperfect!).MargineantuandDietterich(1997)proposedfourapproachestopruneensemblesgeneratedbyAdaboost.KL-divergencepruningandKappapruningaimatmaximizingthepairwisedierencebetweentheselectedensemblemembers.Kappa-errorconvexhullpruningisadiagram-basedheuristictargetingagoodaccuracy-divergencetrade-oamongtheselectedsubset.Back-ttingpruningisessentiallyenumeratingallthepossiblesubsets,whichiscomputationallytoocostlyforlargeensembles.Prodromidisetal.inventedseveralpruningalgorithmsfortheirdistributeddataminingsystem(ProdromidisandChan,2000;Chanetal.,1999).Oneofthetwoalgorithmstheyimplementedisbasedonadiversitymeasuretheydened,andtheotherisbasedonclassspecialtymetrics.Themajorproblemwiththeabovealgorithmsisthatwhenitcomestooptimizingsomecriteriaoftheselectedsub-set,theyallresorttogreedysearch,whichisonthelowerendofoptimizationtechniquesandusuallywithouteithertheoreticalorempiricalqualityguarantees.Kimetal.usedanevolutionaryalgorithmforensemblepruninganditturnedouttobeeective(Kimetal.,2002).Asimilarapproachcanalsobefoundin(Zhouetal.,2001).Unlikepreviousheuristicapproaches,weformulatetheensemblepruningproblemasaquadraticintegerprogrammingproblemtolookforasubsetofclassiersthathastheopti-1316 Zhang,BurerandStreetshowedthatthegeneralizationerrorofanensembleislooselyboundedby s2,whereistheaveragecorrelationbetweenclassiersandsistheoverallstrengthoftheclassiers.Forcontinuouspredictionproblems,thereareevenclosed-formrepresentationsfortheensemblegeneralizationperformancebasedonindividualerroranddiversity.KroghandVedelsby(KroghandVedelsby,1995)showedthatforaneuralnetworkensemble,thegeneralizationerrorE=EA;whereEistheweightedaverageoftheerroroftheindividualnetworksandAisthevarianceamongthenetworks.Zhouetal.(2001)giveanotherform,E=Xi;jCij;whereCij=Zp(x)fi(x)d(x)fj(x)d(x)dx;p(x)isthedensityofinputx,fi(x)istheoutputoftheithnetworkandd(x)isthetrueoutput.NotethatCiiistheerroroftheithnetworkandCij;i6=jisapairwisecorrelation-likemeasurement.Theproblemisthatthemoreaccuratetheclassiersare,thelessdierenttheybecome.Therefore,theremustbeatrade-obetweenthestrengthandthedivergenceofanensem-ble.Whatwearelookingforisasubsetofclassierswiththebesttrade-osothatthegeneralizationperformancecanbeoptimized.Inordertogetthemathematicalformulationoftheensemblepruningproblem,weneedtorepresenttheerrorstructureoftheexistingensembleinaniceway.Unlikethecaseofcontinuousprediction,thereisnoexactclosed-formrepresentationfortheensembleerrorintermsofstrengthanddiversityforadiscreteclassicationproblem.However,wearestillabletoobtainsomeapproximatemetricsfollowingthesameidea.Fromtheerroranalysisofcontinuousproblems,wenoticethattheensembleerrorcanberepresentedbyalinearcombinationoftheindividualaccuracytermsandpairwisediversityterms.Therefore,ifweareabletondstrengthanddiversitymeasurementsforaclassicationensemble,alinearcombinationofthemshouldserveasagoodapproximationoftheoverallensembleerror.Minimizingthisapproximateensembleerrorfunctionwillbetheobjectiveofthemathematicalprogrammingformulation.First,werecordthemisclassicationsofeachclassieronthetrainingsetintheerrormatrixPasfollows:Pij=0;ifjthclassieriscorrectondatapointi,Pij=1;otherwise.(1)LetG=PTP.Thus,thediagonaltermGiiisthetotalnumberoferrorsmadebyclassieriandtheo-diagonaltermGijisthenumberofcommonerrorsofclassierpairiandj.ToputalltheelementsoftheGmatrixonthesamescale,wenormalizethemby~Gii=Gii N;~Gij;i6=j=1 2Gij Gii+Gij Gjj;(2)1318 Zhang,BurerandStreetwhereNvisthenumberoftheverticesinthegraph.Roughlyspeaking,thisoptimizationinvolvespartitioningtheverticesviatheassignmentofyi=1oryi=1toeachvertexiinthegraph(subjecttothesizerequirement)andminimizingthesumPijwijyiyj,wherewijistheedgeweightbetweenverticesiandj.Noticethattheinteractiontermyiyjequals1wheniandjareindierentsetsofthepartition,which,inthecontextoftheminimizationdescribed,contributestothemaximizationofedgescrossingthepartition.TheMC-kproblemisknowntohaveaverygoodapproximatesolutionalgorithmbasedonsemi-deniteprogramming(SDP).ThekeypointoftheSDPapproximationalgorithmistorelaxeachbinaryvariableyi2f1;1gintoaunitvector.Therefore,ifweareabletotransformtheensemblepruningformulationsothatittsintotheframeworkofMC-k,wemayobtainagoodsolutionfortheensemblepruningproblem.TheaboveMC-kformulation(4)isequivalenttominyyTWys.t.Xiyi=Nv2k;yi2f1;1g:(5)whereWistheedge-weightmatrix,withwi;i=0.Ifwecomparethisformulationwiththatoftheensemblepruningproblem,theonlybarrierthatpreventstheapplicationoftheSDPapproximationalgorithmontheensemblepruningproblemisthedierenceinthepossiblevaluesofthebinaryvariable.Specically,inensemblepruning,xi2f0;1gandinMC-k,yi2f1;1g.Therefore,weneedtomakeatransformationofvariablesfortheensemblepruningproblem.Letxi=vi+1 2;(6)andvi2f1;1g.Nowtheobjectivefunctionbecomes1 4(v+e)T~G(v+e);(7)whereeisacolumnvectorofall1s.ThecardinalityconstraintPixi=kcanberewrittenintoquadraticformxTIx=k;(8)whereIistheidentitymatrix.Aftervariabletransformation,thisconstraintbecomes(v+e)TI(v+e)=4k:(9)Avariableexpansiontrickcanbeappliedtoputboththetransformedobjectivefunctionandthecardinalityconstraintbackintoanicequadraticform.Weexpandthevariablevectorv=(v1;v2;:::;vn)intov=(v0;v1;v2;:::;vn),andletv0=1.WethenconstructanewmatrixH(n+1)(n+1)=eT~GeeT~G~Ge~G:(10)1320 Zhang,BurerandStreetMC-kinstance;and(iii)transformbacktoasolutionofourproblem.Further,itisnotdiculttoseethattheabovethreestepsareequivalenttothemoredirectapproachofrelaxing(12)asanSDPandapplyingtherandomizationschemeofFeigeandLangberg(2001)andHan,Ye,andZhang(2002).Inotherwords,itisnotexplicitlynecessarilytoconverttoaninstanceofMC-krst.Forcompleteness,wenowreturnourattentiontotheSDPrelaxationitself.Problem(12)isequivalenttominvHvvTs.t.DvvT=4k;v0=1;vi2f1;1g;8i6=0;(13)whereAB=Pi;jAijBij.Toconstructtherelaxation,werstnotethattheconstraintv0=1canberelaxedtov02f1;1gwithoutchangingtheproblemsincevisfeasiblefortheremainingconstraintsifandonlyifvisandsinceHvvT=H(v)(v)T.Next,werewritetheconstraintsvi2f1;1g,i=0;1;:::;nasthesingle,collectiveconstraintdiag(vvT)=etoarriveatthefollowingformulation:minvHvvTs.t.DvvT=4k;diag(vvT)=e:(14)WenextsubstituteV=vvT,andnotethatVcanbeexpressedasvvTifandonlyifV0withrank(V)=1,whichgivesusminVHVs.t.DV=4k;diag(V)=eV0;rank(V)=1:(15)Althoughthisproblemiswritteninadierentform,itiscompletelyequivalenttoouroriginal0-1quadraticproblem.TheSDPrelaxationisnowobtainedbydroppingtherankconstraint,whichyieldsthefollowing(convex)SDP:minVHVs.t.DV=4k;diag(V)=eV0:(16)NowtheoriginalNP-hardproblem(3)isrelaxedintoaconvexSDPproblemwhichcanbesolvedtoanypresetprecisioninpolynomialtime.WesolvetheSDPrelaxationusingthepubliclyavailablepackageSDPLR(BurerandMonteiro,2003;SDPLR)andhaveimplementedtheapproximationalgorithmdescribedin(Hanetal.,2002).1322 Zhang,BurerandStreet dataset SDP-25Div-25Kappa-25NoPruning autompg 10.35(0.62)13.88(0.83)11.91(1.52)10.71(0.70) bupa 30.52(1.43)35.87(1.25)38.39(2.65)30.32(0.43) cmc 32.82(0.87)41.70(1.66)43.66(1.37)34.50(1.19) crx 13.88(0.46)22.40(3.41)21.78(4.44)13.58(0.85) glass 12.53(1.12)17.30(2.88)16.39(2.76)11.29(0.70) haberma 32.73(2.21)38.63(1.30)38.88(3.09)34.56(1.74) heart 20.13(2.19)28.09(2.73)27.81(4.09)20.40(2.39) hepatit 15.81(1.17)19.83(2.66)16.86(2.00)14.71(1.84) housing 11.03(0.61)12.37(0.91)12.21(1.59)10.67(0.66) ion 7.46(2.17)10.94(5.13)14.02(10.61)9.85(7.72) iris 12.20(9.47)15.00(7.84)19.60(8.44)13.40(11.01) pima 25.34(0.67)31.31(3.78)30.24(2.90)25.06(0.78) sonar 21.43(1.68)26.24(4.32)25.24(2.31)18.96(1.13) wdbc 3.38(0.34)3.51(0.59)3.76(0.76)2.88(0.30) wine1-2 2.92(0.64)6.15(3.17)12.62(18.45)11.85(18.82) wine1-3 1.11(0.76)1.27(0.50)2.58(1.01)1.31(0.47) wine2-3 3.05(0.75)4.56(2.50)5.67(6.27)2.86(3.54) wpbc 24.06(2.51)32.35(1.88)29.54(3.52)24.04(2.79) vehicle1-2 41.15(2.62)42.32(1.01)45.17(4.65)41.40(1.28) vehicle1-3 1.07(0.42)3.67(3.79)9.26(12.11)3.21(3.24) vehicle1-4 4.52(0.36)6.47(1.24)6.18(2.42)3.84(0.21) vehicle2-3 2.25(0.55)7.34(4.01)11.99(7.06)5.82(4.36) vehicle2-4 5.00(1.28)10.33(3.26)13.57(12.65)5.96(4.35) vehicle3-4 1.15(0.11)0.96(0.41)1.39(1.63)0.67(0.39) AbsoluteW-L-T23-1-024-0-012-12-0 SignificantW-L-T9-0-158-0-162-0-22 Table1:ComparisonofSDPpruning,Diversity-basedpruning,Kappa-pruningandoriginalensembles,by%errorand(standarddeviation)signicancelevel)isattachedatthebottomofthetable.Notethatsimplemajorityvotingisusedtocombinethepredictionsoftheensembles.Prodromidisetal.(ProdromidisandStolfo,1998;ProdromidisandChan,2000)builtahigherlevelclassier(meta-classier)toaggregatetheoutputoftheprunedensembles.Therehasbeenotherresearchinthisdirection(Wolpert,1992;Bennettetal.,2000;Masonetal.,1999;GroveandSchuurmans,1998).However,thereissofarnostrongevidencethatsuchameta-classierisgenerallybetterthansimplemajorityvoting.Table3showsthattheperformanceoftheSDP-basedpruningisbetterthanthatoftheothertwoalgorithmsformostofthedatasetsinvolvedinthecomputationalexperiments.Also,althoughonlyaquarteroftheclassiersareleft,theerroroftheprunedensemblebySDP-basedpruningisstatisticallythesameasthatoftheoriginalensemble.Inaddition,wemayconcludethattheSDP-basedpruningismorestableintermsofaccuracy,bylookingattheerrorstandarddeviationofthethreealgorithms.Theerror uctuationrangeoftheothertwoalgorithmsisoftenmuchlarger,whichmightexplainwhytheSDP-basedpruningissometimesnotstatisticallybetter.1324 Zhang,BurerandStreet SizeTime(s) SizeTime(s) 1003 15004889 20036 200023255 400113 250042419 800713 300083825 Table2:ComputationtimeofSDP-basedpruning Figure1:TimingtestofSDP-basedensemblepruning1326 EnsemblePruningViaSemi-definiteProgrammingmentedthepruningalgorithminthefollowingcasestudy,whereclassiersfromdierentbutclosely-relatedproblemdomainsarepooledtogetherandasubsetofthemisthense-lectedforeachproblemdomain.ThiscasestudyinvolvesamuchlargerdatasetthantheUCIsetsusedintheaboveexperiments,andanoriginalensemblewithalargernumberofmoredivergentclassiers.Theresultsofthestudyverifythealgorithm'sperformanceinreal-worldapplications,andshowonetypeofsituationwhereensemblepruningcanbeparticularlyuseful.Asweknow,ensemblemethodsnotonlyimproveclassicationaccuracy,butalsopro-videawaytoshareknowledge.Ifthedataforaproblemdomainaredistributedatdierentlocations,classierscanbetrainedlocallyandthencombinedtocreateanensemble.Thiscentralizedensemblegathersinformationfromeachsiteandcanpotentiallybemorepow-erfulforfurtherpredictions.Nowthequestionsis,ifweareworkingwithseveraldierentbutcloselyrelatedproblemdomains,issharingclassiersamongthosedomainsstillagoodidea?Theessenceofsharingclassiersissharingcommonknowledgeamongdierentbutcloselyrelatedproblemdomains.Afamousexampleoftheresearchinthisdirectionisthemulti-taskneuralnetwork(Caruana,1997).Eachproblemdomainisassignedoneormoreoutputnodes,whilesharingsomeoftheinputandmid-layernodeswithotherproblemdomains.Althoughthismethodissometimessuccessful,ithasseverallimitations.First,itrequiresthatthedatabeatthecentrallocationfortraining.Second,thetrainingspeedbecomesabigissueifthesizeofthedataislarge.Theclassier-sharingstrategymayavoidbothofthedrawbacks.Sincetheclassierscanbetrainedlocally,thedataneednotbecentralized.Moreover,onecanuseecientalgorithmslikedecisiontreestotrainclassiers,socomputationtimebecomeslessofaproblem.Theprerequisiteforsharingclassiersamongdierentproblemdomainsisthatthedataschemaforeachproblemdomainshouldbeidentical,oratleastverysimilar.Itensuresthattheclassierstrainedononeproblemdomaincanalsobeappliedtootherproblemdomains,althoughtheaccuracycanpossiblybelow.Sometimes,thisrequirestransformationofvariablestosatisfythiscondition.Itisinfacthardtotellaprioriwhethersharingclassiersamongdierentproblemdomainswillimprovetheoverallperformance.Thesuccessofthisstrategydependsontheconnectionsamongtheproblemdomainsandtheself-completenessofinformationwithineachproblemdomain.However,withascreeningprocess(ensemblepruning)thatwillbedescribedlater,thechancethatsharingclassierswilleventuallydowngradetheoverallperformanceisminimal.Thecross-domainclassier-sharingstrategyistestedonapubliclyavailablemarketingdataset.Toaddresstheconcernthatsharingclassiersblindlysometimesmaydoharmtosomeoftheproblemdomainsifthereexistcon ictingelementsamongthem,weapplytheSDP-basedpruningalgorithmtoselectagoodsubsetofclassiersfromtheentireensembleforeachproblemdomain.ThedatasetisacatalogmarketingdatasetfromtheDirectMarketingAssociation(DMEFAcademicDataSetThree,SpecialtyCatalogCompany,Code03DMEF).Thede-pendentvariablesarethecustomers'responsestoaparticularpromotionheldbyacatalogcompany.Thereare19dierentcategoriesofproductsinvolvedinthepromotion,whichcorrespondto19responsevariables.Unfortunately,theidentitiesoftheproductcategoriesarenotavailable.Thedataminingproblemwetrytosolveistopredictwhichcategories1327 EnsemblePruningViaSemi-definiteProgrammingresponserate.Iftheseclassierscanbeincludedintotheensemblesofthosecategorieswithoutenoughpositivepoints,theymayhelphitthetargetsandimprovetheoverallperformance.Second,thepurchasepatternsofsomedierentcategoriesmaybesimilar.Forexample,peoplearemorelikelytobuyclotheswhentherearediscountcoupons.Thismayalsobetrueforshoes.Therefore,amarketingmodelforclothesthatstressestheimportanceofdiscountcouponsmayalsoworkforshoesalthoughtheybelongtodierentproductcategories.Anaivewayofsharingclassiersistopoolalltheclassiersfromthe19categoriesintoonebigensembleanduseitforeverycategory.However,thereexistrisksbehindthissharing-allstrategy.Specically,whentherearestrikinglycon ictingconceptsamongtheproblemdomains,mixingclassiersblindlywilldegradetheeectivenessoftheensemblemethod.Therefore,itissafertobringintheSDP-basedscreeningprocessthatisabletoselectasubsetofclassiersforeachproblemdomain.UnliketheexperimentsontheUCIdatasets,hereweuseaheld-outtuningsetforeachsubsetselectionprocesssincethereisalargeamountofdata.Therefore,thePmatrixin(1)isconstructedbasedonclassicationresultsonthetuningsetinsteadofthetrainingset.Eachtuningsetre ectstheoriginalclassdistribution.Notethatthereare19categoriesinthemarketingdata,sotherewillbe19tuningsetsandthesubsetselectionprocesswillberepeated19times,onceforeachcategory.Thereisstillonemoreproblemwiththecurrentsetupofthe~Gmatrix.Sincethetuningdatasetsherere ecttheoriginalclassdistributions,whicharehighlybiasedtowardsnon-buyers,theresulting~Gmatrixwillre ecttheperformanceoftheensembleonthenegativepointswhilealmostignoringthein uenceofthepositivepoints.Itisnecessarytobalancethein uenceofthepositivepointsandthenegativepointson~G.Toachievethisgoal,wedeneathird^Gmatrixasaconvexcombinationofthe~Gmatrixonthepositivepointsandthe~Gmatrixonthenegativepointsinthetuningset,^G=~Gpos+(1)~Gneg:Notethat~Gposiscomputedbasedonlyonthepositivepointsinthetuningsetand~Gnegonlyonthenegativepoints,usingformulas(1)and(2).Inthefollowingcomputationalexperiments,issetto0:5.Theoriginaldataschemaforeachcategoryisidentical:sameindependentvariables(purchasehistory,promotionsanddemographicinformation)andabinarydependentvari-ableindicatingwhetherthecustomerbuysproductsfromthiscategory.However,wefoundthatsomeoftheindependentvariablesarecategory-specic.Forexample,thereare19variableseachrepresentingwhetherthereisadiscountforeachcategory.Intuitively,thediscountvariableforCategoryiismoreinformativetothepredictionproblemforCategoryithanforothercategories.ThereislikelyasplitonthediscountvariableforCategoryiinthedecisiontreesforCategoryi.Sincethisdiscountvariableisprobably(notabsolutely)notrelevanttoothercategories,thedecisiontreesinducedonCategoryiarelesslikelytobeusefulforothercategories.Tomakethedecisiontreesmoreinterchangeableamongdierentcategories,wedidsometransformationsonthosecategory-specicvariables.InthedatasetforCategoryi,acopyofeachcategory-specicvariablerelatedtoCategoryiisappendedtotheendofthedataschemaandlabeledas\xxxx-for-this-category".Thevaluesoftheoriginalcategory-specicvariablesforthisCategoryi(whichalreadyhave1329 EnsemblePruningViaSemi-definiteProgramming Figure2:%DierenceofAUCsofselective-sharingensembleandtheoriginalbaggingen-sembleson19categories,AUCselAUCorig AUCorigTheresultsshowthatselectivesharingofclassiersdoesimprovetheAUCsforalmosthalfofthecategories.Especiallyforthosecategorieswithoutenoughpositiveinformation,theriseinAUCissubstantial(Figure2).Theoverallperformanceoftheensemblesproducedbyselective-sharingisalmostasgoodasthenaive-sharingensemble.ForCategories15,16and19,itisevenstatisticallybetter.Notethattheselectedensemblesuseonly25classierseachascomparedto475forthenaive-sharingensemble.Therearetwoimplicationsofthisresult.First,theensemblepruningprocessisquiteeective.Second,thereasonthattheincluding-allensembleisbetterthantheindividualbaggingensemblesisnotsimplybecauseit'slarger.Thereistrulyusefuladditionalinformationintheincluding-allensemblethatcanbesingledoutbythepruningprocess.Moreover,theselectivesharingisaconservativeversionofsharingclassiers,henceitshouldbemorerobust.Iftherewerewildlycon ictingconceptsamongthecategories,theselective-sharingwouldhaveperformedbetter.Infact,theselective-sharingensemblesofCategories15and16dooutperformtheincluding-allensemblebythrowingawaybadclassiers,asexpected.Itisalsointerestingtolookattheclassiersharingstructureamongdierentcategories.Table4providesalistofstatisticsthatsummarizesthesharingstructure,averagedoverthesameverunsinTable4.The\Prior"columnshowstheresponserateforeachcategory.The\Used"columnisthetotalnumberofclassierstrainedonthiscategoryusedforallcategories.The\UsedOwn"columnisthetotalnumberofclassierstrainedonthiscategoryusedforthiscategory.The\MostUsed"columnshowsthemostcommontrainingcategoryfortheclassiersforeachensemble.Finally,HIindexistheHerndahlindex1331 Zhang,BurerandStreet Category PriorUsedUsedMostHI (%)OwnUsedindex 1 0.1421130.16 2 0.2700140.27 3 0.040060.16 4 0.92121170.25 5 0.1210170.27 6 0.83170140.35 7 0.4490140.35 8 0.37141140.67 9 0.64223140.29 10 0.0210190.33 11 0.1330190.57 12 0.0520190.47 13 0.44180190.67 14 2.65807140.30 15 1.70110190.33 16 2.09141190.35 17 0.65949190.34 18 0.31242190.49 19 1.1715018190.63 Table5:Statisticsofclassiersharingstructure(RoseandEngel,2002),whichiscomputedbythefollowingformula:HIi=CXj=1(nij Ne)2wherenijisthenumberofclassierstrainedonjthcategoryusedforithcategory,andNeisthesizeoftheprunedensemblewhichis25inthiscase.ThesmallertheHIindex,themorediversetheensemble,intermsoforiginaltrainingcategories.Fromourobservation,theclassiersharingstructureisreasonablystableformostofthecategoriesovertheveruns.Forinstance,classiersfromCategories14,17and19arealwaysthetopthreeusedclassiersintheselectedensembles,themostusedcolumnseldomvariesamongdierentruns,etc.Thereareafewthingsthatonecantellafterstudyingthesenumbers.Themostsurprisingfactshownbythetableisthatformostcategories,theclassierstrainedonthecategoryareseldomchosenforitsownensemble.OnlyCategories14,17and19usedareasonablenumberoftheirownclassiers.However,theperformanceoftheprunedensemblesareclosetothatoftheoriginalbaggingensemblesasshowninTable4.Thismayimplythatsomeofthecategoriesarecloselyrelatedthereforetheirclassiersarehighlyinterchangeablethroughcombination.Thehighertheoriginalresponserateforeachcategory,themoretimesitsclassiersareusedbyothercategories.Figure3plotsthenumberoftimesitsclassiersareused1332 Zhang,BurerandStreet Figure3:Responserateofeachcategoryvs.totalnumberoftimesitsclassiersareusedintheselectedensembles1334 EnsemblePruningViaSemi-definiteProgrammingHIindexwillbelow,whichmeansthattheprunedensemblepicksclassiersfrommanycategories.Ontheotherhand,ifacategoryneedsspecializedknowledgeforitsprediction,theHIindexwillbehigher,whichimpliesthattheprunedensemblepicksclassiersonlyfromasmallnumberofcategoriesthatsharesomeparticularpropertieswiththecategoryinquestion.Usuallyforthosecategorieswithsmallresponserate,theHIindexislow,forexampleCategory1andCategory3.Categories8and13havethehighestHIindex.Fromthe\MostUsed"column,weknowthatCategory8usedclassiersmostlyfromCategory14.SoitcanbeinferredthatCategory8andcategory14arecloselyrelated.Forthesamereason,thecustomersofCategory13maysharesomespecialpropertieswiththatofCategory19.Thesepiecesofinformationmightmakemoresenseifweknewwhatthesecategorieswere.Unfortunately,thisknowledgeiscurrentlyunavailable.Theinformationgainedbystudyingtheclassiersharingstructuremayhelpimprovethemailingandpromotionstrat-egyofthecatalogcompany.Forinstance,customersthatbuyproductsfromCategories14,17and19seemtocapturethemaintrendofthecustomerbase,sotheyarelikelythecorecustomersofthecompanyandmayneedextraattention.CustomersofCategory14and8seemtobesimilarfromsomeperspective.Therefore,apromotionstrategythatprovestobesuccessfulforCategory14mayalsoworkwellforCategory8.5.ConclusionandFutureWorkThispaperintroducedanewensemblepruningmethodtoimproveeciencyandeec-tivenessofanexistingensemble.Unlikepreviousheuristicapproaches,weformulatetheensemblepruningproblemasastrictmathematicalprogrammingproblemandapplySDPrelaxationtechniquestoobtainagoodapproximatesolution.Thecomputationalexperi-mentsontheUCIrepositorydatasetsshowthatthisSDP-basedpruningalgorithmperformsbetterthantwoothermetric-basedpruningalgorithms.Itsapplicationinaclassiershar-ingstudyalsoindicatesthatthissubsetselectionprocedureiseectiveinpickingclassiersthatttheneedsofdierentproblemdomains.Besidesthepeer-to-peeremaillteringproblemmentionedbefore,thismethodcanalsobeusefulwhenacompanyistryingtopromoteanewproduct.Usually,thereisonlylimitedinformationaboutanewproduct'spotentialcustomers.However,ifthecompanyhasgoodmarketingmodelsforitsoldprod-ucts,especiallythosecloselyrelatedtothenewproduct,selectingsomeoftheoldmodelsbasedonthelimiteddataofthenewproductmaybeabettersolutionthanbuildingmodelsdirectly.Thereisyetroomforimprovementinthecurrentversionofthealgorithm.Forexample,thereareseveralparametersinthemodelthatcanbene-tuned,suchasthemethodofnormalizationandtherelativeweightbetweenthediagonaltermsando-diagonalterms.Anotherthingthat'sworthexploringiswhetherthereexitsaniceformoftheobjectivefunctionsothata\real"optimalsubsetcanbefoundwithoutenforcingthecardinalityconstraint.Underthecurrentsetup,removingthecardinalityconstraintwillresultinatrivialsolution.1335 EnsemblePruningViaSemi-definiteProgrammingU.FeigeandM.Langberg.Approximationalgorithmsformaximizationproblemsarisingingraphpartitioning.JournalofAlgorithms,41:174{211,2001.J.L.Fleiss.StatisticalMethodsforRatesandProportions.JohnWiley&Sons,1981.Y.FreundandR.E.Schapire.Experimentswithanewboostingalgorithm.InInternationalConferenceonMachineLearning,pages148{156,1996.URLciteseer.nj.nec.com/freund96experiments.html.M.X.GoemansandD.P.Williamson.Improvedapproximationalgorithmsformaximumcutandsatisabilityproblemsusingsemideniteprgramming.JournalofACM,42:1115{1145,1995.A.J.GroveandD.Schuurmans.Boostinginthelimit:Maximizingthemarginoflearnedensembles.InAAAI/IAAI,pages692{699,1998.Q.Han,Y.Ye,andJ.Zhang.Animprovedroundingmethodandsemideniteprogrammingrelaxationforgraphpartition.MathematicalProgramming,pages509{535,2002.L.K.HansenandP.Salamon.Neuralnetworkensembles.IEEETransactionsonPatternAnalysisandMachineIntelligence,12(10):993{1001,1990.ISSN0162-8828.S.Hashem.Optimallinearcombinationsofneuralnetworks.NeuralNetworks,10(4):599{614,1997.ISSN0893-6080.T.K.Ho.Therandomsubspacemethodforconstructingdecisionforests.IEEETransactionsonPatternAnalysisandMachineIntelligence,20(8):832{844,1998.Y.Kim,N.W.Street,andF.Menczer.Meta-evolutionaryensembles.InIEEEInternationalJointConferenceonNeuralNetworks,pages2791{2796,2002.A.KroghandJ.Vedelsby.Neuralnetworkensembles,crossvalidation,andactivelearning.InG.Tesauro,D.Touretzky,andT.Leen,editors,AdvancesinNeuralInformationProcessingSystems,volume7,pages231{238.TheMITPress,1995.G.Lanckriet,N.Cristianini,P.Bartlett,L.ElGhaoui,andM.I.Jordan.Learningthekernelmatrixwithsemideniteprogramming.JournalofMachineLearningResearch,5:27{72,2004.D.D.MargineantuandT.G.Dietterich.Pruningadaptiveboosting.In14thInternationalConferenceonMachineLearning,pages211{218,1997.L.Mason,P.Bartlett,andJ.Baxter.Directoptimizationofmarginsimprovesgeneralizationincombinedclassier.AdvancesinNeuralInformationProcessingSystems,11:288{294,1999.A.McCallum.Multi-labeltextclassicationwithamixturemodeltrainedbyEM.InAAAI'99WorkshoponTextLearning,1999.D.OpitzandR.Maclin.Popularensemblemethods:Anempiricalstudy.JournalofArticialIntelligenceResearch,pages169{198,1999.1337