/
Zhang,BurerandStreetChawlaetal.,2004).Forexample,ensemble-baseddistrib Zhang,BurerandStreetChawlaetal.,2004).Forexample,ensemble-baseddistrib

Zhang,BurerandStreetChawlaetal.,2004).Forexample,ensemble-baseddistrib - PDF document

conchita-marotz
conchita-marotz . @conchita-marotz
Follow
370 views
Uploaded On 2016-06-01

Zhang,BurerandStreetChawlaetal.,2004).Forexample,ensemble-baseddistrib - PPT Presentation

ZhangBurerandStreetshowedthatthegeneralizationerrorofanensembleislooselyboundedby s2whereistheaveragecorrelationbetweenclassi ersandsistheoverallstrengthoftheclassi ersForcontinuouspredictionpr ID: 344278

Zhang BurerandStreetshowedthatthegeneralizationerrorofanensembleislooselyboundedby s2 whereistheaveragecorrelationbetweenclassi ersandsistheoverallstrengthoftheclassi ers.Forcontinuouspredictionpr

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Zhang,BurerandStreetChawlaetal.,2004).Fo..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Zhang,BurerandStreetChawlaetal.,2004).Forexample,ensemble-baseddistributeddata-miningtechniquesenablelargecompanies(likeWalMart)thatstoredataathundredsofdi erentlocationstobuildlearningmodelslocallyandthencombineallthemodelsforfuturepredictionandknowledgediscovery.Thestorageandcomputationtimewillbecomenon-trivialundersuchcircumstances.Inaddition,itisnotalwaystruethatthelargerthesizeofanensemble,thebetteritis.Forexample,theboostingalgorithmfocusesonthosetrainingsamplesthataremisclassi edbythepreviousclassi erineachroundoftrainingand nallysqueezesthetrainingerrortozero.Ifthereisacertainamountofnoiseinthetrainingdata,theboostingensemblewillover t(OpitzandMaclin,1999;Dietterich,2000).Insuchcases,itwillbebettertoreducethecomplexityofthelearningmodelinordertocorrecttheover tting,likepruningadecisiontree.Foraboostingensemble,selectingasubsetofclassi ersmayimprovethegeneralizationperformance.Ensemblemethodshavealsobeenappliedtominestreamingdata(StreetandKim,2001;Wangetal.,2003).Theensembleclassi ersaretrainedfromsequentialchunksofthedatastream.Inatime-evolvingenvironment,anychangeintheunderlyingdata-generatingpatternmaymakesomeoftheoldclassi ersobsolete.Itisbettertohaveascreeningprocessthatonlykeepsclassi ersthatmatchthecurrentformofthedriftingconcept.Asimilarsituationoccurswhenclassi ersaresharedamongslightlydi erentproblemdomains.Forexample,inapeer-to-peerspamemail lteringsystem,eachemailusercanintroducespam ltersfromotherusersandconstructanensemble- lter.However,becauseofthedi erenceofinterestamongemailusers,sharing ltersindiscriminatelyisnotagoodsolution.Thesharingsystemshouldbeabletopick ltersthat ttheindividualityofeachuser.Alloftheabovereasonsmotivatetheappearanceofvariousensemblepruningalgo-rithms.Astraightforwardpruningmethodistoranktheclassi ersaccordingtotheirindividualperformanceonaheld-outtestsetandpickthebestones(Caruanaetal.,2004).Thissimpleapproachmaysometimesworkwellbutistheoreticallyunsound.Forexample,anensembleofthreeidenticalclassi erswith95%accuracyisworsethananensembleofthreeclassi erswith67%accuracyandleastpairwisecorrelatederror(whichisperfect!).MargineantuandDietterich(1997)proposedfourapproachestopruneensemblesgeneratedbyAdaboost.KL-divergencepruningandKappapruningaimatmaximizingthepairwisedi erencebetweentheselectedensemblemembers.Kappa-errorconvexhullpruningisadiagram-basedheuristictargetingagoodaccuracy-divergencetrade-o amongtheselectedsubset.Back- ttingpruningisessentiallyenumeratingallthepossiblesubsets,whichiscomputationallytoocostlyforlargeensembles.Prodromidisetal.inventedseveralpruningalgorithmsfortheirdistributeddataminingsystem(ProdromidisandChan,2000;Chanetal.,1999).Oneofthetwoalgorithmstheyimplementedisbasedonadiversitymeasuretheyde ned,andtheotherisbasedonclassspecialtymetrics.Themajorproblemwiththeabovealgorithmsisthatwhenitcomestooptimizingsomecriteriaoftheselectedsub-set,theyallresorttogreedysearch,whichisonthelowerendofoptimizationtechniquesandusuallywithouteithertheoreticalorempiricalqualityguarantees.Kimetal.usedanevolutionaryalgorithmforensemblepruninganditturnedouttobee ective(Kimetal.,2002).Asimilarapproachcanalsobefoundin(Zhouetal.,2001).Unlikepreviousheuristicapproaches,weformulatetheensemblepruningproblemasaquadraticintegerprogrammingproblemtolookforasubsetofclassi ersthathastheopti-1316 Zhang,BurerandStreetshowedthatthegeneralizationerrorofanensembleislooselyboundedby s2,whereistheaveragecorrelationbetweenclassi ersandsistheoverallstrengthoftheclassi ers.Forcontinuouspredictionproblems,thereareevenclosed-formrepresentationsfortheensemblegeneralizationperformancebasedonindividualerroranddiversity.KroghandVedelsby(KroghandVedelsby,1995)showedthatforaneuralnetworkensemble,thegeneralizationerrorE=E�A;whereEistheweightedaverageoftheerroroftheindividualnetworksandAisthevarianceamongthenetworks.Zhouetal.(2001)giveanotherform,E=Xi;jCij;whereCij=Zp(x)fi(x)�d(x)fj(x)�d(x)dx;p(x)isthedensityofinputx,fi(x)istheoutputoftheithnetworkandd(x)isthetrueoutput.NotethatCiiistheerroroftheithnetworkandCij;i6=jisapairwisecorrelation-likemeasurement.Theproblemisthatthemoreaccuratetheclassi ersare,thelessdi erenttheybecome.Therefore,theremustbeatrade-o betweenthestrengthandthedivergenceofanensem-ble.Whatwearelookingforisasubsetofclassi erswiththebesttrade-o sothatthegeneralizationperformancecanbeoptimized.Inordertogetthemathematicalformulationoftheensemblepruningproblem,weneedtorepresenttheerrorstructureoftheexistingensembleinaniceway.Unlikethecaseofcontinuousprediction,thereisnoexactclosed-formrepresentationfortheensembleerrorintermsofstrengthanddiversityforadiscreteclassi cationproblem.However,wearestillabletoobtainsomeapproximatemetricsfollowingthesameidea.Fromtheerroranalysisofcontinuousproblems,wenoticethattheensembleerrorcanberepresentedbyalinearcombinationoftheindividualaccuracytermsandpairwisediversityterms.Therefore,ifweareableto ndstrengthanddiversitymeasurementsforaclassi cationensemble,alinearcombinationofthemshouldserveasagoodapproximationoftheoverallensembleerror.Minimizingthisapproximateensembleerrorfunctionwillbetheobjectiveofthemathematicalprogrammingformulation.First,werecordthemisclassi cationsofeachclassi eronthetrainingsetintheerrormatrixPasfollows:Pij=0;ifjthclassi eriscorrectondatapointi,Pij=1;otherwise.(1)LetG=PTP.Thus,thediagonaltermGiiisthetotalnumberoferrorsmadebyclassi eriandtheo -diagonaltermGijisthenumberofcommonerrorsofclassi erpairiandj.ToputalltheelementsoftheGmatrixonthesamescale,wenormalizethemby~Gii=Gii N;~Gij;i6=j=1 2Gij Gii+Gij Gjj;(2)1318 Zhang,BurerandStreetwhereNvisthenumberoftheverticesinthegraph.Roughlyspeaking,thisoptimizationinvolvespartitioningtheverticesviatheassignmentofyi=1oryi=�1toeachvertexiinthegraph(subjecttothesizerequirement)andminimizingthesumPijwijyiyj,wherewijistheedgeweightbetweenverticesiandj.Noticethattheinteractiontermyiyjequals�1wheniandjareindi erentsetsofthepartition,which,inthecontextoftheminimizationdescribed,contributestothemaximizationofedgescrossingthepartition.TheMC-kproblemisknowntohaveaverygoodapproximatesolutionalgorithmbasedonsemi-de niteprogramming(SDP).ThekeypointoftheSDPapproximationalgorithmistorelaxeachbinaryvariableyi2f�1;1gintoaunitvector.Therefore,ifweareabletotransformtheensemblepruningformulationsothatit tsintotheframeworkofMC-k,wemayobtainagoodsolutionfortheensemblepruningproblem.TheaboveMC-kformulation(4)isequivalenttominyyTWys.t.Xiyi=Nv�2k;yi2f�1;1g:(5)whereWistheedge-weightmatrix,withwi;i=0.Ifwecomparethisformulationwiththatoftheensemblepruningproblem,theonlybarrierthatpreventstheapplicationoftheSDPapproximationalgorithmontheensemblepruningproblemisthedi erenceinthepossiblevaluesofthebinaryvariable.Speci cally,inensemblepruning,xi2f0;1gandinMC-k,yi2f�1;1g.Therefore,weneedtomakeatransformationofvariablesfortheensemblepruningproblem.Letxi=vi+1 2;(6)andvi2f�1;1g.Nowtheobjectivefunctionbecomes1 4(v+e)T~G(v+e);(7)whereeisacolumnvectorofall1s.ThecardinalityconstraintPixi=kcanberewrittenintoquadraticformxTIx=k;(8)whereIistheidentitymatrix.Aftervariabletransformation,thisconstraintbecomes(v+e)TI(v+e)=4k:(9)Avariableexpansiontrickcanbeappliedtoputboththetransformedobjectivefunctionandthecardinalityconstraintbackintoanicequadraticform.Weexpandthevariablevectorv=(v1;v2;:::;vn)intov=(v0;v1;v2;:::;vn),andletv0=1.WethenconstructanewmatrixH(n+1)(n+1)=eT~GeeT~G~Ge~G:(10)1320 Zhang,BurerandStreetMC-kinstance;and(iii)transformbacktoasolutionofourproblem.Further,itisnotdiculttoseethattheabovethreestepsareequivalenttothemoredirectapproachofrelaxing(12)asanSDPandapplyingtherandomizationschemeofFeigeandLangberg(2001)andHan,Ye,andZhang(2002).Inotherwords,itisnotexplicitlynecessarilytoconverttoaninstanceofMC-k rst.Forcompleteness,wenowreturnourattentiontotheSDPrelaxationitself.Problem(12)isequivalenttominvHvvTs.t.DvvT=4k;v0=1;vi2f�1;1g;8i6=0;(13)whereAB=Pi;jAijBij.Toconstructtherelaxation,we rstnotethattheconstraintv0=1canberelaxedtov02f�1;1gwithoutchangingtheproblemsince�visfeasiblefortheremainingconstraintsifandonlyifvisandsinceHvvT=H(�v)(�v)T.Next,werewritetheconstraintsvi2f�1;1g,i=0;1;:::;nasthesingle,collectiveconstraintdiag(vvT)=etoarriveatthefollowingformulation:minvHvvTs.t.DvvT=4k;diag(vvT)=e:(14)WenextsubstituteV=vvT,andnotethatVcanbeexpressedasvvTifandonlyifV0withrank(V)=1,whichgivesusminVHVs.t.DV=4k;diag(V)=eV0;rank(V)=1:(15)Althoughthisproblemiswritteninadi erentform,itiscompletelyequivalenttoouroriginal0-1quadraticproblem.TheSDPrelaxationisnowobtainedbydroppingtherankconstraint,whichyieldsthefollowing(convex)SDP:minVHVs.t.DV=4k;diag(V)=eV0:(16)NowtheoriginalNP-hardproblem(3)isrelaxedintoaconvexSDPproblemwhichcanbesolvedtoanypresetprecisioninpolynomialtime.WesolvetheSDPrelaxationusingthepubliclyavailablepackageSDPLR(BurerandMonteiro,2003;SDPLR)andhaveimplementedtheapproximationalgorithmdescribedin(Hanetal.,2002).1322 Zhang,BurerandStreet dataset SDP-25Div-25Kappa-25NoPruning autompg 10.35(0.62)13.88(0.83)11.91(1.52)10.71(0.70) bupa 30.52(1.43)35.87(1.25)38.39(2.65)30.32(0.43) cmc 32.82(0.87)41.70(1.66)43.66(1.37)34.50(1.19) crx 13.88(0.46)22.40(3.41)21.78(4.44)13.58(0.85) glass 12.53(1.12)17.30(2.88)16.39(2.76)11.29(0.70) haberma 32.73(2.21)38.63(1.30)38.88(3.09)34.56(1.74) heart 20.13(2.19)28.09(2.73)27.81(4.09)20.40(2.39) hepatit 15.81(1.17)19.83(2.66)16.86(2.00)14.71(1.84) housing 11.03(0.61)12.37(0.91)12.21(1.59)10.67(0.66) ion 7.46(2.17)10.94(5.13)14.02(10.61)9.85(7.72) iris 12.20(9.47)15.00(7.84)19.60(8.44)13.40(11.01) pima 25.34(0.67)31.31(3.78)30.24(2.90)25.06(0.78) sonar 21.43(1.68)26.24(4.32)25.24(2.31)18.96(1.13) wdbc 3.38(0.34)3.51(0.59)3.76(0.76)2.88(0.30) wine1-2 2.92(0.64)6.15(3.17)12.62(18.45)11.85(18.82) wine1-3 1.11(0.76)1.27(0.50)2.58(1.01)1.31(0.47) wine2-3 3.05(0.75)4.56(2.50)5.67(6.27)2.86(3.54) wpbc 24.06(2.51)32.35(1.88)29.54(3.52)24.04(2.79) vehicle1-2 41.15(2.62)42.32(1.01)45.17(4.65)41.40(1.28) vehicle1-3 1.07(0.42)3.67(3.79)9.26(12.11)3.21(3.24) vehicle1-4 4.52(0.36)6.47(1.24)6.18(2.42)3.84(0.21) vehicle2-3 2.25(0.55)7.34(4.01)11.99(7.06)5.82(4.36) vehicle2-4 5.00(1.28)10.33(3.26)13.57(12.65)5.96(4.35) vehicle3-4 1.15(0.11)0.96(0.41)1.39(1.63)0.67(0.39) AbsoluteW-L-T23-1-024-0-012-12-0 SignificantW-L-T9-0-158-0-162-0-22 Table1:ComparisonofSDPpruning,Diversity-basedpruning,Kappa-pruningandoriginalensembles,by%errorand(standarddeviation)signi cancelevel)isattachedatthebottomofthetable.Notethatsimplemajorityvotingisusedtocombinethepredictionsoftheensembles.Prodromidisetal.(ProdromidisandStolfo,1998;ProdromidisandChan,2000)builtahigherlevelclassi er(meta-classi er)toaggregatetheoutputoftheprunedensembles.Therehasbeenotherresearchinthisdirection(Wolpert,1992;Bennettetal.,2000;Masonetal.,1999;GroveandSchuurmans,1998).However,thereissofarnostrongevidencethatsuchameta-classi erisgenerallybetterthansimplemajorityvoting.Table3showsthattheperformanceoftheSDP-basedpruningisbetterthanthatoftheothertwoalgorithmsformostofthedatasetsinvolvedinthecomputationalexperiments.Also,althoughonlyaquarteroftheclassi ersareleft,theerroroftheprunedensemblebySDP-basedpruningisstatisticallythesameasthatoftheoriginalensemble.Inaddition,wemayconcludethattheSDP-basedpruningismorestableintermsofaccuracy,bylookingattheerrorstandarddeviationofthethreealgorithms.Theerror uctuationrangeoftheothertwoalgorithmsisoftenmuchlarger,whichmightexplainwhytheSDP-basedpruningissometimesnotstatisticallybetter.1324 Zhang,BurerandStreet SizeTime(s) SizeTime(s) 1003 15004889 20036 200023255 400113 250042419 800713 300083825 Table2:ComputationtimeofSDP-basedpruning Figure1:TimingtestofSDP-basedensemblepruning1326 EnsemblePruningViaSemi-definiteProgrammingmentedthepruningalgorithminthefollowingcasestudy,whereclassi ersfromdi erentbutclosely-relatedproblemdomainsarepooledtogetherandasubsetofthemisthense-lectedforeachproblemdomain.ThiscasestudyinvolvesamuchlargerdatasetthantheUCIsetsusedintheaboveexperiments,andanoriginalensemblewithalargernumberofmoredivergentclassi ers.Theresultsofthestudyverifythealgorithm'sperformanceinreal-worldapplications,andshowonetypeofsituationwhereensemblepruningcanbeparticularlyuseful.Asweknow,ensemblemethodsnotonlyimproveclassi cationaccuracy,butalsopro-videawaytoshareknowledge.Ifthedataforaproblemdomainaredistributedatdi erentlocations,classi erscanbetrainedlocallyandthencombinedtocreateanensemble.Thiscentralizedensemblegathersinformationfromeachsiteandcanpotentiallybemorepow-erfulforfurtherpredictions.Nowthequestionsis,ifweareworkingwithseveraldi erentbutcloselyrelatedproblemdomains,issharingclassi ersamongthosedomainsstillagoodidea?Theessenceofsharingclassi ersissharingcommonknowledgeamongdi erentbutcloselyrelatedproblemdomains.Afamousexampleoftheresearchinthisdirectionisthemulti-taskneuralnetwork(Caruana,1997).Eachproblemdomainisassignedoneormoreoutputnodes,whilesharingsomeoftheinputandmid-layernodeswithotherproblemdomains.Althoughthismethodissometimessuccessful,ithasseverallimitations.First,itrequiresthatthedatabeatthecentrallocationfortraining.Second,thetrainingspeedbecomesabigissueifthesizeofthedataislarge.Theclassi er-sharingstrategymayavoidbothofthedrawbacks.Sincetheclassi erscanbetrainedlocally,thedataneednotbecentralized.Moreover,onecanuseecientalgorithmslikedecisiontreestotrainclassi ers,socomputationtimebecomeslessofaproblem.Theprerequisiteforsharingclassi ersamongdi erentproblemdomainsisthatthedataschemaforeachproblemdomainshouldbeidentical,oratleastverysimilar.Itensuresthattheclassi erstrainedononeproblemdomaincanalsobeappliedtootherproblemdomains,althoughtheaccuracycanpossiblybelow.Sometimes,thisrequirestransformationofvariablestosatisfythiscondition.Itisinfacthardtotellaprioriwhethersharingclassi ersamongdi erentproblemdomainswillimprovetheoverallperformance.Thesuccessofthisstrategydependsontheconnectionsamongtheproblemdomainsandtheself-completenessofinformationwithineachproblemdomain.However,withascreeningprocess(ensemblepruning)thatwillbedescribedlater,thechancethatsharingclassi erswilleventuallydowngradetheoverallperformanceisminimal.Thecross-domainclassi er-sharingstrategyistestedonapubliclyavailablemarketingdataset.Toaddresstheconcernthatsharingclassi ersblindlysometimesmaydoharmtosomeoftheproblemdomainsifthereexistcon ictingelementsamongthem,weapplytheSDP-basedpruningalgorithmtoselectagoodsubsetofclassi ersfromtheentireensembleforeachproblemdomain.ThedatasetisacatalogmarketingdatasetfromtheDirectMarketingAssociation(DMEFAcademicDataSetThree,SpecialtyCatalogCompany,Code03DMEF).Thede-pendentvariablesarethecustomers'responsestoaparticularpromotionheldbyacatalogcompany.Thereare19di erentcategoriesofproductsinvolvedinthepromotion,whichcorrespondto19responsevariables.Unfortunately,theidentitiesoftheproductcategoriesarenotavailable.Thedataminingproblemwetrytosolveistopredictwhichcategories1327 EnsemblePruningViaSemi-definiteProgrammingresponserate.Iftheseclassi erscanbeincludedintotheensemblesofthosecategorieswithoutenoughpositivepoints,theymayhelphitthetargetsandimprovetheoverallperformance.Second,thepurchasepatternsofsomedi erentcategoriesmaybesimilar.Forexample,peoplearemorelikelytobuyclotheswhentherearediscountcoupons.Thismayalsobetrueforshoes.Therefore,amarketingmodelforclothesthatstressestheimportanceofdiscountcouponsmayalsoworkforshoesalthoughtheybelongtodi erentproductcategories.Anaivewayofsharingclassi ersistopoolalltheclassi ersfromthe19categoriesintoonebigensembleanduseitforeverycategory.However,thereexistrisksbehindthissharing-allstrategy.Speci cally,whentherearestrikinglycon ictingconceptsamongtheproblemdomains,mixingclassi ersblindlywilldegradethee ectivenessoftheensemblemethod.Therefore,itissafertobringintheSDP-basedscreeningprocessthatisabletoselectasubsetofclassi ersforeachproblemdomain.UnliketheexperimentsontheUCIdatasets,hereweuseaheld-outtuningsetforeachsubsetselectionprocesssincethereisalargeamountofdata.Therefore,thePmatrixin(1)isconstructedbasedonclassi cationresultsonthetuningsetinsteadofthetrainingset.Eachtuningsetre ectstheoriginalclassdistribution.Notethatthereare19categoriesinthemarketingdata,sotherewillbe19tuningsetsandthesubsetselectionprocesswillberepeated19times,onceforeachcategory.Thereisstillonemoreproblemwiththecurrentsetupofthe~Gmatrix.Sincethetuningdatasetsherere ecttheoriginalclassdistributions,whicharehighlybiasedtowardsnon-buyers,theresulting~Gmatrixwillre ecttheperformanceoftheensembleonthenegativepointswhilealmostignoringthein uenceofthepositivepoints.Itisnecessarytobalancethein uenceofthepositivepointsandthenegativepointson~G.Toachievethisgoal,wede neathird^Gmatrixasaconvexcombinationofthe~Gmatrixonthepositivepointsandthe~Gmatrixonthenegativepointsinthetuningset,^G=~Gpos+(1�)~Gneg:Notethat~Gposiscomputedbasedonlyonthepositivepointsinthetuningsetand~Gnegonlyonthenegativepoints,usingformulas(1)and(2).Inthefollowingcomputationalexperiments,issetto0:5.Theoriginaldataschemaforeachcategoryisidentical:sameindependentvariables(purchasehistory,promotionsanddemographicinformation)andabinarydependentvari-ableindicatingwhetherthecustomerbuysproductsfromthiscategory.However,wefoundthatsomeoftheindependentvariablesarecategory-speci c.Forexample,thereare19variableseachrepresentingwhetherthereisadiscountforeachcategory.Intuitively,thediscountvariableforCategoryiismoreinformativetothepredictionproblemforCategoryithanforothercategories.ThereislikelyasplitonthediscountvariableforCategoryiinthedecisiontreesforCategoryi.Sincethisdiscountvariableisprobably(notabsolutely)notrelevanttoothercategories,thedecisiontreesinducedonCategoryiarelesslikelytobeusefulforothercategories.Tomakethedecisiontreesmoreinterchangeableamongdi erentcategories,wedidsometransformationsonthosecategory-speci cvariables.InthedatasetforCategoryi,acopyofeachcategory-speci cvariablerelatedtoCategoryiisappendedtotheendofthedataschemaandlabeledas\xxxx-for-this-category".Thevaluesoftheoriginalcategory-speci cvariablesforthisCategoryi(whichalreadyhave1329 EnsemblePruningViaSemi-definiteProgramming Figure2:%Di erenceofAUCsofselective-sharingensembleandtheoriginalbaggingen-sembleson19categories,AUCsel�AUCorig AUCorigTheresultsshowthatselectivesharingofclassi ersdoesimprovetheAUCsforalmosthalfofthecategories.Especiallyforthosecategorieswithoutenoughpositiveinformation,theriseinAUCissubstantial(Figure2).Theoverallperformanceoftheensemblesproducedbyselective-sharingisalmostasgoodasthenaive-sharingensemble.ForCategories15,16and19,itisevenstatisticallybetter.Notethattheselectedensemblesuseonly25classi erseachascomparedto475forthenaive-sharingensemble.Therearetwoimplicationsofthisresult.First,theensemblepruningprocessisquitee ective.Second,thereasonthattheincluding-allensembleisbetterthantheindividualbaggingensemblesisnotsimplybecauseit'slarger.Thereistrulyusefuladditionalinformationintheincluding-allensemblethatcanbesingledoutbythepruningprocess.Moreover,theselectivesharingisaconservativeversionofsharingclassi ers,henceitshouldbemorerobust.Iftherewerewildlycon ictingconceptsamongthecategories,theselective-sharingwouldhaveperformedbetter.Infact,theselective-sharingensemblesofCategories15and16dooutperformtheincluding-allensemblebythrowingawaybadclassi ers,asexpected.Itisalsointerestingtolookattheclassi ersharingstructureamongdi erentcategories.Table4providesalistofstatisticsthatsummarizesthesharingstructure,averagedoverthesame verunsinTable4.The\Prior"columnshowstheresponserateforeachcategory.The\Used"columnisthetotalnumberofclassi erstrainedonthiscategoryusedforallcategories.The\UsedOwn"columnisthetotalnumberofclassi erstrainedonthiscategoryusedforthiscategory.The\MostUsed"columnshowsthemostcommontrainingcategoryfortheclassi ersforeachensemble.Finally,HIindexistheHer ndahlindex1331 Zhang,BurerandStreet Category PriorUsedUsedMostHI (%)OwnUsedindex 1 0.1421130.16 2 0.2700140.27 3 0.040060.16 4 0.92121170.25 5 0.1210170.27 6 0.83170140.35 7 0.4490140.35 8 0.37141140.67 9 0.64223140.29 10 0.0210190.33 11 0.1330190.57 12 0.0520190.47 13 0.44180190.67 14 2.65807140.30 15 1.70110190.33 16 2.09141190.35 17 0.65949190.34 18 0.31242190.49 19 1.1715018190.63 Table5:Statisticsofclassi ersharingstructure(RoseandEngel,2002),whichiscomputedbythefollowingformula:HIi=CXj=1(nij Ne)2wherenijisthenumberofclassi erstrainedonjthcategoryusedforithcategory,andNeisthesizeoftheprunedensemblewhichis25inthiscase.ThesmallertheHIindex,themorediversetheensemble,intermsoforiginaltrainingcategories.Fromourobservation,theclassi ersharingstructureisreasonablystableformostofthecategoriesoverthe veruns.Forinstance,classi ersfromCategories14,17and19arealwaysthetopthreeusedclassi ersintheselectedensembles,themostusedcolumnseldomvariesamongdi erentruns,etc.Thereareafewthingsthatonecantellafterstudyingthesenumbers.Themostsurprisingfactshownbythetableisthatformostcategories,theclassi erstrainedonthecategoryareseldomchosenforitsownensemble.OnlyCategories14,17and19usedareasonablenumberoftheirownclassi ers.However,theperformanceoftheprunedensemblesareclosetothatoftheoriginalbaggingensemblesasshowninTable4.Thismayimplythatsomeofthecategoriesarecloselyrelatedthereforetheirclassi ersarehighlyinterchangeablethroughcombination.Thehighertheoriginalresponserateforeachcategory,themoretimesitsclassi ersareusedbyothercategories.Figure3plotsthenumberoftimesitsclassi ersareused1332 Zhang,BurerandStreet Figure3:Responserateofeachcategoryvs.totalnumberoftimesitsclassi ersareusedintheselectedensembles1334 EnsemblePruningViaSemi-definiteProgrammingHIindexwillbelow,whichmeansthattheprunedensemblepicksclassi ersfrommanycategories.Ontheotherhand,ifacategoryneedsspecializedknowledgeforitsprediction,theHIindexwillbehigher,whichimpliesthattheprunedensemblepicksclassi ersonlyfromasmallnumberofcategoriesthatsharesomeparticularpropertieswiththecategoryinquestion.Usuallyforthosecategorieswithsmallresponserate,theHIindexislow,forexampleCategory1andCategory3.Categories8and13havethehighestHIindex.Fromthe\MostUsed"column,weknowthatCategory8usedclassi ersmostlyfromCategory14.SoitcanbeinferredthatCategory8andcategory14arecloselyrelated.Forthesamereason,thecustomersofCategory13maysharesomespecialpropertieswiththatofCategory19.Thesepiecesofinformationmightmakemoresenseifweknewwhatthesecategorieswere.Unfortunately,thisknowledgeiscurrentlyunavailable.Theinformationgainedbystudyingtheclassi ersharingstructuremayhelpimprovethemailingandpromotionstrat-egyofthecatalogcompany.Forinstance,customersthatbuyproductsfromCategories14,17and19seemtocapturethemaintrendofthecustomerbase,sotheyarelikelythecorecustomersofthecompanyandmayneedextraattention.CustomersofCategory14and8seemtobesimilarfromsomeperspective.Therefore,apromotionstrategythatprovestobesuccessfulforCategory14mayalsoworkwellforCategory8.5.ConclusionandFutureWorkThispaperintroducedanewensemblepruningmethodtoimproveeciencyande ec-tivenessofanexistingensemble.Unlikepreviousheuristicapproaches,weformulatetheensemblepruningproblemasastrictmathematicalprogrammingproblemandapplySDPrelaxationtechniquestoobtainagoodapproximatesolution.Thecomputationalexperi-mentsontheUCIrepositorydatasetsshowthatthisSDP-basedpruningalgorithmperformsbetterthantwoothermetric-basedpruningalgorithms.Itsapplicationinaclassi ershar-ingstudyalsoindicatesthatthissubsetselectionprocedureise ectiveinpickingclassi ersthat ttheneedsofdi erentproblemdomains.Besidesthepeer-to-peeremail lteringproblemmentionedbefore,thismethodcanalsobeusefulwhenacompanyistryingtopromoteanewproduct.Usually,thereisonlylimitedinformationaboutanewproduct'spotentialcustomers.However,ifthecompanyhasgoodmarketingmodelsforitsoldprod-ucts,especiallythosecloselyrelatedtothenewproduct,selectingsomeoftheoldmodelsbasedonthelimiteddataofthenewproductmaybeabettersolutionthanbuildingmodelsdirectly.Thereisyetroomforimprovementinthecurrentversionofthealgorithm.Forexample,thereareseveralparametersinthemodelthatcanbe ne-tuned,suchasthemethodofnormalizationandtherelativeweightbetweenthediagonaltermsando -diagonalterms.Anotherthingthat'sworthexploringiswhetherthereexitsaniceformoftheobjectivefunctionsothata\real"optimalsubsetcanbefoundwithoutenforcingthecardinalityconstraint.Underthecurrentsetup,removingthecardinalityconstraintwillresultinatrivialsolution.1335 EnsemblePruningViaSemi-definiteProgrammingU.FeigeandM.Langberg.Approximationalgorithmsformaximizationproblemsarisingingraphpartitioning.JournalofAlgorithms,41:174{211,2001.J.L.Fleiss.StatisticalMethodsforRatesandProportions.JohnWiley&Sons,1981.Y.FreundandR.E.Schapire.Experimentswithanewboostingalgorithm.InInternationalConferenceonMachineLearning,pages148{156,1996.URLciteseer.nj.nec.com/freund96experiments.html.M.X.GoemansandD.P.Williamson.Improvedapproximationalgorithmsformaximumcutandsatis abilityproblemsusingsemide niteprgramming.JournalofACM,42:1115{1145,1995.A.J.GroveandD.Schuurmans.Boostinginthelimit:Maximizingthemarginoflearnedensembles.InAAAI/IAAI,pages692{699,1998.Q.Han,Y.Ye,andJ.Zhang.Animprovedroundingmethodandsemide niteprogrammingrelaxationforgraphpartition.MathematicalProgramming,pages509{535,2002.L.K.HansenandP.Salamon.Neuralnetworkensembles.IEEETransactionsonPatternAnalysisandMachineIntelligence,12(10):993{1001,1990.ISSN0162-8828.S.Hashem.Optimallinearcombinationsofneuralnetworks.NeuralNetworks,10(4):599{614,1997.ISSN0893-6080.T.K.Ho.Therandomsubspacemethodforconstructingdecisionforests.IEEETransactionsonPatternAnalysisandMachineIntelligence,20(8):832{844,1998.Y.Kim,N.W.Street,andF.Menczer.Meta-evolutionaryensembles.InIEEEInternationalJointConferenceonNeuralNetworks,pages2791{2796,2002.A.KroghandJ.Vedelsby.Neuralnetworkensembles,crossvalidation,andactivelearning.InG.Tesauro,D.Touretzky,andT.Leen,editors,AdvancesinNeuralInformationProcessingSystems,volume7,pages231{238.TheMITPress,1995.G.Lanckriet,N.Cristianini,P.Bartlett,L.ElGhaoui,andM.I.Jordan.Learningthekernelmatrixwithsemide niteprogramming.JournalofMachineLearningResearch,5:27{72,2004.D.D.MargineantuandT.G.Dietterich.Pruningadaptiveboosting.In14thInternationalConferenceonMachineLearning,pages211{218,1997.L.Mason,P.Bartlett,andJ.Baxter.Directoptimizationofmarginsimprovesgeneralizationincombinedclassi er.AdvancesinNeuralInformationProcessingSystems,11:288{294,1999.A.McCallum.Multi-labeltextclassi cationwithamixturemodeltrainedbyEM.InAAAI'99WorkshoponTextLearning,1999.D.OpitzandR.Maclin.Popularensemblemethods:Anempiricalstudy.JournalofArti cialIntelligenceResearch,pages169{198,1999.1337