/
JournalofMachineLearningResearch10(2009)2273-2293Submitted11/08;Revise JournalofMachineLearningResearch10(2009)2273-2293Submitted11/08;Revise

JournalofMachineLearningResearch10(2009)2273-2293Submitted11/08;Revise - PDF document

marina-yarberry
marina-yarberry . @marina-yarberry
Follow
394 views
Uploaded On 2016-05-20

JournalofMachineLearningResearch10(2009)2273-2293Submitted11/08;Revise - PPT Presentation

DELCOZD ID: 328070

DELCOZ

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "JournalofMachineLearningResearch10(2009)..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

JournalofMachineLearningResearch10(2009)2273-2293Submitted11/08;Revised2/09;Published10/09LearningNondeterministicClassiersJuanJos´edelCozJUANJO@AIC.UNIOVI.ESJorgeD´ezJDIEZ@AIC.UNIOVI.ESAntonioBahamondeANTONIO@AIC.UNIOVI.ESArticialIntelligenceCenterUniversityofOviedoatGij´onAsturias,SpainEditor:LyleUngarAbstractNondeterministicclassiersaredenedasthoseallowedtopredictmorethanoneclassforsomeentriesfromaninputspace.Giventhatthetrueclassshouldbeincludedinpredictionsandthenumberofclassespredictedshouldbeassmallaspossible,thesekindofclassierscanbecon-sideredasInformationRetrieval(IR)procedures.Inthispaper,weproposeafamilyofIRlossfunctionstomeasuretheperformanceofnondeterministiclearners.Afterdiscussingsuchmea-sures,wederiveanalgorithmforlearningoptimalnondeterministichypotheses.Givenanentryfromtheinputspace,thealgorithmrequirestheposteriorprobabilitiestocomputethesubsetofclasseswiththelowestexpectedloss.Fromageneralpointofview,nondeterministicclassiersprovideanimprovementintheproportionofpredictionsthatincludethetrueclasscomparedtotheirdeterministiccounterparts;thepricetobepaidforthisincreaseisusuallyatinyproportionofpredictionswithmorethanoneclass.Thepaperincludesanextensiveexperimentalstudyusingthreedeterministiclearnerstoestimateposteriorprobabilities:amulticlassSupportVectorMachine(SVM),aLogisticRegression,andaNa¨veBayes.ThedatasetsconsideredcomprisebothUCImulti-classlearningtasksandmicroarrayexpressionsofdifferentkindsofcancer.Wesuccessfullycomparenondeterministicclassierswithotheralternativeapproaches.Additionally,weshallseehowthequalityofposteriorprobabilities(measuredbytheBrierscore)determinesthegoodnessofnondeterministicpredictions.Keywords:nondeterministic,multiclassication,rejectoption,multi-labelclassication,poste-riorprobabilities1.IntroductionThereareseverallearnersthatsuccessfullysolveclassicationtasksinwhichthenumberofclassesishigherthantwo;seeforinstanceWuetal.(2004)andLinetal.(2008).However,foreachclassCmostclassicationerrorsfrequentlyoccurbetweensmallsubsetsofclassesthataresomehowsimilartoC,regardlessoftheapproachused.Thisfactsuggeststhatmulticlassclassierswouldincreaseinreliabilityiftheywereallowedtoexpresstheirdoubtswhenevertheywereaskedtoclassifysomeentries.Inthispaperweexplorehowtolearnclassierswithmultipleoutcomes,likenondeterministicautomata;weshallcallthemnondeterministicclassiers.Sincetheyreturnasetofvalues,theseclassierscouldbecalledset-valuedclassiers.Toxideas,letusconsiderascreeningforasetofmedicaldiseases(orotherdiagnosticsituations);forsomeinputs,anondeterministicclassierc\r2009JuanJos´edelCoz,JorgeD´ezandAntonioBahamonde. DELCOZ,D´IEZANDBAHAMONDEwouldbeabletopredictnotjustonesingledisease,butasetofoptions.Thesemultiplepredictionswillbeprovidedtodomainexpertswhentheclassierisnotsureenoughtogiveauniqueclass.Thusnondeterministicpredictionsmaydiscardsomeoptionsandallowdomainexpertstomakepracticaldecisions.Evenwhenthenondeterministicclassierreturnsmostoftheavailableclassesfortherepresentationofanentry,wecanreadthatthelearnedhypothesisisacknowledgingitsignoranceabouthowtodealwiththatentry.Itisevidentthatnondeterministicclassierswillincludetrueclassesintheirpredictionsmorefrequentlythandeterministichypotheses:theyonlyhaveonepossibilitytoberight.Inthissense,nondeterministicpredictionsarebackedbygreaterreliability.Tobeuseful,however,nondetermin-isticclassiersshouldnotonlypredictasetofclassescontainingthecorrectortrueone,buttheirpredictionsetsshouldalsobeassmallaspossible.NoticethattheserequirementsarecommoninalgorithmsdesignedforInformationRetrieval.Inthiscase,thequeriesaretheentriestobeclassi-edandtheRecallandPrecisionarethenappliedtoeachprediction.Hence,thelossfunctionsfornondeterministicclassierscanbebuiltascombinationsofIRmeasures,asFbfunctionsare.Startingfromthedistributionofposteriorprobabilitiesofclasses,givenoneentry,wepresentanalgorithmthatcomputesthesubsetofclasseswiththelowestexpectedloss.Intheexperimentsreportedattheendofthepaper,weemployedthreedeterministiclearnersthatprovideposteriorprobabilities:SupportVectorMachines(SVM),LogisticRegression(LR),andNa¨veBayes(NB)Wesuccessfullycomparedtheachievementsofournondeterministicclassierswiththoseobtainedbyotheralternativeapproaches.Thepaperisorganizedasfollows.Inthenextsection,wepresentanoverviewofrelatedworkonclassiersthatreturnsubsetsofclassesinsteadofasingleclass.Theformalsettingsbothfornondeterministicclassiersandtheirlossfunctionsarepresentedinthethirdsection.Afterthat,inSection4,wederiveanalgorithmtolearnnondeterministichypotheses.Then,weconcludethepaperwithasectioninwhichwereportanexperimentalstudyoftheirperformance.Inadditiontothecomparisonmentionedabove,wediscusstheroleplayedbythedeterministiclearnerthatprovidesposteriorprobabilities.Weseethatthequalityofposteriorprobabilitiesdeterminesthegoodnessofnondeterministicpredictions.Thedatasetsusedarepubliclyavailableand,inadditiontoagroupofdatasetsfromtheUCIRepository(AsuncionandNewman,2007),theyincludeagroupofclassicationtasksofcancerpatientsfromgeneexpressionscapturedbymicroarrays.2.RelatedWorkNondeterministicclassiersaresomehowrelatedtoclassierswithrejectoption(Chow,1970).Inthisapproach,theentriesthatarelikelytobemisclassiedarerejected,theyarenotclassiedandcanbehandledbymoresophisticatedprocedures:amanualclassication,forinstance.Thecoreas-sumptionisthatthecostofmakingawrongdecisionis1,whilethecostofusingtherejectoptionisgivenbysomed,0d1.Inthiscontext,providedthatposteriorprobabilitiesareexactlyknown,anoptimalrejectionrulecanbedevised(Chow,1970;BartlettandWegkamp,2008):anentryisrejectedifthemaximumposteriorprobabilityislessthanathreshold.Noticethatclassierswithrejectoptionarearelaxedversionofnondeterministicclassiers.Rejectionisanondeterministicclassicationthatincludesthecompletesetofclasses.Ontheotherhand,insteadofavoidingdif-cultclassications,foreachentry,nondeterministicclassifersadventureasetofpossibleclasses,notnecessarilythecompleteset.2274 LEARNINGNONDETERMINISTICCLASSIFIERSHowever,predictorsofmorethanoneclassarenotcompletelynew.Givenane2[0;1],theso-calledcondencemachinesmakeconformalpredictions(ShaferandVovk,2008):theyproduceasetoflabelscontainingthetrueclasswithaprobabilitygreaterthan1eTothebestofourknowledge,themostdirectlyrelatedworktotheapproachpresentedinthispaperisthatofZaffalon(2002)andCoraniandZaffalon(2008a,b).Inthesepapers,theauthorsdescribetheNa¨veCredalClassier,aset-valuedclassierwhichisanextensionoftheNa¨veBayesclassiertoimpreciseprobabilities.TheNa¨veCredalClassiermodelspriorignoranceaboutthedistributionofclassesbymeansofasetofpriordensities(alsocalledthepriorcredalset),whichisturnedintoasetofposteriorprobabilitiesbyelement-wiseapplicationofBayes'rule.Theclassierreturnsalltheclassesthatarenon-dominatedbyanyotherclassaccordingtotheposteriorcredalset.Anotherlearningtaskthatisrelatedtothispaperismulti-labelclassication.However,traininginstancesinmulti-labeltaskscanbelongtomorethanoneclass,whilenondeterministictrainingsetsarethesameasthoseofstandardclassication.InTsoumakasandKatakis(2007),theau-thorsprovideanin-depthdescriptionofmulti-labelclassication,enumerateseveralmethodsandcomparetheirperformanceusingInformationRetrievalmeasures.Someapplicationshavelikewisearisenwithinthecontextofhierarchicalorganizationofbiologicalobjects:predictinggenefunc-tions(ClareandKing,2003),ormappingbiologicalentitiesintoontologies(Kriegeletal.,2004).TheformalsettingpresentedinthispaperwaspreviouslyintroducedinAlonsoetal.(2008).There,wedealtwithaninterestingapplicationofnondeterministicclassiers,inwhichclasses(orranks,inthatcontext)arelinearlyordered.Theaimwastopredicttherank(inanorderedscale)ofcarcassesofbeefcattle.Thisvaluedetermines,ontheonehand,thepricestobeobtainedbycarcassesand,ontheother,thegeneticvalueofanimalsinordertoselectstudsforthenextgeneration.Inthisapplication,nondeterministicclassiersreturnanintervalofranks.Intervalpredictionsareusefulevenwhentheintervalscomprisemorethanonerank.Forinstance,itispossibletorejectananimalasastudforthenextgenerationwhenapredictionintervalisincludedinthelowestpartofthescale.However,ifweneedauniquerank,wemaydecidetoappealtoanactualexperttoresolvetheambiguity,anexpensiveclassicationprocedurenotalwaysavailableinpractice.Thenoveltyofthispaperisthatnowwedealwithastandardclassicationsetting;thatis,thesetsofclassesarenotordered.Thisfactisveryimportantasthesearchfortheoptimalpredictionleadstoadramaticdifferenceincomplexity.Thus,ifkisthenumberofclasses,thesearchintheordinalcaseisjustoforderk2,whileintheunorderedcase,atarstglance,thesearchisoforder2kHowever,theTheoremofCorrectnessofAlgorithm1provesthatthissearchcanbeaccomplishedinpolynomialtime.Additionally,thispaperreportsanextensiveexperimentalstudy.First,wetestwhethernon-deterministicclassiersoutperformNa¨veCredalClassiersandotheralternativeapproaches.Wetheninvestigatetheroleplayedbytheingredientsofnondeterministicclassiers.3.FormalPresentationandNotationLetbeaninputspaceand=fC1;:::;Ckganitesetofclasses.WeconsideramulticlassicationtaskgivenbyatrainingsetS=f(x1;y1);:::;(xn;yn)gdrawnfromanunknowndistributionPr(X;Y)fromtheproduct.Withinthiscontext,wedene2275 DELCOZ,D´IEZANDBAHAMONDEDenition1Anondeterministichypothesisisafunctionhfromtheinputspacetothesetofnon-emptysubsetsof;insymbols,ifP()isthesetofallsubsetsofh!P()nf?g:TheaimofsuchalearningtaskistondanondeterministichypothesishfromaspaceHthatoptimizestheexpectedpredictionperformance(orrisk)onsamplesS0independentlyandidenticallydistributed(i.i.d.)accordingtothedistributionPr(X;Y)R(h)=ZD(h(x);y)d(Pr(x;y));whereD(h(x);y)isalossfunctionthatmeasuresthepenaltyduetothepredictionh(x)whenthetruevalueisyInnondeterministicclassication,wewouldliketofavorthosedecisionsofhthatcontainthetrueclasses,andasmallerratherthanalargernumberofclasses.Inotherwords,weinterprettheoutputh(x)asanimpreciseanswertoaqueryabouttherightclassofanentryx2.Thus,nondeterministicclassicationcanbeseenasakindofInformationRetrievaltaskforeachentry.PerformanceinInformationRetrievaliscomparedusingdifferentmeasuresinordertoconsiderdifferentperspectives.ThemostfrequentlyusedmeasuresareRecall(proportionofallrelevantdocumentsthatarefoundbyasearch)andPrecision(proportionofretrieveddocumentsthatarerelevant).Theharmonicaverageofthetwoamountsisusedtocapturethegoodnessofahypothesisinasinglemeasure.Intheweightedcase,themeasureiscalledFb.TheideaistomeasureatradeoffbetweenRecallandPrecisionForfurtherreference,letusrecalltheformaldenitionsoftheseInformationRetrievalmeasures.Thus,forapredictionofanondeterministichypothesish(x)withx2,andaclassy2,wecancomputethefollowingcontingencymatrix,wherez2 y=zy=z z2h(x) abz=2h(x) cd(1)inwhicheachentry(a;b;c;d)isthenumberoftimesthatthecorrespondingcombinationofmem-bershipsoccurs.Noticethatacanonlybe1or0,dependingonwhethertheclassyisincludedinthepredictionh(x)ornot;bisthenumberofclassesdifferentfromyincludedinh(x)c=1aanddisthenumberofclassesdifferentfromythatarenotincludedinh(x)Accordingtothematrix,Equation(1),ifhisanondeterministichypothesisand(x;y)2wethushavethefollowingdenitions.Denition2TheRecallinaquery(i.e.,anentryx)isdenedastheproportionofrelevantclasses(y)includedinh(x):R(h(x);y)=a a+c=a=1y2h(x):Denition3ThePrecisionisdenedastheproportionofretrievedclassesinh(x)thatarerelevant(y):P(h(x);y)=a a+b=1y2h(x) jh(x)j:2276 LEARNINGNONDETERMINISTICCLASSIFIERS h(x)PrecisionRecallF1F2 [1;2;3]0:3310:500:71[1;2]0:5010:670:83[1]1111[2;3;4]0000 Table1:ThePrecisionRecallF1,andF2fordifferentpredictionsofanondeterministicclassierhforanentryxwithclass1,(y=1)Inotherwords,givenahypothesish,thePrecisionforanentryx,thatis,P(h(x);y),istheprobabilityofndingthetrueclass(y)oftheentry(x)byrandomlychoosingoneoftheclassesofh(x)Finally,thetradeoffisformalizedbyDenition4TheFbisdened,ingeneral,byFb(h(x);y)=(1+b2)PR b2P+R=(1+b2)a (1+b2)a+b+b2c:(2)Thus,foranondeterministicclassierhandapair(x;y)Fb(h(x);y)=(1+b2 b2+h(x)ify2h(x)0otherwise.(3)ThemostfrequentlyusedF-measureisF1.Foreaseofreference,letusstatethatF1(h(x);y)=2y2h(x) 1+jh(x)j:Noticethatfordeterministicclassiers,theaccuracyisequaltoRecallPrecision,andFbgiventhatjh(x)j=1.ToillustratetheuseoftheF-measuresofanentry,letusconsideranexample.Ifweassumethatthetrueclassofanentryxis1,(y=1),then,dependingonthevalueofh(x),Table1reportstheRecallPrecisionF1,andF2.Weobservethattherewardattachedtoapredictioncontainingthetrueclasswithanotherextraclassrangesfrom0:667forF1to0:833forF2;whereastheamountsarelowerwhenthepredictionincludes2extraclasses.OncewehavethedenitionofFbforindividualentries,itisstraightforwardtoextendittoatestset.Hence,whenS0isatestsetofsizen,theaveragelossonitwillbecomputedbyRND(h;S0)=1 nnåj=1DND(h(x0j);y0j)=1 nnåj=11Fb(h(x0j);y0j)(4)=1 nnåj=1 11+b2 b2+jh(x0j)j1y0j2h(x0j)!:TheaverageRecallandPrecisioncanbesimilarlydened.Foreaseofreference,letusremarkthattheRecallistheproportionoftimesthath(x0)includesy0andisthusageneralizationofthedeterministicaccuracy2277 DELCOZ,D´IEZANDBAHAMONDE !"#!"$!"%!"&!"'!"(!")!"*!"+#"! ,&,%,$,#!#$%& Figure1:Conditionalprobabilitiesofclass+1giventhediscriminantvalue(horizontalaxis)ofentriesx2.Verticalbarsseparatetheregionwherebothclassesf1;+1ghaveaprobabilityofover1=33.1NondeterministicClassicationinaBinaryTaskTocompletethissection,letusshowwhatnondeterministicclassierslooklikeinthesimplestcase,whichwillbefurtherdevelopedinthefollowingsections.Letusassumethatinabinaryclassica-tiontask(theclassesarecodiedby1and+1)wehavealoss1foreachfalseclassication.Ontheotherhand,weareallowedtopredictbothclasses,inwhichcasethelosswillbe1=3:theF1foraclassicationof2classescontainingthetrueone;seeTable1.TheextensionfordealingwithFbwithb=1,isstraightforward.Theoptimumclassierwillreturnonlyoneclasswhenitissufcientlysure.Indoubtfulsitua-tions,however,thenondeterministicclassiershouldoptforpredictingthe2classes.Thiswillbethecasewhenevertheprobabilityoferrorforbothclassesishigherthan1=3,sincethisisthelossforpredictionsoftwoclasses;seeFigure1.Therefore,ifwehavetheconditionalprobabilitiesofclassesgiventheentries,theoptimumclassierwillbegivenbyhND(x)=8:f1gifh(x)1=3f1;+1gif1=3h(x)2=3f+1gif2=3h(x);(5)wherewearerepresentingbyh(x)theposteriorprobability:h(x)=Pr(class=+1jx):NoticethatEquation(5)isequivalenttothegeneralizedBayesdiscriminantfunctiondescribedinBartlettandWegkamp(2008)whenthecostofusingtherejectoptioniscalculatedusingtheF1lossfunction.2278 LEARNINGNONDETERMINISTICCLASSIFIERS Algorithm1Thenondeterministicclassiernd,analgorithmforcomputingthepredictionwithoneormoreclassesforanentryxprovidedthattheposteriorprobabilitiesofclassesaregiven Input:Cj=1;::;ksortedbyPr(Cjjx) Input:b:trade-offbetweenRecallandPrecisionInitialize=0,D0=1repeat=+1Di=11+b2 b2+iåij=1Pr(Cjjx)until((==k)or(Di1Di)if(Di1Di)thenreturnfCj=1;::;1gelsereturnfCj=1;::;kgendif 4.NondeterministicClassicationUsingMulticlassPosteriorProbabilitiesInthegeneralmulticlasssettingpresentedatthebeginningofSection3,letxbeanentryoftheinputspaceandletusnowassumethatweknowtheconditionalprobabilitiesofclassesgiventheentry,Pr(Cjjx).Additionally,weshallassumethattheclassesareorderedaccordingtotheseprobabilities.Inthiscontext,wewishtodenetheh(x)=Z=fC1;:::;CkgthatminimizestheriskdenedinEquation(1)whenweusethenondeterministiclossgivenbyFb(Equations2,3,and4).Weshallprovethatsuchanh(x)canbecomputedbyAlgorithm1,whichdoesnotneedtosearchthroughallnon-emptysubsetsofTheorem1(Correctness).IftheconditionalprobabilitiesPr(Cjjx)areknown,Algorithm1returnsthenondeterministicpredictionforh(x)thatminimizestheriskgivenbytheloss1FbProofTominimizetherisk,Equation(1),itsufcestocomputeDx(Z)=åy2YDND(Z;y)Pr(yjx);(6)withZfC1;:::;Ckg.Then,weonlyhavetodeneh(x)=argminfDx(Z)ZfC1;:::;Ckgg:Theproofhastwoparts.First,weshallseethatifh(x)hasrclasses,thenthosearetherclasseswiththehighestprobabilities;bearinginmindthatclassesareordered,h(x)=Zr=fCj=1;::;rgForthispurpose,weneedtoseethatanyothersubsetofrclasseswillincreasethelossduetoZrThisisaconsequenceofthefollowing.ThevalueofEquation(6)forZrisDrinAlgorithm1.Infact,withthecomplementaryproba-bilityofårj=1Pr(Cjjx),weexpectalossof1:thetrueclasswillnotbeoneoftherrstclasses.On2279 DELCOZ,D´IEZANDBAHAMONDEtheotherhand,withthissumofprobabilities,thetrueclasswillbeinh(x),andthereforethelosswillbe1minustheFbofthepredictionh(x)=fCj=1;::;rgDx(Cj=1;::;r)= 1råj=1Pr(Cjjx)!+ råj=1Pr(Cjjx)!11+b2 b2+r=11+b2 b2+rråj=1Pr(Cjjx)=Dr:Noticethatforanyothersubsetofrclasses,wecouldachieveasimilarexpressionsimplybymodifyingthesetofposteriorprobabilitiesofthelastsum.Therefore,tominimizethevalueofEquation(6)withrclasses,weneedthosewiththehighestprobability.Inthesecondstep,weonlyhavetoshowthattheindexrreturnedbytheAlgorithmistherightone.Weshallseethatthesearchforthebestrcanbeaccomplishedinlineartime,asintheAlgorithm.Infact,weshallestablishthatwhentheAlgorithmreachesthenumberofclasseswithwhichthelossincreases,addingfurtherclasseswillonlyincreasetheloss.Insymbols,weshallprovethatDrDr+1)Dr+1Dr+2:Todoso,weshallnextexpresstheexitconditionoftheloopDrDr+1when(r+1)kinadifferentway.Thefollowingexpressionsareequivalent:DrDr+1(7)1+b2 b2+rråj=1Pr(Cjjx)1+b2 b2+r+1r+1åj=1Pr(Cjjx)(b2+r+1)råj=1Pr(Cjjx)(b2+r)r+1åj=1Pr(Cjjx)råj=1Pr(Cjjx)(b2+r)Pr(Cr+1jx):Therefore,ifDrDr+1and(r+1)k,thenPr(Cr+1jx)+råj=1Pr(Cjjx)(b2+r)Pr(Cr+1jx)+Pr(Cr+1jx):However,bearinginmindthattheclassesareordered,wehavethatPr(Cr+1jx)Pr(Cr+2jx)andusingEquation(7),weconcludethatr+1åj=1Pr(Cjjx)(b2+r+1)Pr(Cr+2jx),Dr+1Dr+2: 2280 LEARNINGNONDETERMINISTICCLASSIFIERS4.1CorollariesInordertodrawsomepracticalconsequences,letusrewordthepreviousTheorem.Itstatesthattheoptimumclassicationforaninputxisthesetofrclasseswiththehighestposteriorprobabilities,whereristhelowestintegerthatfulllsråj=1Pr(Cjjx)(b2+r)Pr(Cr+1jx);(8)orthesetofallclasseswhenthisconditionisnotfullledbyanyr.Expressedinthisway,itisstraightforwardtoseethatfortwoclasses,withb=1,Algorithm1coincideswiththeruledenedinEquation(5).Additionally,wewouldliketounderscorethatEquation(8)hinderstheuseofna¨vethresholdstocomputenondeterministicpredictions.Thus,anondeterministicclassierthatalwayspredictsthetoprclassesforaconstantvaluerisnotacorrectoption.Equation(8)showsthatr,atleast,dependsontheinputxMoreover,weshouldnotsearchforathresholdltoreturn,forallinputs,therstrclasseswhosesumofprobabilitiesisabovelråj=1Pr(Cjjx)l:(9)Notethatgivenalvaluein[0;1],Equation(9)straightforwardlygivesrisetoanondeterministicclassierasfollows.Foreachinputx,ifthesetofclassesisorderedaccordingtotheirposteriorprobabilities,wedenehl(x)=(C1;:::;Crråj=1Pr(Cjjx)lr1åj=1Pr(Cjjx)l):(10)Again,theright-handsideofEquation(8)showsthatthethreshold(l)woulddependonthenumberofclassespredicted,theprobabilityoftherstclassexcludedfromtheprediction,andtheparameterb:thetrade-offbetweenPrecisionandRecall.TheideabehindEquation(8)isthat,oncewehavedecidedtoincludethetoprclasses,toaddthe(r+1)thclassweshouldguaranteethatPr(Cr+1jx)isnotmuchsmallerthanthesumofprobabilitiesofthetoprclasses.However,itmaybearguedthattheinaccuracyofposteriorprobabilitieswouldpartiallyinvali-datetheprecedingtheoreticaldiscussion.Infact,posteriorprobabilitiesarenotknowninpractice:theyareestimatedbyalgorithmsthatfrequentlytrytooptimizetheclassicationaccuracyofahypothesisthatreturnstheclasswiththehighestprobability.Inotherwords,probabilitiesaredis-criminantvaluesinsteadofthoroughdescriptionsofthedistributionofclassesinalearningtask.Therefore,intheexperimentsreportedattheendofthepaper,weshallconsidertheclassiersdenedbyEquation(10)asapossiblealternativemethodtothenondeterministicclassierofAlgo-rithm1.5.ExperimentalResultsInthissectionwereporttheresultsofasetofexperimentsconductedtoevaluatetheproposalsofthispaper.Thenextsubsectiondescribesthesettingsusedintheexperiments:deterministiclearners,datasets,procedurestosetparameters,andmethodstoestimatethescores.2281 DELCOZ,D´IEZANDBAHAMONDE Datasets#classes#samples#features zoo710116iris31504glass62149ecoli83367balancescale36254vehicle484618vowel1199011contraceptive314739yeast1014848car417286image7231019waveform3500040landsat6643536letterrecognition262000016 Table2:DescriptionofthedatasetsdownloadedfromtheUCIrepository.TheclassesarenotlinearlyseparableWehavetwogoalshere.Ontheonehand,wecompareourapproachwithtwoalternativemethods.Thecomparisonwillrstbeestablishedwithastate-of-the-artset-valuedalgorithm,theNa¨veCredalClassier(NCC)(Zaffalon,2002;CoraniandZaffalon,2008a,b).ThisalgorithmisanextensionofthetraditionalNa¨veBayesclassiertowardsimpreciseprobabilitiesandisdesignedtoreturnrobustset-valued(nondeterministic)classications.WeshowthatourmethodcanimprovetheperformanceofNCC.WethencontrastourmethodwithanimplementationofEquation(10);onceagainourproposalsoutperformthisalternativewaytolearnnondeterministicclassiers.Ontheotherhand,weanalyzetheinuenceofanumberoffactorsrelatedtonondeterministiclearners.Weaccordinglydiscusshowthescoresofanondeterministiclearnerareaffectedbythequalityofposteriorprobabilities.Weseethattheperformanceofanondeterministicclassierishighlycorrelatedwiththeaccuracyofitsdeterministiccounterpart.Thesectionendswithastudyofthemeaningoftheparameterb5.1ExperimentalSettingsWeusedthreedifferentmethodsforlearningposteriorprobabilitiesinordertobuildnondeter-ministicclassiers.First,weemployedtheNa¨veBayes(NB)usedbyNCCasitsdeterministiccounterpart(CoraniandZaffalon,2008b).TheseconddeterministiclearnerwasamulticlassSVMtheimplementationusedwaslibsvm(Wuetal.,2004)withthelinearkernel.Last,weemployedthelogisticregression(LR)ofLinetal.(2008).Itshouldbenotedthatwearenotonlyusingthemulti-classclassierslearnedbySVMorLR.Primarily,weapplythemechanismsthatprovideposteriorprobabilitiesfromtheiroutputs.Foreachoftheselearners,webuiltndd,wheredstandsforthenameofthedeterministiccoun-terpart,nbsvmorlr.RecallthatnddistheimplementationofAlgorithm1thataimstooptimizeF1;thatis,b=1.2282 LEARNINGNONDETERMINISTICCLASSIFIERS Datasets#classes#samples#featuresOriginalsourceUsedin brain5425597Pomeroyetal.(2002)[1]nci9607131Rossetal.(2000)[1,3,4]lung667016387Tamayoetal.(2007)leukemia337212582Armstrongetal.(2002)[2]lung44829036Tamayoetal.(2007)lung1111894459Tamayoetal.(2007)tumors111117412533Suetal.(2001)[1,2]tumors141419016063Ramaswamyetal.(2001)[1,2,4]lung1616201493Tamayoetal.(2007)leukemia7732712558Yeohetal.(2002)[2] Table3:Descriptionofcancermicroarraydatasetsusedintheexperimentsincludingtheoriginalsourcesandpapersfromwhichtheyaretaken.Forthesakeofbrevity,wehavedenotedthepapersasfollows:[1]TibshiraniandHastie(2007),[2]Tanetal.(2005),[3]Stauntonetal.(2001),[4]YeungandBumgarner(2003)Intheexperimentsthatfollow,weusedtwokindsofdatasets.First,weconsidereddatasetsdownloadedfromtheUCIrepository(AsuncionandNewman,2007),allofwhichhavemoreex-amplesthanattributes.Weincludedallthedatasetsthatfulllthefollowingrules:continuousorordinalattributevalues,nomorethan40attributesandnomorethan20000examples.Theinten-tionwastoconsidersmalldatasetsthatarenotlinearlyseparable.Additionally,weexcludedthoselearningtaskswithmissingvaluesorinwhicheverydeterministiclearnerconsidered(NB,SVM,LR)achievesaproportionofsuccessfulclassicationsofover95%;otherwisenondeterministiclearnerswouldbetoosimilartotheirdeterministiccounterpart.AdescriptionofthegroupofdatasetsconsideredcanbefoundinTable2.Wethenevaluatedtheperformanceonlearningtasksinwhichtheaimwastoclassifycancerpatientsfromgeneexpressionscapturedbymicroarrays.Unliketherstpackageofdatasets,alltheclassesarenowlinearlyseparablegiventhedimensionsoftheinputspaceandthenumberofentries.Table3showsthedetailsofthesedatasets.Everytableofscores(Tables4,5,6,7,8)isdevotedtoreportingtheexperimentalresultsachievedinoneofthekindsofdatasetsbyoneofthedeterministiclearnersandbytwonondeter-ministicalgorithmsthataretobecompared.Allthetableshaveasimilarlayout.First,theycontainthescoresofthedeterministiclearnerd:theF1(oraccuracyorRecall),andtheBrierscore,amea-sureforthequalityofposteriorprobabilities(Brier,1950;Yeungetal.,2005),computedbymeansofBS=1 2nnåi=1kåj=1([yi=Cj]Pr(Cjjxi))2:Thenwereport,foreachnondeterministiclearner,theF1PrecisionRecall,andtheaveragenumberofclassespredicted(jh(x)j).Allthescoreswereestimatedbymeansofa5-foldcrossvalidationrepeated2times.Wedidnotusethe10-foldprocedure,sinceincertaindatasetstherearetoofewexamplesinsomeoftheclasses.FollowingDemsar(2006),weusedtheWilcoxonsignedrankstesttocomparetheperformanceoftwoclassierswhenthemeasurementsareF1PrecisionRecall,ortheaveragejh(x)j.Unless2283 DELCOZ,D´IEZANDBAHAMONDE NB NCC ndnbDataset F1BS F1PRjh(x)j F1PRjh(x)j zoo 95.00.03 92.390.5100.01.496 95.294.397.01.055iris 93.30.05 92.992.594.01.037 93.993.594.71.023glass 68.00.22 69.266.576.61.321 70.767.677.61.253ecoli 83.50.12 82.081.085.71.240 84.482.288.71.136balance 73.90.16 76.074.279.71.132 79.974.990.11.370vehicle 60.80.30 60.959.863.41.103 63.360.269.61.241vowel 62.10.25 64.662.669.81.296 65.560.975.51.429contra 50.00.30 50.350.150.61.013 56.647.974.61.670yeast 58.10.28 58.458.259.01.037 60.854.474.31.500car 86.80.11 87.387.087.81.017 83.476.698.01.487image 90.90.08 91.490.594.81.195 91.290.992.01.026waveform 80.10.17 80.180.080.41.007 80.980.082.51.051landsat 82.00.17 82.081.683.11.058 82.181.982.41.011letter 73.90.19 74.674.275.81.081 74.873.378.01.166 Table4:ScoresobtainedbyNa¨veBayes,theNa¨veCredalClassierandnondeterministicclassi-ersonUCIdatasetsusinga5-foldcrossvalidationrepeated2times.Foreaseofreading,F1,Precision(P),andRecall(R)areexpressedaspercentages.ThebestnondeterministicF1foreachdatasetisboldfacedexplicitlystated,weusetheexpressionstatisticallysignicantdifferencestomeanthatp0:01.Additionally,inordertoprovideaquickviewoftheorderofmagnitudeofthescores,wehaveboldfacedthebestnondeterministicF1scoreforeachdataset.Toselecttheregularizationparameter,C,forSVMandLR,weuseda2-foldcrossvalidationrepeated5timesperformedontrainingsets.WesearchedwithinC2[102;:::;102]5.2NondeterministicClassiersvs.Na¨veCredalClassiersInthissubsection,wecompareournondeterministiclearnerwithNCC(CoraniandZaffalon,2008b),astate-of-the-artset-valued(nondeterministic)algorithm.Inordertoensureafaircomparison,ourapproachusestheNa¨veBayes(NB)employedbyNCCasitsdeterministiccounterpart.Table4reportsthescoresofNBNCCandouralgorithmndnbThenondeterministicndnbissignicantly(rememberthatweareusingWilcoxontests)betterthanNCCbothinRecallandF1.Moreover,ndnbwinsin12outof14datasetsinF1,andin11outof14inRecall.However,thescoresinPrecisionandsizeofpredictionsaremorebalanced;thedifferencesarenotsignicant.InPrecisionNCCwinsin5cases,losesin8,andthereis1tiesituation.ThesizescoresarefavorabletoNCCin8outof14datasets.Tocompletethecomparison,weshoulddiscusstheresultsachievedonhighdimensionaldatasets(Table3).Nevertheless,wedonotshowthescoresoneachdataset.ThecharacteristicsofthesetasksarenotappropriateforNa¨veBayes(alargenumberofattributeswithasmallnumberofexamples);therefore,theposteriorprobabilitiesofNBarepoor(theyaresignicantlyworsethanthoseachievedbySVMandLR)andthisaffectstheperformanceofournondeterministicalgorithmandNCC.Ourmethodtendstobealmostdeterministic,theaveragevalueforthesizeofpredictionsisjh(x)j=1:008.Thisisnotoptimal,asweshallseelater,butitisacceptablebehavior.However,2284 LEARNINGNONDETERMINISTICCLASSIFIERS SVM ndsvml ndsvmDataset F1BS F1PRjh(x)j F1PRjh(x)j zoo 94.00.08 38.924.6100.04.390 94.292.498.01.134iris 96.00.02 83.274.8100.01.510 97.696.799.31.053glass 61.70.26 63.753.385.01.711 63.055.977.31.484ecoli 86.50.11 75.166.397.21.854 87.485.092.31.152balance 91.70.06 83.877.598.71.528 91.389.098.11.272vehicle 79.80.13 79.671.097.81.576 82.577.992.01.297vowel 82.00.15 66.355.097.52.313 82.978.891.51.288contra 51.30.29 55.948.371.31.599 57.746.783.11.960yeast 59.00.27 60.650.482.21.817 62.453.481.61.706car 85.30.11 82.876.197.31.475 85.683.090.81.169image 95.90.03 84.879.199.81.579 96.195.397.91.058waveform 86.40.10 85.780.097.11.343 87.681.591.81.126landsat 86.80.09 84.478.497.61.453 87.885.791.91.139letter 85.80.11 71.064.398.22.949 86.376.791.01.186 Table5:ScoresobtainedbySVMlearnersonUCIdatasetsusinga5-foldcrossvalidationrepeated2times.Foreaseofreading,F1,Precision(P),andRecall(R)areexpressedaspercentages.ThebestnondeterministicF1foreachdatasetisboldfaced LR ndlrl ndlrDataset F1BS F1PRjh(x)j F1PRjh(x)j zoo 95.00.04 91.088.497.01.252 95.495.096.01.045iris 96.70.05 74.461.7100.01.767 94.492.299.01.137glass 60.30.27 61.549.386.01.844 63.051.885.51.774ecoli 87.50.11 76.968.396.11.668 87.084.492.11.173balance 86.70.11 88.987.492.61.185 88.787.790.91.136vehicle 77.00.16 74.864.995.31.674 79.274.189.71.342vowel 57.90.30 54.141.583.52.226 57.848.679.71.908contra 50.80.29 55.947.772.31.644 58.047.482.51.928yeast 58.40.28 59.449.080.91.818 61.052.279.91.713car 80.90.13 80.774.195.01.482 82.078.988.51.215image 88.40.11 72.360.998.71.915 88.085.193.81.196waveform 86.50.10 81.872.899.61.536 87.482.996.41.272landsat 77.70.18 68.658.193.81.940 76.671.786.91.387letter 71.80.24 49.336.590.73.253 70.364.982.51.556 Table6:ScoresobtainedbyLRlearnersonUCIdatasetsusinga5-foldcrossvalidationrepeated2times.Foreaseofreading,F1,Precision(P),andRecall(R)areexpressedaspercentages.ThebestnondeterministicF1foreachdatasetisboldfacedthescoresofNCConthesedatasetsareinadmissible;theirclassierspredictalmostallclassesforeveryexample,theiraveragevaluesare:F1=25:73,P=15:39,R=100,andjh(x)j=8:58.Infact,thebehaviorofNCCisdifculttopredict,sometimesitisalmostadeterministicclas-sier,whereasinothertasksthenumberofclassespredictedbyNCCisveryhigh.Moreover,itsdegreeofnondeterminismisnotrelatedtothedifcultyofthelearningtask.Whentheaccuracyofthedeterministicclassiersdecreases,theaveragenumberofclassespredictedwouldbeexpected2285 DELCOZ,D´IEZANDBAHAMONDE SVM ndsvml ndsvmDataset F1BS F1PRjh(x)j F1PRjh(x)j brain 81.80.15 59.244.997.52.504 82.978.093.81.401nci 48.30.35 42.933.268.32.492 47.741.465.02.167lung6 72.10.21 65.757.985.71.907 73.070.478.61.221leukemia3 94.50.04 75.164.5100.01.862 95.794.997.31.049lung4 87.10.11 73.963.096.91.743 87.385.391.41.122lung11 58.40.31 49.336.584.22.656 60.453.878.01.903tumors11 89.60.13 30.619.199.76.135 88.987.192.81.199tumors14 70.00.26 45.035.395.04.550 66.560.284.72.021lung16 84.80.17 25.014.5100.07.440 87.383.195.81.266leukemia7 92.00.07 70.159.999.42.216 92.190.695.11.090 Table7:ScoresobtainedbySVMlearnersoncancermicroarraydatasetsusinga5-foldcrossvali-dationrepeated2times.Foreaseofreading,F1,Precision(P),andRecall(R)areexpressedaspercentages.ThebestnondeterministicF1foreachdatasetisboldfaced LR ndlrl ndlrDataset F1BS F1PRjh(x)j F1PRjh(x)j brain 86.80.11 86.382.794.01.274 86.184.589.21.106nci 55.80.33 56.755.858.31.142 57.856.760.01.158lung6 70.70.22 74.372.577.91.150 73.372.175.71.107leukemia3 97.30.02 92.288.8100.01.258 97.797.398.71.028lung4 88.90.10 88.986.493.91.153 88.887.890.81.061lung11 69.00.24 69.165.277.51.354 68.965.775.91.316tumors11 94.80.05 89.585.999.11.421 93.793.095.11.057tumors14 75.30.18 76.873.983.21.337 76.875.280.31.145lung16 88.10.10 88.386.093.01.157 88.487.490.31.060leukemia7 91.90.07 90.687.996.31.202 91.991.493.11.040 Table8:ScoresobtainedbyLRlearnersoncancermicroarraydatasetsusinga5-foldcrossvalida-tionrepeated2times.Foreaseofreading,F1,Precision(P),andRecall(R)areexpressedaspercentages.ThebestnondeterministicF1foreachdatasetisboldfacedtoincrease.HoweverthecorrelationbetweentheaccuracyofNBandjh(x)jofNCCis0:24.Inthecaseofndnb,thiscorrelationis0:75:negativeandquitehigh.5.3ComparingndwithAnotherAlternativeMethodInaccordancewiththediscussioninSection4.1,weshallnowcomparethenondeterministicclas-sierslearnedbyAlgorithm1withthealternativeclassierdenedinEquation(10)thatusesathresholdlforthesumofposteriorprobabilities.Thecomparisonwillbeestablishedwithpos-teriorprobabilitiesprovidedbySVMandLRgiventhatbothoutperformtheaccuracyachievedbyNa¨veBayesclassiersinthedatasetsusedintheseexperiments.Thelnondeterministicclassierswillbedenotedbynddl,wheredstandsforthedeterministiccounterpart.Toselecttheparameterl,weuseagridsearchemployinga2-foldcrossvalidationrepeated5times,aimingtooptimizeF1.ThesearchingspacedependsonthelearningtaskS.Iftheproportionofsuccessfulclassicationsfordeterministicclassiers,theaccuracy,isa,thenwesearchwithin2286 LEARNINGNONDETERMINISTICCLASSIFIERSl2[a0;a1;:::;a5];sixoptionsdistributefromato0:99.Insymbols,a0=a;a5=0:99,andai+1ai=099a 5InUCIdatasets,Tables5and6,ndsvmandndlrwinthecorrespondingndlin13outof14datasetsinF1andPrecision.InRecallwehavetheoppositesituation;lclassierswinin13outof14cases.Moreover,lclassiersalwayspredictmoreclassesthanndsvmandndlr.Inotherwords,lclassierspredictmoreclassesthannecessary.Alldifferencesaresignicant.Thus,ourndclassiersarebetterthanthosecomputedwiththelparameter.Incancermicroarraydata,Tables7and8,ndsvmalwayswinsinF1Precision,andaveragejh(x)j;whilendsvmalwayslosesinRecall.Alldifferencesareagainsignicant.However,whenposteriorprobabilitiesareprovidedbyLR,thedifferencesarenotsignicantinF1,althoughndlrhas5wins,1tieand4losses;inPrecisionandaveragesizeofpredictionsthedifferencesaresignicantinfavorofndlr.Furthermore,asusual,theRecallissignicantlyhigherforlclassiers.Theconclusionisthatlclassiersseemtoneedmoreclassesintheirpredictionsthanndclas-siers.Infact,Equation(9)onlyconsiderstheRecall.Inpractice,thismeansmoreRecall,butlessPrecisionandF1.Therefore,tooptimizetheF1measure,inanexperimentalenvironment,Equation(8)ismoreadequatethanEquation(9),aswehaveconjecturedtheoreticallyinSection4.1.5.4TheImportanceofPosteriorProbabilitiesTheobjectiveofthissubsectionistoexperimentallyinvestigatethedegreeofdependencybetweennondeterministicscoresandtheaccuracyofposteriorprobabilities.InthisstudyweagainemploySVMandLRwiththecollectionofdatasetsdetailedinTables2and3.LetusrstconsiderthesetofUCIdatasets.ComparingtheresultsinTables5and6,itcanbeseenthatthescoresofndlraresignicantlyworsethanthoseofndsvminF1PrecisionRecall(p0:03),andinaveragesizeofpredictions.Thegeneralmessageisthatndlrincludeunnecessaryclassesintheirpredictions.Thebaseposteriorprobabilitiesseemtobethecauseofthisbehavior:theBrierscoreofLRissignicantlyworsethanthatofSVMOntheotherhand,thescoresobtainedwithcancermicroarraydatasetsareshowninTables7and8.ThecharacteristicsofUCIandmicroarraydatasetsarequitedifferent,andthisaffectstheperformanceofclassiers.ThemaindifferenceisthatLRnowhasasignicantlybetterBrierscorethanSVM.Moreover,thendlralgorithmachievesbetterresultsthanndsvm.ThedifferencesaresignicantinF1PrecisionRecall(p0:02),andaveragejh(x)j.Yetagain,inferiorposteriorprobabilitiesseemtoberesponsiblefortheinclusionofunnecessaryclassesinnondeterministicpredictions.Intheprecedingdiscussionofthescoresachievedbynondeterministiclearners,wefoundsig-nicantdifferenceswhentheBrierscoresofthedeterministiccounterpartspresentedsignicantdifferences.Infact,thescoresofalearnerbuiltwithAlgorithm1dependonthequalityofthepos-teriorprobabilitiessuppliedbythecorrespondingdeterministiclearner.Itseemsplausibletodrawtheconclusionthatthebettertheposteriorprobabilities,thebetterthenondeterministicscores.Inordertoquantifythisstatement,wecompareddeterministicBrierscoreswithnondeterministicF1Recall,andPrecisionvalues;seeFigure2.WeseparatedthescoresachievedbyUCIandcancerdatasetsandincludedthescoresofndnbinUCIdatasets.Similarresultswouldbeachievedifwecomparednondeterministicscoreswithdeterministicaccuracy.2287 DELCOZ,D´IEZANDBAHAMONDE !"#$%& '()*+,*!"#$%& !"#-. '()*+,*!"#-. !"#!/ '()*+,*!"#!/ 0+..1-2)(+!3*#456678 0+..1-2)(+!3*#456687 0+..1-2)(+!3*#45694: '78;;4;;=4=;949;646;744 �.(1.*$?+.14454;4574457;45:445:;45@4 �'7*%$*.(1.*$?+.1*AB0CD !"#$%& '()*+,*!"#$%& !"#-. '()*+,*!"#-. /+..0-1)(+!2*#345678 /+..0-1)(+!2*#345598 ':877377;3;7636793975357:33 (3343734:334:7&#x.0.*;&#x$=+.;Ֆ&#x.148;343&#x.0.*;&#x$=+.;Ֆ&#x.148;34734?334?73483 ':*%$*( !"#$%& '()*+,*!"#$%& !"#-. '()*+,*!"#-. !"#!/ '()*+,*!"#!/ 0+..1-2)(+!3*#456787 0+..1-2)(+!3*#45679: 0+..1-2)(+!3*#456746 ;1==--2;".1;攀4--2;".1;攀=:4:=949=646=?44 @.(1.*$4454=45?445?=4584458=45A4 ;1( !"#$%& '()*+,*!"#$%& !"#-. '()*+,*!"#-. /+..0-1)(+!2*#345675 /+..0-1)(+!2*#346689 :0;1--=3=Ֆ.;ᕠ3Ֆ.;ᕠ535636933 ?.(0.*$;+.03343349334934@334@34733473483 :0;1--*%$*?.(0.*$;+.0*A/1!;0.B !"#$%& '()*+,*!"#$%& !"#-. '()*+,*!"#-. !"#!/ '()*+,*!"#!/ 0+..1-2)(+!3*#456786 0+..1-2)(+!3*#456776 0+..1-2)(+!3*#4569:7 ;.1((=984899499&#x$+!5;V.1;吀4&#x$+!5;V.1;吀9?4?974796469:44 @.(1.*$4454945:445:945A445A945=4 ;.1((( !"#$%& '()*+,*!"#$%& !"#-. '()*+,*!"#-. /+..0-1)(+!2*#345678 /+..0-1)(+!2*#345583 9.0:($(+!;=3=838Ֆ.;ᕠ3Ֆ.;ᕠ636535?33 @.(0.*$:+.0334334?334?347334734;334;34=3 9.0:($(+!*%$*@.(0.*$:+.0*A/1!:0.B Figure2:CorrelationbetweenBrierscoresandF1Recall,andPrecision.TheleftcolumnshowstheresultswithUCIdatasets,whiletherightcolumnusescancerdatasets.SimilarresultswouldbeachievedifwecomparednondeterministicscoreswithdeterministicaccuracyWeobservedthatthecorrelationsbetweentheBrierscoresofdeterministiclearnersandnon-deterministicscores(F1Recall,andPrecision)areveryhigh:theirabsolutevaluesareinallcasesgreaterthan0:89.Therefore,inordertochooseanondeterministicapproachinapracticalappli-cation,givenadataset,itwouldberecommendabletorstanalyzetheBrierscoreofdifferentdeterministiclearners.2288 LEARNINGNONDETERMINISTICCLASSIFIERS !" !# $%&'()(*+ ,&'-.. /0#//0#1/02//021/03//031/01//011/04//041/05//051/06//061/07//071"0// 8&9-//01"0/"01#0/#0120/20130/ +:);࠳.;p=?&-)9@ !" !# $%&'()(*+ ,&'-.. /0#//0#1/02//021/03//031/01//011/04//041/05//051/06//061/07//071"0// 8&9-//01"0/"01#0/#0120/20130/ +:);࠳.;p=;*?&.@ !" !# $%&'()(*+ ,&'-.. /0#//0#1/02//021/03//031/01//011/04//041/05//051/06//061/07//071"0// 8&9-//01"0/"01#0/#0120/20130/ +:.%&#x=&-9;;) !" !# $%&'()(*+ ,&'-.. /0#//0#1/02//021/03//031/01//011/04//041/05//051/06//061/07//071"0// 8&9-//01"0/"01#0/#0120/20130/ +:.%&#x=*00;; !" !# $%&'()(*+ ,&'-.. /0#//0#1/02//021/03//031/01//011/04//041/05//051/06//061/07//071"0// 8&9-//01"0/"01#0/#0120/20130/ +:+;&#x=000;) !" !# $%&'()(*+ ,&'-.. /0#//0#1/02//021/03//031/01//011/04//041/05//051/06//061/07//071"0// 8&9-//01"0/"01#0/#0120/20130/ +:+;&#x=000; Figure3:EvolutionofF1F2PrecisionandRecallontwoUCIdatasets(yeastandvowel)fordifferentbvaluesandforthenondeterministiclearnersgeneratedbySVMLR,andNB5.5TheMeaningofbInthissubsection,weanalyzefromthepointofviewoftheusertheroleplayedbytheparameterbinAlgorithm1.Itstheoreticalaimistocontrolthesizeofpredictions:asthebvalueincreases,thesizeofpredictionswillbecomebiggerandthereforetheRecallscoreswillbehigher;seeEquation(8).TheproblemisthatitisnotalwaysofinteresttoincreaseRecallvalues,sincethatwouldworsenF1scores:addingmoreclassesinpredictionsincreasesincorrectanswers.InFigure3weshowtheevolutionofF1F2PrecisionandRecallontwoUCIdatasets(yeastandvowel)fordifferentbvaluesandforthenondeterministiclearnersgeneratedbySVMLR,and2289 DELCOZ,D´IEZANDBAHAMONDENB.Quitesimilargraphscouldhavebeengeneratedfortheotherdatasetsusedintheexperimentsreportedinthissection.Initially,b=0makesthenondeterministicclassiersdeterministic.Therefore,thescoresrep-resentedintheleft-handsideofallthegraphsinFigure3areallthesame:theaccuracyofthede-terministicclassier.Asbvaluesbecomehigher,theRecallincreasesandthePrecisiondecreases.Themaingoalofthelearningmethodproposedhereistolookforatradeoffofthesemeasuresthatisdeterminedbyb,auser-modiableparameter.Inpractice,thevalueofbthattheclassiermustaimtooptimizeshouldbexedbyanexpertintheeldofapplicationinwhichtheclassierisgoingtobeemployed.ThekindofdecisionsthatonewouldliketotakefromnondeterministicclassicationsmustbeconsideredItcanbeobservedinthegraphsinFigure3thatthebestscoresinF1arenotalwaysachievedforb=1.WithsmallvaluesofbF1increases.However,whensomepointnear1isexceeded,theF1scoreofthenondeterministiclearnertypicallyfallsbelowtheaccuracyofthecorrespondingdeterministiclearner.Nonetheless,optimalvaluesarefrequentlyreachedaroundthenominalvalue:b=1(or2respectively).SlightimprovementscanbeachievedinF1(ingeneralFb)ifweuseagridsearchforbvaluestobeusedinAlgorithm1.6.ConclusionsWehavestudiedclassiersthatareallowedtopredictmorethanoneclassforentriesfromaninputspace:nondeterministicorset-valuedclassiers.UsingaclearanalogywithInformationRetrieval,wehaveproposedafamilyoflossfunctionsbasedonFbmeasures.Afterdiscussingsuchmeasures,wederivedanalgorithmtolearnoptimalnondeterministichypothesis.Givenanentryfromtheinputspace,thealgorithmrequirestheposteriorprobabilitiestocomputethesubsetofclasseswiththelowestexpectedloss.Thepaperincludesasetofexperimentscarriedoutontwocollectionsofdatasets.TherstonewasdownloadedfromtheUCIrepository,theclassesofwhicharenotlinearlyseparable.Thesec-ondgroupisformedbydatasetswhoseinputspacesrepresentmicroarrayexpressionsofdifferentkindsofcancer,theclassesofwhichareseparable.Usingthesebenchmarks,werstcomparednondeterministiclearnersobtainedfromaNa¨veBayeswiththoselearnedbyastate-of-the-artset-valued(nondeterministic)algorithm,theNa¨veCredalClassier(NCC)(Zaffalon,2002;CoraniandZaffalon,2008a,b),anextensionofthetradi-tionalNa¨veBayesclassierdesignedtoreturnrobustset-valuedclassications.Weshowedthat,usingthelossmeasuresdenedinthispaper,ourmethodcanimprovetheperformanceofNCCAdditionally,animportantadvantageofournondeterministicclassiersoverNCCisthatwecancontrolthedegreeofnondeterministicbehavior.WecanregulatethenumberofclassespredictedbyxingtheFbtobeoptimized:asbishigher(theweightofRecallisincreasedintheharmonicaver-ageFb),thesizeofourpredictionsgrows(seeSection5.5).HoweverthenondeterministicbehaviorofNCCisquitedifculttopredict.InadditiontoNa¨veBayes,weusedamulticlassSVMandaLogisticRegression.Withtheposteriorprobabilitiesprovidedbythesedeterministiclearners,webuiltanotheralternativemethodtopredictmorethanoneclass:thesetofclasseswhichthehighestposteriorprobabilitiessummingmorethanathresholdl.Wealsofoundthattheclassiersbuiltwithouralgorithmoutperformthisoptionbasedonathreshold.2290 LEARNINGNONDETERMINISTICCLASSIFIERSOntheotherhand,intheexperimentsreportedinthispaper,westudiedtheroleofthedetermin-isticlearnersthatexplicitlyprovideposteriorprobabilities.Wefoundthatthebettertheposteriorprobabilities,thebetterthenondeterministicclassiers.InfactweobtainedveryhighcorrelationsbetweentheBrierscoresofdeterministicprobabilitiesandtheF1PrecisionandRecallvaluesoftheirnondeterministiccounterparts.AcknowledgmentsTheresearchreportedhereissupportedinpartundergrantsTIN2005-08288fromtheMEC(Minis-teriodeEducaci´onyCiencia,Spain)andTIN2008-06247fromtheMICINN(MinisteriodeCienciaeInnovaci´on,Spain).Wewouldalsoliketoacknowledgeallthosepeoplewhogenerouslysharedthedatasetsandsoftwareusedinthispaper,andtheanonymousreviewers,whosecommentssig-nicantlyimprovedit.ReferencesJ.Alonso,J.J.delCoz,J.D´ez,O.Luaces,andA.Bahamonde.Learningtopredictoneormoreranksinordinalregressiontasks.ProceedingsoftheEuropeanConferenceonMachineLearningandPrinciplesandPracticeofKnowledgeDiscoveryinDatabases(ECMLPKDD'08),LNAI5211,pages39–54.Springer,2008.S.A.Armstrong,J.E.Staunton,L.B.Silverman,R.Pieters,M.L.denBoer,M.D.Minden,S.E.Sallan,E.S.Lander,T.R.Golub,andS.J.Korsmeyer.MLLtranslocationsspecifyadistinctgeneexpressionprolethatdistinguishesauniqueleukemia.NatureGenetics,30(1):41–47,2002.A.AsuncionandD.J.Newman.UCImachinelearningrepository.SchoolofInformationandComputerSciences.UniversityofCalifornia,Irvine,California,USA,2007.P.L.BartlettandM.H.Wegkamp.Classicationwitharejectoptionusingahingeloss.JournalofMachineLearningResearch,9:1823–1840,2008.G.W.Brier.Vericationofforecastsexpressedintermsofprobability.MonthlyWeatherRev,78:1–3,1950.C.Chow.Onoptimumrecognitionerrorandrejecttradeoff.IEEETransactionsonInformationTheory,16(1):41–46,1970.A.ClareandR.D.King.PredictinggenefunctioninSaccharomycescerevisiae.Bioinformatics,19(2):42–49,2003.G.CoraniandM.Zaffalon.Learningreliableclassiersfromsmallorincompletedatasets:TheNaiveCredalClassier2.JournalofMachineLearningResearch,9:581–621,2008a.G.CoraniandM.Zaffalon.JNCC2:ThejavaimplementationofNaiveCredalClassier2.JournalofMachineLearningResearch(MachineLearningOpenSourceSoftware),9:2695–2698,2008b.J.Demsar.Statisticalcomparisonsofclassiersovermultipledatasets.JournalofMachineLearn-ingResearch,7:1–30,2006.2291 DELCOZ,D´IEZANDBAHAMONDEH.P.Kriegel,P.Kroger,A.Pryakhin,andM.Schubert.Usingsupportvectormachinesforclassi-fyinglargesetsofmulti-representedobjects.Proc.4thSIAMInt.Conf.onDataMining,pages102–114,2004.C-J.Lin,R.C.Weng,andS.S.Keerthi.Trustregionnewtonmethodforlogisticregression.JournalofMachineLearningResearch,9(Apr):627–650,2008.S.L.Pomeroy,P.Tamayo,M.Gaasenbeek,L.M.Sturla,M.Angelo,M.E.McLaughlin,J.Y.H.Kim,L.C.Goumnerova,P.M.Black,C.Lau,J.C.Allen,D.Zagzag,J.M.Olson,T.Cur-ran,C.Wetmore,J.A.Biegel,T.Poggio,S.Mukherjee,R.Rifkin,A.Califano,G.Stolovitzky,D.N.Louis,J.P.Mesirov,E.S.Lander,andT.R.Golub.Predictionofcentralnervoussystemembryonaltumouroutcomebasedongeneexpression.Nature,415(6870):436–442,2002.S.Ramaswamy,P.Tamayo,R.Rifkin,S.Mukherjee,C.H.Yeang,M.Angelo,C.Ladd,M.Reich,E.Latulippe,J.P.Mesirov,etal.Multiclasscancerdiagnosisusingtumorgeneexpressionsigna-tures.ProceedingsoftheNationalAcademyofSciences(PNAS),98(26):15149–15154,2001.D.T.Ross,U.Scherf,M.B.Eisen,C.M.Perou,C.Rees,P.Spellman,V.Iyer,S.S.Jeffrey,M.VandeRijn,M.Waltham,etal.Systematicvariationingeneexpressionpatternsinhumancancercelllines.NatureGenetics,24(3):227–234,2000.G.ShaferandV.Vovk.Atutorialonconformalprediction.JournalofMachineLearningResearch9:371–421,2008.J.E.Staunton,D.K.Slonim,H.A.Coller,P.Tamayo,M.J.Angelo,J.Park,U.Scherf,J.K.Lee,W.O.Reinhold,J.N.Weinstein,etal.Chemosensitivitypredictionbytranscriptionalproling.ProceedingsoftheNationalAcademyofSciences(PNAS),98(19):10787–10792,2001.A.I.Su,J.B.Welsh,L.M.Sapinoso,S.G.Kern,P.Dimitrov,H.Lapp,P.G.Schultz,S.M.Powell,C.A.Moskaluk,H.F.Frierson,andG.M.Hampton.Molecularclassicationofhumancarcino-masbyuseofgeneexpressionsignatures.CancerResearch,61(20):7388–7393,2001.P.Tamayo,D.Scanfeld,B.L.Ebert,M.A.Gillette,C.W.M.Roberts,andJ.P.Mesirov.Metageneprojectionforcross-platform,cross-speciescharacterizationofglobaltranscriptionalstates.Pro-ceedingsoftheNationalAcademyofSciences(PNAS),104(14):5959–5964,2007.A.C.Tan,D.Q.Naiman,L.Xu,R.L.Winslow,andD.Geman.Simpledecisionrulesforclassifyinghumancancersfromgeneexpressionproles.Bioinformatics,21(20):3896–3904,2005.R.TibshiraniandT.Hastie.Margintreesforhigh-dimensionalclassication.JournalofMachineLearningResearch,8:637–652,2007.G.TsoumakasandI.Katakis.Multi-labelclassication:Anoverview.InternationalJournalofDataWarehousingandMining,3(3):1–13,2007.T.-F.Wu,C.-J.Lin,andR.C.Weng.Probabilityestimatesformulti-classclassicationbypairwisecoupling.JournalofMachineLearningResearch,5:975–1005,August2004.2292 LEARNINGNONDETERMINISTICCLASSIFIERSE.J.Yeoh,M.E.Ross,S.A.Shurtleff,W.K.Williams,D.Patel,R.Mahfouz,F.G.Behm,S.C.Rai-mondi,M.V.Relling,A.Patel,C.Cheng,D.Campana,D.Wilkins,X.Zhou,J.Li,H.Liu,C.-H.Pui,W.E.Evans,C.Naeve,L.Wong,andJ.R.Downing.Classication,subtypediscovery,andpredictionofoutcomeinpediatricacutelymphoblasticleukemiabygeneexpressionproling.CancerCell,1(2):133–143,2002.K.Y.YeungandR.E.Bumgarner.Multiclassclassicationofmicroarraydatawithrepeatedmea-surements:applicationtocancer.GenomeBiology,4(12):R83,2003.K.Y.Yeung,R.E.Bumgarner,andA.E.Raftery.Bayesianmodelaveraging:developmentofanimprovedmulticlass,geneselectionandclassicationtoolformicroarraydataBioinformatics21(10):2394–2402,2005.M.Zaffalon.TheNa¨veCredalClassier.JournalofStatisticalPlanningandInference,105(1):5–21,2002.2293