LiFerngLinsentativesamplingalgorithmsalsofailtoachievedecentperformanceInotherwordstheclusteringstepisusuallythebottleneckofrepresentativesamplingHuangetal2010proposeanimprovedalgorithmthatmodel ID: 290573
Download Pdf The PPT/PDF document "LiFerngLinAssuggestedinCohnetal.(1996);X..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
LiFerngLinAssuggestedinCohnetal.(1996);Xuetal.(2003),activelearningcanbeimprovedbyconsideringtheunlabeledinstancesinordertoquerytheinstancethatisnotonlyuncertaintotheavailableclassierbutalso\representative"totheglobaldatadistribution.Therearemanyexistingalgorithmsthatuseunlabeledinformationtoimprovetheperformanceofactivelearning,suchasrepresentativesampling(Xuetal.,2003).Representativesamplingmakesqueryingdecisionsusenotonlytheuncertaintyofeachinstance,butalsotherepresentativeness,whichismeasuredbydeterminingwhethertheinstancesresideinadensearea.Typicalrepresentativesamplingalgorithms(Xuetal.,2003;NguyenandSmeulders,2004;DasguptaandHsu,2008)estimatetheunderlyingdatadistributionviaclusteringmethods.However,theperformanceofthealgorithmsdependsontheresultofclustering,whichisasophisticatedandnon-trivialtask,especiallywhentheinstancesarewithinahighdimensionalspace.Anotherstate-of-the-artalgorithm(Huangetal.,2010)modelstherepresentativenessbyestimatingthepotentiallabelassignmentoftheunlabeledinstancesonthebasisofthemin-maxviewofactivelearning(Hoietal.,2008).Theperformanceofthisalgorithmdependsontheresultsofestimatingthelabelassignments,whichisalsoacomplicatedtask.Inthiswork,weproposeanovelframeworkofactivelearning,hintedsampling,whichconsiderstheunlabeledinstancesashints(Abu-Mostafa,1995)oftheglobaldatadistribu-tion,insteadofdirectlyclusteringthemorestimatingtheirlabelassignments.Thisleadstoasimpleractivelearningalgorithm.Similartorepresentativesampling,hintedsamplingalsoconsidersbothuncertaintyandrepresentativeness.Hintedsamplingenjoysthead-vantageofsimplicitybyavoidingtheclusteringorlabel-assignmentestimationsteps.Wedemonstratetheeectivenessofhintedsamplingbydesigninganovelalgorithmwithsup-portvectormachine(SVM;Vapnik,1998).Inthealgorithm,weextendtheusualSVMtoanovelformulation,HintSVM,whichiseasiertosolvethaneitherclusteringorlabel-assignmentestimation.Wethenstudythehintselectionstrategytoimprovetheeciencyandeectivenessoftheproposedalgorithm.ExperimentalresultsdemonstratethatthesimpleHintSVMwithaproperhintselectionstrategyiscomparabletothebestofbothun-certaintysamplingandrepresentativesamplingalgorithms,andresultsinbetterandmorestableperformancethanotherstate-of-the-artactivelearningalgorithms.Therestofthepaperisorganizedasfollows.Section2introducestheformalproblemdenitionandreviewstherelatedworks.Section3describesourproposedhintedsamplingframeworkaswellastheHintSVMalgorithms.Section4elucidatesthehintselectionstrategy.Section5reportsexperimentresultsandcomparisons.Finally,Section6concludesthiswork.2.ProblemDenitionandRelatedWorksInthiswork,wefocusonpool-basedactivelearningforbinaryclassication,whichisoneofthemostcommonsetupinactivelearning(LewisandGale,1994).Attheinitialstageofthesetup,thelearningalgorithmispresentedwithalabeleddatapoolandanunlabeleddatapool.WedenotethelabeleddatapoolbyDl=f(x1;y1);(x2;y2);:::;(xN;yN)gandtheunlabeleddatapoolbyDu=fex1;ex2;:::;exMg,wheretheinputvectorsxi;exj2Rdandthelabelsyi2f1;1g.Usually,thelabeleddatapoolDlisrelativelysmallorevenempty,whereastheunlabeleddatapoolDuisassumedtobelarge.Activelearningisan222 LiFerngLinsentativesamplingalgorithmsalsofailtoachievedecentperformance.Inotherwords,theclusteringstepisusuallythebottleneckofrepresentativesampling.Huangetal.(2010)proposeanimprovedalgorithmthatmodelsrepresentativenesswith-outclustering.Inthealgorithm,theusefulnessofeachexj,whichimplicitlycontainsbothuncertaintyandrepresentativeness,isestimatedbyusingatechniqueinsemi-supervisedlearning(Hoietal.,2008)thatchecksapproximatelyallpossiblelabelassignmentsforeachunlabeledexj2Du.ThequeryingalgorithmQproposed(Huangetal.,2010)isbasedontheusefulnessofeachexj;thelearningalgorithmLissimplyastand-aloneSVM.Whiletheactivelearningalgorithm(Huangetal.,2010)oftenachievespromisingempiricalresults,itsbottleneckisthelabel-estimationstep,whichisrathersophisticatedandthusnotalwayseasytoachieveasatisfactoryperformance.AnotherimprovementofrepresentativesamplingispresentedbyDonmezetal.(2007),whoreportthatrepresentativesamplingislessecientthanuncertaintysamplingforlateriterations,inwhichthedecisionfunctionisclosertotheidealone.Tocombinethebestprop-ertiesofuncertaintysamplingandrepresentativesampling,Donmezetal.(2007)proposeamixedalgorithmbyextendingrepresentativesampling(NguyenandSmeulders,2004).TheproposedqueryalgorithmQ(Donmezetal.,2007)issplitintotwostages.Therststageperformsrepresentativesampling(NguyenandSmeulders,2004)whileestimatingtheexpectederrorreduction.Whentheexpectedreductionislessthanagiventhreshold,thequeryingalgorithmQswitchestouncertaintysamplingforne-tuningthedecisionbound-ary.Thebottleneckofthealgorithm(Donmezetal.,2007)isstilltheclusteringstepintherststage.Insteadoffacingthechallengesofeitherclusteringorlabel-estimation,weproposetoviewtheinformationinDudierently.Inparticular,theunlabeledinstancesexj2Duaretakenashints(Abu-Mostafa,1995)thatguidethequeryingalgorithmQ.Theideaofusinghintsleadstoasimpleractivelearningalgorithmwithbetterempiricalperformance,asintroducedinthenextsections.3.HintedSamplingFrameworkFirst,weillustratethepotentialdrawbackofuncertaintysamplingwithalinearSVMclassier(Vapnik,1998),whichisappliedtoatwo-dimensionalarticialdataset.Figure1showsthearticialdataset,whichconsistsofthreeclusters,eachofwhichcontainsinstancesofaparticularclass.Wedenoteoneclassbyaredcrossandtheotherbyalledgreencircle.ThelabeledinstancesinDlaremarkedwithabluesquarewhileotherinstancesareinDu.InFigure1(a),theinitialtwolabeledinstancesresideintwooftheclusterswithdierentlabels.Theinitialdecisionfunctionf(0)trainedonthelabeledinstances(fromthetwoclusters)isnotawareofthethirdcluster.Thedecisionfunctionf(0)thenmis-classiestheinstancesinthethirdcluster,andcausesthequeryingalgorithmQ(whichisbasedonf(0))toqueryonlyfromtheinstancesnearthe\wrong"boundaryratherthanexploringthethirdcluster.Afterseveraliterations,asshowninFigure1(b),theuncertaintysamplingalgorithmstilloutputsanunsatisfactorydecisionfunctionthatmis-classiestheentireunqueried(third)cluster.Theunsatisfactoryperformanceofuncertaintysamplingoriginatesinitslackofaware-nessofcandidateunlabeledinstancesthatshouldbequeried.Whentrainedononlyafew224 ActiveLearningwithHintedSupportVectorMachine (a) (b)Figure1:(a)Thedecisionfunction(black)obtainedfromtwolabeled(blue)instances;(b)whenusingthedecisionfunctionin(a)foruncertaintysampling,theupper-leftclusterkeepsbeingignored (a) (b)Figure2:(a)Thehintedqueryfunction(dashedmagentaline)thatisawareoftheupper-leftcluster;(b)whenusingthehinteddecisionfunctionin(a)foruncertaintysampling,allthreeclustersareexploredlabeledinstances,theresulting(linear)decisionfunctionisoverlycondentabouttheunla-beledinstancesthatarefarfromtheboundary.Intuitively,uncertaintysamplingcouldbeimprovedifthequeryingalgorithmQwereawareofandlesscondentabouttheunqueriedregions.Bothclustering(NguyenandSmeulders,2004)andlabel-estimation(Huangetal.,2010)arebasedonthisintuition,buttheyexploretheunlabeledregionsinarathersophis-ticatedway.225 ActiveLearningwithHintedSupportVectorMachinebutstillpassesthroughthesameregionbecauseofthemanyhints.Theinstancetobequeriedisthenthegreensquare,whichisclosetoxiandarguablynotcarryingmuchaddi-tionalinformation.Todrivethequeryboundaryawayfromtheknownxi,thesurroundingneighborsofxishouldbedroppedfromthehintpoolDh,asshowninFigure3(c).Then,theboundarycouldassistthequeryingalgorithmQinqueryingotherpotentiallymorevaluableinstancesthatarefarfromxi,suchastheonemarkedbyasquareinFigure3(c).Weimplementtheideawithaneighborhoodfunctioni:Rd![0;1]tomeasuretheclosenessofanunlabeledinstanceexjtoeachgivenlabeledinstancexi.GiventhelabeledpoolDlandtheneighborhoodfunctionsiofeachxi2Dl,weproposedroppingexjfromthehintpoolwiththeprobabilitymaxxi2Dli(exj):Thatis,ifexjisclosetosomexi(highi(exj)),withahighprobabilitythatexjwouldbedroppedfromDh.Theneighborhoodfunctionicanbeviewedasa\droppingrecommendation"fromxi.Wedesignibyrequestingthefunctiontosatisfythreenaturalconstraints:(1)i(xi)=1,whichmeansaduplicateexampleshouldbedropped(2)theifortheclosestneighbortoxiisP(3)theiforthefarthestneighbortoxiisp,wherepP.Wemodelibyaradiusbasisfunctiontosatisfythethreeconstraints:i(exj)=P(rij):(2)Hererj=kexjxik=diisanormalizeddistanceofexjgivenxi,anddiisthedistancetotheclosestneighborofxi.Then,accordingto(2)andtheconstraints,wecaneasilysolvei=loglogp logP.logRi,whereRiisthenormalizeddistanceofthefarthestneighborofxi.Wenowbrie ycomparefoursamplingstrategiesforALHS:(1)ALL:includeallun-labeledinstances,(2)RAND:randomlydropinstancesfromDhwithaxedprobability,(3)CLOSEST:dropaxednumberofneighborsclosesttothequeriedinstance,(4)SAM-PLE:theproposedstrategy.TheresultsontwodatasetsareshowninFigure4,andthedetailedexperimentalsettingsarelistedinSection5.Accordingtotheexperimentalre-sults,theALLstrategyistheworstbecausetoomanyhintsoverwhelmHintSVM.RANDstrategycansolvetheweaknessofALLstrategy,butitsperformanceattheearlierstagemaybeunsatisfactoryifcurrentlabeledinstancesarenotconsidered.CLOSESTmatchesthecharacteristicofHintSVM,butitmaybeanoverkilltodropallneighborsbasedononlyonequeriedinstance.Amongthestrategies,SAMPLEperformsthebest.Itdropstheneighborinstanceswiththeprobabilitiescomputedfromneighborhoodfunctions,andhasachancetokeepsomeneighborsashintsindenseregions.Furthermore,basedonthehintselectionstrategy,thehintpoolDhcontainsthemostinformativeinstancesinDu.Therefore,whenDhisnon-empty,weproposetoletQselectqueriesfromDhinsteadofDu.4.3.HintTerminationAfterqueryingasucientnumberofinstances,ALHScapturestheunderlyingdatadistri-butionwithahighprobabilityandtheclassierf(r)onhandshallbeclosetotheidealone.Atthattime,allhintscarrylittleinformationtoassistALHSandthusarenotimportant.ThequeryingalgorithmQinALHScanthendropallthehintstoswitchtouncertainty229 LiFerngLin (a)diabetes (b)letterVvsYFigure4:Comparisonofhintsamplingmethodsfordierentdatasetssampling.ThisideaissimilarlyexploredbyDonmezetal.(2007),andwecallithinttermination.Wesetaterminationrulebasedontheproportionoftheremaininghintinstances.Afterwedropmanyhintsbyqueryingenoughinstances,theremaininghintsarenotimportant.TheterminationruleisjDhj jDlj+jDuj,whereisagiventhreshold.Weexaminetwothresholdsat=0(notermination)and=0:5.AsshowninFigure5,theexperimentresultsshowthat=0:5iscomparableto=0andcouldevenoutperform=0insomecases.Weobservethesimilarresultsinotherdatasets,andthusconsider=0:5infutureexperiments. (a)diabetes (b)letterVvsYFigure5:Comparisonofdierentvalues230 ActiveLearningwithHintedSupportVectorMachineTable1:Comparisononaccuracy(meanse)afterquerying5%ofunlabeledpool Algorithms(%),thehighestaccuracyforeachdatasetisinboldface dataUNCERTAINREPRESENTQUIREDUALALHS australian82:1881:57183:7390:54882:3191:12681:3040:64784:0720:454breast96:3340:27895:2640:43996:6570:18796:4080:19696:5250:219diabetes63:2292:76766:7580:50566:7710:96065:1430:38166:8621:632german69:0600:49767:2401:09968:7500:60569:6200:32369:7500:349letterMvsN89:6321:10383:4631:34881:3721:69383:4371:21191:9190:812letterVvsY79:2451:17663:5232:33568:5162:13276:2131:54979:3811:174segment95:4370:36794:3900:48296:0740:22486:0782:83496:0950:204splice74:4300:60669:1171:45270:3400:94256:9690:57675:5060:403wdbc93:8423:13795:6160:71196:6130:23096:0560:25096:9210:2005.ExperimentWecomparedtheproposedALHSalgorithmwiththefollowingactivelearningalgorithms:(1)UNCERTAIN(TongandKoller,2000):uncertaintysamplingwithSVM,(2)REPRE-SENT(Xuetal.,2003):representativesamplingwithSVMandclustering,(3)DUAL(Don-mezetal.,2007):mixtureofuncertaintyandrepresentativesampling,(4)QUIRE(Huangetal.,2010):representativesamplingwithlabelestimationbasedonthemin-maxview.WeconductedexperimentsonnineUCIbenchmarks(FrankandAsuncion,2010),whichareaustralian,breast,diabetes,german,splice,wdbc,letetrMvsN,letterVvsY(Donmezetal.,2007;Huangetal.,2010)andsegment-binary(Ratschetal.,2001;Donmezetal.,2007)aschosenbyotherrelatedworks.Foreachdataset,werandomlydivideditintotwopartswithequalsize.OnepartwastreatedastheunlabeledpoolDuforactivelearningalgorithms.Theotherpartwasreservedasthetestset.Beforequerying,werandomlylabeledonepositiveinstanceandonenegativeinstancetoformthelabeledpoolDl.Foreachdataset,weranthealgorithms20timeswithdierentrandomsplits.Duetothedicultyoflocatingthebestparametersforeachactivelearningalgorithmsinpractice,wechosetocompareallalgorithmsonxedparameters.Intheexperiments,everySVM-basedalgorithmtookLIBSVM(ChangandLin,2011)withtheRBFkernelandthedefaultparameters,exceptforC=5.Correspondingly,theparameterinDonmezetal.(2007);Huangetal.(2010)wassetto=1 C.Theseparametersensurethatallfouralgorithmsbehaveinastablemanner.ForALHS,wexed=0:5,P=0:5andp=0:01asdiscussedintheprevioussectionswithnofurthertuningforeachdataset.Forotheralgorithms,wetaketheparametersintheoriginalpapers.Figure6presentstheaccuracyofdierentactivelearningalgorithmsalongwiththenumberofroundsR,whichequalsthenumberofqueriedinstances.Tables1and2listthemeanandstandarderrorofaccuracywhenR=jDuj5%andR=jDuj10%,respectively.Thehighestmeanaccuracyisshowninboldfaceforeachdataset.Wealsoconductedthet-testat95%signicancelevelasdescribedbyMelvilleandMooney(2004);GuoandGreiner(2007);Donmezetal.(2007).Thet-testresultsaregiveninTable3,whichsummarizesthenumberofdatasetsinwhichALHSperformssignicantlybetter(orworse)thantheotheralgorithms.231 LiFerngLin (a)australian (b)breast (c)diabetes (d)leterMvsN (e)letterVvsY (f)segment (g)wdbc (h)spliceFigure6:Comparisonondierentdatasets232 ActiveLearningwithHintedSupportVectorMachineTable2:Comparisononaccuracy(meanse)afterquerying10%ofunlabeledpool Algorithms(%),thehighestaccuracyforeachdatasetisinboldface dataUNCERTAINREPRESENTQUIREDUALALHS australian83:8840:46084:8840:36784:8700:45581:1740:79884:9860:314breast96:8040:18896:3780:21296:6420:17996:4220:23596:7890:175diabetes66:7062:63266:4841:22367:5001:33765:1430:38171:1591:224german71:4100:48867:1500:77370:2500:56069:7600:29971:6900:333letterMvsN95:3690:31592:4330:77795:1140:48686:8930:87095:6480:264letterVvsY88:2130:63573:8061:55184:7230:89180:1231:35988:6970:607segment96:5280:14395:6840:15596:6580:11089:5191:76096:5450:100splice79:9310:27476:2740:89578:5600:64858:9470:85380:6350:309wdbc97:1550:14196:8180:19196:8620:20695:7480:24797:1110:157Table3:ALHSversustheotheralgorithmbasedont-testat95%signicancelevel Algorithms(win/tie/loss) PercentageofqueriesUNCERTAINREPRESENTQUIREDUAL 5%6/3/07/2/06/3/05/4/010%5/4/07/2/06/3/05/4/0Forsomedatasets,suchaswdbcandbreastinFigure6(g)and6(b),representativesam-plingapproaches(REPRESENT,DUALandQUIRE)achieveabetterperformance,whiletheresultforUNCERTAINisunsatisfactory.Thisunsatisfactoryperformanceispossiblycausedbythelackofawarenessofunlabeledinstances,whichechoesourillustrationinFigure1.ALHSimprovesonUNCERTAINbyusingthehints,andiscomparabletootherrepresentativesamplingalgorithms.1Ontheotherhand,inFigure6(h),sincespliceisalargerandhigherdimensionaldataset,representativesamplingalgorithmsthatperformclustering(REPRESENT,DUAL)orlabelestimation(QUIRE)failtoreachadecentper-formance,whileALHSkeepsastableperformanceandslightlyoutperformsUNCERTAINbyusingthehints.InFigure6,weseethatALHScanachievecomparableresultstothoseofthebestrepresentativesamplinganduncertaintysamplingalgorithms.AsshowninTables1and2,afterquerying5%oftheunlabeledinstances(Table1),ALHSachievesthehighestmeanaccuracyin8outof9datasets;afterquerying10%ofunlabeledinstances(Table2),ALHSachievesthehighestmeanaccuracyin6outof9datasets.Table3furtherconrmsthatALHSusuallyoutperformseachoftheotheralgorithmsatthe95%signicancelevel. 1.Therearesomemoreaggressivequeryingcriteria(TongandKoller,2000)thanUNCERTAINandwehavecomparedwiththoseasadditionalexperiments.OurpreliminaryobservationwasthatthosecriteriacanbeworsethanUNCERTAINwithsoft-marginSVMandhenceweexcludedthemfromthetables.233 LiFerngLin6.ConclusionWeproposeanewframeworkofactivelearning,hintedsampling,whichexploitstheunla-beledinstancesashints.Hintedsamplingcantakebothuncertaintyandrepresentativenessintoaccountconcurrentlyinamorenaturalandsimplerway.WedesignanovelactivelearningalgorithmALHSwithintheframework,andcouplethealgorithmwithapromisinghintselectionstrategy.BecauseALHSmodelstherepresentativenessbyhints,itavoidsthepotentialproblemsofothermoresophisticatedapproachesthatareemployedbyotherrepresentativesamplingalgorithms.Hence,ALHSresultsinasignicantlybetterandmorestableperformancethanotherstate-of-the-artalgorithms.Duetothesimplicityandeectivenessofhintedsampling,itisworthstudyingmoreaboutthisframework.Anintensiveresearchdirectionistocouplehintedsamplingwithotherclassicationalgorithms,andinvestigatedeeperonthehintselectionstrategies.WhileweuseSVMinALHS,thisframeworkcouldbegeneralizedtootherclassicationalgorithms.Inthefuture,weplantoinvestigatemoregeneralhintselectionstrategiesandextendhintedsamplingfrombinaryclassicationtootherclassicationproblem.AcknowledgmentsWethankDr.Chih-HanYu,theanonymousreviewersandthemembersoftheNTUCom-putationalLearningLabforvaluablesuggestions.ThisworkissupportedbytheNationalScienceCouncilofTaiwanviathegrantNSC101-2628-E-002-029-MY2.ReferencesY.S.Abu-Mostafa.Hints.NeuralComputation,4:639{671,1995.K.P.BennettandA.Demiriz.Semi-supervisedsupportvectormachines.InAdvancesinNeuralInformationProcessingSystems11,pages368{374,1998.C.-C.ChangandC.-J.Lin.LIBSVM:Alibraryforsupportvectormachines.ACMTrans-actionsonIntelligentSystemsandTechnology,pages27:1{27:27,2011.D.A.Cohn,Z.Ghahramani,andM.I.Jordan.Activelearningwithstatisticalmodels.JournalofArticialIntelligenceResearch,4:129{145,1996.S.DasguptaandD.Hsu.Hierarchicalsamplingforactivelearning.InProceedingsofthe25thInternationalConferenceonMachinelearning,pages208{215,2008.P.Donmez,J.G.Carbonell,andP.N.Bennett.Dualstrategyactivelearning.InProceedingsofthe18thEuropeanConferenceonMachineLearning,pages116{127,2007.A.FrankandA.Asuncion.UCImachinelearningrepository,2010.Y.GuoandR.Greiner.Optimisticactivelearningusingmutualinformation.InProceedingsofthe20thInternationalJointConferenceonArticialIntelligence,pages823{829,2007.234