/
LiFerngLinAssuggestedinCohnetal.(1996);Xuetal.(2003),activelearningcan LiFerngLinAssuggestedinCohnetal.(1996);Xuetal.(2003),activelearningcan

LiFerngLinAssuggestedinCohnetal.(1996);Xuetal.(2003),activelearningcan - PDF document

cheryl-pisano
cheryl-pisano . @cheryl-pisano
Follow
387 views
Uploaded On 2016-04-23

LiFerngLinAssuggestedinCohnetal.(1996);Xuetal.(2003),activelearningcan - PPT Presentation

LiFerngLinsentativesamplingalgorithmsalsofailtoachievedecentperformanceInotherwordstheclusteringstepisusuallythebottleneckofrepresentativesamplingHuangetal2010proposeanimprovedalgorithmthatmodel ID: 290573

LiFerngLinsentativesamplingalgorithmsalsofailtoachievedecentperformance.Inotherwords theclusteringstepisusuallythebottleneckofrepresentativesampling.Huangetal.(2010)proposeanimprovedalgorithmthatmodel

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "LiFerngLinAssuggestedinCohnetal.(1996);X..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

LiFerngLinAssuggestedinCohnetal.(1996);Xuetal.(2003),activelearningcanbeimprovedbyconsideringtheunlabeledinstancesinordertoquerytheinstancethatisnotonlyuncertaintotheavailableclassi erbutalso\representative"totheglobaldatadistribution.Therearemanyexistingalgorithmsthatuseunlabeledinformationtoimprovetheperformanceofactivelearning,suchasrepresentativesampling(Xuetal.,2003).Representativesamplingmakesqueryingdecisionsusenotonlytheuncertaintyofeachinstance,butalsotherepresentativeness,whichismeasuredbydeterminingwhethertheinstancesresideinadensearea.Typicalrepresentativesamplingalgorithms(Xuetal.,2003;NguyenandSmeulders,2004;DasguptaandHsu,2008)estimatetheunderlyingdatadistributionviaclusteringmethods.However,theperformanceofthealgorithmsdependsontheresultofclustering,whichisasophisticatedandnon-trivialtask,especiallywhentheinstancesarewithinahighdimensionalspace.Anotherstate-of-the-artalgorithm(Huangetal.,2010)modelstherepresentativenessbyestimatingthepotentiallabelassignmentoftheunlabeledinstancesonthebasisofthemin-maxviewofactivelearning(Hoietal.,2008).Theperformanceofthisalgorithmdependsontheresultsofestimatingthelabelassignments,whichisalsoacomplicatedtask.Inthiswork,weproposeanovelframeworkofactivelearning,hintedsampling,whichconsiderstheunlabeledinstancesashints(Abu-Mostafa,1995)oftheglobaldatadistribu-tion,insteadofdirectlyclusteringthemorestimatingtheirlabelassignments.Thisleadstoasimpleractivelearningalgorithm.Similartorepresentativesampling,hintedsamplingalsoconsidersbothuncertaintyandrepresentativeness.Hintedsamplingenjoysthead-vantageofsimplicitybyavoidingtheclusteringorlabel-assignmentestimationsteps.Wedemonstratethee ectivenessofhintedsamplingbydesigninganovelalgorithmwithsup-portvectormachine(SVM;Vapnik,1998).Inthealgorithm,weextendtheusualSVMtoanovelformulation,HintSVM,whichiseasiertosolvethaneitherclusteringorlabel-assignmentestimation.Wethenstudythehintselectionstrategytoimprovetheeciencyande ectivenessoftheproposedalgorithm.ExperimentalresultsdemonstratethatthesimpleHintSVMwithaproperhintselectionstrategyiscomparabletothebestofbothun-certaintysamplingandrepresentativesamplingalgorithms,andresultsinbetterandmorestableperformancethanotherstate-of-the-artactivelearningalgorithms.Therestofthepaperisorganizedasfollows.Section2introducestheformalproblemde nitionandreviewstherelatedworks.Section3describesourproposedhintedsamplingframeworkaswellastheHintSVMalgorithms.Section4elucidatesthehintselectionstrategy.Section5reportsexperimentresultsandcomparisons.Finally,Section6concludesthiswork.2.ProblemDe nitionandRelatedWorksInthiswork,wefocusonpool-basedactivelearningforbinaryclassi cation,whichisoneofthemostcommonsetupinactivelearning(LewisandGale,1994).Attheinitialstageofthesetup,thelearningalgorithmispresentedwithalabeleddatapoolandanunlabeleddatapool.WedenotethelabeleddatapoolbyDl=f(x1;y1);(x2;y2);:::;(xN;yN)gandtheunlabeleddatapoolbyDu=fex1;ex2;:::;exMg,wheretheinputvectorsxi;exj2Rdandthelabelsyi2f�1;1g.Usually,thelabeleddatapoolDlisrelativelysmallorevenempty,whereastheunlabeleddatapoolDuisassumedtobelarge.Activelearningisan222 LiFerngLinsentativesamplingalgorithmsalsofailtoachievedecentperformance.Inotherwords,theclusteringstepisusuallythebottleneckofrepresentativesampling.Huangetal.(2010)proposeanimprovedalgorithmthatmodelsrepresentativenesswith-outclustering.Inthealgorithm,theusefulnessofeachexj,whichimplicitlycontainsbothuncertaintyandrepresentativeness,isestimatedbyusingatechniqueinsemi-supervisedlearning(Hoietal.,2008)thatchecksapproximatelyallpossiblelabelassignmentsforeachunlabeledexj2Du.ThequeryingalgorithmQproposed(Huangetal.,2010)isbasedontheusefulnessofeachexj;thelearningalgorithmLissimplyastand-aloneSVM.Whiletheactivelearningalgorithm(Huangetal.,2010)oftenachievespromisingempiricalresults,itsbottleneckisthelabel-estimationstep,whichisrathersophisticatedandthusnotalwayseasytoachieveasatisfactoryperformance.AnotherimprovementofrepresentativesamplingispresentedbyDonmezetal.(2007),whoreportthatrepresentativesamplingislessecientthanuncertaintysamplingforlateriterations,inwhichthedecisionfunctionisclosertotheidealone.Tocombinethebestprop-ertiesofuncertaintysamplingandrepresentativesampling,Donmezetal.(2007)proposeamixedalgorithmbyextendingrepresentativesampling(NguyenandSmeulders,2004).TheproposedqueryalgorithmQ(Donmezetal.,2007)issplitintotwostages.The rststageperformsrepresentativesampling(NguyenandSmeulders,2004)whileestimatingtheexpectederrorreduction.Whentheexpectedreductionislessthanagiventhreshold,thequeryingalgorithmQswitchestouncertaintysamplingfor ne-tuningthedecisionbound-ary.Thebottleneckofthealgorithm(Donmezetal.,2007)isstilltheclusteringstepinthe rststage.Insteadoffacingthechallengesofeitherclusteringorlabel-estimation,weproposetoviewtheinformationinDudi erently.Inparticular,theunlabeledinstancesexj2Duaretakenashints(Abu-Mostafa,1995)thatguidethequeryingalgorithmQ.Theideaofusinghintsleadstoasimpleractivelearningalgorithmwithbetterempiricalperformance,asintroducedinthenextsections.3.HintedSamplingFrameworkFirst,weillustratethepotentialdrawbackofuncertaintysamplingwithalinearSVMclassi er(Vapnik,1998),whichisappliedtoatwo-dimensionalarti cialdataset.Figure1showsthearti cialdataset,whichconsistsofthreeclusters,eachofwhichcontainsinstancesofaparticularclass.Wedenoteoneclassbyaredcrossandtheotherbya lledgreencircle.ThelabeledinstancesinDlaremarkedwithabluesquarewhileotherinstancesareinDu.InFigure1(a),theinitialtwolabeledinstancesresideintwooftheclusterswithdi erentlabels.Theinitialdecisionfunctionf(0)trainedonthelabeledinstances(fromthetwoclusters)isnotawareofthethirdcluster.Thedecisionfunctionf(0)thenmis-classi estheinstancesinthethirdcluster,andcausesthequeryingalgorithmQ(whichisbasedonf(0))toqueryonlyfromtheinstancesnearthe\wrong"boundaryratherthanexploringthethirdcluster.Afterseveraliterations,asshowninFigure1(b),theuncertaintysamplingalgorithmstilloutputsanunsatisfactorydecisionfunctionthatmis-classi estheentireunqueried(third)cluster.Theunsatisfactoryperformanceofuncertaintysamplingoriginatesinitslackofaware-nessofcandidateunlabeledinstancesthatshouldbequeried.Whentrainedononlyafew224 ActiveLearningwithHintedSupportVectorMachine (a) (b)Figure1:(a)Thedecisionfunction(black)obtainedfromtwolabeled(blue)instances;(b)whenusingthedecisionfunctionin(a)foruncertaintysampling,theupper-leftclusterkeepsbeingignored (a) (b)Figure2:(a)Thehintedqueryfunction(dashedmagentaline)thatisawareoftheupper-leftcluster;(b)whenusingthehinteddecisionfunctionin(a)foruncertaintysampling,allthreeclustersareexploredlabeledinstances,theresulting(linear)decisionfunctionisoverlycon dentabouttheunla-beledinstancesthatarefarfromtheboundary.Intuitively,uncertaintysamplingcouldbeimprovedifthequeryingalgorithmQwereawareofandlesscon dentabouttheunqueriedregions.Bothclustering(NguyenandSmeulders,2004)andlabel-estimation(Huangetal.,2010)arebasedonthisintuition,buttheyexploretheunlabeledregionsinarathersophis-ticatedway.225 ActiveLearningwithHintedSupportVectorMachinebutstillpassesthroughthesameregionbecauseofthemanyhints.Theinstancetobequeriedisthenthegreensquare,whichisclosetoxiandarguablynotcarryingmuchaddi-tionalinformation.Todrivethequeryboundaryawayfromtheknownxi,thesurroundingneighborsofxishouldbedroppedfromthehintpoolDh,asshowninFigure3(c).Then,theboundarycouldassistthequeryingalgorithmQinqueryingotherpotentiallymorevaluableinstancesthatarefarfromxi,suchastheonemarkedbyasquareinFigure3(c).Weimplementtheideawithaneighborhoodfunctioni:Rd![0;1]tomeasuretheclosenessofanunlabeledinstanceexjtoeachgivenlabeledinstancexi.GiventhelabeledpoolDlandtheneighborhoodfunctionsiofeachxi2Dl,weproposedroppingexjfromthehintpoolwiththeprobabilitymaxxi2Dli(exj):Thatis,ifexjisclosetosomexi(highi(exj)),withahighprobabilitythatexjwouldbedroppedfromDh.Theneighborhoodfunctionicanbeviewedasa\droppingrecommendation"fromxi.Wedesignibyrequestingthefunctiontosatisfythreenaturalconstraints:(1)i(xi)=1,whichmeansaduplicateexampleshouldbedropped(2)theifortheclosestneighbortoxiisP(3)theiforthefarthestneighbortoxiisp,wherepP.Wemodelibyaradiusbasisfunctiontosatisfythethreeconstraints:i(exj)=P(r ij):(2)Hererj=kexj�xik=diisanormalizeddistanceofexjgivenxi,anddiisthedistancetotheclosestneighborofxi.Then,accordingto(2)andtheconstraints,wecaneasilysolve i=loglogp logP.logRi,whereRiisthenormalizeddistanceofthefarthestneighborofxi.Wenowbrie ycomparefoursamplingstrategiesforALHS:(1)ALL:includeallun-labeledinstances,(2)RAND:randomlydropinstancesfromDhwitha xedprobability,(3)CLOSEST:dropa xednumberofneighborsclosesttothequeriedinstance,(4)SAM-PLE:theproposedstrategy.TheresultsontwodatasetsareshowninFigure4,andthedetailedexperimentalsettingsarelistedinSection5.Accordingtotheexperimentalre-sults,theALLstrategyistheworstbecausetoomanyhintsoverwhelmHintSVM.RANDstrategycansolvetheweaknessofALLstrategy,butitsperformanceattheearlierstagemaybeunsatisfactoryifcurrentlabeledinstancesarenotconsidered.CLOSESTmatchesthecharacteristicofHintSVM,butitmaybeanoverkilltodropallneighborsbasedononlyonequeriedinstance.Amongthestrategies,SAMPLEperformsthebest.Itdropstheneighborinstanceswiththeprobabilitiescomputedfromneighborhoodfunctions,andhasachancetokeepsomeneighborsashintsindenseregions.Furthermore,basedonthehintselectionstrategy,thehintpoolDhcontainsthemostinformativeinstancesinDu.Therefore,whenDhisnon-empty,weproposetoletQselectqueriesfromDhinsteadofDu.4.3.HintTerminationAfterqueryingasucientnumberofinstances,ALHScapturestheunderlyingdatadistri-butionwithahighprobabilityandtheclassi erf(r)onhandshallbeclosetotheidealone.Atthattime,allhintscarrylittleinformationtoassistALHSandthusarenotimportant.ThequeryingalgorithmQinALHScanthendropallthehintstoswitchtouncertainty229 LiFerngLin (a)diabetes (b)letterVvsYFigure4:Comparisonofhintsamplingmethodsfordi erentdatasetssampling.ThisideaissimilarlyexploredbyDonmezetal.(2007),andwecallithinttermination.Wesetaterminationrulebasedontheproportionoftheremaininghintinstances.Afterwedropmanyhintsbyqueryingenoughinstances,theremaininghintsarenotimportant.TheterminationruleisjDhj jDlj+jDuj,whereisagiventhreshold.Weexaminetwothresholdsat=0(notermination)and=0:5.AsshowninFigure5,theexperimentresultsshowthat=0:5iscomparableto=0andcouldevenoutperform=0insomecases.Weobservethesimilarresultsinotherdatasets,andthusconsider=0:5infutureexperiments. (a)diabetes (b)letterVvsYFigure5:Comparisonofdi erentvalues230 ActiveLearningwithHintedSupportVectorMachineTable1:Comparisononaccuracy(meanse)afterquerying5%ofunlabeledpool Algorithms(%),thehighestaccuracyforeachdatasetisinboldface dataUNCERTAINREPRESENTQUIREDUALALHS australian82:1881:57183:7390:54882:3191:12681:3040:64784:0720:454breast96:3340:27895:2640:43996:6570:18796:4080:19696:5250:219diabetes63:2292:76766:7580:50566:7710:96065:1430:38166:8621:632german69:0600:49767:2401:09968:7500:60569:6200:32369:7500:349letterMvsN89:6321:10383:4631:34881:3721:69383:4371:21191:9190:812letterVvsY79:2451:17663:5232:33568:5162:13276:2131:54979:3811:174segment95:4370:36794:3900:48296:0740:22486:0782:83496:0950:204splice74:4300:60669:1171:45270:3400:94256:9690:57675:5060:403wdbc93:8423:13795:6160:71196:6130:23096:0560:25096:9210:2005.ExperimentWecomparedtheproposedALHSalgorithmwiththefollowingactivelearningalgorithms:(1)UNCERTAIN(TongandKoller,2000):uncertaintysamplingwithSVM,(2)REPRE-SENT(Xuetal.,2003):representativesamplingwithSVMandclustering,(3)DUAL(Don-mezetal.,2007):mixtureofuncertaintyandrepresentativesampling,(4)QUIRE(Huangetal.,2010):representativesamplingwithlabelestimationbasedonthemin-maxview.WeconductedexperimentsonnineUCIbenchmarks(FrankandAsuncion,2010),whichareaustralian,breast,diabetes,german,splice,wdbc,letetrMvsN,letterVvsY(Donmezetal.,2007;Huangetal.,2010)andsegment-binary(Ratschetal.,2001;Donmezetal.,2007)aschosenbyotherrelatedworks.Foreachdataset,werandomlydivideditintotwopartswithequalsize.OnepartwastreatedastheunlabeledpoolDuforactivelearningalgorithms.Theotherpartwasreservedasthetestset.Beforequerying,werandomlylabeledonepositiveinstanceandonenegativeinstancetoformthelabeledpoolDl.Foreachdataset,weranthealgorithms20timeswithdi erentrandomsplits.Duetothedicultyoflocatingthebestparametersforeachactivelearningalgorithmsinpractice,wechosetocompareallalgorithmson xedparameters.Intheexperiments,everySVM-basedalgorithmtookLIBSVM(ChangandLin,2011)withtheRBFkernelandthedefaultparameters,exceptforC=5.Correspondingly,theparameterinDonmezetal.(2007);Huangetal.(2010)wassetto=1 C.Theseparametersensurethatallfouralgorithmsbehaveinastablemanner.ForALHS,we xed=0:5,P=0:5andp=0:01asdiscussedintheprevioussectionswithnofurthertuningforeachdataset.Forotheralgorithms,wetaketheparametersintheoriginalpapers.Figure6presentstheaccuracyofdi erentactivelearningalgorithmsalongwiththenumberofroundsR,whichequalsthenumberofqueriedinstances.Tables1and2listthemeanandstandarderrorofaccuracywhenR=jDuj5%andR=jDuj10%,respectively.Thehighestmeanaccuracyisshowninboldfaceforeachdataset.Wealsoconductedthet-testat95%signi cancelevelasdescribedbyMelvilleandMooney(2004);GuoandGreiner(2007);Donmezetal.(2007).Thet-testresultsaregiveninTable3,whichsummarizesthenumberofdatasetsinwhichALHSperformssigni cantlybetter(orworse)thantheotheralgorithms.231 LiFerngLin (a)australian (b)breast (c)diabetes (d)leterMvsN (e)letterVvsY (f)segment (g)wdbc (h)spliceFigure6:Comparisonondi erentdatasets232 ActiveLearningwithHintedSupportVectorMachineTable2:Comparisononaccuracy(meanse)afterquerying10%ofunlabeledpool Algorithms(%),thehighestaccuracyforeachdatasetisinboldface dataUNCERTAINREPRESENTQUIREDUALALHS australian83:8840:46084:8840:36784:8700:45581:1740:79884:9860:314breast96:8040:18896:3780:21296:6420:17996:4220:23596:7890:175diabetes66:7062:63266:4841:22367:5001:33765:1430:38171:1591:224german71:4100:48867:1500:77370:2500:56069:7600:29971:6900:333letterMvsN95:3690:31592:4330:77795:1140:48686:8930:87095:6480:264letterVvsY88:2130:63573:8061:55184:7230:89180:1231:35988:6970:607segment96:5280:14395:6840:15596:6580:11089:5191:76096:5450:100splice79:9310:27476:2740:89578:5600:64858:9470:85380:6350:309wdbc97:1550:14196:8180:19196:8620:20695:7480:24797:1110:157Table3:ALHSversustheotheralgorithmbasedont-testat95%signi cancelevel Algorithms(win/tie/loss) PercentageofqueriesUNCERTAINREPRESENTQUIREDUAL 5%6/3/07/2/06/3/05/4/010%5/4/07/2/06/3/05/4/0Forsomedatasets,suchaswdbcandbreastinFigure6(g)and6(b),representativesam-plingapproaches(REPRESENT,DUALandQUIRE)achieveabetterperformance,whiletheresultforUNCERTAINisunsatisfactory.Thisunsatisfactoryperformanceispossiblycausedbythelackofawarenessofunlabeledinstances,whichechoesourillustrationinFigure1.ALHSimprovesonUNCERTAINbyusingthehints,andiscomparabletootherrepresentativesamplingalgorithms.1Ontheotherhand,inFigure6(h),sincespliceisalargerandhigherdimensionaldataset,representativesamplingalgorithmsthatperformclustering(REPRESENT,DUAL)orlabelestimation(QUIRE)failtoreachadecentper-formance,whileALHSkeepsastableperformanceandslightlyoutperformsUNCERTAINbyusingthehints.InFigure6,weseethatALHScanachievecomparableresultstothoseofthebestrepresentativesamplinganduncertaintysamplingalgorithms.AsshowninTables1and2,afterquerying5%oftheunlabeledinstances(Table1),ALHSachievesthehighestmeanaccuracyin8outof9datasets;afterquerying10%ofunlabeledinstances(Table2),ALHSachievesthehighestmeanaccuracyin6outof9datasets.Table3furthercon rmsthatALHSusuallyoutperformseachoftheotheralgorithmsatthe95%signi cancelevel. 1.Therearesomemoreaggressivequeryingcriteria(TongandKoller,2000)thanUNCERTAINandwehavecomparedwiththoseasadditionalexperiments.OurpreliminaryobservationwasthatthosecriteriacanbeworsethanUNCERTAINwithsoft-marginSVMandhenceweexcludedthemfromthetables.233 LiFerngLin6.ConclusionWeproposeanewframeworkofactivelearning,hintedsampling,whichexploitstheunla-beledinstancesashints.Hintedsamplingcantakebothuncertaintyandrepresentativenessintoaccountconcurrentlyinamorenaturalandsimplerway.WedesignanovelactivelearningalgorithmALHSwithintheframework,andcouplethealgorithmwithapromisinghintselectionstrategy.BecauseALHSmodelstherepresentativenessbyhints,itavoidsthepotentialproblemsofothermoresophisticatedapproachesthatareemployedbyotherrepresentativesamplingalgorithms.Hence,ALHSresultsinasigni cantlybetterandmorestableperformancethanotherstate-of-the-artalgorithms.Duetothesimplicityande ectivenessofhintedsampling,itisworthstudyingmoreaboutthisframework.Anintensiveresearchdirectionistocouplehintedsamplingwithotherclassi cationalgorithms,andinvestigatedeeperonthehintselectionstrategies.WhileweuseSVMinALHS,thisframeworkcouldbegeneralizedtootherclassi cationalgorithms.Inthefuture,weplantoinvestigatemoregeneralhintselectionstrategiesandextendhintedsamplingfrombinaryclassi cationtootherclassi cationproblem.AcknowledgmentsWethankDr.Chih-HanYu,theanonymousreviewersandthemembersoftheNTUCom-putationalLearningLabforvaluablesuggestions.ThisworkissupportedbytheNationalScienceCouncilofTaiwanviathegrantNSC101-2628-E-002-029-MY2.ReferencesY.S.Abu-Mostafa.Hints.NeuralComputation,4:639{671,1995.K.P.BennettandA.Demiriz.Semi-supervisedsupportvectormachines.InAdvancesinNeuralInformationProcessingSystems11,pages368{374,1998.C.-C.ChangandC.-J.Lin.LIBSVM:Alibraryforsupportvectormachines.ACMTrans-actionsonIntelligentSystemsandTechnology,pages27:1{27:27,2011.D.A.Cohn,Z.Ghahramani,andM.I.Jordan.Activelearningwithstatisticalmodels.JournalofArti cialIntelligenceResearch,4:129{145,1996.S.DasguptaandD.Hsu.Hierarchicalsamplingforactivelearning.InProceedingsofthe25thInternationalConferenceonMachinelearning,pages208{215,2008.P.Donmez,J.G.Carbonell,andP.N.Bennett.Dualstrategyactivelearning.InProceedingsofthe18thEuropeanConferenceonMachineLearning,pages116{127,2007.A.FrankandA.Asuncion.UCImachinelearningrepository,2010.Y.GuoandR.Greiner.Optimisticactivelearningusingmutualinformation.InProceedingsofthe20thInternationalJointConferenceonArti cialIntelligence,pages823{829,2007.234