edu Kamalika Chaudhuri University of California San Diego 9500 Gilman Drive La Jolla CA 92093 kamalikacsucsdedu Abstract We study agnostic active learning where the goal is to learn a classi er in a pre speci ed hypothesis class interactively with as ID: 46615
Download Pdf The PPT/PDF document "Beyond Disagreementbased Agnostic Active..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
BeyondDisagreement-basedAgnosticActiveLearning ChichengZhangUniversityofCalifornia,SanDiego9500GilmanDrive,LaJolla,CA92093KamalikaChaudhuriUniversityofCalifornia,SanDiego9500GilmanDrive,LaJolla,CA92093Westudyagnosticactivelearning,wherethegoalistolearnaclassierinapre-edhypothesisclassinteractivelywithasfewlabelqueriesaspossible,whilemakingnoassumptionsonthetruefunctiongeneratingthelabels.Themainal-gorithmforthisproblemisdisagreement-basedactivelearning,whichhasahighlabelrequirement.Thusamajorchallengeistoδndanalgorithmwhichachievesbetterlabelcomplexity,isconsistentinanagnosticsetting,andappliestogeneralcationproblems.Inthispaper,weprovidesuchanalgorithm.Oursolutionisbasedontwonovelcontributions;δrst,areductionfromconsistentactivelearningtoconpredictionwithguaranteederror,andsecond,anovelcondence-ratedpredictor.1IntroductionInthispaper,westudyactivelearningofclassiersinanagnosticsetting,wherenoassumptionsaremadeonthetruefunctionthatgeneratesthelabels.Thelearnerhasaccesstoalargepoolofunlabelledexamples,andcaninteractivelyrequestlabelsforasmallsubsetofthese;thegoalistolearnanaccurateclassierinapre-speciedclasswithasfewlabelqueriesaspossible.Specically,wearegivenahypothesisclassHandatargetδ,andouraimistoδndabinaryclassierinHwhoseerrorisatmostδmorethanthatofthebestclassierinH,whileminimizingthenumberofrequestedlabels.Therehasbeenalargebodyofpreviousworkonactivelearning;seethesurveysby[10,28]foroverviews.Themainchallengeinactivelearningisensuringconsistencyintheagnosticsettingwhilestillmaintaininglowlabelcomplexity.Inparticular,averynaturalapproachtoactivelearningistoviewitasageneralizationofbinarysearch[17,9,27].Whilethisstrategyhasbeenextendedtoseveraldifferentnoisemodels[23,27,26],itisgenerallyinconsistentintheagnosticcase[11].Theprimaryalgorithmforagnosticactivelearningiscalleddisagreement-basedactivelearningThemainideaisasfollows.AsetVofpossibleriskminimizersismaintainedwithtime,andthelabelofanexamplexisqueriedifthereexisttwohypotheseshhVsuchthath)δ).Thisalgorithmisconsistentintheagnosticsetting[7,2,12,18,5,19,6,24];however,duetotheconservativelabelquerypolicy,itslabelrequirementishigh.Alineofworkdueto[3,4,1]haveprovidedalgorithmsthatachievebetterlabelcomplexityforlinearclassicationontheuniformdistributionovertheunitsphereaswellaslog-concavedistributions;however,theiralgorithmsarelimitedtothesespeciccases,anditisunclearhowtoapplythemmoregenerally.Thus,amajorchallengeintheagnosticactivelearningliteraturehasbeentoδndageneralactivelearningstrategythatappliestoanyhypothesisclassanddatadistribution,isconsistentintheagnos-ticcase,andhasabetterlabelrequirementthandisagreementbasedactivelearning.Thishasbeenmentionedasanopenproblembyseveralworks,suchas[2,10,4]. Inthispaper,weprovidesuchanalgorithm.Oursolutionisbasedontwokeycontributions,whichmaybeofindependentinterest.Theδrstisageneralconnectionbetweencondence-ratedpre-dictorsandactivelearning.Acondence-ratedpredictorisonethatisallowedtoabstainfrompredictiononoccasion,andasaresult,canguaranteeatargetpredictionerror.Givenaconratedpredictorwithguaranteederror,weshowhowtotoconstructanactivelabelqueryalgorithmconsistentintheagnosticsetting.Oursecondkeycontributionisanovelcondence-ratedpredictorwithguaranteederrorthatappliestoanygeneralclassicationproblem.Weshowthatourpredictoroptimalintherealizablecase,inthesensethatithasthelowestabstentionrateoutofallpredictorsguaranteeingacertainerror.Moreover,weshowhowtoextendourpredictortotheagnosticsetting.Combiningthelabelqueryalgorithmwithournovelcondence-ratedpredictor,wegetageneralactivelearningalgorithmconsistentintheagnosticsetting.Weprovideacharacterizationofthelabelcomplexityofouralgorithm,andshowthatthisisbetterthantheboundsknownfordisagreement-basedactivelearningingeneral.Finally,weshowthatforlinearclassicationwithrespecttotheuniformdistributionandlog-concavedistributions,ourboundsreducetothoseof[3,4].2Algorithm2.1TheSettingWestudyactivelearningforbinaryclassication.ExamplesbelongtoaninstancespaceX,andtheirlabelslieinalabelspaceY={1;labelledexamplesaredrawnfromanunderlyingdatadistributionDonX×Y.WeuseDtodenotethemarginalonDonX,andD|todenotetheconditionaldistributiononY|=xinducedbyD.OuralgorithmhasaccesstoexamplesthroughtwooraclesanexampleoracleUwhichreturnsanunlabelledexamplex�XdrawnfromDalabellingoracleOwhichreturnsthelabelyofaninputx�XdrawnfromD|GivenahypothesisclassHofVCdimensiond,theerrorofanyh�HwithrespecttoadatadistributionδoverX×Yisdenedaserr)=Px,y)δ(x)δy).Wede)=argminH),κ)=err)).ForasetS,weabusenotationanduseStoalsodenotetheuniformdistributionovertheelementsofS.WedeP):=Px,y)δ),):=Ex,y)δ).GivenaccesstoexamplesfromadatadistributionDthroughanexampleoracleUandalabelingO,weaimtoprovideaclassi�Hsuchthatwithprobability�1κ�,err�)+δ,forsometargetvaluesofδand�;thisisachievedinanadaptivemannerbymakingasfewqueriestothelabellingoracleOaspossible.Whenκ)=0,wearesaidtobeintherealizablecase;inthemoregeneralagnosticcase,wemakenoassumptionsonthelabels,andthus)canbepositive.PreviousapproachestoagnosticactivelearninghavefrequentlyusedthenotionofdisagreementsThedisagreementbetweentwohypotheseshhwithrespecttoadatadistributionδisthefractionofexamplesaccordingtoδtowhichhhassigndifferentlabels;formally:,h)=Px,y)δ)δh)).Observethatadatadistributionδinducesapseudo-�ontheelementsofH;thisiscalledthedisagreementmetric.Foranyrandanyh�H,Bh,r)tobethedisagreementballofradiusraroundhwithrespecttothedatadistribution.Formally:Bh,r)={H:�h,h�r}Fornotationalsimplicity,weassumethatthehypothesisspaceisdensewithrepsecttothedatadistributionD,inthesensethat�r�0supB),r)h,h))=r.Ouranalysiswillstillapplywithoutthedensenessassumption,butwillbesignicantlymoremessy.Finally,givenasetofhypothesesV�H,thedisagreementregionofVisthesetofallexamplesxsuchthatthereexisttwohypothesesh,hVforwhichh)δh).Thispaperestablishesaconnectionbetweenactivelearningandcondence-ratedpredictorswithguaranteederror.Acondence-ratedpredictorisapredictionalgorithmthatisoccasionallyal-lowedtoabstainfromclassication.Wewillconsidersuchpredictorsinthetransductivesetting.GivenasetVofcandidatehypotheses,anerrorguarantee�,andasetUofunlabelledexamples,acondence-ratedpredictorPeitherassignsalabelorabstainsfrompredictiononeachunlabelled �U.ThelabelsareassignedwiththeguaranteethattheexpecteddisagreementbetweenthelabelassignedbyPandanyh�Vis��.Specically,forallh�V,PU(x)δP(),P()δ=0)��(1)ThisensuresthatifsomehVisthetrueriskminimizer,then,thelabelspredictedbyPonUdonotdifferverymuchfromthosepredictedbyh.Theperformanceofacondence-ratedpredictorwhichhasaguaranteesuchasinEquation(1)ismeasuredbyitscoverage,ortheprobabilityofPU()δ=0);highercoverageimpliesbetterperformance.2.2MainAlgorithmOuractivelearningalgorithmproceedsinepochs,wherethegoalofepochkistoachieveexcessgeneralizationerrorδδ2+1,byqueryingafreshbatchoflabels.ThealgorithmmaintainsacandidatesetVthatisguaranteedtocontainthetrueriskminimizer.Thecriticaldecisionateachepochishowtoselectasubsetofunlabelledexampleswhoselabelsshouldbequeried.Wemakethisdecisionusingacondence-ratedpredictorP.Atepochk,werunwithcandidatehypothesissetV=Vanderrorguarantee�=δ.WheneverPabstains,wequerythelabeloftheexample.Thenumberoflabelsmqueriedisadjustedsothatitisenoughtoachieveexcessgeneralizationerrorδ+1AnoutlineisdescribedinAlgorithm1;wenextdiscusseachindividualcomponentindetail. Algorithm1ActiveLearningAlgorithm:Outline ExampleoracleU,LabellingoracleO,hypothesisclassHofVCdimensionddence-ratedpredictorP,targetexcesserrorδandtargetcon�.k�log1.InitializecandidatesetVH.fork=12,..kδδ+1� k+1)UtogenerateafreshunlabelledsampleU{k,,...,zk,nofsizen512 ln192(512 +ln288 Runcondence-ratedpredictorPwithinpuyV=VU=Uanderrorguarantee=δtogetabstentionprobabilities�k,,...,�k,nontheexamplesinU.TheseprobabilitiesinduceadistributionκU.Let�PU()=0)= k,iintheRealizableCasethenm1536� ln1536� +ln .Drawmi.i.dexamplesfromκandqueryforlabelsoftheseexamplestogetalabelleddatasetS.UpdateV+1S+1{�Vh)=y,forall(x,y)�SInthenon-realizablecase,useAlgorithm2withinputshypothesissetV,distribution,targetexcesserror ,targetcon ,andthelabelingoracleOtogetanewhypothesissetV+1returnanarbitrary�V CandidateSets.Atepochk,wemaintainasetVofcandidatehypothesesguaranteedtocontainthetrueriskminimizerh)(w.h.p).Intherealizablecase,weuseaversionspaceasourcandidateset.TheversionspacewithrespecttoasetSoflabelledexamplesisthesetofallh�Hsuchthatx)=yforall(,y�S.Lemma1.SupposewerunAlgorithm1intherealizablecasewithinputsexampleoracleU,la-bellingoracleO,hypothesisclassH,condence-ratedpredictorP,targetexcesserrorδandtarget�.Then,withprobability1h)�Vforallk=12,...,k+1Inthenon-realizablecase,theversionspaceisusuallyempty;weuseinsteada(1κ�)setforthetrueriskminimizer.GivenasetSofnlabelledexamples,letC()�Hbeafunctionof wheretheexpectationiswithrespecttotherandomchoicesmadebyP ;C()issaidtobea(1κ�)dencesetforthetrueriskminimizerifforalldatadistributionsoverX×Y,δκ,hκ(�)�C()]�1κ�,Recallthath)=argminH).Inthenon-realizablecase,ourcandidatesetsare(1κ�)dencesetsforh),for�=�.TheprecisesettingofVisexplainedinAlgorithm2.Lemma2.SupposewerunAlgorithm1inthenon-realizablecasewithinputsexampleoracleU,labellingoracleO,hypothesisclassH,condence-ratedpredictorP,targetexcesserrorδandtargetcon�.Thenwithprobability1κ�,h)�Vforallk=12,...,k+1LabelQuery.Wenextdiscussourlabelqueryprocedurewhichexamplesshouldwequerylabelsfor,andhowmanylabelsshouldwequeryateachepoch?WhichLabelstoQuery?Ourgoalistoquerythelabelsofthemostinformativeexamples.Tochoosetheseexampleswhilestillmaintainingconsistency,weuseacondence-ratedpredictorPwithguaranteederror.TheinputstothepredictorareourcandidatehypothesissetVwhichcontains(w.h.p)thetrueriskminimizer,afreshsetUofunlabelledexamples,andanerrorguarantee�=.Fornotationsimplicity,assumetheelementsinUaredistinct.Theoutputisasequenceofabstentionprobabilities{k,�k,,...,�k,n,foreachexampleinU.ItinducesadistributionκoverU,fromwhichweindependentlydrawexamplesforlabelqueries.HowManyLabelstoQuery?ThegoalofepochkistoachieveexcessgeneralizationerrorδToachievethis,passivelearningrequires(d/δlabelledexamplesintherealizablecase,and((κ)+δexamplesintheagnosticcase.AkeyobservationinthispaperisthatinordertoachieveexcessgeneralizationerrorδD,itsufδcestoachieveamuchlargerexcessgeneralizationerrorO(onthedatadistributioninducedbyκD|,where�isthefractionofexamplesonwhichthecondence-ratedpredictorabstains.Intherealizablecase,weachievethisbysamplingm1536� ln1536� +ln i.i.dexamplesκ,andqueryingtheirlabelstogetalabelleddatasetS.Observethatas�istheabstentionprobabilityofPwithguaranteederror�δ,itisgenerallysmallerthanthemeasureofthedisagreementregionoftheversionspace;thiskeyfactresultsinimprovedlabelcomplexityoverdisagreement-basedactivelearning.Thissamplingprocedurehasthefollowingproperty:Lemma3.SupposewerunAlgorithm1intherealizablecasewithinputsexampleoracleU,la-bellingoracleO,hypothesisclassH,condence-ratedpredictorP,targetexcesserrorδandtarget�.Thenwithprobability1κ�,forallk=12,...,k+1,andforallh�V)�δ.Inparticular,thereturnedattheendofthealgorithmsatiseserr�δTheagnosticcasehasanaddedcomplicationinpractice,thevalueofκisnotknownaheadoftime.Inspiredby[24],weuseadoublingprocedure(statedinAlgorithm2)whichadaptivelyδthenumbermoflabelledexamplestobequeriedandqueriesthem.Thefollowingtwolemmasillustrateitspropertiesthatitisconsistent,andthatitdoesnotusetoomanylabelqueries.Lemma4.SupposewerunAlgorithm2withinputshypothesissetV,exampledistribution�labellingoracleO,targetexcesserrorδandtargetcon.LetbethejointdistributiononX×Yinducedby�andD|.Thenthereexistsanevent,P)�1κ,suchthaton,(1)Algorithm2haltsand(2)thesetVhasthefollowingproperties:(2.1)Ifforh�H,err)κerr))�δ,thenh�V(2.2)Ontheotherhand,ifh�V,thenerr)κerr))�δWheneventhappens,wesayAlgorithm2succeeds.Lemma5.SupposewerunAlgorithm2withinputshypothesissetV,exampledistribution�labellingoracleO,targetexcesserrorδandtargetcon.Thereexistssomeabsoluteconstant0,suchthatontheeventthatAlgorithm2succeeds,ncdln κ+ln )+κ κ.Thusthetotalnumberoflabelsqueriedis=122dln κ+ln )+κ κ (hideslogarithmicfactors Anaiveapproach(seeAlgorithm4intheAppendix)whichusesanadditiveVCboundgivesasamplecomplexityofO((dln(1δ)+ln(1))δ;Algorithm2givesabettersamplecomplexity.Thefollowinglemmaisaconsequenceofourlabelqueryprocedureinthenon-realizablecase.Lemma6.SupposewerunAlgorithm1inthenon-realizablecasewithinputsexampleoracleU,labellingoracleO,hypothesisclassH,condence-ratedpredictorP,targetexcesserrorδandtargetcon�.Thenwithprobability1κ�,forallk=12,...,k+1,andforallh�V)�err))+δ.Inparticular,thereturnedattheendofthealgorithmsatis�err))+δ Algorithm2AnAdaptiveAlgorithmforLabelQueryGivenTargetExcessError HypothesissetVofVCdimensiond,Exampledistribution�,LabelingoracleO,targetexcesserrorδ,targetcon.forj=12,...doDrawn=2i.i.dexamplesfrom�;querytheirlabelsfromOtogetalabelleddataset.Denote/j(j+1)).TrainanERMclassiVoverSnethesetVasfollows:�V:err)�err)+δ �(n)+ (nh,�(n,�):= ln +ln supV(n)+ (nh,�κ j,breakreturnV 2.3Condence-RatedPredictorOuractivelearningalgorithmusesacondence-ratedpredictorwithguaranteederrortomakeitslabelquerydecisions.Inthissection,weprovideanovelcondence-ratedpredictorwithguaranteederror.Thispredictorhasoptimalcoverageintherealizablecase,andmaybeofindependentinterest.ThepredictorPreceivesasinputasetV�Hofhypotheses(whichislikelytocontainthetrueriskminimizer),anerrorguarantee�,andasetofUofunlabelledexamples.Weconsiderasoftpredictionalgorithm;so,foreachexampleinU,thepredictorPoutputsthreeprobabilitiesthataddupto1theprobabilityofpredicting1κand0.Thisoutputissubjecttotheconstraintthattheexpecteddisagreementbetweenthe±labelsassignedbyPandthoseassignedbyanyh�Visatmost�,andthegoalistomaximizethecoverage,ortheexpectedfractionofnon-abstentions.Ourkeyinsightisthatthisproblemcanbewrittenasalinearprogram,whichisdescribedinAlgo-rithm3.Therearethreevariables,≡√�,foreachunlabelledzU;therearetheprobabil-itieswithwhichwepredict1κ1and0onzrespectively.Constraint(2)ensuresthattheexpecteddisagreementbetweenthelabelpredictedandanyh�Visnomorethan�,whiletheLPobjectivemaximizesthecoverageundertheseconstraints.ObservethattheLPisalwaysfeasible.AlthoughtheLPhasinnitelymanyconstraints,thenumberofconstraintsinEquation(2)distinguishablebyisatmost(em/d),wheredistheVCdimensionofthehypothesisclassH.Theperformanceofacondence-ratedpredictorismeasuredbyitserrorandcoverage.Theerrorofacondence-ratedpredictoristheprobabilitywithwhichitpredictsthewronglabelonanexample,whilethecoverageisitsprobabilityofnon-abstention.WecanshowthefollowingguaranteeontheperformanceofthepredictorinAlgorithm3.Theorem1.Intherealizablecase,ifthehypothesissetVistheversionspacewithrespecttoatrainingset,thenPU()δh),P()δ=0)��.Inthenon-realizablecase,ifthehypothesissetVisan(1κ�)dencesetforthetrueriskminimizerh,then,w.p�1κ�,U()δy,P(x)δ=0)�PU)δy)+�. wheretheexpectationistakenovertherandomchoicesmadebyP Algorithm3Condence-ratedPredictor hypothesissetV,unlabelleddataU={,...,z,errorbound�.Solvethelinearprogram:minsubjectto:�i,≡√�=1�V,(z(z1�m(2)i,≡√�0ForeachzU,outputprobabilitiesforpredicting1κand0≡√,and� Intherealizablecase,wecanalsoshowthatourcondenceratedpredictorhasoptimalcoverage.Observethatwecannotdirectlyshowoptimalityinthenon-realizablecase,astheperformancedependsontheexactchoiceofthe(1κ�)-conδdenceset.Theorem2.Intherealizablecase,supposethatthehypothesissetVistheversionspacewithrespecttoatrainingset.IfPisanycondenceratedpredictorwitherrorguarantee�,andifPisthepredictorinAlgorithm3,then,thecoverageofPisatleastmuchasthecoverageofP3PerformanceGuaranteesAnessentialpropertyofanyactivelearningalgorithmisconsistencythatitconvergestothetrueriskminimizergivenenoughlabelledexamples.Weobservethatouralgorithmisconsistentpro-videdweuseanycondence-ratedpredictorPwithguaranteederrorasasubroutine.Theconsis-tencyofouralgorithmisaconsequenceofLemmas3and6andisshowninTheorem3.Theorem3(Consistency)SupposewerunAlgorithm1withinputsexampleoracleU,labellingoracleO,hypothesisclassH,condence-ratedpredictorP,targetexcesserrorδandtarget�.Thenwithprobability1κ�,theclassireturnedbyAlgorithm1satisκerr))�δWenowestablishalabelcomplexityboundforouralgorithm;however,thislabelcomplexityboundappliesonlyifweusethepredictordescribedinAlgorithm3asasubroutine.ForanyhypothesissetV,datadistributionD,and�,deδV,�)tobetheminimumabsten-tionprobabilityofacondence-ratedpredictorwhichguaranteesthatthedisagreementbetweenitspredictedlabelsandanyh�VunderDisatmost�.Formally,δV,�)=min{E(x):EEI((x)=+1)√(x)+I(h(x)=κ≡(x)]�forallh�V,�(x)+≡()+√()≡1�(x),≡(),√()�0.De�(r,�):=,r),�).Thelabelcomplexityofouractivelearningalgorithmcanbestatedasfollows.Theorem4(LabelComplexity)SupposewerunAlgorithm1withinputsexampleoracleU,la-bellingoracleO,hypothesisclassH,condence-ratedpredictorPofAlgorithm3,targetexcesserrorδandtargetcon�.Thenthereexistconstantsc,c0suchthatwithprobabilityκ�:(1)Intherealizablecase,thetotalnumberoflabelsqueriedbyAlgorithm1isatmost:log =1lnδδ +ln(log(1/δ)�κk+1 δδ (2)Intheagnosticcase,thetotalnumberoflabelsqueriedbyAlgorithm1isatmost:log =1ln(2κ)+δδ +ln(log(1/δ)�κk+1 (2κ)+δδ ) k)6 Thelabelcomplexityofdisagreement-basedactivelearningischaracterizedintermsofthedisagreementcoefδ.Givenaradiusr,thedisagreementcoefδθ(r)isde(r)=supDIS(B,r))) whereforanyV�H,DISV)isthedisagreementregionofV.AsPDIS(B,r)))=r,0)[13],inournotation,θ(r)=supr Intherealizablecase,thebestknownboundforlabelcomplexityofdisagreement-basedactivelearningis(()·ln(1·(dlnθ()+lnln(1/δ)))[20].Ourlabelcomplexityboundmaybeedto: supआlog(1/κ)�δδ lnsupआlog(1/κ)�δδ +lnln whichisessentiallytheboundof[20]withθ()replacedbysupआlog(1/κ)�κ256) .Asen-forcingalowererrorguaranteerequiresmoreabstention,�r,�)isadecreasingfunctionof�;asasupआlog(1/κ)�δδ θ(),andourlabelcomplexityboundisbetter.Intheagnosticcase,[12]providesalabelcomplexityboundof((2)+δ)·() ln(1/δ)+lnδ)))fordisagreement-basedactive-learning.Incontrast,byProposition1ourlabelcom-plexityisatmost:supआlog(1/κ)�(2κ)+δδ )+δ) ln(1/δ)+dlnδ)Again,thisisessentiallytheboundof[12]withθ(2κ)+δreplacedbythesmallerquantitysupआlog(1/κ)�(2κ)+δδ )+δ[20]hasprovidedamorerenedanalysisofdisagreement-basedactivelearningthatgivesalabelcomplexityof(()+δ)() +ln dlnθ(κ)+δ)+lnln ;observethattheirdependenceisstillonθ(κ)+δ).Weleaveamorerenedlabelcomplexityanalysisofouralgorithmforfuturework.Animportantsub-caseoflearningfromnoisydataislearningundertheTsybakovnoisecondi-tions[30].WedeferthediscussionintotheAppendix.3.1CaseStudy:LinearClassicationundertheLog-concaveDistributionWenowconsiderlearninglinearclassierswithrespecttolog-concavedatadistributiononR.Inthiscase,foranyr,thedisagreementcoefδθ(r)�O( ln(1))[4];however,forany��0r,) O(ln(r/))(seeLemma14intheAppendix),whichismuchsmallersolongas�/risnottoosmall.Thisleadstothefollowinglabelcomplexitybounds.Corollary1.SupposeDisisotropicandlog-concaveonR,andHisthesetofhomogeneouslin-earclassiersonR.ThenAlgorithm1withinputsexampleoracleU,labellingoracleO,hypothesisH,condence-ratedpredictorPofAlgorithm3,targetexcesserrorδandtargetcon�esthefollowingproperties.Withprobability1κ�:(1)Intherealizablecase,thereexistssomeabsoluteconstantc0suchthatthetotalnumberoflabelsqueriedisatmostc +lnln +ln ). Herethe(notationhidesfactorslogarithmicin1/δ (2)Intheagnosticcase,thereexistssomeabsoluteconstantc0suchthatthetotalnumberofla-belsqueriedisatmostc) +ln )ln) ln) +ln )+ln ) lnln (3)If(Cκ-TsybakovNoiseconditionholdsforDwithrespecttoH,thenthereexistssomec0(thatdependsonCκ)suchthatthetotalnumberoflabelsqueriedisatmost 2ln1 ln +ln Intherealizablecase,ourboundmatches[4].Fordisagreement-basedalgorithms,theboundis( 2ln21 (lnd+lnln ,whichisworsebyafactorofO( ln(1)).[4]doesnotaddressthefullyagnosticcasedirectly;however,ifκ)isknowna-priori,thentheiralgorithmcanachieveroughlythesamelabelcomplexityasours.FortheTsybakovNoiseConditionwithκκ1,[3,4]providesalabelcomplexityboundfor( 2ln21 +lnln withanalgorithmthathasa-prioriknowledgeofCκ.Wegetaslightlybetterbound.Ontheotherhand,adisagreementbasedalgorithm[20]givesalabelcomplexityof( 2ln21 2 (lnd+lnln .Againourboundisbetterbyfactorof� overdisagreement-basedalgorithms.Forκ=1,wecantightenourlabelcomplexitytogeta(ln +lnln +ln bound,whichagainmatches[4],andisbetterthantheonesprovidedbydisagreement-basedalgorithm( 2ln21 (lnd+lnln [20].4RelatedWorkActivelearninghasseenalotofprogressoverthepasttwodecades,motivatedbyvastamountsofunlabelleddataandthehighcostofannotation[28,10,20].Accordingto[10],thetwomainthreadsofresearchareexploitationofclusterstructure[31,11],andefδcientsearchinhypothesisspace,whichisthesettingofourwork.WearegivenahypothesisclassH,andthegoalistoδndanh�Hthatachievesatargetexcessgeneralizationerror,whileminimizingthenumberoflabelqueries.Threemainapproacheshavebeenstudiedinthissetting.Theδrstandmostnaturaloneisgeneralizedbinarysearch[17,8,9,27],whichwasanalyzedintherealizablecaseby[9]andinvariouslimitednoisesettingsby[23,27,26].Whilethisapproachhastheadvantageoflowlabelcomplexity,itisgenerallyinconsistentinthefullyagnosticsetting[11].Thesecondapproach,disagreement-basedactivelearning,isconsistentintheagnosticPACmodel.[7]providestheδrstdisagreement-basedalgorithmfortherealizablecase.[2]providesanagnosticdisagreement-basedalgorithm,whichisanalyzedin[18]usingthenotionofdisagreementcoefδcient.[12]reducesdisagreement-basedactivelearningtopassivelearning;[5]and[6]furtherextendthisworktoprovidepracticalandefδcientimplementations.[19,24]givealgorithmsthatareadaptivetotheTsybakovNoisecondition.Thethirdlineofwork[3,4,1],achievesabetterlabelcomplexitythandisagreement-basedactivelearningforlinearclassiersontheuniformdistributionoverunitsphereandlogconcavedistribu-tions.However,alimitationisthattheiralgorithmappliesonlytothesespecicsettings,anditisnotapparenthowtoapplyitgenerally.Researchoncondence-ratedpredictionhasbeenmostlyfocusedonempiricalwork,withrelativelylesstheoreticaldevelopment.TheoreticalworkonthistopicincludesKWIKlearning[25],confor-malprediction[29]andtheweightedmajorityalgorithmof[16].Theclosesttoourworkistherecentlearning-theoretictreatmentby[13,14].[13]addressescondence-ratedpredictionwithguaranteederrorintherealizablecase,andprovidesapredictorthatabstainsinthedisagreementregionoftheversionspace.Thispredictorachieveszeroerror,andcoverageequaltothemeasureoftheagree-mentregion.[14]showshowtoextendthisalgorithmtothenon-realizablecaseandobtainzeroerrorwithrespecttothebesthypothesisinH.Notethatthepredictorsin[13,14]generallyachievelesscoveragethanoursforthesameerrorguarantee;infact,ifweplugthemintoourAlgorithm1,thenwerecoverthelabelcomplexityboundsofdisagreement-basedalgorithms[12,19,24].Aformalconnectionbetweendisagreement-basedactivelearninginrealizablecaseandperfectdence-ratedprediction(withazeroerrorguarantee)wasestablishedby[15].Ourworkcanbeseenasasteptowardsbridgingthesetwoareas,bydemonstratingthatactivelearningcanbefurtherreducedtoimperfectcondence-ratedprediction,withpotentiallyhigherlabelsavings.Acknowledgements.WethankNSFunderIIS-1162581forresearchsupport.WethankSanjoyDasguptaandYoavFreundforhelpfuldiscussions.CZwouldliketothankLiweiWangforintro-ducingtheproblemofselectiveclassicationtohim. References[1]P.Awasthi,M-F.Balcan,andP.M.Long.Thepoweroflocalizationforefδcientlylearninglinearseparatorswithnoise.InSTOC,2014.[2]M.-F.Balcan,A.Beygelzimer,andJ.Langford.Agnosticactivelearning.J.Comput.Syst.,75(1):7889,2009.[3]M.-F.Balcan,A.Z.Broder,andT.Zhang.Marginbasedactivelearning.InCOLT,2007.[4]M.-F.BalcanandP.M.Long.Activeandpassivelearningoflinearseparatorsunderlog-concavedistributions.InCOLT,2013.[5]A.Beygelzimer,S.Dasgupta,andJ.Langford.Importanceweightedactivelearning.InICML[6]A.Beygelzimer,D.Hsu,J.Langford,andT.Zhang.Agnosticactivelearningwithoutcon-straints.InNIPS,2010.[7]D.A.Cohn,L.E.Atlas,andR.E.Ladner.Improvinggeneralizationwithactivelearning.MachineLearning,15(2),1994.[8]S.Dasgupta.Analysisofagreedyactivelearningstrategy.InNIPS,2004.[9]S.Dasgupta.Coarsesamplecomplexityboundsforactivelearning.InNIPS,2005.[10]S.Dasgupta.Twofacesofactivelearning.Theor.Comput.Sci.,412(19),2011.[11]S.DasguptaandD.Hsu.Hierarchicalsamplingforactivelearning.InICML,2008.[12]S.Dasgupta,D.Hsu,andC.Monteleoni.Ageneralagnosticactivelearningalgorithm.In,2007.[13]R.El-YanivandY.Wiener.Onthefoundationsofnoise-freeselectiveclassication.JMLR[14]R.El-YanivandY.Wiener.Agnosticselectiveclassication.InNIPS,2011.[15]R.El-YanivandY.Wiener.ActivelearningviaperfectselectiveclassiJMLR,2012.[16]Y.Freund,Y.Mansour,andR.E.Schapire.GeneralizationboundsforaveragedclassiTheAnn.ofStat.,32,2004.[17]Y.Freund,H.S.Seung,E.Shamir,andN.Tishby.Selectivesamplingusingthequerybycommitteealgorithm.MachineLearning,28(2-3):133168,1997.[18]S.Hanneke.Aboundonthelabelcomplexityofagnosticactivelearning.InICML,2007.[19]S.Hanneke.Adaptiveratesofconvergenceinactivelearning.InCOLT,2009.[20]S.Hanneke.Astatisticaltheoryofactivelearning.Manuscript,2013.[21]S.HannekeandL.Yang.Surrogatelossesinpassiveandactivelearning.CoRR,abs/1207.3772,[22]D.Hsu.AlgorithmsforActiveLearning.PhDthesis,UCSanDiego,2010.[23]M.Kainen.Activelearninginthenon-realizablecase.InALT,2006.[24]V.Koltchinskii.Rademachercomplexitiesandboundingtheexcessriskinactivelearning.,2010.[25]L.Li,M.L.Littman,andT.J.Walsh.Knowswhatitknows:aframeworkforself-awarelearning.InICML,2008.[26]M.Naghshvar,T.Javidi,andK.Chaudhuri.Noisybayesianactivelearning.InAllerton,2013.[27]R.D.Nowak.Thegeometryofgeneralizedbinarysearch.IEEETransactionsonInformation,57(12):78937906,2011.[28]B.Settles.Activelearningliteraturesurvey.Technicalreport,UniversityofWisconsin-Madison,2010.[29]G.ShaferandV.Vovk.Atutorialonconformalprediction.JMLR,2008.[30]A.B.Tsybakov.Optimalaggregationofclassiersinstatisticallearning.AnnalsofStatistics32:135166,2004.[31]R.Urner,S.Wulff,andS.Ben-David.Plal:Cluster-basedactivelearning.InCOLT,2013.