/
Beyond Disagreementbased Agnostic Active Learning Chicheng Zhang University of California Beyond Disagreementbased Agnostic Active Learning Chicheng Zhang University of California

Beyond Disagreementbased Agnostic Active Learning Chicheng Zhang University of California - PDF document

myesha-ticknor
myesha-ticknor . @myesha-ticknor
Follow
515 views
Uploaded On 2015-03-17

Beyond Disagreementbased Agnostic Active Learning Chicheng Zhang University of California - PPT Presentation

edu Kamalika Chaudhuri University of California San Diego 9500 Gilman Drive La Jolla CA 92093 kamalikacsucsdedu Abstract We study agnostic active learning where the goal is to learn a classi er in a pre speci ed hypothesis class interactively with as ID: 46615

edu Kamalika Chaudhuri University

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Beyond Disagreementbased Agnostic Active..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

BeyondDisagreement-basedAgnosticActiveLearning ChichengZhangUniversityofCalifornia,SanDiego9500GilmanDrive,LaJolla,CA92093KamalikaChaudhuriUniversityofCalifornia,SanDiego9500GilmanDrive,LaJolla,CA92093Westudyagnosticactivelearning,wherethegoalistolearnaclassierinapre-edhypothesisclassinteractivelywithasfewlabelqueriesaspossible,whilemakingnoassumptionsonthetruefunctiongeneratingthelabels.Themainal-gorithmforthisproblemisdisagreement-basedactivelearning,whichhasahighlabelrequirement.Thusamajorchallengeisto�δndanalgorithmwhichachievesbetterlabelcomplexity,isconsistentinanagnosticsetting,andappliestogeneralcationproblems.Inthispaper,weprovidesuchanalgorithm.Oursolutionisbasedontwonovelcontributions;�δrst,areductionfromconsistentactivelearningtoconpredictionwithguaranteederror,andsecond,anovelcondence-ratedpredictor.1IntroductionInthispaper,westudyactivelearningofclassiersinanagnosticsetting,wherenoassumptionsaremadeonthetruefunctionthatgeneratesthelabels.Thelearnerhasaccesstoalargepoolofunlabelledexamples,andcaninteractivelyrequestlabelsforasmallsubsetofthese;thegoalistolearnanaccurateclassierinapre-speciedclasswithasfewlabelqueriesaspossible.Specically,wearegivenahypothesisclassHandatarget�δ,andouraimisto�δndabinaryclassierinHwhoseerrorisatmost�δmorethanthatofthebestclassierinH,whileminimizingthenumberofrequestedlabels.Therehasbeenalargebodyofpreviousworkonactivelearning;seethesurveysby[10,28]foroverviews.Themainchallengeinactivelearningisensuringconsistencyintheagnosticsettingwhilestillmaintaininglowlabelcomplexity.Inparticular,averynaturalapproachtoactivelearningistoviewitasageneralizationofbinarysearch[17,9,27].Whilethisstrategyhasbeenextendedtoseveraldifferentnoisemodels[23,27,26],itisgenerallyinconsistentintheagnosticcase[11].Theprimaryalgorithmforagnosticactivelearningiscalleddisagreement-basedactivelearningThemainideaisasfollows.AsetVofpossibleriskminimizersismaintainedwithtime,andthelabelofanexamplexisqueriedifthereexisttwohypotheseshhVsuchthath)�δ).Thisalgorithmisconsistentintheagnosticsetting[7,2,12,18,5,19,6,24];however,duetotheconservativelabelquerypolicy,itslabelrequirementishigh.Alineofworkdueto[3,4,1]haveprovidedalgorithmsthatachievebetterlabelcomplexityforlinearclassicationontheuniformdistributionovertheunitsphereaswellaslog-concavedistributions;however,theiralgorithmsarelimitedtothesespeciccases,anditisunclearhowtoapplythemmoregenerally.Thus,amajorchallengeintheagnosticactivelearningliteraturehasbeento�δndageneralactivelearningstrategythatappliestoanyhypothesisclassanddatadistribution,isconsistentintheagnos-ticcase,andhasabetterlabelrequirementthandisagreementbasedactivelearning.Thishasbeenmentionedasanopenproblembyseveralworks,suchas[2,10,4]. Inthispaper,weprovidesuchanalgorithm.Oursolutionisbasedontwokeycontributions,whichmaybeofindependentinterest.The�δrstisageneralconnectionbetweencondence-ratedpre-dictorsandactivelearning.Acondence-ratedpredictorisonethatisallowedtoabstainfrompredictiononoccasion,andasaresult,canguaranteeatargetpredictionerror.Givenaconratedpredictorwithguaranteederror,weshowhowtotoconstructanactivelabelqueryalgorithmconsistentintheagnosticsetting.Oursecondkeycontributionisanovelcondence-ratedpredictorwithguaranteederrorthatappliestoanygeneralclassicationproblem.Weshowthatourpredictoroptimalintherealizablecase,inthesensethatithasthelowestabstentionrateoutofallpredictorsguaranteeingacertainerror.Moreover,weshowhowtoextendourpredictortotheagnosticsetting.Combiningthelabelqueryalgorithmwithournovelcondence-ratedpredictor,wegetageneralactivelearningalgorithmconsistentintheagnosticsetting.Weprovideacharacterizationofthelabelcomplexityofouralgorithm,andshowthatthisisbetterthantheboundsknownfordisagreement-basedactivelearningingeneral.Finally,weshowthatforlinearclassicationwithrespecttotheuniformdistributionandlog-concavedistributions,ourboundsreducetothoseof[3,4].2Algorithm2.1TheSettingWestudyactivelearningforbinaryclassication.ExamplesbelongtoaninstancespaceX,andtheirlabelslieinalabelspaceY={1;labelledexamplesaredrawnfromanunderlyingdatadistributionDonX×Y.WeuseDtodenotethemarginalonDonX,andD|todenotetheconditionaldistributiononY|=xinducedbyD.Ouralgorithmhasaccesstoexamplesthroughtwooracles–anexampleoracleUwhichreturnsanunlabelledexamplex��XdrawnfromDalabellingoracleOwhichreturnsthelabelyofaninputx��XdrawnfromD|GivenahypothesisclassHofVCdimensiond,theerrorofanyh��Hwithrespecttoadatadistribution�δoverX×Yisdenedaserr)=Px,y)δ(x)�δy).Wede)=argminH),�κ)=err)).ForasetS,weabusenotationanduseStoalsodenotetheuniformdistributionovertheelementsofS.WedeP):=Px,y)δ),):=Ex,y)δ).GivenaccesstoexamplesfromadatadistributionDthroughanexampleoracleUandalabelingO,weaimtoprovideaclassi��Hsuchthatwithprobability��1�κ��,err��)+�δ,forsometargetvaluesof�δand��;thisisachievedinanadaptivemannerbymakingasfewqueriestothelabellingoracleOaspossible.When�κ)=0,wearesaidtobeintherealizablecase;inthemoregeneralagnosticcase,wemakenoassumptionsonthelabels,andthus)canbepositive.PreviousapproachestoagnosticactivelearninghavefrequentlyusedthenotionofdisagreementsThedisagreementbetweentwohypotheseshhwithrespecttoadatadistribution�δisthefractionofexamplesaccordingto�δtowhichhhassigndifferentlabels;formally:,h)=Px,y)δ)�δh)).Observethatadatadistribution�δinducesapseudo-��ontheelementsofH;thisiscalledthedisagreementmetric.Foranyrandanyh��H,Bh,r)tobethedisagreementballofradiusraroundhwithrespecttothedatadistribution.Formally:Bh,r)={H:��h,h��r}Fornotationalsimplicity,weassumethatthehypothesisspaceis“dense”withrepsecttothedatadistributionD,inthesensethat��r�0supB),r)h,h))=r.Ouranalysiswillstillapplywithoutthedensenessassumption,butwillbesignicantlymoremessy.Finally,givenasetofhypothesesV��H,thedisagreementregionofVisthesetofallexamplesxsuchthatthereexisttwohypothesesh,hVforwhichh)�δh).Thispaperestablishesaconnectionbetweenactivelearningandcondence-ratedpredictorswithguaranteederror.Acondence-ratedpredictorisapredictionalgorithmthatisoccasionallyal-lowedtoabstainfromclassication.Wewillconsidersuchpredictorsinthetransductivesetting.GivenasetVofcandidatehypotheses,anerrorguarantee��,andasetUofunlabelledexamples,acondence-ratedpredictorPeitherassignsalabelorabstainsfrompredictiononeachunlabelled ��U.ThelabelsareassignedwiththeguaranteethattheexpecteddisagreementbetweenthelabelassignedbyPandanyh��Vis����.Specically,forallh��V,PU(x)�δP(),P()�δ=0)����(1)ThisensuresthatifsomehVisthetrueriskminimizer,then,thelabelspredictedbyPonUdonotdifferverymuchfromthosepredictedbyh.Theperformanceofacondence-ratedpredictorwhichhasaguaranteesuchasinEquation(1)ismeasuredbyitscoverage,ortheprobabilityofPU()�δ=0);highercoverageimpliesbetterperformance.2.2MainAlgorithmOuractivelearningalgorithmproceedsinepochs,wherethegoalofepochkistoachieveexcessgeneralizationerror�δ�δ2+1,byqueryingafreshbatchoflabels.ThealgorithmmaintainsacandidatesetVthatisguaranteedtocontainthetrueriskminimizer.Thecriticaldecisionateachepochishowtoselectasubsetofunlabelledexampleswhoselabelsshouldbequeried.Wemakethisdecisionusingacondence-ratedpredictorP.Atepochk,werunwithcandidatehypothesissetV=Vanderrorguarantee��=�δ.WheneverPabstains,wequerythelabeloftheexample.Thenumberoflabelsmqueriedisadjustedsothatitisenoughtoachieveexcessgeneralizationerror�δ+1AnoutlineisdescribedinAlgorithm1;wenextdiscusseachindividualcomponentindetail. Algorithm1ActiveLearningAlgorithm:Outline ExampleoracleU,LabellingoracleO,hypothesisclassHofVCdimensionddence-ratedpredictorP,targetexcesserror�δandtargetcon��.k��log1.InitializecandidatesetVH.fork=12,..k�δ�δ+1�� k+1)UtogenerateafreshunlabelledsampleU{k,,...,zk,nofsizen512 ln192(512 +ln288 Runcondence-ratedpredictorPwithinpuyV=VU=Uanderrorguarantee=�δtogetabstentionprobabilities��k,,...,��k,nontheexamplesinU.Theseprobabilitiesinduceadistribution�κU.Let��PU()=0)= k,iintheRealizableCasethenm1536� ln1536� +ln .Drawmi.i.dexamplesfrom�κandqueryforlabelsoftheseexamplestogetalabelleddatasetS.UpdateV+1S+1{��Vh)=y,forall(x,y)��SInthenon-realizablecase,useAlgorithm2withinputshypothesissetV,distribution,targetexcesserror ,targetcon ,andthelabelingoracleOtogetanewhypothesissetV+1returnanarbitrary��V CandidateSets.Atepochk,wemaintainasetVofcandidatehypothesesguaranteedtocontainthetrueriskminimizerh)(w.h.p).Intherealizablecase,weuseaversionspaceasourcandidateset.TheversionspacewithrespecttoasetSoflabelledexamplesisthesetofallh��Hsuchthatx)=yforall(,y��S.Lemma1.SupposewerunAlgorithm1intherealizablecasewithinputsexampleoracleU,la-bellingoracleO,hypothesisclassH,condence-ratedpredictorP,targetexcesserror�δandtarget��.Then,withprobability1h)��Vforallk=12,...,k+1Inthenon-realizablecase,theversionspaceisusuallyempty;weuseinsteada(1�κ��)setforthetrueriskminimizer.GivenasetSofnlabelledexamples,letC()��Hbeafunctionof wheretheexpectationiswithrespecttotherandomchoicesmadebyP ;C()issaidtobea(1�κ��)dencesetforthetrueriskminimizerifforalldatadistributionsoverX×Y,�δκ,hκ(�)��C()]��1�κ��,Recallthath)=argminH).Inthenon-realizablecase,ourcandidatesetsare(1�κ��)dencesetsforh),for��=��.TheprecisesettingofVisexplainedinAlgorithm2.Lemma2.SupposewerunAlgorithm1inthenon-realizablecasewithinputsexampleoracleU,labellingoracleO,hypothesisclassH,condence-ratedpredictorP,targetexcesserror�δandtargetcon��.Thenwithprobability1�κ��,h)��Vforallk=12,...,k+1LabelQuery.Wenextdiscussourlabelqueryprocedure–whichexamplesshouldwequerylabelsfor,andhowmanylabelsshouldwequeryateachepoch?WhichLabelstoQuery?Ourgoalistoquerythelabelsofthemostinformativeexamples.Tochoosetheseexampleswhilestillmaintainingconsistency,weuseacondence-ratedpredictorPwithguaranteederror.TheinputstothepredictorareourcandidatehypothesissetVwhichcontains(w.h.p)thetrueriskminimizer,afreshsetUofunlabelledexamples,andanerrorguarantee��=.Fornotationsimplicity,assumetheelementsinUaredistinct.Theoutputisasequenceofabstentionprobabilities{k,��k,,...,��k,n,foreachexampleinU.Itinducesadistribution�κoverU,fromwhichweindependentlydrawexamplesforlabelqueries.HowManyLabelstoQuery?Thegoalofepochkistoachieveexcessgeneralizationerror�δToachievethis,passivelearningrequires(d/δlabelledexamplesintherealizablecase,and((κ)+�δexamplesintheagnosticcase.Akeyobservationinthispaperisthatinordertoachieveexcessgeneralizationerror�δD,itsuf�δcestoachieveamuchlargerexcessgeneralizationerrorO(onthedatadistributioninducedby�κD|,where��isthefractionofexamplesonwhichthecondence-ratedpredictorabstains.Intherealizablecase,weachievethisbysamplingm1536� ln1536� +ln i.i.dexamples�κ,andqueryingtheirlabelstogetalabelleddatasetS.Observethatas��istheabstentionprobabilityofPwithguaranteederror���δ,itisgenerallysmallerthanthemeasureofthedisagreementregionoftheversionspace;thiskeyfactresultsinimprovedlabelcomplexityoverdisagreement-basedactivelearning.Thissamplingprocedurehasthefollowingproperty:Lemma3.SupposewerunAlgorithm1intherealizablecasewithinputsexampleoracleU,la-bellingoracleO,hypothesisclassH,condence-ratedpredictorP,targetexcesserror�δandtarget��.Thenwithprobability1�κ��,forallk=12,...,k+1,andforallh��V)���δ.Inparticular,thereturnedattheendofthealgorithmsatiseserr���δTheagnosticcasehasanaddedcomplication–inpractice,thevalueof�κisnotknownaheadoftime.Inspiredby[24],weuseadoublingprocedure(statedinAlgorithm2)whichadaptively�δthenumbermoflabelledexamplestobequeriedandqueriesthem.Thefollowingtwolemmasillustrateitsproperties–thatitisconsistent,andthatitdoesnotusetoomanylabelqueries.Lemma4.SupposewerunAlgorithm2withinputshypothesissetV,exampledistribution��labellingoracleO,targetexcesserror˜�δandtargetcon.LetbethejointdistributiononX×Yinducedby��andD|.Thenthereexistsanevent,P)��1�κ,suchthaton,(1)Algorithm2haltsand(2)thesetVhasthefollowingproperties:(2.1)Ifforh��H,err)�κerr))��˜�δ,thenh��V(2.2)Ontheotherhand,ifh��V,thenerr)�κerr))��˜�δWheneventhappens,wesayAlgorithm2succeeds.Lemma5.SupposewerunAlgorithm2withinputshypothesissetV,exampledistribution��labellingoracleO,targetexcesserror˜�δandtargetcon.Thereexistssomeabsoluteconstant0,suchthatontheeventthatAlgorithm2succeeds,ncdln �κ+ln )+˜�κ �κ.Thusthetotalnumberoflabelsqueriedis=122dln �κ+ln )+˜�κ �κ (hideslogarithmicfactors Anaiveapproach(seeAlgorithm4intheAppendix)whichusesanadditiveVCboundgivesasamplecomplexityofO((dln(1˜�δ)+ln(1))˜�δ;Algorithm2givesabettersamplecomplexity.Thefollowinglemmaisaconsequenceofourlabelqueryprocedureinthenon-realizablecase.Lemma6.SupposewerunAlgorithm1inthenon-realizablecasewithinputsexampleoracleU,labellingoracleO,hypothesisclassH,condence-ratedpredictorP,targetexcesserror�δandtargetcon��.Thenwithprobability1�κ��,forallk=12,...,k+1,andforallh��V)��err))+�δ.Inparticular,thereturnedattheendofthealgorithmsatis��err))+�δ Algorithm2AnAdaptiveAlgorithmforLabelQueryGivenTargetExcessError HypothesissetVofVCdimensiond,Exampledistribution��,LabelingoracleO,targetexcesserror˜�δ,targetcon.forj=12,...doDrawn=2i.i.dexamplesfrom��;querytheirlabelsfromOtogetalabelleddataset.Denote/j(j+1)).TrainanERMclassiVoverSnethesetVasfollows:��V:err)��err)+�δ ��(n)+ (nh,��(n,��):= ln +ln supV(n)+ (nh,���κ j,breakreturnV 2.3Condence-RatedPredictorOuractivelearningalgorithmusesacondence-ratedpredictorwithguaranteederrortomakeitslabelquerydecisions.Inthissection,weprovideanovelcondence-ratedpredictorwithguaranteederror.Thispredictorhasoptimalcoverageintherealizablecase,andmaybeofindependentinterest.ThepredictorPreceivesasinputasetV��Hofhypotheses(whichislikelytocontainthetrueriskminimizer),anerrorguarantee��,andasetofUofunlabelledexamples.Weconsiderasoftpredictionalgorithm;so,foreachexampleinU,thepredictorPoutputsthreeprobabilitiesthataddupto1–theprobabilityofpredicting1�κand0.Thisoutputissubjecttotheconstraintthattheexpecteddisagreementbetweenthe±labelsassignedbyPandthoseassignedbyanyh��Visatmost��,andthegoalistomaximizethecoverage,ortheexpectedfractionofnon-abstentions.Ourkeyinsightisthatthisproblemcanbewrittenasalinearprogram,whichisdescribedinAlgo-rithm3.Therearethreevariables,�≡�√��,foreachunlabelledzU;therearetheprobabil-itieswithwhichwepredict1�κ1and0onzrespectively.Constraint(2)ensuresthattheexpecteddisagreementbetweenthelabelpredictedandanyh��Visnomorethan��,whiletheLPobjectivemaximizesthecoverageundertheseconstraints.ObservethattheLPisalwaysfeasible.AlthoughtheLPhasinnitelymanyconstraints,thenumberofconstraintsinEquation(2)distinguishablebyisatmost(em/d),wheredistheVCdimensionofthehypothesisclassH.Theperformanceofacondence-ratedpredictorismeasuredbyitserrorandcoverage.Theerrorofacondence-ratedpredictoristheprobabilitywithwhichitpredictsthewronglabelonanexample,whilethecoverageisitsprobabilityofnon-abstention.WecanshowthefollowingguaranteeontheperformanceofthepredictorinAlgorithm3.Theorem1.Intherealizablecase,ifthehypothesissetVistheversionspacewithrespecttoatrainingset,thenPU()�δh),P()�δ=0)����.Inthenon-realizablecase,ifthehypothesissetVisan(1�κ��)dencesetforthetrueriskminimizerh,then,w.p��1�κ��,U()�δy,P(x)�δ=0)��PU)�δy)+��. wheretheexpectationistakenovertherandomchoicesmadebyP Algorithm3Condence-ratedPredictor hypothesissetV,unlabelleddataU={,...,z,errorbound��.Solvethelinearprogram:minsubjectto:��i,�≡�√��=1��V,(z(z1��m(2)i,�≡�√��0ForeachzU,outputprobabilitiesforpredicting1�κand0�≡�√,and�� Intherealizablecase,wecanalsoshowthatourcondenceratedpredictorhasoptimalcoverage.Observethatwecannotdirectlyshowoptimalityinthenon-realizablecase,astheperformancedependsontheexactchoiceofthe(1�κ��)-conδdenceset.Theorem2.Intherealizablecase,supposethatthehypothesissetVistheversionspacewithrespecttoatrainingset.IfPisanycondenceratedpredictorwitherrorguarantee��,andifPisthepredictorinAlgorithm3,then,thecoverageofPisatleastmuchasthecoverageofP3PerformanceGuaranteesAnessentialpropertyofanyactivelearningalgorithmisconsistency–thatitconvergestothetrueriskminimizergivenenoughlabelledexamples.Weobservethatouralgorithmisconsistentpro-videdweuseanycondence-ratedpredictorPwithguaranteederrorasasubroutine.Theconsis-tencyofouralgorithmisaconsequenceofLemmas3and6andisshowninTheorem3.Theorem3(Consistency)SupposewerunAlgorithm1withinputsexampleoracleU,labellingoracleO,hypothesisclassH,condence-ratedpredictorP,targetexcesserror�δandtarget��.Thenwithprobability1�κ��,theclassireturnedbyAlgorithm1satis�κerr))���δWenowestablishalabelcomplexityboundforouralgorithm;however,thislabelcomplexityboundappliesonlyifweusethepredictordescribedinAlgorithm3asasubroutine.ForanyhypothesissetV,datadistributionD,and��,de�δV,��)tobetheminimumabsten-tionprobabilityofacondence-ratedpredictorwhichguaranteesthatthedisagreementbetweenitspredictedlabelsandanyh��VunderDisatmost��.Formally,�δV,��)=min{E(x):EEI((x)=+1)√(x)+I(h(x)=�κ≡(x)]��forallh��V,��(x)+�≡()+�√()�≡1��(x),�≡(),�√()��0.De��(r,��):=,r),��).Thelabelcomplexityofouractivelearningalgorithmcanbestatedasfollows.Theorem4(LabelComplexity)SupposewerunAlgorithm1withinputsexampleoracleU,la-bellingoracleO,hypothesisclassH,condence-ratedpredictorPofAlgorithm3,targetexcesserror�δandtargetcon��.Thenthereexistconstantsc,c0suchthatwithprobability�κ��:(1)Intherealizablecase,thetotalnumberoflabelsqueriedbyAlgorithm1isatmost:log =1lnδ�δ +ln(log(1/δ)��κk+1 δ�δ (2)Intheagnosticcase,thetotalnumberoflabelsqueriedbyAlgorithm1isatmost:log =1ln(2κ)+�δ�δ +ln(log(1/δ)��κk+1 (2κ)+�δ�δ ) k)6 Thelabelcomplexityofdisagreement-basedactivelearningischaracterizedintermsofthedisagreementcoef�δ.Givenaradiusr,thedisagreementcoef�δ�θ(r)isde(r)=supDIS(B,r))) whereforanyV��H,DISV)isthedisagreementregionofV.AsPDIS(B,r)))=r,0)[13],inournotation,�θ(r)=supr Intherealizablecase,thebestknownboundforlabelcomplexityofdisagreement-basedactivelearningis(()·ln(1·(dln�θ()+lnln(1/δ)))[20].Ourlabelcomplexityboundmaybeedto: sup�आlog(1/κ)�δ�δ lnsup�आlog(1/κ)�δ�δ +lnln whichisessentiallytheboundof[20]with�θ()replacedbysup�आlog(1/κ)�κ256) .Asen-forcingalowererrorguaranteerequiresmoreabstention,��r,��)isadecreasingfunctionof��;asasup�आlog(1/κ)�δ�δ �θ(),andourlabelcomplexityboundisbetter.Intheagnosticcase,[12]providesalabelcomplexityboundof((2)+�δ)·() ln(1/δ)+lnδ)))fordisagreement-basedactive-learning.Incontrast,byProposition1ourlabelcom-plexityisatmost:sup�आlog(1/κ)�(2κ)+�δ�δ )+�δ) ln(1/δ)+dlnδ)Again,thisisessentiallytheboundof[12]with�θ(2κ)+�δreplacedbythesmallerquantitysup�आlog(1/κ)�(2κ)+�δ�δ )+�δ[20]hasprovidedamorerenedanalysisofdisagreement-basedactivelearningthatgivesalabelcomplexityof(()+�δ)() +ln dln�θ(κ)+�δ)+lnln ;observethattheirdependenceisstillon�θ(κ)+�δ).Weleaveamorerenedlabelcomplexityanalysisofouralgorithmforfuturework.Animportantsub-caseoflearningfromnoisydataislearningundertheTsybakovnoisecondi-tions[30].WedeferthediscussionintotheAppendix.3.1CaseStudy:LinearClassicationundertheLog-concaveDistributionWenowconsiderlearninglinearclassierswithrespecttolog-concavedatadistributiononR.Inthiscase,foranyr,thedisagreementcoef�δ�θ(r)��O( ln(1))[4];however,forany���0r,) O(ln(r/))(seeLemma14intheAppendix),whichismuchsmallersolongas��/risnottoosmall.Thisleadstothefollowinglabelcomplexitybounds.Corollary1.SupposeDisisotropicandlog-concaveonR,andHisthesetofhomogeneouslin-earclassiersonR.ThenAlgorithm1withinputsexampleoracleU,labellingoracleO,hypothesisH,condence-ratedpredictorPofAlgorithm3,targetexcesserror�δandtargetcon��esthefollowingproperties.Withprobability1�κ��:(1)Intherealizablecase,thereexistssomeabsoluteconstantc0suchthatthetotalnumberoflabelsqueriedisatmostc +lnln +ln ). Herethe(notationhidesfactorslogarithmicin1/δ (2)Intheagnosticcase,thereexistssomeabsoluteconstantc0suchthatthetotalnumberofla-belsqueriedisatmostc) +ln )ln) ln) +ln )+ln ) lnln (3)If(C�κ-TsybakovNoiseconditionholdsforDwithrespecttoH,thenthereexistssomec0(thatdependsonC�κ)suchthatthetotalnumberoflabelsqueriedisatmost 2ln1 ln +ln Intherealizablecase,ourboundmatches[4].Fordisagreement-basedalgorithms,theboundis( 2ln21 (lnd+lnln ,whichisworsebyafactorofO( ln(1)).[4]doesnotaddressthefullyagnosticcasedirectly;however,if�κ)isknowna-priori,thentheiralgorithmcanachieveroughlythesamelabelcomplexityasours.FortheTsybakovNoiseConditionwith�κκ1,[3,4]providesalabelcomplexityboundfor( 2ln21 +lnln withanalgorithmthathasa-prioriknowledgeofC�κ.Wegetaslightlybetterbound.Ontheotherhand,adisagreementbasedalgorithm[20]givesalabelcomplexityof( 2ln21 2 (lnd+lnln .Againourboundisbetterbyfactorof�� overdisagreement-basedalgorithms.For�κ=1,wecantightenourlabelcomplexitytogeta(ln +lnln +ln bound,whichagainmatches[4],andisbetterthantheonesprovidedbydisagreement-basedalgorithm–( 2ln21 (lnd+lnln [20].4RelatedWorkActivelearninghasseenalotofprogressoverthepasttwodecades,motivatedbyvastamountsofunlabelleddataandthehighcostofannotation[28,10,20].Accordingto[10],thetwomainthreadsofresearchareexploitationofclusterstructure[31,11],andef�δcientsearchinhypothesisspace,whichisthesettingofourwork.WearegivenahypothesisclassH,andthegoalisto�δndanh��Hthatachievesatargetexcessgeneralizationerror,whileminimizingthenumberoflabelqueries.Threemainapproacheshavebeenstudiedinthissetting.The�δrstandmostnaturaloneisgeneralizedbinarysearch[17,8,9,27],whichwasanalyzedintherealizablecaseby[9]andinvariouslimitednoisesettingsby[23,27,26].Whilethisapproachhastheadvantageoflowlabelcomplexity,itisgenerallyinconsistentinthefullyagnosticsetting[11].Thesecondapproach,disagreement-basedactivelearning,isconsistentintheagnosticPACmodel.[7]providesthe�δrstdisagreement-basedalgorithmfortherealizablecase.[2]providesanagnosticdisagreement-basedalgorithm,whichisanalyzedin[18]usingthenotionofdisagreementcoef�δcient.[12]reducesdisagreement-basedactivelearningtopassivelearning;[5]and[6]furtherextendthisworktoprovidepracticalandef�δcientimplementations.[19,24]givealgorithmsthatareadaptivetotheTsybakovNoisecondition.Thethirdlineofwork[3,4,1],achievesabetterlabelcomplexitythandisagreement-basedactivelearningforlinearclassiersontheuniformdistributionoverunitsphereandlogconcavedistribu-tions.However,alimitationisthattheiralgorithmappliesonlytothesespecicsettings,anditisnotapparenthowtoapplyitgenerally.Researchoncondence-ratedpredictionhasbeenmostlyfocusedonempiricalwork,withrelativelylesstheoreticaldevelopment.TheoreticalworkonthistopicincludesKWIKlearning[25],confor-malprediction[29]andtheweightedmajorityalgorithmof[16].Theclosesttoourworkistherecentlearning-theoretictreatmentby[13,14].[13]addressescondence-ratedpredictionwithguaranteederrorintherealizablecase,andprovidesapredictorthatabstainsinthedisagreementregionoftheversionspace.Thispredictorachieveszeroerror,andcoverageequaltothemeasureoftheagree-mentregion.[14]showshowtoextendthisalgorithmtothenon-realizablecaseandobtainzeroerrorwithrespecttothebesthypothesisinH.Notethatthepredictorsin[13,14]generallyachievelesscoveragethanoursforthesameerrorguarantee;infact,ifweplugthemintoourAlgorithm1,thenwerecoverthelabelcomplexityboundsofdisagreement-basedalgorithms[12,19,24].Aformalconnectionbetweendisagreement-basedactivelearninginrealizablecaseandperfectdence-ratedprediction(withazeroerrorguarantee)wasestablishedby[15].Ourworkcanbeseenasasteptowardsbridgingthesetwoareas,bydemonstratingthatactivelearningcanbefurtherreducedtoimperfectcondence-ratedprediction,withpotentiallyhigherlabelsavings.Acknowledgements.WethankNSFunderIIS-1162581forresearchsupport.WethankSanjoyDasguptaandYoavFreundforhelpfuldiscussions.CZwouldliketothankLiweiWangforintro-ducingtheproblemofselectiveclassicationtohim. References[1]P.Awasthi,M-F.Balcan,andP.M.Long.Thepoweroflocalizationforef�δcientlylearninglinearseparatorswithnoise.InSTOC,2014.[2]M.-F.Balcan,A.Beygelzimer,andJ.Langford.Agnosticactivelearning.J.Comput.Syst.,75(1):78–89,2009.[3]M.-F.Balcan,A.Z.Broder,andT.Zhang.Marginbasedactivelearning.InCOLT,2007.[4]M.-F.BalcanandP.M.Long.Activeandpassivelearningoflinearseparatorsunderlog-concavedistributions.InCOLT,2013.[5]A.Beygelzimer,S.Dasgupta,andJ.Langford.Importanceweightedactivelearning.InICML[6]A.Beygelzimer,D.Hsu,J.Langford,andT.Zhang.Agnosticactivelearningwithoutcon-straints.InNIPS,2010.[7]D.A.Cohn,L.E.Atlas,andR.E.Ladner.Improvinggeneralizationwithactivelearning.MachineLearning,15(2),1994.[8]S.Dasgupta.Analysisofagreedyactivelearningstrategy.InNIPS,2004.[9]S.Dasgupta.Coarsesamplecomplexityboundsforactivelearning.InNIPS,2005.[10]S.Dasgupta.Twofacesofactivelearning.Theor.Comput.Sci.,412(19),2011.[11]S.DasguptaandD.Hsu.Hierarchicalsamplingforactivelearning.InICML,2008.[12]S.Dasgupta,D.Hsu,andC.Monteleoni.Ageneralagnosticactivelearningalgorithm.In,2007.[13]R.El-YanivandY.Wiener.Onthefoundationsofnoise-freeselectiveclassication.JMLR[14]R.El-YanivandY.Wiener.Agnosticselectiveclassication.InNIPS,2011.[15]R.El-YanivandY.Wiener.ActivelearningviaperfectselectiveclassiJMLR,2012.[16]Y.Freund,Y.Mansour,andR.E.Schapire.GeneralizationboundsforaveragedclassiTheAnn.ofStat.,32,2004.[17]Y.Freund,H.S.Seung,E.Shamir,andN.Tishby.Selectivesamplingusingthequerybycommitteealgorithm.MachineLearning,28(2-3):133–168,1997.[18]S.Hanneke.Aboundonthelabelcomplexityofagnosticactivelearning.InICML,2007.[19]S.Hanneke.Adaptiveratesofconvergenceinactivelearning.InCOLT,2009.[20]S.Hanneke.Astatisticaltheoryofactivelearning.Manuscript,2013.[21]S.HannekeandL.Yang.Surrogatelossesinpassiveandactivelearning.CoRR,abs/1207.3772,[22]D.Hsu.AlgorithmsforActiveLearning.PhDthesis,UCSanDiego,2010.[23]M.Kainen.Activelearninginthenon-realizablecase.InALT,2006.[24]V.Koltchinskii.Rademachercomplexitiesandboundingtheexcessriskinactivelearning.,2010.[25]L.Li,M.L.Littman,andT.J.Walsh.Knowswhatitknows:aframeworkforself-awarelearning.InICML,2008.[26]M.Naghshvar,T.Javidi,andK.Chaudhuri.Noisybayesianactivelearning.InAllerton,2013.[27]R.D.Nowak.Thegeometryofgeneralizedbinarysearch.IEEETransactionsonInformation,57(12):7893–7906,2011.[28]B.Settles.Activelearningliteraturesurvey.Technicalreport,UniversityofWisconsin-Madison,2010.[29]G.ShaferandV.Vovk.Atutorialonconformalprediction.JMLR,2008.[30]A.B.Tsybakov.Optimalaggregationofclassiersinstatisticallearning.AnnalsofStatistics32:135–166,2004.[31]R.Urner,S.Wulff,andS.Ben-David.Plal:Cluster-basedactivelearning.InCOLT,2013.