/
JournalofMachineLearningResearch10(2009)163-182Submitted9/08;Published JournalofMachineLearningResearch10(2009)163-182Submitted9/08;Published

JournalofMachineLearningResearch10(2009)163-182Submitted9/08;Published - PDF document

liane-varnes
liane-varnes . @liane-varnes
Follow
452 views
Uploaded On 2016-02-28

JournalofMachineLearningResearch10(2009)163-182Submitted9/08;Published - PPT Presentation

FELDMANnosticlearningofsimpleclassesoffunctionsprovideanotherindicationofthehardnessofagnosticlearningKearnsetal1994Kalaietal2008aFeldmanetal2006Amembershiporacleallowsalearningalgorithmtoo ID: 235287

FELDMANnosticlearningofsimpleclassesoffunctionsprovideanotherindicationofthehardnessofagnosticlearning(Kearnsetal. 1994;Kalaietal. 2008a;Feldmanetal. 2006).Amembershiporacleallowsalearningalgorithmtoo

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "JournalofMachineLearningResearch10(2009)..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

JournalofMachineLearningResearch10(2009)163-182Submitted9/08;Published2/09OnThePowerofMembershipQueriesinAgnosticLearningVitalyFeldman†VITALY@POST.HARVARD.EDUIBMAlmadenResearchCenter650HarryRd.SanJose,CA95120Editor:RoccoServedioAbstractWestudythepropertiesoftheagnosticlearningframeworkofHaussler(1992)andKearns,Schapire,andSellie(1994).Inparticular,weaddressthequestion:isthereanysituationinwhichmember-shipqueriesareusefulinagnosticlearning?Ourresultsshowthattheanswerisnegativefordistribution-independentagnosticlearningandpositiveforagnosticlearningwithrespecttoaspecicmarginaldistribution.Namely,wegiveasimpleproofthatanyconceptclasslearnableagnosticallybyadistribution-independentalgorithmwithaccesstomembershipqueriesisalsolearnableagnosticallywithoutmembershipqueries.ThisresolvesanopenproblemposedbyKearnsetal.(1994).Foragnosticlearningwithrespecttotheuniformdistributionoverf0;1gnweshowaconceptclassthatislearnablewithmembershipqueriesbutcomputationallyhardtolearnfromrandomexamplesalone(assumingthatone-wayfunctionsexist).Keywords:agnosticlearning,membershipquery,separation,PAClearning1.IntroductionTheagnosticframework(Haussler,1992;Kearnsetal.,1994)isanaturalgeneralizationofValiant'sPAClearningmodel(Valiant,1984).Inthismodelnoassumptionsaremadeonthelabelsoftheexamplesgiventothelearningalgorithm,inotherwords,thelearningalgorithmhasnopriorbeliefsaboutthetargetconcept(andhencethenameofthemodel).ThegoaloftheagnosticlearningalgorithmforaconceptclassCistoproduceahypothesishwhoseerroronthetargetconceptisclosetothebestpossiblebyaconceptfromC.Thismodelreectsacommonempiricalapproachtolearning,wherefewornoassumptionsaremadeontheprocessthatgeneratestheexamplesandalimitedspaceofcandidatehypothesisfunctionsissearchedinanattempttondthebestapproximationtothegivendata.Designingalgorithmsthatlearnefcientlyinthismodelisnotoriouslyhardandveryfewpos-itiveresultsareknown(Kearnsetal.,1994;Leeetal.,1995;Goldmanetal.,2001;Gopalanetal.,2008;Kalaietal.,2008a,b).Furthermore,strongcomputationalhardnessresultsareknownforag-nosticlearningofeventhesimplestclassesoffunctionssuchasparities,monomialsandhalfspaces(H astad,2001;Feldman,2006;Feldmanetal.,2006;GuruswamiandRaghavendra,2006)(albeitonlyforproperlearning).Reductionsfromlong-standingopenproblemsforPAClearningtoag-.PartsofthisworkarepublishedintheProceedingsof18thAnnualConferenceonLearningTheory,2008.†.PartoftheworkdonewhiletheauthorwasatHarvardUniversitysupportedbygrantsfromtheNationalScienceFoundationNSF-CCF-04-32037andNSF-CCF-04-27129.c\r2009VitalyFeldman. FELDMANnosticlearningofsimpleclassesoffunctionsprovideanotherindicationofthehardnessofagnosticlearning(Kearnsetal.,1994;Kalaietal.,2008a;Feldmanetal.,2006).Amembershiporacleallowsalearningalgorithmtoobtainthevalueoftheunknowntargetfunctionfonanypointinthedomain.Itcanbethoughtofasmodelingtheaccesstoanexpertorabilitytoconductexperiments.LearningwithmembershipqueriesinbothPACandAngluin'sexactmodels(Angluin,1988)wasstudiedinnumerousworks.ForexamplemonotoneDNFformulas,niteautomataanddecisiontreesareonlyknowntobelearnablewithmembershipqueries(Valiant,1984;Angluin,1988;Bshouty,1995).Itiswell-knownandeasytoprovethatthePACmodelwithmembershipqueriesisstrictlystrongerthanthePACmodelwithoutmembershipqueries(ifone-wayfunctionsexist).Membershipqueriesarealsousedinseveralagnosticlearningalgorithms.TherstoneisthefamousalgorithmofGoldreichandLevin(1989)introducedinacryptographiccontext(evenbeforethedenitionoftheagnosticlearningmodel).Theiralgorithmlearnsparitiesagnosticallywithrespecttotheuniformdistributionusingmembershipqueries.KushilevitzandMansour(1993)usedthisalgorithmtoPAClearndecisiontreesandithassincefoundnumerousothersignicantapplications.MoreefcientversionsofthisalgorithmwerealsogivenbyLevin(1993),Bshouty,Jackson,andTamon(2004)andFeldman(2007).Recently,Gopalan,Kalai,andKlivans(2008)gaveanelegantalgorithmthatlearnsdecisiontreesagnosticallyovertheuniformdistributionandusesmembershipqueries.1.1OurContributionInthisworkwestudythepowerofmembershipqueriesintheagnosticlearningmodel.Theques-tionofwhetherornotmembershipqueriescanaidinagnosticlearningwasrstaskedbyKearnsetal.(1994)whoconjecturedthattheanswerisno.Tothebestofourknowledge,thequestionhasnotbeenaddressedpriortoourwork.Wepresenttworesultsonthisquestion.Intherstresultweprovethateveryconceptclasslearnableagnosticallywithmembershipqueriesisalsolearnableag-nosticallywithoutmembershipqueries(seeTh.6foraformalstatement).ThisprovestheconjectureofKearnsetal.(1994).Thereductionwegivemodiesthedistributionofexamplesandthereforeisonlyvalidfordistribution-independentlearning,thatis,whenasinglelearningalgorithmisusedforeverydistributionovertheexamples.Thesimpleproofofthisresultexplainswhytheknowndistribution-independentagnosticlearningalgorithmdonotusemembershipqueries(Kearnsetal.,1994;Kalaietal.,2008a,b).Theproofofthisresultalsoshowsequivalenceoftwostandardagnosticmodels:theoneinwhichexamplesarelabeledbyanunrestrictedfunctionandtheoneinwhichexamplescomefromajointdistributionoverthedomainandthelabels.Oursecondresultisaproofthatthereexistsaconceptclassthatisagnosticallylearnablewithmembershipqueriesovertheuniformdistributiononf0;1gnbuthardtolearninthesamesettingwithoutmembershipqueries(seeTh.8foraformalstatement).Thisresultisbasedonthemostbasiccryptographicassumption,namelytheexistenceofone-wayfunctions.NotethatanunconditionalseparationofthesetwomodelswouldimplyNP6=P.Cryptographicassumptionsareessentialfornumerousotherhardnessresultsinlearningtheory(cf.,KearnsandValiant,1994;Kharitonov,1995).Ourconstructionisbasedontheuseofpseudorandomfunctionfamilies,list-decodablecodesandavariantofanideafromtheworkofElbaz,Lee,Servedio,andWan(2007).Sections4.1and4.2describethetechniqueanditsrelationtopriorworkinmoredetail.164 ONTHEPOWEROFMEMBERSHIPQUERIESINAGNOSTICLEARNINGThisresultsis,perhaps,unsurprisingsinceagnosticlearningofparitieswithrespecttotheuni-formdistributionfromrandomexamplesonlyiscommonlyconsideredhardandisknowntobeequivalenttolearningofparitieswithrandomnoise(Feldmanetal.,2006),alongstandingopenproblemwhichitselfisequivalenttodecodingofrandomlinearcodes,along-standingopenprob-lemincodingtheory.ThebestknownalgorithmforthisproblemrunsintimeO(2n=logn)(Blumetal.,2003;Feldmanetal.,2006).Ifoneassumesthatlearningofparitieswithnoiseisintractablethenitimmediatelyfollowsthatmembershipqueriesareprovablyhelpfulinagnosticlearningovertheuniformdistributiononf0;1gn.Thegoalofourresultistoreplacethisassumptionbyapossi-blyweakerandmoregeneralcryptographicassumption.Itisknownthatiflearningofparitieswithnoiseishardthenone-wayfunctionsexist(Blumetal.,1993)but,forallweknow,itispossiblethattheconverseisnottrue.Theproofofourresulthoweverissubstantiallylessstraightforwardthanonemightexpect(andthantheanalogousseparationforPAClearning).Herethemainobstacleisthesameasinprovingpositiveresultsforagnosticlearning:therequirementsofthemodelimposeseverelimitsonconceptclassesforwhichtheagnosticguaranteescanbeprovablysatised.1.2OrganizationFollowingthepreliminaries,ourrstresultisdescribedinSection3.ThesecondresultappearsinSection4.2.PreliminariesLetXdenotethedomainortheinputspaceofalearningproblem.Thedomainoftheproblemsthatwestudyisf0;1gn,orthen-dimensionalBooleanhypercube.AconceptoverXisaf1;1gfunctionoverthedomainandaconceptclassCisasetofconceptsoverX.Theunknownfunctionf2Cthatalearningalgorithmistryingtolearnisreferredtoasthetargetconcept.AparityfunctionisafunctionequaltotheXORofsomesubsetofvariables.ForaBooleanvectora2f0;1gnwedenetheparityfunctionca(x)=(1)ax=(1)inaixi.Wedenotetheconceptclassofparityfunctionsfcaja2f0;1gngbyPAR.Ak-juntaisafunctionthatdependsonlyonkvariables.Arepresentationclassisaconceptclassdenedbyprovidingaspecicwaytorepresenteachfunctionintheconceptclass.Infactalltheclassesoffunctionsthatwediscussarerepresentationclasses.Weoftenrefertoarepresentationclasssimplyasconceptclasswhentherepresentationisimplicitinthedescriptionoftheclass.ForarepresentationclassF,wesaythatanalgorithmoutputsf2FifthealgorithmoutputsfintherepresentationassociatedwithF.2.1PACLearningModelThelearningmodelsdiscussedinthisworkarebasedonValiant'swell-knownPACmodel(Valiant,1984).Inthismodel,foraconceptfanddistributionDoverX,anexampleoracleEX(D;f)istheoraclethat,uponrequest,returnsanexamplehx;f(x)iwherexischosenrandomlywithrespecttoD.Fore0wesaythatafunctionge-approximatesafunctionfwithrespecttodistributionDifPrD[f(x)=g(x)]1e.InthePAClearningmodelthelearnerisgivenaccesstoEX(D;f)wherefisassumedtobelongtoaxedconceptclassC.165 FELDMANDenition1ForarepresentationclassC,wesaythatanalgorithmAlgPAClearnsC,ifforeverye�0,d�0,f2C,anddistributionDoverX,Alg,givenaccesstoEX(D;f),outputs,withprobabilityatleast1d,ahypothesishthate-approximatesf.Thelearningalgorithmisefcientifitsrunningtimeandthetimetoevaluateharepolynomialin1=e;1=dandthesizesofthelearningproblem.HerebythesizewerefertothemaximumdescriptionlengthofanelementinX(e.g.,nwhenX=f0;1gn)plusthemaximumdescriptionlengthofanelementinCintherepresentationassociatedwithC.AnalgorithmissaidtoweaklylearnCifitproducesahypothesishthat(121p(s))-approximatesfforsomepolynomialp().2.2AgnosticLearningModelTheagnosticlearningmodelwasintroducedbyHaussler(1992)andKearnsetal.(1994)inordertomodelsituationsinwhichtheassumptionthatexamplesarelabeledbysomef2Cdoesnothold.InitsleastrestrictedversiontheexamplesaregeneratedfromsomeunknowndistributionAoverXf1;1g.ThegoalofanagnosticlearningalgorithmforaconceptclassCistoproduceahypothesiswhoseerroronexamplesgeneratedfromAisclosetothebestpossiblebyaconceptfromC.ClassCisreferredtoasthetouchstoneclassinthissetting.Moregenerally,themodelallowsspecicationoftheassumptionsmadebyalearningalgorithmbydescribingasetAofdistributionsoverXf1;1gthatrestrictsthedistributionsoverXf1;1gseenbyalearningalgorithm.SuchAisreferredtoastheassumptionclass.AnydistributionAoverXf1;1gcanbedescribeduniquelybyitsmarginaldistributionDoverXandtheexpectationofbgivenx.Thatis,werefertoadistributionAoverXf1;1gbyapair(DA;fA)whereDA(z)=Prhx;biA[x=z]andfA(z)=Ehx;biA[bjz=x]:Formally,foraBooleanfunctionhandadistributionA=(D;f)overXf1;1g,wedeneD(A;h)=Prhx;biA[h(x)6=b]=ED[jf(x)h(x)j=2]:Similarly,foraconceptclassC,deneD(A;C)=infh2CfD(A;h)g:Kearnsetal.(1994)deneagnosticlearningasfollows.Denition2AnalgorithmAlgagnosticallylearnsarepresentationclassCbyarepresentationclassHassumingAifforeverye�0;d�0,A2A,AlggivenaccesstoexamplesdrawnrandomlyfromA,outputs,withprobabilityatleast1d,ahypothesish2HsuchthatD(A;h)D(A;C)+e.Thelearningalgorithmisefcientifitrunsintimepolynomial1=e;log(1=d)ands(thesizeofthelearningproblem).IfH=Cthen,byanalogywiththePACmodel,thelearningisreferredtoasproper.WedropthereferencetoHtoindicatethatCislearnablebysomerepresentationclass.Anumberofspecialcasesoftheabovedenitionarecommonlyconsidered(andoftenreferredtoastheagnosticlearningmodel).InfullyagnosticlearningAisthesetofalldistributionsoverXf1;1g.Anotherversionassumesthatexamplesarelabeledbyanunrestrictedfunction.Thatis,thesetAcontainsdistributionA=(D;f)foreveryBooleanfunctionfanddistributionD.Note166 ONTHEPOWEROFMEMBERSHIPQUERIESINAGNOSTICLEARNINGthataccesstorandomexamplesfromA=(D;f)isequivalenttoaccesstoEX(D;f).FollowingKearnsetal.(1994),werefertothisversionasagnosticPAClearning.Theorem6impliesthattheseversionsareessentiallyequivalent.Indistribution-specicversionsofthismodelforevery(D;f)2A,Dequalstosomexeddistributionknowninadvance.WealsonotethattheagnosticPAClearningmodelcanalsobethoughtofasamodelofadver-sarialclassicationnoise.Bydenition,aBooleanfunctiongdiffersfromsomefunctionf2ConD(g;C)fractionofthedomain.ThereforegcanbethoughtofasfcorruptedbynoiseofrateDD(f;C).Unlikeintherandomclassicationnoisemodel,thepointsonwhichaconceptcanbecorruptedareunrestrictedandthereforethenoiseisreferredtoasadversarial.2.2.1UNIFORMCONVERGENCEAnaturalapproachtoagnosticlearningistorstdrawasampleofxedsizeandthenchooseahypothesisthatbesttstheobservedlabels.TheconditionsinwhichthisapproachissuccessfulwerestudiedinworksofDudley(1978),Pollard(1984),Haussler(1992),Vapnik(1998)andothers.TheygiveanumberofconditionsonthehypothesisclassHthatguaranteeuniformconvergenceofempiricalerrortothetrueerror.Thatis,existenceofafunctionmH(e;d)suchthatforeverydistributionAoverexamples,everyh2H,e�0,d�0,theempiricalerrorofhonsampleofmH(e;d)examplesrandomlychosenfromAis,withprobabilityatleast1d,withineofD(A;h).WedenotetheempiricalerrorofhonsampleSbyD(S;h).IntheBooleancase,thefollowingresultofVapnikandChervonenkis(1971)willbesufcientforourpurposes.Theorem3LetHbeaconceptclassoverXofVCdimensiond.ThenforeverydistributionAoverXf1;1g,everyh2H,e�0,d�0,andsampleSofsizem=O((dlog(d=e)+log(1=d))=e2)randomlydrawnwithrespecttoA,Pr[jD(A;h)D(S;h)je]d:InfactasimpleuniformconvergenceresultbasedonthecardinalityofthefunctionclassfollowseasilyfromChernoffbounds(Haussler,1992).ThatisTheorem3holdsform=O(logjHj=e2log(1=d)).Thisresultwouldalsobesufcientforourpurposesbutmightgivesomewhatweakerbounds.MembershipQueriesAmembershiporacleforafunctionfistheoraclethat,givenanypointz2f0;1gn,returnsthevaluef(z)(Valiant,1984).WedenoteitbyMEM(f).WerefertoagnosticPAClearningwithaccesstoMEM(f),wherefistheunknownfunctionthatlabelstheexamples,asagnosticPAC+MQlearning.Similarly,onecanextendthedenitionofamembershiporacletofullyagnosticlearning.ForadistributionAoverXf1;1g,letMEM(A)betheoraclethat,uponqueryz,returnsb2f1;1gwithprobabilityPrA[(x;b)jx=z].WesaythatMEM(A)ispersistentifgiventhesamequerytheoraclerespondswiththesamelabel.WhenlearningwithpersistentmembershipqueriesthelearningalgorithmisallowedtofailwithsomenegligibleprobabilityovertheanswersofMEM(A).ThisisnecessarytoaccountforprobabilitythattheanswersofMEM(A)mightbenot“representative”ofA(amoreformalargumentcanbefoundforexampleintheworkofGoldmanetal.2001).167 FELDMAN2.4List-DecodableCodesAswehavementionedearlier,agnosticlearningcanbeseenasrecoveryofanunknownconceptfrompossiblymaliciouserrors.Therefore,encodingofinformationthatallowsrecoveryfromer-rors,orerror-correctingcodes,canbeusefulinthedesignofagnosticlearningalgorithms.Inourconstructionwewillusebinarylist-decodableerror-correctingcodes.Alist-decodablecodeisacodethatallowsrecoveryfromerrorswhenthenumberoferrorsislargerthanthedistanceofthecode,andhencethereismorethanonevalidwaytodecodethecorruptedencoding,eachgivingadifferentmessage(seeforexamplethebookofvanLint1998).List-decodingofthecodegivesthelistofallthemessagescorrespondingtothevalidwaystodecodethecorruptencoding.Formally,letC:f0;1gu!f0;1gvbeabinarycodeofmessagelengthuandblocklengthv.Ourconstructionrequiresefcientencodingandefcientlist-decodingfrom1=2gfractionoferrorsforag�0thatwewilldenelater.Specically,Efcientencodingalgorithm.Foranyz2f0;1guandjv,C(z)j(thejthbitofC(z))iscomputableintimepolynomialinuandlogv.Efcientlist-decodingfrom(1=2g0)verrorsintimepolynomialinuand1=g0foranyg0g.Thatis,analgorithmthatgivenoracleaccesstothebitsofstringy2f0;1gv,producesthelistofallmessageszsuchthatPrj2[v][C(z)j6=yj]1=2g0(intimepolynomialinuand1=g0).OurmainresultisachievedusingtheReed-SolomoncodeconcatenatedwiththeHadamardcodeforwhichthelist-decodingalgorithmwasgivenbyGuruswamiandSudan(2000).Theircodehasthedesiredpropertiesforv=O(u2=g4).Inthedescriptionofourconstruction,forsimplicity,weusethemorefamiliarbutexponentiallylongerHadamardcode.2.4.1HADAMARDCODETheHadamardcodeencodesavectora2f0;1gnasthevaluesoftheparityfunctioncaonallthepointsinf0;1gn(thatisthelengthoftheencodingis2n).Itisconvenienttodescribelist-decodingoftheHadamardcodeusingFourieranalysisoverf0;1gnthatiscommonlyusedinthecontextoflearningwithrespecttotheuniformdistribution(Linial,Mansour,andNisan,1993).WenowbrieyreviewanumberofsimplefactsontheFourierrepresentationoffunctionsoverf0;1gnandreferthereadertoasurveybyMansour(1994)formoredetails.InthediscussionbelowallprobabilitiesandexpectationsaretakenwithrespecttotheuniformdistributionUunlessspecicallystatedotherwise.Deneaninnerproductoftworeal-valuedfunctionsoverf0;1gntobehf;gi=Ex[f(x)g(x)].Thetechniqueisbasedonthefactthatthesetofallparityfunctionsfca(x)ga2f0;1gnformsanor-thonormalbasisofthelinearspaceofreal-valuedfunctionsoverf0;1gnwiththeaboveinnerprod-uct.Thisfactimpliesthatanyreal-valuedfunctionfoverf0;1gncanbeuniquelyrepresentedasalinearcombinationofparities,thatisf(x)=åa2f0;1gnˆf(a)ca(x).Thecoefcientˆf(a)iscalledFouriercoefcientoffonaandequalsEx[f(x)ca(x)];aiscalledtheindexofˆf(a).WesaythataFouriercoefcientˆf(a)isq-heavyifjˆf(a)jq.LetL2(f)=Ex[(f(x))2]1=2.Parseval'sidentitystatesthat(L2(f))2=Ex[(f(x))2]=åaˆf2(a):LetA=(U;f)beadistributionoverf0;1gnf1;1gwithuniformmarginaldistributionoverf0;1gn.Fouriercoefcientˆf(a)canbeeasilyrelatedtotheerrorofca(x)onA.Thatis,Prhx;biA[b6=ca(x)]=(1ˆf(a))=2:(1)168 ONTHEPOWEROFMEMBERSHIPQUERIESINAGNOSTICLEARNINGTherefore,bothlist-decodingoftheHadamardcodeandagnosticlearningofparitiesamounttondingthelargest(within2e)Fouriercoefcientoff(x).TherstalgorithmforthistaskwasgivenbyGoldreichandLevin(1989).Givenaccesstoamembershiporacle,foreverye�0theiralgorithmcanefcientlyndalle-heavyFouriercoefcients.Theorem4(GoldreichandLevin,1989)ThereexistsanalgorithmGLthatforeverydistributionA=(U;f)andeverye;d�0,givenaccesstoMEM(A),GL(e;d)returns,withprobabilityatleast1d,asetofindicesTf0;1gnthatcontainsallasuchthatjˆf(a)jeandforalla2T,jˆf(a)je=2.Furthermore,thealgorithmrunsintimepolynomialinn,1=eandlog(1=d).NotethatbyParseval'sidentity,theconditionjˆf(a)je=2impliesthatthereareatmost4=e2elementsinT.2.5Pseudo-randomFunctionFamiliesAkeypartofourconstructioninSection4willbebasedontheuseofpseudorandomfunctionsfamiliesdenedbyGoldreich,Goldwasser,andMicali(1986).Denition5AfunctionfamilyF=fFng¥=1whereFn=fpzgz2f0;1gnisapseudorandomfunctionfamilyofBooleanfunctionsoverf0;1gnifThereexistsapolynomialtimealgorithmthatforeveryn,givenz2f0;1gnandx2f0;1gncomputespz(x).AnyadversaryMwhoseresourcesareboundedbyapolynomialinncandistinguishbetweenafunctionpz(wherez2f0;1gnischosenrandomlyandkeptsecret)andatotallyrandomfunctionfromf0;1gntof1;1gonlywithnegligibleprobability.Thatis,foreveryproba-bilisticpolynomialtimeMwithanoracleaccesstoafunctionfromf0;1gntof1;1gthereexistsanegligiblefunctionn(n),jPr[Mpz(1n)=1]Pr[Mr(1n)=1]jn(n);wherepzisafunctionrandomlyanduniformlychosenfromFnandrisarandomlychosenfunctionfromf0;1gntof1;1g.TheprobabilityistakenovertherandomchoiceofpzorrandthecoinipsofM.ResultsofH astadetal.(1999)andGoldreichetal.(1986)giveaconstructionofpseudorandomfunctionfamiliesbasedontheexistenceofone-wayfunctions.3.Distribution-IndependentAgnosticLearningInthissectionweshowthatindistribution-independentagnosticlearningmembershipqueriesdonothelp.Inaddition,weprovethatfullyagnosticlearningisequivalenttoagnosticPAClearning.Ourproofisbasedontwosimpleobservationsaboutagnosticlearningviaempiricalerrormini-mization.Valuesoftheunknownfunctiononpointsoutsideofthesamplecanbesettoanyvaluewithoutchangingthebesttbyafunctionfromthetouchstoneclass.Thereforemembershipqueriesdonotmakeempiricalerrorminimizationeasier.Inaddition,pointswithcontradictinglabelsdonotinuencethecomplexityofempiricalerrorminimizationsinceanyfunctionhasthesameerroronpairsofcontradictinglabels.Wewillnowprovidetheformalstatementofthisresult.169 FELDMANTheorem6LetAlgbeanalgorithmthatagnosticallyPAC+MQlearnsaconceptclassCintimeT(s;e;d)andoutputsahypothesisinarepresentationclassH(s;e).ThenCis(fully)agnosticallylearnablebyH(s;e=2)intimeT(s;e=2;d=2)+O(dlog(d=e)+log(1=d))=e2),wheredistheVCdimensionofH(s;e=2)[C.ProofLetA=(D;f)beadistributionoverXf1;1g.Ourreductionworksasfollows.StartbydrawingmexamplesfromAformtobedenedlater.DenotethissamplebyS.LetS0beSwithallcontradictingpairsofexamplesremoved,thatisforeachexamplehx;1iweremoveittogetherwithoneexamplehx;1i.Everyfunctionhasthesameerrorrateof1=2onexamplesinSnS0.Thereforeforeveryfunctionh,D(S;h)=D(S0;h)jS0j+jSnS0j=2jSj=D(S0;h)jS0jm+mjS0j2m(2)andhenceD(S;C)=D(S0;C)jS0jm+mjS0j2m:(3)Letf(x)denotethefunctionequaltobifhx;bi2S0andequalto1otherwise.LetUS0denotetheuniformdistributionoverS0.GiventhesampleS0wecaneasilysimulatetheexampleoracleEX(US0;f)andMEM(f).WerunAlg(e=2;d=2)withthesesoraclesanddenoteitsoutputbyh.Note,thatthissimulatesAintheagnosticPAC+MQsettingoverdistribution(US0;f).BythedenitionofUS0,foranyBooleanfunctiong(x),PrUS0[f(x)6=g(x)]=1jS0j fx2S0jf(x)6=g(x)g =D(S0;g):Thatis,theerrorofanyfunctiongonUS0isexactlytheempiricalerrorofgonsampleS0.ThusD((US0;f);h)=D(S0;h)andD((US0;f);C)=D(S0;C).BythecorrectnessofAlg,withprobabilityatleast1d=2,D(S0;h)D(S0;C)+e=2.ByEquations(2)and(3)wethusobtainthatD(S;h)=D(S0;h)jS0jm+mjS0j2m(D(S0;C)+e2)jS0jm+mjS0j2m=D(S;C)+e2jS0jm:ThereforeD(S;h)D(S;C)+e=2.WecanapplytheVCdimension-baseduniformconvergenceresultsforH(s;e=2)[C(Theorem3)toconcludethatform(e=4;d=4)=Odlog(d=e)+log(1=d)e2;withprobabilityatleast1d=2,D(A;h)D(S;h)+e4andD(S;C)+e4D(A;C).Finally,weobtainthatwithprobabilityatleast1d,D(A;h)D(S;h)+e4D(S;C)+3e4D(A;C)+e:Iteasytoverifythattherunningtimeandhypothesisspaceofthisalgorithmareasclaimed.NotethatifAlgisefcientthend(s;e=2)ispolynomialinsand1=eand,inparticular,theobtainedalgorithmisefcient.Inaddition,inplaceofVC-dimonecanusetheuniformconvergenceresultbasedonthecardinalityofthehypothesisspace.ThedescriptionlengthofahypothesisoutputbyAlgispolynomialinsand1=eandhenceinthiscaseapolynomialnumberofsampleswillberequiredtosimulateAlg.170 ONTHEPOWEROFMEMBERSHIPQUERIESINAGNOSTICLEARNINGRemark7WenotethatwhilethisproofisgivenforthestrongestversionofagnosticlearninginwhichtheerrorofanagnosticalgorithmisboundedbyD(A;C)+e,itcanbeeasilyextendedtoweakerformsofagnosticlearning,suchasalgorithmsthatonlyguaranteeerrorboundedbyaD(A;C)+b+eforsomea1andb0.Thisistruesincethereductionaddsatmoste=2totheerroroftheoriginalalgorithm.4.LearningwithRespecttotheUniformDistributionInthissectionweshowthatwhenlearningwithrespecttotheuniformdistributionoverf0;1gn,membershipqueriesarehelpful.Specically,weshowthatifone-wayfunctionsexist,thenthereexistsaconceptclassCthatisnotagnosticallyPAClearnable(evenweakly)withrespecttotheuniformdistributionbutisagnosticallylearnableovertheuniformdistributiongivenmembershipqueries.Ouragnosticlearningalgorithmissuccessfulonlywhene1=p(n)forapolynomialpxedinadvance(thedenitionofCdependsonp).Whilethisisslightlyweakerthanrequiredbythedenitionofthemodelitstillexhibitsthegapbetweenagnosticlearningwithandwithoutmembershipqueries.WeremarkthatanumberofknownPACandagnosticlearningalgorithmsareefcientonlyforrestrictedvaluesofe(O'DonnellandServedio,2006;Gopalanetal.,2008;Kalaietal.,2008a).Theorem8Foreverypolynomialp(),thereexistsaconceptclassCpoverf0;1gnsuchthat,1.thereexistsnoefcientalgorithmthatweaklyPAClearnsCpwithrespecttotheuniformdistributionoverf0;1gn;2.thereexistsarandomizedalgorithmAgnLearnthatforeverydistributionA=(U;f)overf0;1gnf1;1gandeverye1=p(n);d�0,givenaccesstoMEM(A),withprobabilityatleast1d,ndshsuchthatD(A;h)D(A;Cpn)+e.TheprobabilityistakenoverthecoinipsofMEM(A)andAgnLearn.AgnLearnrunsintimepolynomialinnandlog(1=d).4.1BackgroundWerstshowwhysomeoftheknownseparationresultswillnotworkintheagnosticsetting.Itiswell-knownthatthePACmodelwithmembershipqueriesisstrictlystrongerthanthePACmodelwithoutmembershipqueries(underthesamecryptographicassumption).TheseparationresultisobtainedbyusingaconceptclassCthatisnotPAClearnableandaugmentingeachconceptf2Cwiththeencodingoffinaxedpartofthedomain.ThisencodingisreadableusingmembershipqueriesandthereforeanMQalgorithmcan“learn”theaugmentedCbyqueryingthepointsthatcontaintheencoding.Ontheotherhand,withoverwhelmingprobabilitythisencodingwillnotbeobservedinrandomexamplesandthereforedoesnothelplearningfromrandomexamples.Thissimpleapproachwouldfailintheagnosticsetting.TheunknownfunctionmightberandomonthepartofthedomainthatcontainstheencodingandequaltoaconceptfromCelsewhere.TheagreementoftheunknownfunctionwithaconceptfromCisalmost1butmembershipqueriesonthepointsofencodingwillnotyieldanyusefulinformation.AsimilarproblemariseswithencodingschemesusedintheseparationresultsofElbazetal.(2007)andFeldmanandShah(2009).TheretoothesecretencodingcanberenderedunusablebyafunctionthatagreeswithaconceptinConasignicantfractionofthedomain.171 FELDMAN4.2OutlineWestartbypresentingsomeoftheintuitionbehindourconstruction.Asinmostotherseparationresultsourgoalistocreateaconceptclassthatisnotlearnablefromuniformexamplesbutincludesanencodingoftheunknownfunctionthatisreadableusingmembershipqueries.Werstnotethatinorderforthisapproachtoworkintheagnosticsettingthesecretencodinghastobe“spread”overatleast12efractionoff0;1gn.ToseethisletfbeaconceptandletSf0;1gnbethesubsetofthedomainwheretheencodingoffiscontained.Assume,forsimplicity,thatwithouttheencodingthelearningalgorithmcannotpredictfonS=f0;1gnnSwithanysignicantadvantageoverrandomguessing.Letf0beafunctionequaltofonSandtrulyrandomonS.ThenPr[f=f0](jSj+jSj=2)=2n=1=2+jSj2n+1:Ontheotherhand,f0doesnotcontainanyinformationabouttheencodingoffandtherefore,byourassumption,noefcientalgorithmcanproduceahypothesiswithadvantagesignicantlyhigherthan1=2onbothSandS.ThismeansthattheerrorofanyefcientalgorithmwillbehigherbyatleastjSj=2n+1thanthebestpossible.ToensurethatjSj=2n+1e,weneedjSj(12e)2n.Anotherrequirementthattheconstructionhastosatisfyisthattheencodingofthesecrethastoberesilienttoalmostanyamountofnoise.Inparticular,sincetheencodingisapartofthefunction,wealsoneedtobeabletoreconstructanencodingthatisclosetothebestpossible.Anencodingwiththispropertyisinessencealist-decodablebinarycode.InordertoachievethestrongestseparationresultwewillusethecodeofGuruswamiandSudan(2000)thatistheconcatenationofReed-SolomoncodewiththebinaryHadamardcode.However,tosimplifythepresentation,wewillusethemorefamiliarbinaryHadamardcodeinourconstruction.InSection4.6weprovidethedetailsontheuseoftheGuruswami-SudancodeinplaceoftheHadamardcode.TheHadamardcodeisequivalenttoencodingavectora2f0;1gnasthevaluesoftheparityfunctioncaonallpointsinf0;1gn.Thatis,nbitvectoraisencodedinto2nbitsgivenbyca(x)foreveryx2f0;1gn.Thismightappearquiteinefcientsincealearningalgorithmwillnotbeabletoreadallthebitsoftheencoding.HowevertheGoldreich-Levinalgorithmprovidesanefcientwaytorecovertheindicesofalltheparitiesthatagreewithagivenfunctionwithprobabilitysignicantlyhigherthan1=2(GoldreichandLevin,1989).ThereforetheHadamardcodecanbedecodedbyreadingthecodeinonlyapolynomialnumberof(appropriately-chosen)locations.Thenextproblemthatarisesisthattheencodingshouldnotbereadablefromrandomexamples.Aswehaveobservedearlier,wecannotsimply“hide”itonanegligiblefractionofthedomain.Specically,weneedtomakesurethatourHadamardencodingisnotrecoverablefromrandomexamples.Oursolutiontothisproblemistouseapseudo-randomfunctiontomakevaluesonrandomexamplesindistinguishablefromrandomcoinipsinthefollowingmanner.Leta2f0;1gnbethevectorwewanttoencodeandletpd:f0;1gn!f1;1gbeapseudo-randomfunctionfromsomepseudorandomfunctionfamilyF=fpbgb2f0;1gn.Wedeneafunctiong:f0;1gnf0;1gn!f1;1gasg(z;x)=pd(z)ca(x)(issimplytheproductinf1;1g).Thelabelofarandompoint(z;x)2f0;1g2nisaXORofapseudorandombitwithanindependentbitandthereforeispseudorandom.Valuesofapseudoran-domfunctionbonanypolynomialsetofdistinctpointsarepseudorandomandthereforerandompointswillhavepseudorandomlabelsaslongastheirzpartsaredistinct.Inasampleofpolynomial172 ONTHEPOWEROFMEMBERSHIPQUERIESINAGNOSTICLEARNINGinnsizeofrandomanduniformpointsfromf0;1g2nthishappenswithoverwhelmingprobabil-ityandthereforeg(z;x)isnotlearnablefromrandomexamples.Ontheotherhand,foraxedz,pd(z)ca(x)givesaHadamardencodingofaoritsnegation.Henceitispossibletondausingmembershipquerieswiththesameprex.AconstructionbasedonasimilarideawasusedbyElbazetal.(2007)intheirseparationresult.Finally,theproblemwiththeconstructionwehavesofaristhatwhileamembershipquerylearningalgorithmcanndthesecreta,itcannotpredictg(z;x)withoutknowingd.Thismeansthatwealsoneedtoprovidedtothelearningalgorithm.ItistemptingtousetheHadamardcodetoencodedtogetherwitha.However,abitoftheencodingofdisnolongerindependentofpd,andthereforethepreviousargumentdoesnothold.Weareunawareofanyconstructionsofpseudo-randomfunctionsthatwouldremainpseudorandomwhenthevalueofthefunctionis“mixed”withthedescriptionofthefunction(seetheworkofHaleviandKrawczyk(2007)foradiscussionofthisproblem).AnidenticalproblemalsoarisesintheconstructionofElbazetal.(2007).Theyusedanotherpseudorandomfunctionpd1to“hide”theencodingofd,thenusedanotherpseudorandomfunctionpd2to“hide”theencodingofd1andsoon.Thefractionofthedomainusedupfortheencodingofdiisbecomingprogressivelysmallerasigrows.IntheirconstructionaPAClearningalgorithmcanrecoverasmanyoftheencodingsasisrequiredtoreachaccuracye.Thismethodwouldnotbeeffectiveinourcase.First,intheagnosticsettingalltheencodingsbuttheoneusingthelargestfractionofthedomaincanbe“corrupted”.Thismakesthelargestencodingunrecover-ableandimpliesthatthebesteachievableisatmosthalfofthefractionofthedomainusedbythelargestencoding.Inaddition,intheagnosticsettingtheencodingofdiforeveryoddicanbecom-pletely“corrupted”makingalltheotherencodingsunrecoverable.Tosolvetheseproblemsinourconstructionwesplitthedomainintopequalpartsandonpartiweuseapseudorandomfunctionpdito“hide”theencodingofdjforallji.InFigure1weprovideaschematicviewofaconceptthatweconstruct(forp=4).Figure1:StructureofaconceptinCpnforp=4.ArrowfrompartitopartjindicatesthatthesecretkeytopartjisencodedusingtheHadamardcodeinparti.Thecrucialpropertyofthisconstructionisthattheunknownconceptcanbe“recovered”onallbutonepartofthedomain.Specically,theonlypartwheretheunknownconceptcannotbe“recovered”agnosticisthepartisuchforallj�iagreementofthetargetfunctionwitheverygd2Cpnonpartjiscloseto1=2andhencedjcannotberecovered.Therefore,bymakingthenumberofpartsplargerthan1=e,wecanmakesurethatthereexistsanefcientalgorithmthatndsahypothesiswiththeerrorwithineoftheoptimum.173 FELDMAN4.3TheConstructionWewillnowdescribetheconstructionformallyandgiveaproofofitscorrectness.Letp=p(n)beapolynomial,let`=logp(n)(weassumeforsimplicitythatp(n)isapowerof2)andletm=`+np.Werefertoanelementoff0;1gmbytriple(k;z;x)wherek2[p],z2f0;1gn,andx=(x1;x2;:::;xp1)2f0;1gn(p1):Herekindexestheencodings,zistheinputtothek-thpseudorandomfunctionandxistheinputtoaparityfunctiononn(p1)variablesthatencodesthesecretkeysforallpseudorandomfunctionsusedforencodings1throughk1.Formally,letd=(d1;d2;:::;dp1)beavectorinf0;1gn(p1)(whereeachdi2f0;1gn)andfork2[p]letd(k)=(d1;d2;:::;dk1;0n;:::;0n):LetF=fpygy2f0;1gbeapseudorandomfunctionfamily(Denition5).Wedenegd:f0;1gm!f1;1gasfollows:gd(k;z;x)=pdk(z)cd(k)(x):Finally,wedeneCpn=ngdjd2f0;1gn(p1)o:4.4HardnessofLearningCpnFromRandomExamplesWestartbyshowingthatCpnisnotagnosticallylearnablefromrandomanduniformexamplesonly.Infact,wewillshowthatitisnotevenweaklyPAClearnable.OurproofissimilartotheproofbyElbazetal.(2007)whoshowthatthesameholdsfortheconceptclasstheydene.Theorem9ThereexistsnoefcientalgorithmthatweaklyPAClearnsCpnwithrespecttotheuni-formdistributionoverf0;1gm.ProofInordertoprovetheclaimweshowthataweakPAClearningalgorithmforCpncanbeusedtodistinguishapseudorandomfunctionfamilyfromatrulyrandomfunction.AweaklearningalgorithmforCpnimpliesthateveryfunctioninCpncanbedistinguishedfromatrulyrandomfunctiononf0;1gm.If,ontheotherhand,inthecomputationofgd(k;z;x)weusedatrulyrandomfunctioninplaceofeachpdk(z)thentheresultinglabelswouldbetrulyrandomand,inparticular,unpredictable.Formally,letAlgbeaweaklearningalgorithmforCpnthat,withprobabilityatleast1d,producesahypothesiswitherrorofatmost1=21=q(m)andrunsintimet(m;1=d)forsomepolynomialst(;)andq().OurconceptclassCpnusesnumerouspseudorandomfunctionsfromFnandthereforeweuseaso-called“hybrid”argumenttoshowthatonecanreplaceasinglepdk(z)withatrulyrandomfunctiontocauseAlgtofail.For0ip,letO(i)denoteanoraclerandomlychosenaccordingtothefollowingprocedure.Firstchooserandomlyanduniformlypd1;pd2;:::;pdi2Fnandthenchooserandomlyanduniformlyri+1;ri+2;:::;rpfromthesetofallBooleanfunctionsoverf0;1gn.Uponrequestsuchanoraclereturnsanexampleh(k;z;x);biwhere(k;z;x)ischosenrandomlyanduniformlyfromf0;1gmandb=pdk(z)cd(k)(x)0ki;rk(z)ikp:174 ONTHEPOWEROFMEMBERSHIPQUERIESINAGNOSTICLEARNINGWenotethatinordertosimulatesuchanoracleitisnotneededtoexplicitlychooseri+1;ri+2;:::;rp(and,indeedthatwouldnotbepossibleinpolynomialtime).Insteadtheirvaluescanbegenerateduponrequestbyippingafaircoin.Thismeansthatforeveryi,O(i)canbechosenandthensimulatedintimepolynomialinmandthenumberofexamplesrequested.LetM(i)denotethealgorithmthatperformsthefollowingsteps.ChooseO(i)randomlyaccordingtotheaboveprocedure.SimulateAlgwithrandomexamplesfromO(i)andd=1=2.LethbetheoutputofAlg.Produceanestimate˜ehoftheerrorofhondistributiondenedbyO(i)that,withprobabilityatleast7=8,iswithin1=(3q(m))ofthetrueerror.ChernoffboundsimplythatthiscanbedoneusinganempiricalestimateonO(q2(m))randomsamples.Output1if˜eh1=22=(3q(m))and0otherwise.WedenotebyditheprobabilitythatM(i)outputs1.TheprobabilityistakenoveralltherandomchoicesmadebyM(i):therandomchoiceandsimulationofO(i),thecoinipsofAlgandtheestimationoftheerrorofh.Claim10dpd01=4.ProofToseethiswerstobservethatO(0)isdenedusingptrulyrandomfunctionsandthere-fore,theprobabilitythatthereexistsahypothesisofsizeatmostt(m;2)thathaserrorlessthan1=21=3q(m)issomenegligiblefunctionn(n).Inparticular,theerrorofthehypothesispro-ducedbyAlgisatleast1=21=3q(m)(withprobabilityatleast1n(n)).Thismeansthat˜eh1=22=(3q(m))onlyiftheestimationfails.Bythedenitionofourerrorestimationpro-cedure,thishappenswithprobabilityatmost1=8andtherefored01=8+n(n).Ontheotherhand,O(p)isequivalenttoEX(U;gd)forsomerandomlychosend.Thisimpliesthatwithprobabilityatleast1=2,Algoutputsahypothesiswitherrorofatmost1=21=q(m).Withprobabilityatleast7=8,˜eh1=22=(3q(m)),andhencedp7=16.Thisimpliesourclaim.WenowdescribeourdistinguisherMp,wherepdenotesthefunctiongiventoMasanoracle.LetOp(i)denotetheexampleoraclegeneratedbyusingpinplaceofpdiinthedenitionofO(i).Thatis,rstchooserandomlyanduniformlypd1;pd2;:::;pdi12Fnandthenchooserandomlyanduniformlyri+1;ri+2;:::;rpfromthesetofallBooleanfunctionsoverf0;1gn.UponrequestOp(i)returnsanexampleh(k;z;x);biwhere(k;z;x)ischosenrandomlyanduniformlyfromf0;1gmandb=8pdk(z)cd(k)(x)ki;p(z)cd(k)(x)k=i;rk(z)k�i:Similarly,wedenotebyMp(i)thealgorithmthatisthesameasM(i)butchoosesarandomOp(i)inplaceofO(i).ThedistinguishingtestMpchoosesarandomi2[p]andrunsMp(i).WerstobservethatifpischosenrandomlyfromFnthenchoosingandsimulatingarandomOp(i)isequivalenttochoosingandsimulatingarandomO(i).Thereforeforeveryi2[p],Mp(i)isequivalenttoM(i).ThisimpliesthatinthiscaseMpwilloutput1withprobability1påi2[p]di:175 FELDMANOntheotherhand,ifpischosenrandomlyfromthesetofallBooleanfunctionoverf0;1gnthenOp(i)isequivalenttoO(i1).ThereforeinthiscaseMpwilloutput1withprobability1påi2[p]di1:Therefore,byClaim10thisimpliesthatthedifferenceintheprobabilitythatMoutputs1inthesetwocasesis1påi2[p]di1påi2[p]di1=1p(dpd0)1=(4p);thatis,non-negligible.TheefciencyofMfollowsreadilyfromtheefciencyofAlgandtheefciencyofthestepswedescribed.ThisgivesusthecontradictiontothepseudorandomnesspropertyoffunctionfamilyF.4.5AgnosticLearningofCpnwithMembershipQueriesWenowdescribea(fully)agnosticlearningalgorithmforCpnthatusesmembershipqueriesandissuccessfulforanye1=p(n).Theorem11ThereexistsarandomizedalgorithmAgnLearnthatforeverydistributionA=(U;f)overf0;1gmf1;1gandeverye1=p(n);d�0,givenaccesstoMEM(A),withprobabilityatleast1d,ndshsuchthatD(A;h)D(A;Cpn)+e.TheprobabilityistakenoverthecoinipsofMEM(A)andAgnLearn.AgnLearnrunsintimepolynomialinmandlog(1=d).ProofLetgefore=(e1;e2;:::;ep1)2f0;1g(p1)nbethefunctionforwhichD(A;ge)=D(A;Cpn).Thegoalofouralgorithmistondthelargestjsuchthatonrandomexamplesfromthej-thpartofthedomain(i.e.,fork=j)Aagreeswiththeencodingofe(j)=(e1;e2;:::;ej1;0n;:::;0n)withprobabilityatleast1=2+e=4.UsingGoldreich-Levinalgorithmsuchjcanbeusedtorecovere(j)andthereforewillallowustoreconstructgeonallpoints(k;z;x)forkj.Forpointswithkj,ourhypothesiswillbeeitherconstant1orconstant-1,whicheverhasthehigheragreementwithA.Thisguaranteesthattheerroronthispartisatmost1=2.Bythedenitionofj,gehaserrorofatleast1=2e=41=(2p)1=2eonthispartofthedomainandthereforeourhypothesishaserrorclosetothatofge.WenowdescribeAgnLearnformally.Foreveryi2[p]andy2f0;1gn,letAi;ybeArestrictedtopointsinf0;1gmwithprexi;y.ThatisA=(U(p1)n;fi;y)wherefi;y(x)f(i;y;x)andU(p1)nistheuniformdistributionoverf0;1g(p1)n.NotethatMEM(Ai;y)canbesimulatedusingMEM(A):whenqueriedonapointx2f0;1g(p1)nMEM(Ai;y)returnstheanswerofMEM(A)onpoint(i;y;x).Further,foreachvectord2f0;1g(p1)nandb2f1;1g,lethd;i;bbedenedashd;i;b(k;z;x)=pdk(z)cd(k)(x)ki;bki:(4)(HerepdkisanelementofthepseudorandomfunctionfamilyFnusedintheconstruction.)AgnLearnperformsthefollowingsteps.176 ONTHEPOWEROFMEMBERSHIPQUERIESINAGNOSTICLEARNING1.InitializesH=fh1;h1g,whereh11andh11.2.Foreach2ip:(a)Choosesrindependentrandomanduniformpointsinf0;1gn,forrtobedenedlater.DenotetheobtainedsetofpointsbyYi.(b)Foreachy2Yi:i.RunsGL(e=4;1=2)overf0;1g(p1)nusingMEM(Ai;y).LetTdenotethesetofindicesofheavyFouriercoefcientsreturnedbyGL.ii.Foreachvectord2Tandb2f1;1gaddshd;i;btothesetofhypothesesH.3.Foreachh2H,estimatesD(A;h)towithinaccuracye=8andwithoverallcondence1d=2usingtheempiricalerroronrandomsamplesfromA.ChernoffboundsimplythatthiscanbedoneusingsamplesofsizeO(log(jHj=d)=e2).Denotetheestimateobtainedforahypothesishd;i;b2Hby˜Dd;i;b.4.Returnsh2Hwiththelowestempiricalerror.Claim12Forr=O(log(1=d)=e),withprobabilityatleast1d,AgnLearnreturnshsuchthatD(A;h)D(A;Cpn)+e.ProofWeshowthatinthesetHofhypothesesconsideredbyAgnLearntherewillbeahypothesish0suchthatD(A;h0)D(A;ge)+3e=4(withsufcientlyhighprobability).Theestimatesoftheerrorofeachhypothesisarewithine=8ofthetrueerrorandthereforethehypothesishwiththesmallestempiricalerrorwillsatisfyD(A;h)D(A;h0)+e=4D(A;ge)+e:Fori2[p],denoteDi=PrA[b6=ge(k;z;x)jk=i](hereandbelowbyprobabilitywithrespecttoAwemeanthatalabeledexampleh(k;z;x);biischosenrandomlyaccordingtoA).Bythedenition,1påi2[p]Di=D(A;ge):(5)Letjbethelargesti2[p]thatsatisesDi1=2e=4.Ifjisundened(whennoisatisesthecondition)thenbyEquation(5),D(A;ge)�1=2e=4.Eitherh1orh1haserrorofatmost1=2onAandthereforethereexistsh02HsuchthatD(A;h0)D(A;ge)+3e=4.Wecannowassumethatjiswell-dened.Fori2[p]andy2f0;1gndenoteDi;y=PrA[b6=ge(k;z;x)jk=i;z=y]=Prhx;biAi;y[b6=ge(i;y;x)]:Thefunctionge(j;y;x)equalspdj(y)ce(j)(x).Ifpdj(y)=1thenbyEquation(1)andthedenitionofAj;y,Dj;y=1cfj;y(e(j))2;177 FELDMANandthereforecfj;y(e(j))=12Dj;y.Ifpdj(y)=1thenDj;y=11cfj;y(e(j))2=1+cfj;y(e(j))2andthuscfj;y(e(j))=(12Dj;y).Ineithercasejcfj;y(e(j))j12Dj;y:(6)Bythedenition,Ey2f0;1gn[Di;y]=Di:Thisimpliesthatforarandomlyanduniformlychoseny,withprobabilityatleaste=4,Dj;y1=2e=8.ThisistruesinceotherwiseDj(1e4)(12e8)�12e4;contradictingthechoiceofj.TogetherwithEquation(6)weobtainthatforarandomlychoseny,withprobabilityatleaste=4,jcfj;y(e(j))je=4.Inthiscase,byTheorem4,GL(e=4;1=2)withaccesstoMEM(Aj;y)willreturne(j)withprobabilityatleast1=2(possibly,amongothervectors).Thismeansthate(j)willbefoundwithprobabilityatleaste=8.Bytakingr=8ln(2=d)=eweensurethatAgnLearnndse(j)withprobabilityatleast1d=2.NowletbjbetheconstantwiththelowesterroronexamplesfromAforwhichkj,thatisbj=sign(EA[bjkj]):Clearly,theerrorofbjonAwhenkjisatmost1=2.Bythedenitionofhe(j);j;bj(Equation4),he(j);j;bjequalsgeonpointsforwhichkjandequalsbjontherestofthedomain.ThereforeD(A;he(j);j;bj)=j1pPrA[b6=ge(k;z;x)jkj]+pj+1pPrA[b6=bjjkj]1p åijDi+pj+12!:Ontheotherhand,bythepropertiesofj,foralli�j,Di1=2e=4andthusD(A;ge)=1p åi2[p]Di!1p åijDi+(pj)12e4!:BycombiningtheseequationsweobtainthatD(A;he(j);j;bj)D(A;ge)12p+e43e4:Asnotedbefore,thisimpliestheclaim.178 ONTHEPOWEROFMEMBERSHIPQUERIESINAGNOSTICLEARNINGGivenClaim12,weonlyneedtocheckthattherunningtimeofAgnLearnispolynomialinmandlog(1=d).ByParseval'sidentitythereareO(1=e2)elementsineachsetofvectorsreturnedbyGLandr=8ln(2=d)=e.ThereforetheefciencyofGLimpliesthatHisfoundinpolynomialtimeandthesizeofHisO(plog(1=d)=e3).ThisimpliesthattheerrorestimationstepofAgnLearntakespolynomialtime.Remark13InTheorem11weassumedthatMEM(A)isnotpersistent.IfMEM(A)ispersistentthenexecutionsofGLfordifferenty'sarenotcompletelyindependentandGLmightfailwithsomenegligibleprobability.Asimpleandstandardmodicationoftheanalysis(asintheworkofBshoutyetal.(2004)forexample)canbeusedtoshowthattheprobabilityoffailureofAgnLearninthiscaseisnegligible.ThisimpliesthatAgnLearnagnosticallylearnsCpnfrompersistentmembershipqueries.BoundsoneInTheorem11Cpnisdenedoverf0;1gmform=np(n)+logp(n)andislearnableagnosticallyforanye1=p(n).Thismeansthatthisconstructioncannotachievedependenceonebeyond1=m.Toimprovethisdependenceweuseamoreefcientlist-decodablecodeinplaceoftheHadamardcode.Specically,weneedalist-decodablecodeC:f0;1gu!f0;1gvthatcanbelist-decodedfrom(1=2g0)verrorsintimepolynomialinuand1=g0foranyg0e=8.GuruswamiandSudan(2000)gavealist-decodingalgorithmfortheReed-SolomoncodeconcatenatedwiththeHadamardcodethathasthedesiredpropertiesforv=O(u2=e4).NotethatthisisexponentiallymoreefcientthantheHadamardcodeforwhichv=2u.Infactforthiscodewecanaffordtoreadthewholecorruptmessageinpolynomialtime.Thismeansthatwecanassumethattheoutputofthelist-decodingalgorithmisexact(andnotapproximateasinthecaseoflist-decodingusingtheHadamardcodeusingtheGoldreich-Levinalgorithm).Inourconstructionu=n(p(n)1).Toapplytheabovecodeweindexapositioninthecodeusinglogv=O(log(n=e))bits.Furtherwecanusepseudorandomfunctionsoverf0;1gn=2insteadoff0;1gninthedenitionofCpn.WewouldthenobtainthatthedimensionofCpnism=n=2+logv+logp(n)nforanypolynomialp(n)ande1=p(n).Thisimpliesthatourlearningalgorithmissuccessfulforeverye1=p(n)1=p(m).ItiseasytoverifythatTheorems9and11stillholdforthisvariantoftheconstructionandimplyTheorem8.5.DiscussionOurresultsclarifytheroleofmembershipqueriesinagnosticlearning.Theyimplythatinordertoextractanymeaningfulinformationfrommembershipqueriesthelearnerneedstohavesignicantpriorknowledgeaboutthedistributionofexamples.Specically,eitherthesetofpossibleclassica-tionfunctionshastoberestricted(asinthePACmodel)orthesetofpossiblemarginaldistributions(asindistribution-specicagnosticlearning).Ainterestingresultinthisdirectionwouldbeademonstrationthatmembershipqueriesareusefulfordistribution-specicagnosticlearningofanaturalconceptclasssuchashalfspaces.Finally,wewouldbeinterestedtoseeaproofthatmembershipqueriesareusefulindistribution-specicagnosticlearningthatplacesnorestrictionone.179 FELDMANAcknowledgmentsIthankParikshitGopalan,SalilVadhanandDavidWoodruffforvaluablediscussionsandcommentsonthisresearch.IalsothankanonymousreviewersofCOLT2008andJMLRfortheirnumeroushelpfulcomments.ReferencesD.Angluin.Queriesandconceptlearning.MachineLearning,2:319–342,1988.A.Blum,M.Furst,M.Kearns,andR.J.Lipton.Cryptographicprimitivesbasedonhardlearningproblems.InProceedingsofInternationalCryptologyConferenceonAdvancesinCryptology(CRYPTO),pages278–291,1993.A.Blum,A.Kalai,andH.Wasserman.Noise-tolerantlearning,theparityproblem,andthestatisti-calquerymodel.JournaloftheACM,50(4):506–519,2003.N.Bshouty.Exactlearningviathemonotonetheory.InformationandComputation,123(1):146–153,1995.N.Bshouty,J.Jackson,andC.Tamon.MoreefcientPAClearningofDNFwithmembershipqueriesundertheuniformdistribution.InProceedingsofCOLT,pages286–295,1999.N.Bshouty,J.Jackson,andC.Tamon.MoreefcientPAC-learningofDNFwithmembershipqueriesundertheuniformdistribution.JournalofComputerandSystemSciences,68(1):205–234,2004.R.Dudley.Centrallimittheoremsforempiricalmeasures.AnnalsofProbability,6(6):899–929,1978.A.Elbaz,H.Lee,R.Servedio,andA.Wan.Separatingmodelsoflearningfromcorrelatedanduncorrelateddata.JournalofMachineLearningResearch,8:277–290,2007.V.Feldman.Optimalhardnessresultsformaximizingagreementswithmonomials.InProceedingsofConferenceonComputationalComplexity(CCC),pages226–236,2006.V.Feldman.Attributeefcientandnon-adaptivelearningofparitiesandDNFexpressions.JournalofMachineLearningResearch,(8):1431–1460,2007.V.Feldman,P.Gopalan,S.Khot,andA.Ponuswami.Newresultsforlearningnoisyparitiesandhalfspaces.InProceedingsofFOCS,pages563–574,2006.V.FeldmanandS.Shah.Separatingmodelsoflearningwithfaultyteachers.TheoreticalComputerScience,doi:10.1016/j.tcs.2009.01.017,2009.S.A.Goldman,S.Kwek,andS.D.Scott.Agnosticlearningofgeometricpatterns.JournalofComputerandSystemSciences,62(1):123–151,2001.O.GoldreichandL.Levin.Ahard-corepredicateforallone-wayfunctions.InProceedingsofSTOC,pages25–32,1989.180 ONTHEPOWEROFMEMBERSHIPQUERIESINAGNOSTICLEARNINGO.Goldreich,S.Goldwasser,andS.Micali.Howtoconstructrandomfunctions.JournaloftheACM,33(4):792–807,1986.P.Gopalan,A.Kalai,andA.Klivans.Agnosticallylearningdecisiontrees.InProceedingsofSTOC,pages527–536,2008.G.GuruswamiandM.Sudan.Listdecodingalgorithmsforcertainconcatenatedcodes.InProceed-ingsofSTOC,pages181–190,2000.V.GuruswamiandP.Raghavendra.Hardnessoflearninghalfspaceswithnoise.InProceedingsofFOCS,pages543–552,2006.S.HaleviandH.Krawczyk.Securityunderkey-dependentinputs.InCCS'07:Proceedingsofthe14thACMconferenceonComputerandcommunicationssecurity,pages466–475,2007.J.H astad.Someoptimalinapproximabilityresults.JournaloftheACM,48(4):798–859,2001.ISSN0004-5411.J.H astad,R.Impagliazzo,L.Levin,andM.Luby.Apseudorandomgeneratorfromanyone-wayfunction.SIAMJournalonComputing,28(4):1364–1396,1999.D.Haussler.DecisiontheoreticgeneralizationsofthePACmodelforneuralnetandotherlearningapplications.InformationandComputation,100(1):78–150,1992.ISSN0890-5401.A.Kalai,A.Klivans,Y.Mansour,andR.Servedio.Agnosticallylearninghalfspaces.SIAMJournalonComputing,37(6):1777–1805,2008a.A.Kalai,Y.Mansour,andE.Verbin.Agnosticboostingandparitylearning.InProceedingsofSTOC,pages629–638,2008b.M.KearnsandL.Valiant.Cryptographiclimitationsonlearningbooleanformulaeandniteau-tomata.JournaloftheACM,41(1):67–95,1994.M.Kearns,R.Schapire,andL.Sellie.Towardefcientagnosticlearning.MachineLearning,17(2-3):115–141,1994.M.Kharitonov.Cryptographiclowerboundsforlearnabilityofbooleanfunctionsontheuniformdistribution.JournalofComputerandSystemSciences,50:600–610,1995.E.KushilevitzandY.Mansour.LearningdecisiontreesusingtheFourierspectrum.SIAMJournalonComputing,22(6):1331–1348,1993.W.S.Lee,P.L.Bartlett,andR.C.Williamson.Onefcientagnosticlearningoflinearcombinationsofbasisfunctions.InProceedingsofCOLT,pages369–376,1995.L.Levin.Randomnessandnon-determinism.JournalofSymbolicLogic,58(3):1102–1103,1993.N.Linial,Y.Mansour,andN.Nisan.Constantdepthcircuits,Fouriertransformandlearnability.JournaloftheACM,40(3):607–620,1993.181 FELDMANY.Mansour.Learningbooleanfunctionsviathefouriertransform.InV.P.Roychodhury,K.Y.Siu,andA.Orlitsky,editors,TheoreticalAdvancesinNeuralComputationandLearning,pages391–424.Kluwer,1994.R.O'DonnellandR.Servedio.Learningmonotonedecisiontreesinpolynomialtime.InProceed-ingsofIEEEConferenceonComputationalComplexity,pages213–225,2006.D.Pollard.ConvergenceofStochasticProcesses.Springer-Verlag,1984.L.G.Valiant.Atheoryofthelearnable.CommunicationsoftheACM,27(11):1134–1142,1984.J.H.vanLint.IntroductiontoCodingTheory.Springer,Berlin,1998.V.Vapnik.StatisticalLearningTheory.Wiley-Interscience,NewYork,1998.V.VapnikandA.Chervonenkis.Ontheuniformconvergenceofrelativefrequenciesofeventstotheirprobabilities.TheoryofProbab.anditsApplications,16(2):264–280,1971.182