/
SpectralMethodsMeetEM:AProvablyOptimalAlgorithmforCrowdsourcing
... SpectralMethodsMeetEM:AProvablyOptimalAlgorithmforCrowdsourcing
...

SpectralMethodsMeetEM:AProvablyOptimalAlgorithmforCrowdsourcing ... - PDF document

ellena-manuel
ellena-manuel . @ellena-manuel
Follow
427 views
Uploaded On 2016-07-15

SpectralMethodsMeetEM:AProvablyOptimalAlgorithmforCrowdsourcing ... - PPT Presentation

YuchenZhangyXiChenDengyongZhouMichaelIJordanyyUniversityofCaliforniaBerkeleyBerkeleyCA94720fyuczhangjordangberkeleyeduNewYorkUniversityNewYorkNY10012xichennyueduMicrosoftResearch1Micro ID: 405502

YuchenZhangyXiChen]DengyongZhouMichaelI.JordanyyUniversityofCalifornia Berkeley Berkeley CA94720fyuczhang jordang@berkeley.edu]NewYorkUniversity NewYork NY10012xichen@nyu.eduMicrosoftResearch 1Micro

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "SpectralMethodsMeetEM:AProvablyOptimalAl..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

SpectralMethodsMeetEM:AProvablyOptimalAlgorithmforCrowdsourcing YuchenZhangyXiChen]DengyongZhouMichaelI.JordanyyUniversityofCalifornia,Berkeley,Berkeley,CA94720fyuczhang,jordang@berkeley.edu]NewYorkUniversity,NewYork,NY10012xichen@nyu.eduMicrosoftResearch,1MicrosoftWay,Redmond,WA98052dengyong.zhou@microsoft.comAbstractTheDawid-Skeneestimatorhasbeenwidelyusedforinferringthetruelabelsfromthenoisylabelsprovidedbynon-expertcrowdsourcingworkers.However,sincetheestimatormaximizesanon-convexlog-likelihoodfunction,itishardtotheoreticallyjustifyitsperformance.Inthispaper,weproposeatwo-stageef-cientalgorithmformulti-classcrowdlabelingproblems.Therststageusesthespectralmethodtoobtainaninitialestimateofparameters.ThenthesecondstagerenestheestimationbyoptimizingtheobjectivefunctionoftheDawid-SkeneestimatorviatheEMalgorithm.Weshowthatouralgorithmachievestheoptimalconvergencerateuptoalogarithmicfactor.Weconductextensiveexperimentsonsyntheticandrealdatasets.Experimentalresultsdemonstratethattheproposedalgorithmiscomparabletothemostaccurateempiricalapproach,whileoutper-formingseveralotherrecentlyproposedmethods.1IntroductionWiththeadventofonlinecrowdsourcingservicessuchasAmazonMechanicalTurk,crowdsourcinghasbecomeanappealingwaytocollectlabelsforlarge-scaledata.Althoughthisapproachhasvirtuesintermsofscalabilityandimmediateavailability,labelscollectedfromthecrowdcanbeoflowqualitysincecrowdsourcingworkersareoftennon-expertsandcanbeunreliable.Asaremedy,mostcrowdsourcingservicesresorttolabelingredundancy,collectingmultiplelabelsfromdifferentworkersforeachitem.Suchastrategyraisesafundamentalproblemincrowdsourcing:howtoinfertruelabelsfromnoisybutredundantworkerlabels?Forlabelingtaskswithkdifferentcategories,DawidandSkene[8]proposeamaximumlikelihoodapproachbasedontheExpectation-Maximization(EM)algorithm.Theyassumethateachworkerisassociatedwithakkconfusionmatrix,wherethe(l;c)-thentryrepresentstheprobabilitythatarandomlychoseniteminclasslislabeledasclasscbytheworker.Thetruelabelsandwork-erconfusionmatricesarejointlyestimatedbymaximizingthelikelihoodoftheobservedworkerlabels,wheretheunobservedtruelabelsaretreatedaslatentvariables.AlthoughthisEM-basedapproachhashadempiricalsuccess[21,20,19,26,6,25],thereisasyetnotheoreticalguaranteeforitsperformance.Arecenttheoreticalstudy[10]showsthattheglobaloptimalsolutionsoftheDawid-Skeneestimatorcanachieveminimaxratesofconvergenceinasimpliedscenario,wherethelabelingtaskisbinaryandeachworkerhasasingleparametertorepresentherlabelingaccura-cy(referredtoasa“one-coinmodel”inwhatfollows).However,sincethelikelihoodfunctionisnon-convex,thisguaranteeisnotoperationalbecausetheEMalgorithmmaygettrappedinalocaloptimum.SeveralalternativeapproacheshavebeendevelopedthataimtocircumventthetheoreticaldecienciesoftheEMalgorithm,stillinthecontextoftheone-coinmodel[14,15,11,7].Unfor-1 tunately,theyeitherfailtoachievetheoptimalratesordependonrestrictiveassumptionswhicharehardtojustifyinpractice.Weproposeacomputationallyefcientandprovablyoptimalalgorithmtosimultaneouslyestimatetruelabelsandworkerconfusionmatricesformulti-classlabelingproblems.Ourapproachisatwo-stageprocedure,inwhichwerstcomputeaninitialestimateofworkerconfusionmatricesusingthespectralmethod,andtheninthesecondstageweturntotheEMalgorithm.Undersomemildconditions,weshowthatthistwo-stageprocedureachievesminimaxratesofconvergenceuptoalogarithmicfactor,evenafteronlyoneiterationofEM.Inparticular,givenany2(0;1),weprovidetheboundsonthenumberofworkersandthenumberofitemssothatourmethodcancorrectlyestimatelabelsforallitemswithprobabilityatleast1�.Wealsoestablishalowerboundtodemonstratetheoptimalityofthisapproach.Further,weprovidebothupperandlowerboundsforestimatingtheconfusionmatrixofeachworkerandshowthatouralgorithmachievestheoptimalaccuracy.Thisworknotonlyprovidesanoptimalalgorithmforcrowdsourcingbutshedslightonunderstand-ingthegeneralmethodofmoments.EmpiricalstudiesshowthatwhenthespectralmethodisusedasaninitializationfortheEMalgorithm,itoutperformsEMwithrandominitialization[18,5].Thisworkprovidesaconcretewaytotheoreticallyjustifysuchobservations.Itisalsoknownthatstartingfromaroot-nconsistentestimatorobtainedbythespectralmethod,oneNewton-Raphsonstepleadstoanasymptoticallyoptimalestimator[17].However,obtainingaroot-nconsistentestimatorandperformingaNewton-Raphsonstepcanbedemandingcomputationally.Incontrast,ourinitializa-tiondoesn'tneedtoberoot-nconsistent,thusasmallportionofdatasufcestoinitialize.Moreover,performingoneiterationofEMiscomputationallymoreattractiveandnumericallymorerobustthanaNewton-Raphsonstepespeciallyforhigh-dimensionalproblems.2RelatedWorkManymethodshavebeenproposedtoaddresstheproblemofestimatingtruelabelsincrowdsourcing[23,20,22,11,19,26,7,15,14,25].Themethodsin[20,11,15,19,14,7]arebasedonthegenerativemodelproposedbyDawidandSkene[8].Inparticular,Ghoshetal.[11]proposeamethodbasedonSingularValueDecomposition(SVD)whichaddressesbinarylabelingproblemsundertheone-coinmodel.Theanalysisin[11]assumesthatthelabelingmatrixisfull,thatis,eachworkerlabelsallitems.Torelaxthisassumption,Dalvietal.[7]proposeanotherSVD-basedalgorithmwhichexplicitlyconsidersthesparsityofthelabelingmatrixinbothalgorithmdesignandtheoreticalanalysis.Kargeretal.proposeaniterativealgorithmforbinarylabelingproblemsundertheone-coinmodel[15]andextendittomulti-classlabelingtasksbyconvertingak-classproblemintok�1binaryproblems[14].Thislineofworkassumesthattasksareassignedtoworkersaccordingtoarandomregulargraph,thusimposingspecicconstraintsonthenumberofworkersandthenumberofitems.InSection5,wecompareourtheoreticalresultswiththatofexistingapproaches[11,7,15,14].Themethodsin[20,19,6]incorporateBayesianinferenceintotheDawid-Skeneestimatorbyassumingaprioroverconfusionmatrices.Zhouetal.[26,25]proposeaminimaxentropyprincipleforcrowdsourcingwhichleadstoanexponentialfamilymodelparameterizedwithworkerabilityanditemdifculty.Whenallitemshavezerodifculty,theexponentialfamilymodelreducestothegenerativemodelsuggestedbyDawidandSkene[8].OurmethodforinitializingtheEMalgorithmincrowdsourcingisinspiredbyrecentworkusingspectralmethodstoestimatelatentvariablemodels[3,1,4,2,5,27,12,13].Thebasicideainthislineofworkistocomputethird-orderempiricalmomentsfromthedataandthentoestimateparam-etersbycomputingacertainorthogonaldecompositionofatensorderivedfromthemoments.Giventhespecialsymmetricstructureofthemoments,thetensorfactorizationcanbecomputedefcient-lyusingtherobusttensorpowermethod[3].Aproblemwiththisapproachisthattheestimationerrorcanhaveapoordependenceontheconditionnumberofthesecond-ordermomentmatrixandthusempiricallyitsometimesperformsworsethanEMwithmultiplerandominitializations.Ourmethod,bycontrast,requiresonlyaroughinitializationfromthemomentofmoments;weshowthattheestimationerrordoesnotdependontheconditionnumber(seeTheorem2(b)).3ProblemSetupThroughoutthispaper,[a]denotestheintegersetf1;2;:::;agandb(A)denotestheb-thlargestsingularvalueofthematrixA.Supposethattherearemworkers,nitemsandkclasses.Thetrue2 Algorithm1:Estimatingconfusionmatrices Input:integerk,observedlabelszij2Rkfori2[m]andj2[n].Output:confusionmatrixestimatesbCi2Rkkfori2[m].(1)Partitiontheworkersintothreedisjointandnon-emptygroupG1,G2andG3.ComputethegroupaggregatedlabelsZgjbyEq.(1).(2)For(a;b;c)2f(2;3;1);(3;1;2);(1;2;3)g,computethesecondandthethirdordermomentscM22Rkk,cM32RkkkbyEq.(2a)-(2d),thencomputebCc2RkkandcW2Rkkbytensordecomposition:(a)ComputewhiteningmatrixbQ2Rkk(suchthatbQTcM2bQ=I)usingSVD.(b)Computeeigenvalue-eigenvectorpairsf(b h;bvh)gkh=1ofthewhitenedtensorcM3(bQ;bQ;bQ)byusingtherobusttensorpowermethod[3].Thencomputebwh=b �2handbh=(bQT)�1(b hbvh).(c)Forl=1;:::;k,setthel-thcolumnofbCcbysomebhwhosel-thcoordinatehasthegreatestcomponent,thensetthel-thdiagonalentryofcWbybwh.(3)ComputebCibyEq.(3). labelyjofitemj2[n]isassumedtobesampledfromaprobabilitydistributionP[yj=l]=wlwherefwl:l2[k]garepositivevaluessatisfyingPkl=1wl=1.Denotebyavectorzij2Rkthelabelthatworkeriassignstoitemj.Whentheassignedlabelisc;wewritezij=ec,whereecrepresentsthec-thcanonicalbasisvectorinRkinwhichthec-thentryis1andallotherentriesare0:Aworkermaynotlabeleveryitem.Letiindicatetheprobabilitythatworkerilabelsarandomlychosenitem.Ifitemjisnotlabeledbyworkeri,wewritezij=0.Ourgoalistoestimatethetruelabelsfyj:j2[n]gfromtheobservedlabelsfzij:i2[m];j2[n]g:Inordertoobtainanestimator,weneedtomakeassumptionsontheprocessofgeneratingobservedlabels.FollowingtheworkofDawidandSkene[8],weassumethattheprobabilitythatworkerilabelsaniteminclasslasclasscisindependentofanyparticularchosenitem,thatis,itisaconstantoverj2[n].Letusdenotetheconstantprobabilitybyilc:Letil=[il1il2ilk]T:ThematrixCi=[i1i2:::ik]2Rkkiscalledtheconfusionmatrixofworkeri.Besidesestimatingthetruelabels,wealsowanttoestimatetheconfusionmatrixforeachworker.4OurAlgorithmInthissection,wepresentanalgorithmtoestimateconfusionmatricesandtruelabels.Ouralgorithmconsistsoftwostages.Intherststage,wecomputeaninitialestimateofconfusionmatricesviathemethodofmoments.Inthesecondstage,weperformthestandardEMalgorithmbytakingtheresultoftheStage1asaninitialization.4.1Stage1:EstimatingConfusionMatricesPartitioningtheworkersintothreedisjointandnon-emptygroupsG1,G2andG3,theoutlineofthisstageisthefollowing:weusethespectralmethodtoestimatetheaveragedconfusionmatricesforthethreegroups,thenutilizethisintermediateestimatetoobtaintheconfusionmatrixofeachindividualworker.Inparticular,forg2f1;2;3gandj2[n],wecalculatetheaveragedlabelingwithineachgroupbyZgj:=1 jGgjXi2Ggzij:(1)Denotingtheaggregatedconfusionmatrixcolumnsbygl:=E(Zgjjyj=l)=1 jGgjPi2Ggiil;ourrststepistoestimateCg:=[g1;g2;:::;gk]andtoestimatethedistributionoftruelabels3 W:=diag(w1;w2;:::;wk).ThefollowingpropositionshowsthatwecansolveforCgandWfromthemomentsoffZgjg.Proposition1(Anandkumaretal.[3]).Assumethatthevectorsfg1;g2;:::;gkgarelinearlyindependentforeachg2f1;2;3g.Let(a;b;c)beapermutationoff1;2;3g.DeneZ0aj:=E[Zcj Zbj](E[Zaj Zbj])�1Zaj;Z0bj:=E[Zcj Zaj](E[Zbj Zaj])�1Zbj;M2:=E[Z0aj Z0bj]andM3:=E[Z0aj Z0bj Zcj];thenwehaveM2=Pkl=1wlcl clandM3=Pkl=1wlcl cl cl.Sinceweonlyhavenitesamples,theexpectationsinProposition1havetobeapproximatedbyempiricalmoments.Inparticular,theyarecomputedbyaveragingoverindicesj=1;2;:::;n.Foreachpermutation(a;b;c)2f(2;3;1);(3;1;2);(1;2;3)g,wecomputebZ0aj:=1 nnXj=1Zcj Zbj1 nnXj=1Zaj Zbj�1Zaj;(2a)bZ0bj:=1 nnXj=1Zcj Zaj1 nnXj=1Zbj Zaj�1Zbj;(2b)cM2:=1 nnXj=1bZ0aj bZ0bj;(2c)cM3:=1 nnXj=1bZ0aj bZ0bj Zcj:(2d)ThestatementofProposition1suggeststhatwecanrecoverthecolumnsofCcandthediagonalentriesofWbyoperatingonthemomentscM2andcM3.Thisisimplementedbythetensorfac-torizationmethodinAlgorithm1.Inparticular,thetensorfactorizationalgorithmreturnsasetofvectorsf(bh;bwh):h=1;:::;kg,whereeach(bh;bwh)estimatesaparticularcolumnofCc(forsomecl)andaparticulardiagonalentryofW(forsomewl).Itisimportanttonotethatthetensorfactorizationalgorithmdoesn'tprovideaone-to-onecorrespondencebetweentherecoveredcolum-nandthetruecolumnsofCc.Thus,b1;:::;bkrepresentsanarbitrarypermutationofthetruecolumns.Todiscovertheindexcorrespondence,wetakeeachbhandexamineitsgreatestcomponent.Weassumethatwithineachgroup,theprobabilityofassigningacorrectlabelisalwaysgreaterthantheprobabilityofassigninganyspecicincorrectlabel.Thisassumptionwillbemadepreciseinthenextsection.Asaconsequence,ifbhcorrespondstothel-thcolumnofCc,thenitsl-thcoordinateisexpectedtobegreaterthanothercoordinates.Thus,wesetthel-thcolumnofbCctosomevectorbhwhosel-thcoordinatehasthegreatestcomponent(iftherearemultiplesuchvectors,thenrandomlyselectoneofthem;ifthereisnosuchvector,thenrandomlyselectabh).Then,wesetthel-thdiagonalentryofcWtothescalarbwhassociatedwithbh.Notethatbyiteratingover(a;b;c)2f(2;3;1);(3;1;2);(1;2;3)g,weobtainbCcforc=1;2;3respectively.TherewillbethreecopiesofcWestimatingthesamematrixW—weaveragethemforthebestaccuracy.Inthesecondstep,weestimateeachindividualconfusionmatrixCi.ThefollowingpropositionshowsthatwecanrecoverCifromthemomentsoffzijg.See[24]fortheproof.Proposition2.Foranyg2f1;2;3gandanyi2Gg,leta2f1;2;3gnfggbeoneoftheremaininggroupindex.TheniCiW(Ca)T=E[zijZTaj]:Proposition2suggestsaplug-inestimatorforCi.WecomputebCiusingtheempiricalapproximationofE[zijZTaj]andusingthematricesbCa,bCb,cWobtainedintherststep.Concretely,wecalculatebCi:=normalize8:1 nnXj=1zijZTajcW(bCa)T�19=;;(3)4 wherethenormalizationoperatorrescalesthematrixcolumns,makingsurethateachcolumnsumstoone.TheoverallprocedureforStage1issummarizedinAlgorithm1.4.2Stage2:EMalgorithmThesecondstageisdevotedtoreningtheinitialestimateprovidedbyStage1.Thejointlikelihoodoftruelabelyjandobservedlabelszij,asafunctionofconfusionmatricesi,canbewrittenasL(;y;z):=nYj=1mYi=1kYc=1(iyjc)I(zij=ec):Byassumingauniformpriorovery,wemaximizethemarginallog-likelihoodfunction`():=log(Py2[k]nL(;y;z)).WerenetheinitialestimateofStage1bymaximizingtheobjectivefunc-tion,whichisimplementedbytheExpectationMaximization(EM)algorithm.TheEMalgorithmtakesthevaluesfbilcgprovidedasoutputbyStage1asinitialization,thenexecutesthefollowingE-stepandM-stepforatleastoneround.E-stepCalculatetheexpectedvalueofthelog-likelihoodfunction,withrespecttotheconditionaldistributionofygivenzunderthecurrentestimateof:Q():=Eyjzf;b[log(L(;y;z))]=nXj=1(kXl=1bqjllog mYi=1kYc=1(ilc)I(zij=ec)!);wherebqjl exp�Pmi=1Pkc=1I(zij=ec)log(bilc) Pkl0=1exp�Pmi=1Pkc=1I(zij=ec)log(bil0c)forj2[n],l2[k].(4)M-stepFindtheestimatebthatmaximizesthefunctionQ():bilc Pnj=1bqjlI(zij=ec) Pkc0=1Pnj=1bqjlI(zij=ec0)fori2[m],l2[k],c2[k].(5)Inpractice,wealternativelyexecutetheupdates(4)and(5),foroneiterationoruntilconvergence.Eachupdateincreasestheobjectivefunction`().Since`()isnotconcave,theEMupdatedoesn'tguaranteeconvergingtotheglobalmaximum.Itmayconvergetodistinctlocalstationarypointsfordifferentinitializations.Nevertheless,asweproveinthenextsection,itisguaranteedthattheEMalgorithmwilloutputstatisticallyoptimalestimatesoftruelabelsandworkerconfusionmatricesifitisinitializedbyAlgorithm1.5ConvergenceAnalysisTostateourmaintheoreticalresults,werstneedtointroducesomenotationandassumptions.Letwmin:=minfwlgkl=1andmin:=minfigmi=1bethesmallestportionoftruelabelsandthemostextremesparsitylevelofworkers.Ourrstassumptionassumesthatbothwminandminarestrictlypositive,thatis,everyclassandeveryworkercontributestothedataset.Oursecondassumptionassumesthattheconfusionmatricesforeachofthethreegroups,namelyC1,C2andC3,arenonsingular.Asaconsequence,ifwedenematricesSabandtensorsTabcforanya;b;c2f1;2;3gasSab:=kXl=1wlal bl=CaW(Cb)TandTabc:=kXl=1wlal bl cl;thentherewillbeapositivescalarLsuchthatk(Sab)L�0.Ourthirdassumptionassumesthatwithineachgroup,theaverageprobabilityofassigningacorrectlabelisalwayshigherthantheaverageprobabilityofassigninganyincorrectlabel.Tomakethis5 statementrigorous,wedeneaquantity:=ming2f1;2;3gminl2[k]minc2[k]nflgfgll�glcgindicatingthesmallestgapbetweendiagonalentriesandnon-diagonalentriesinthesameconfusionmatrixcolumn.Theassumptionrequiresbeingstrictlypositive.Notethatthisassumptionisgroup-based,thusdoesnotassumetheaccuracyofanyindividualworker.Finally,weintroduceaquantitythatmeasurestheaverageabilityofworkersinidentifyingdistinctlabels.FortwodiscretedistributionsPandQ,letDKL(P;Q):=PiP(i)log(P(i)=Q(i))repre-senttheKL-divergencebetweenPandQ.Sinceeachcolumnoftheconfusionmatrixrepresentsadiscretedistribution,wecandenethefollowingquantity: D=minl6=l01 mmXi=1iDKL(il;il0):(6)Thequantity DlowerboundstheaveragedKL-divergencebetweentwocolumns.If Disstrictlypositive,itmeansthateverypairoflabelscanbedistinguishedbyatleastonesubsetofworkers.Asthelastassumption,weassumethat Disstrictlypositive.Thefollowingtwotheoremscharacterizetheperformanceofouralgorithm.Wesplittheconver-genceanalysisintotwoparts.Theorem1characterizestheperformanceofAlgorithm1,providingsufcientconditionsforachievinganarbitrarilyaccurateinitialization.WeprovidetheproofofTheorem1inthelongversionofthispaper[24].Theorem1.Foranyscalar�0andanyscalarsatisfyingminn36k minwminL;2o,ifthenumberofitemsnsatisesn= k5log((k+m)=) 22minw2min13L;thentheconfusionmatricesreturnedbyAlgorithm1areboundedaskbCi�Cik1foralli2[m];withprobabilityatleast1�.Here,kk1denotestheelement-wise`1-normofamatrix.Theorem2characterizestheerrorrateinStage2.Itstatesthatwhenasufcientlyaccurateinitializationistaken,theupdates(4)and(5)renetheestimatesbandbytotheoptimalaccuracy.Seethelongversionofthispaper[24]fortheproof.Theorem2.Assumethatthereisapositivescalarsuchthatilcforall(i;l;c)2[m][k]2.Foranyscalar�0,ifconfusionmatricesbCiareinitializedinamannersuchthatkbCi�Cik1 :=min 2; D 16foralli2[m];(7)andthenumberofworkersmandthenumberofitemsnsatisfym= log(1=)log(kn=)+log(mn) Dandn= log(mk=) minwmin 2;then,forbandbqobtainedbyiterating(4)and(5)(foratleastoneround),withprobabilityatleast1�,(a)Lettingbyj=argmaxl2[k]bqjl,wehavethatbyj=yjholdsforallj2[n].(b)kbil�ilk2248log(2mk=) iwlnholdsforall(i;l)2[m][k].InTheorem2,theassumptionthatallconfusionmatrixentriesarelowerboundedby�0issomewhatrestrictive.Fordatasetsviolatingthisassumption,weenforcepositiveconfusionmatrixentriesbyaddingrandomnoise:Givenanyobservedlabelzij,wereplaceitbyarandomlabelinf1;:::;kgwithprobabilityk.Inthismodiedmodel,everyentryoftheconfusionmatrixislowerboundedby,sothatTheorem2holds.Therandomnoisemakestheconstant Dsmallerthanitsoriginalvalue,butthechangeisminorforsmall.6 Datasetname #classes #items #workers #workerlabels Bird 2 108 39 4,212 RTE 2 800 164 8,000 TREC 2 19,033 762 88,385 Dog 4 807 52 7,354 Web 5 2,665 177 15,567 Table1:Summaryofdatasetsusedintherealdataexperiment.Toseetheconsequenceoftheconvergenceanalysis,wetakeerrorrateinTheorem1equaltotheconstant denedinTheorem2.Thenwecombinethestatementsofthetwotheorems.Thisshowsthatifwechoosethenumberofworkersmandthenumberofitemsnsuchthatm=e 1 Dandn=e k5 2minw2min13Lminf2;( D)2g;(8)thatis,ifbothmandnarelowerboundedbyaproblem-specicconstantandlogarithmicterms,thenwithhighprobability,thepredictorbywillbeperfectlyaccurate,andtheestimatorbwillbeboundedaskbil�ilk22eO(1=(iwln)).Toshowtheoptimalityofthisconvergencerate,wepresentthefollowingminimaxlowerbounds.Again,see[24]fortheproof.Theorem3.Thereareuniversalconstantsc1�0andc2�0suchthat:(a)Foranyfilcg,figandanynumberofitemsn,ifthenumberofworkersm1=(4 D),theninfbysupv2[k]nEhnXj=1I(byj6=yj) filcg;fig;y=vic1n:(b)Foranyfwlg,fig,anyworker-itempair(m;n)andanypairofindices(i;l)2[m][k],wehaveinfbsup2RmkkEhkbil�ilk22 fwlg;figic2min1;1 iwln:Inpart(a)ofTheorem3,weseethatthenumberofworkersshouldbeatleast1=(4 D),otherwiseanypredictorwillmakemanymistakes.Thislowerboundmatchesoursufcientconditiononthenumberofworkersm(seeEq.(8)).Inpart(b),weseethatthebestpossibleestimateforilhas (1=(iwln))mean-squarederror.Itveriestheoptimalityofourestimatorbil.Itisworthnotingthattheconstraintonthenumberofitemsn(seeEq.(8))mightbeimprovable.Inrealdatasetsweusuallyhavenmsothattheoptimalityformismoreimportantthanforn.Itisworthcontrastingourconvergenceratewithexistingalgorithms.Ghoshetal.[11]andDalvietal.[7]proposedconsistentestimatorsforthebinaryone-coinmodel.Toattainanerrorrate,theiralgorithmsrequiremandnscalingwith1=2,whileouralgorithmonlyrequiresmandnscalingwithlog(1=).Kargeretal.[15,14]proposedalgorithmsforbothbinaryandmulti-classproblems.Theiralgorithmassumesthatworkersareassignedbyarandomregulargraph.Moreover,theiranalysisassumesthatthelimitofnumberofitemsgoestoinnity,orthatthenumberofworkersismanytimesthenumberofitems.Ouralgorithmnolongerrequirestheseassumptions.Wealsocompareouralgorithmwiththemajorityvotingestimator,wherethetruelabelissimplyestimatedbyamajorityvoteamongworkers.GaoandZhou[10]showedthatiftherearemanyspammersandfewexperts,themajorityvotingestimatorgivesalmostarandomguess.Incon-trast,ouralgorithmonlyrequiresm D=e (1)toguaranteegoodperformance.Sincem DistheaggregatedKL-divergence,asmallnumberofexpertsaresufcienttoensureitislargeenough.6ExperimentsInthissection,wereporttheresultsofempiricalstudiescomparingthealgorithmweproposeinSection4(referredtoasOpt-D&S)withavarietyofexistingmethodswhicharealsobasedonthegenerativemodelofDawidandSkene.Specically,wecomparetotheDawid&Skeneestimator7 (a)RTE(b)Dog(c)WebFigure1:ComparingMV-D&SandOpt-D&Swithdifferentthresholdingparameter.Thelabelpredictionerrorisplottedafterthe1stEMupdateandafterconvergence. Opt-D&S MV-D&S MajorityVoting KOS Ghosh-SVD EigenRatio Bird 10.09 11.11 24.07 11.11 27.78 27.78 RTE 7.12 7.12 10.31 39.75 49.13 9.00 TREC 29.80 30.02 34.86 51.96 42.99 43.96 Dog 16.89 16.66 19.58 31.72 – – Web 15.86 15.74 26.93 42.93 – – Table2:Errorrate(%)inpredictingtruelabelsonrealdata.initializedbymajorityvoting(referredtoasMV-D&S),thepuremajorityvotingestimator,themulti-classlabelingalgorithmproposedbyKargeretal.[14](referredtoasKOS),theSVD-basedalgorithmproposedbyGhoshetal.[11](referredtoasGhost-SVD)andthe“EigenvaluesofRatio”algorithmproposedbyDalvietal.[7](referredtoasEigenRatio).Theevaluationismadeonverealdatasets.Wecomparethecrowdsourcingalgorithmsonthreebinarytasksandtwomulti-classtasks.Binarytasksincludelabelingbirdspecies[22](Birddataset),recognizingtextualentailment[21](RTEdataset)andassessingthequalityofdocumentsintheTREC2011crowdsourcingtrack[16](TRECdataset).Multi-classtasksincludelabelingthebreedofdogsfromImageNet[9](Dogdataset)andjudgingtherelevanceofwebsearchresults[26](Webdataset).ThestatisticsforthevedatasetsaresummarizedinTable1.SincetheGhost-SVDalgorithmandtheEigenRatioalgorithmworkonbinarytasks,theyareevaluatedonlyontheBird,RTEandTRECdatasets.FortheMV-D&SandtheOpt-D&Smethods,weiteratetheirEMstepsuntilconvergence.Sinceentriesoftheconfusionmatrixarepositive,wendithelpfultoincorporatethispriorknowl-edgeintotheinitializationstageoftheOpt-D&Salgorithm.Inparticular,whenestimatingthecon-fusionmatrixentriesbyEq.(3),weaddanextracheckingstepbeforethenormalization,examiningifthematrixcomponentsaregreaterthanorequaltoasmallthreshold.Forcomponentsthataresmallerthan,theyareresetto.Thedefaultchoiceofthethresholdingparameteris=10�6.Later,wewillcomparetheOpt-D&Salgorithmwithrespecttodifferentchoicesof.Itisimpor-tanttonotethatthismodicationdoesn'tchangeourtheoreticalresult,sincethethresholdingisnotneededincasethattheinitializationerrorisboundedbyTheorem1.Table2summarizestheperformanceofeachmethod.TheMV-D&SandtheOpt-D&Salgorithmsconsistentlyoutperformtheothermethodsinpredictingthetruelabelofitems.TheKOSalgorithm,theGhost-SVDalgorithmandtheEigenRatioalgorithmyieldpoorerperformance,presumablyduetothefactthattheyrelyonidealizedassumptionsthatarenotmetbytherealdata.InFigure1,wecomparetheOpt-D&Salgorithmwithrespecttodifferentthresholdingparameters2f10�ig6i=1.Weplotresultsforthreedatasets(RET,Dog,Web),wheretheperformanceofMV-D&SisequaltoorslightlybetterthanthatofOpt-D&S.TheplotshowsthattheperformanceoftheOpt-D&Salgorithmisstableafterconvergence.ButattherstEMiterate,theerrorratesaremoresensitivetothechoiceof.AproperchoiceofmakesOpt-D&SoutperformMV-D&S.TheresultsuggeststhataproperinitializationcombinedwithoneEMiterateisgoodenoughforthepurposesofprediction.Inpractice,thebestchoiceofcanbeobtainedbycrossvalidation.8 10-6 10-5 10-4 10-3 10-2 10-1 0.15 0.2 0.25 0.3 0.35 ThresholdLabel prediction error Opt-D&S: 1st iteration Opt-D&S: 50th iteration MV-D&S: 1st iteration MV-D&S: 50th iteration 10-6 10-5 10-4 10-3 10-2 10-1 0.15 0.16 0.17 0.18 0.19 0.2 0.21 ThresholdLabel prediction error Opt-D&S: 1st iteration Opt-D&S: 50th iteration MV-D&S: 1st iteration MV-D&S: 50th iteration 10-6 10-5 10-4 10-3 10-2 10-1 0.08 0.1 0.12 0.14 0.16 0.18 0.2 0.22 ThresholdLabel prediction error Opt-D&S: 1st iteration Opt-D&S: 50th iteration MV-D&S: 1st iteration MV-D&S: 50th iteration References[1]A.Anandkumar,D.P.Foster,D.Hsu,S.M.Kakade,andY.-K.Liu.AspectralalgorithmforlatentDirichletallocation.arXivpreprint:1204.6703,2012.[2]A.Anandkumar,R.Ge,D.Hsu,andS.M.Kakade.Atensorspectralapproachtolearningmixedmem-bershipcommunitymodels.InAnnualConferenceonLearningTheory,2013.[3]A.Anandkumar,R.Ge,D.Hsu,S.M.Kakade,andM.Telgarsky.Tensordecompositionsforlearninglatentvariablemodels.arXivpreprint:1210.7559,2012.[4]A.Anandkumar,D.Hsu,andS.M.Kakade.AmethodofmomentsformixturemodelsandhiddenMarkovmodels.InAnnualConferenceonLearningTheory,2012.[5]A.T.ChagantyandP.Liang.Spectralexpertsforestimatingmixturesoflinearregressions.arXivpreprint:1306.3729,2013.[6]X.Chen,Q.Lin,andD.Zhou.Optimisticknowledgegradientpolicyforoptimalbudgetallocationincrowdsourcing.InProceedingsofICML,2013.[7]N.Dalvi,A.Dasgupta,R.Kumar,andV.Rastogi.Aggregatingcrowdsourcedbinaryratings.InProceed-ingsofWorldWideWebConference,2013.[8]A.P.DawidandA.M.Skene.Maximumlikelihoodestimationofobservererror-ratesusingtheEMalgorithm.JournaloftheRoyalStatisticalSociety,SeriesC,pages20–28,1979.[9]J.Deng,W.Dong,R.Socher,L.-J.Li,K.Li,andL.Fei-Fei.Imagenet:Alarge-scalehierarchicalimagedatabase.InIEEECVPR,2009.[10]C.GaoandD.Zhou.Minimaxoptimalconvergenceratesforestimatinggroundtruthfromcrowdsourcedlabels.arXivpreprintarXiv:1310.5764,2014.[11]A.Ghosh,S.Kale,andP.McAfee.Whomoderatesthemoderators?crowdsourcingabusedetectioninuser-generatedcontent.InProceedingsoftheACMConferenceonElectronicCommerce,2011.[12]D.Hsu,S.M.Kakade,andT.Zhang.AspectralalgorithmforlearninghiddenMarkovmodels.JournalofComputerandSystemSciences,78(5):1460–1480,2012.[13]P.JainandS.Oh.Learningmixturesofdiscreteproductdistributionsusingspectraldecompositions.arXivpreprint:1311.2972,2013.[14]D.R.Karger,S.Oh,andD.Shah.Efcientcrowdsourcingformulti-classlabeling.InACMSIGMET-RICS,2013.[15]D.R.Karger,S.Oh,andD.Shah.Budget-optimaltaskallocationforreliablecrowdsourcingsystems.OperationsResearch,62(1):1–24,2014.[16]M.LeaseandG.Kazai.OverviewoftheTREC2011crowdsourcingtrack.InProceedingsofTREC2011,2011.[17]E.LehmannandG.Casella.TheoryofPointEstimation.Springer,2ndedition,2003.[18]P.Liang.Partialinformationfromspectralmethods.NIPSSpectralLearningWorkshop,2013.[19]Q.Liu,J.Peng,andA.T.Ihler.Variationalinferenceforcrowdsourcing.InNIPS,2012.[20]V.C.Raykar,S.Yu,L.H.Zhao,G.H.Valadez,C.Florin,L.Bogoni,andL.Moy.Learningfromcrowds.JournalofMachineLearningResearch,11:1297–1322,2010.[21]R.Snow,B.O'Connor,D.Jurafsky,andA.Y.Ng.Cheapandfast—butisitgood?evaluatingnon-expertannotationsfornaturallanguagetasks.InProceedingsofEMNLP,2008.[22]P.Welinder,S.Branson,S.Belongie,andP.Perona.Themultidimensionalwisdomofcrowds.InNIPS,2010.[23]J.Whitehill,P.Ruvolo,T.Wu,J.Bergsma,andJ.R.Movellan.Whosevoteshouldcountmore:Optimalintegrationoflabelsfromlabelersofunknownexpertise.InNIPS,2009.[24]Y.Zhang,X.Chen,D.Zhou,andM.I.Jordan.SpectralmethodsmeetEM:Aprovablyoptimalalgorithmforcrowdsourcing.arXivpreprintarXiv:1406.3824,2014.[25]D.Zhou,Q.Liu,J.C.Platt,andC.Meek.Aggregatingordinallabelsfromcrowdsbyminimaxconditionalentropy.InProceedingsofICML,2014.[26]D.Zhou,J.C.Platt,S.Basu,andY.Mao.Learningfromthewisdomofcrowdsbyminimaxentropy.InNIPS,2012.[27]J.Zou,D.Hsu,D.Parkes,andR.Adams.Contrastivelearningusingspectralmethods.InNIPS,2013.9

Related Contents


Next Show more