/
The Multidimensional Wisdom of Crowds Peter Welinder Steve Branson Serge Belongie Pietro The Multidimensional Wisdom of Crowds Peter Welinder Steve Branson Serge Belongie Pietro

The Multidimensional Wisdom of Crowds Peter Welinder Steve Branson Serge Belongie Pietro - PDF document

lindy-dunigan
lindy-dunigan . @lindy-dunigan
Follow
594 views
Uploaded On 2014-12-18

The Multidimensional Wisdom of Crowds Peter Welinder Steve Branson Serge Belongie Pietro - PPT Presentation

edu sbransonsjb csucsdedu Abstract Distributing labeling tasks among hundreds or thousands of annotators is an in creasingly important method for annotating large datasets We present a method for estimating the underlying value eg the class of each i ID: 25852

edu sbransonsjb csucsdedu Abstract Distributing

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "The Multidimensional Wisdom of Crowds Pe..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

TheMultidimensionalWisdomofCrowds PeterWelinder1SteveBranson2SergeBelongie2PietroPerona11CaliforniaInstituteofTechnology,2UniversityofCalifornia,SanDiegofwelinder,peronag@caltech.edufsbranson,sjbg@cs.ucsd.eduAbstractDistributinglabelingtasksamonghundredsorthousandsofannotatorsisanin-creasinglyimportantmethodforannotatinglargedatasets.Wepresentamethodforestimatingtheunderlyingvalue(e.g.theclass)ofeachimagefrom(noisy)an-notationsprovidedbymultipleannotators.Ourmethodisbasedonamodeloftheimageformationandannotationprocess.EachimagehasdifferentcharacteristicsthatarerepresentedinanabstractEuclideanspace.Eachannotatorismodeledasamultidimensionalentitywithvariablesrepresentingcompetence,expertiseandbias.Thisallowsthemodeltodiscoverandrepresentgroupsofannotatorsthathavedifferentsetsofskillsandknowledge,aswellasgroupsofimagesthatdifferqualitatively.Wendthatourmodelpredictsgroundtruthlabelsonbothsyn-theticandrealdatamoreaccuratelythanstateoftheartmethods.Experimentsalsoshowthatourmodel,startingfromasetofbinarylabels,maydiscoverrichinformation,suchasdifferent“schoolsofthought”amongsttheannotators,andcangrouptogetherimagesbelongingtoseparatecategories.1IntroductionProducinglarge-scaletraining,validationandtestsetsisvitalformanyapplications.Mostoftenthisjobhastobecarriedout“byhand”andthusitisdelicate,expensive,andtedious.ServicessuchasAmazonMechanicalTurk(MTurk)havemadeiteasytodistributesimplelabelingtaskstohundredsofworkers.Such“crowdsourcing”isincreasinglypopularandhasbeenusedtoannotatelargedatasetsin,forexample,ComputerVision[8]andNaturalLanguageProcessing[7].Assomeannotatorsareunreliable,thecommonwisdomistocollectmultiplelabelsperexemplarandrelyon“majorityvoting”todeterminethecorrectlabel.Weproposeamodelfortheannotationprocesswiththegoalofobtainingmorereliablelabelswithasfewannotatorsaspossible.Ithasbeenobservedthatsomeannotatorsaremoreskilledandconsistentintheirlabelsthanothers.Wepostulatethattheabilityofannotatorsismultidimensional;thatis,anannotatormaybegoodatsomeaspectsofataskbutworseatothers.Annotatorsmayalsoattachdifferentcoststodifferentkindsoferrors,resultingindifferentbiasesfortheannotations.Furthermore,differentpiecesofdatamaybeeasierormoredifculttolabel.Allofthesefactorscontributetoa“noisy”annotationprocessresultingininconsistentlabels.Althoughapproachesformodelingcertainaspectsoftheannotationprocesshavebeenproposedinthepast[1,5,6,9,13,4,12],noattempthasbeenmadetoblendallcharacteristicsoftheprocessintoasingleuniedmodel.Thispaperhastwomaincontributions:(1)weimproveoncurrentstate-of-the-artmethodsforcrowdsourcingbyintroducingamorecomprehensiveandaccuratemodelofthehumanannota-tionprocess,and(2)weprovideinsightintothehumanannotationprocessbylearningaricherrepresentationthatdistinguishesamongstthedifferentsourcesofannotatorerror.Understandingtheannotationprocesscanbeimportanttowardquantifyingtheextenttowhichdatasetsconstructedfromhumandataare“groundtruth”.WeproposeagenerativeBayesianmodelfortheannotationprocess.Wedescribeaninferencealgorithmtoestimatethepropertiesofthedatabeinglabeledandtheannotatorslabelingthem.Weshowonsyntheticandrealdatathatthemodelcanbeusedtoestimatedatadifcultyandannotator1 Figure1:(a)SampleMTurktaskwhereannotatorswereaskedtoclickonimagesofIndigoBunting(describedinSection5.2).(b)Theimageformationprocess.Theclassvariablezimodelsiftheobject(IndigoBunting)willbepresent(zi=1)orabsent(zi=0)intheimage,whileanumberof“nuisancefactors”inuencetheappearanceoftheimage.Theimageisthentransformedintoalow-dimensionalrepresentationxiwhichcapturesthemainattributesthatareconsideredbyannotatorsinlabelingtheimage.(c)Probabilisticgraphicalmodeloftheentireannotationprocesswhereimageformationissummarizedbythenodesziandxi.Theobservedvariables,indicatedbyshadedcircles,aretheindexioftheimage,indexjoftheannotators,andvaluelijofthelabelprovidedbyannotatorjforimagei.Theannotationprocessisrepeatedforalliandformultiplejthusobtainingmultiplelabelsperimagewitheachannotatorlabelingmultipleimages(seeSection3).biases,whileidentifyingannotators'different“areasofstrength”.Whilemanyofourresultsarevalidforgenerallabelsandtasks,wefocusonthebinarylabelingofimages.2RelatedWorkTheadvantagesanddrawbacksofusingcrowdsourcingservicesforlabelinglargedatasetshavebeenexploredbyvariousauthors[2,7,8].Ingeneral,ithasbeenfoundthatmanylabelsareofhighquality[8],butafewsloppyannotatorsdolowqualitywork[7,12];thustheneedforefcientalgorithmsforintegratingthelabelsfrommanyannotators[5,12].Arelatedtopicisthatofusingpairedgamesforobtainingannotations,whichcanbeseenasaformofcrowdsourcing[10,11].Methodsforcombiningthelabelsfrommanydifferentannotatorshavebeenstudiedbefore.DawidandSkene[1]presentedamodelformulti-valuedannotationswherethebiasesandskillsoftheannotatorsweremodeledbyaconfusionmatrix.ThismodelwasgeneralizedandextendedtootherannotationtypesbyWelinderandPerona[12].Similarly,themodelpresentedbyRaykaretal.[4]consideredannotatorbiasinthecontextoftrainingbinaryclassierswithnoisylabels.Buildingontheseworks,ourmodelgoesastepfurtherinmodelingeachannotatorasamultidimensionalclassierinanabstractfeaturespace.WealsodrawinspirationfromWhitehilletal.[13],whomodeledbothannotatorcompetenceandimagedifculty,butdidnotconsiderannotatorbias.Ourmodelgeneralizes[13]byintroducingahigh-dimensionalconceptofimagedifcultyandcombin-ingitwithabroaderdenitionofannotatorcompetence.Otherapproacheshavebeenproposedfornon-binaryannotations[9,6,12].Bymodelingannotatorcompetenceandimagedifcultyasmultidimensionalquantities,ourapproachachievesbetterperformanceonrealdatathanpreviousmethodsandprovidesaricheroutputspaceforseparatinggroupsofannotatorsandimages.3TheAnnotationProcessAnannotator,indexedbyj,looksatimageIiandassignsitalabellij.Competentannotatorsprovideaccurateandpreciselabels,whileunskilledannotatorsprovideinconsistentlabels.Thereisalsothepossibilityofadversarialannotatorsassigninglabelsthatareoppositetothoseassignedbycompetentannotators.Annotatorsmayhavedifferentareasofstrength,orexpertise,andthusprovidemorereliablelabelsondifferentsubsetsofimages.Forexample,whenaskedtolabelimagescontainingduckssomeannotatorsmaybemoreawareofthedistinctionbetweenducksandgeesewhileothersmaybemoreawareofthedistinctionbetweenducks,grebes,andcormorants(visuallysimilarbirdspecies).Furthermore,differentannotatorsmayweigherrorsdifferently;oneannotatormaybeintolerantoffalsepositives,whileanotherismoreoptimisticandacceptsthecostofafewfalsepositivesinordertogetahigherdetectionrate.Lastly,thedifcultyoftheimagemayalsomatter.Adifcultorambiguousimagemaybelabeledinconsistentlyevenbycompetentannotators,whileaneasyimageislabeledconsistentlyevenbysloppyannotators.Inmodelingtheannotationprocess,allofthesefactorsshouldbeconsidered.2 Wemodeltheannotationprocessinasequenceofsteps.Nimagesareproducedbysomeimagecapture/collectionprocess.First,avariablezidecideswhichsetof“objects”contributetoproducinganimageIi.Forexample,zi2f0;1gmaydenotethepresence/absenceofaparticularbirdspecies.Anumberof“nuisancefactors,”suchasviewpointandpose,determinetheimage(seeFigure1).Eachimageistransformedbyadeterministic“visualtransformation”convertingpixelsintoavectoroftask-specicmeasurementsxi,representingmeasurementsthatareavailabletothevisualsystemofanidealannotator.Forexample,thexicouldbetheringratesoftask-relevantneuronsinthebrainofthebesthumanannotator.Anotherwaytothinkaboutxiisthatitisavectorofvisualattributes(beakshape,plumagecolor,taillengthetc)thattheannotatorwillconsiderwhendecidingonalabel.Theprocessoftransformingzitothe“signal”xiisstochasticanditisparameterizedbyz,whichaccountsforthevariabilityinimageformationduetothenuisancefactors.ThereareMannotatorsintotal,andthesetofannotatorsthatlabelimageiisdenotedbyJi.Anannotatorj2Ji,selectedtolabelimageIi,doesnothavedirectaccesstoxi,butrathertoyij=xi+nij,aversionofthesignalcorruptedbyannotator-specicandimage-specic“noise”nij.Thenoiseprocessmodelsdifferencesbetweenthemeasurementsthatareultimatelyavailabletoindividualannotators.Thesedifferencesmaybeduetovisualacuity,attention,directionofgaze,etc.Thestatisticsofthisnoisearedifferentfromannotatortoannotatorandareparametrizedbyj.Mostsignicantly,thevarianceofthenoisewillbelowerforcompetentannotators,astheyaremorelikelytohaveaccesstoaclearerandmoreconsistentrepresentationoftheimagethanconfusedorunskilledannotators.Thevectoryijcanbeunderstoodasaperceptualencodingthatencompassesallmajorcomponentsthataffectanannotator'sjudgmentonanannotationtask.Eachannotatorisparameterizedbyaunitvector^wj,whichmodelstheannotator'sindividualweightingoneachofthesecomponents.Inthisway,^wjencodesthetrainingorexpertiseoftheannotatorinamultidimensionalspace.Thescalarprojectionhyij;^wjiiscomparedtoathreshold^j.Ifthesignalisabovethethreshold,theannotatorassignsalabellij=1,andlij=0otherwise.4ModelandInferencePuttingtogethertheassumptionsoftheprevioussection,weobtainthegraphicalmodelshowninFigure1.WewillassumeaBayesiantreatment,withpriorsonallparameters.Thejointprobabilitydistribution,excludinghyper-parametersforbrevity,canbewrittenasp(L;z;x;y;;^w;^)=MYj=1p(j)p(^j)p(^wj)NYi=10@p(zi)p(xijzi)Yj2Jip(yijjxi;j)p(lijj^wj;^j;yij)1A;(1)wherewedenotez,x,y,,^,^w,andLtomeanthesetsofallthecorrespondingsubscriptedvariables.Thissectiondescribesfurtherassumptionsontheprobabilitydistributions.Theseas-sumptionsarenotnecessary;however,inpracticetheysimplifyinferencewithoutcompromisingthequalityoftheparameterestimates.Althoughbothziandlijmaybecontinuousormultivalueddiscreteinamoregeneraltreatmentofthemodel[12],wehenceforthassumethattheyarebinary,i.e.zi;lij2f0;1g.WeassumeaBernoulliprioronziwithp(zi=1)= ,andthatxiisnormallydistributed1withvariance2z,p(xijzi)=N(xi;z;2z);(2)wherez=�1ifzi=0andz=1ifzi=1(seeFigure2a).Ifxiandyijaremulti-dimensional,thenjisacovariancematrix.TheseassumptionsareequivalenttousingamixtureofGaussiansprioronxi.Thenoisyversionofthesignalxithatannotatorjsees,denotedbyyij,isassumedtobegeneratedbyaGaussianwithvariance2jcenteredatxi,thatisp(yijjxi;j)=N(yij;xi;2j)(seeFigure2b).Weassumethateachannotatorassignsthelabellijaccordingtoalinearclassier.Theclassierisparameterizedbyadirection^wjofadecisionplaneandabias^j.Thelabellijisdeterministicallychosen,i.e.lij=I(h^wj;yiji^j),whereI()istheindicatorfunction.Itispossibletointegrate 1Weusedtheparameters =0:5andz=0:8.3 Figure2:Assumptionsofthemodel.(a)Labelingismodeledinasignaldetectiontheoryframework,wherethesignalyijthatannotatorjseesforimageIiisproducedbyoneoftwoGaussiandistributions.Dependingonyijandannotatorparameterswjandj,theannotatorlabels1or0.(b)TheimagerepresentationxiisassumedtobegeneratedbyaGaussianmixturemodelwhereziselectsthecomponent.Thegureshows8differentrealizationsxi(x1;:::;x8),generatedfromthemixturemodel.Dependingontheannotatorj,noisenijisaddedtoxi.Thethreelowerplotsshowsthenoisedistributionsforthreedifferentannotators(A,B,C),withincreasing“incompetence”j.Thebiasesjoftheannotatorsareshownwiththeredbars.Imageno.4,representedbyx4,isthemostambiguousimage,asitisveryclosetotheoptimaldecisionplaneatxi=0.(c)Anexampleof2-dimensionalxi.Theredlineshowsthedecisionplaneforoneannotator.outyijandputlijindirectdependenceonxi,p(lij=1jxi;j;^j)=h^wj;xii�^j j;(3)where()isthecumulativestandardizednormaldistribution,asigmoidal-shapedfunction.Inordertoremovetheconstrainton^wjbeingadirection,i.e.k^wjk2=1,wereparameterizetheproblemwithwj=^wj=jandj=^j=j.Furthermore,toregularizewjandjduringinference,wegivethemGaussianpriorsparameterizedby and respectively.Theprioronjiscenteredattheoriginandisverybroad( =3).Fortheprioronwj,wekeptthecenterclosetotheorigintobeinitiallypessimisticoftheannotatorcompetence,andtoallowforadversarialannotators(mean1,std3).Allofthehyperparameterswerechosensomewhatarbitrarilytodeneascalefortheparameterspace,andinourexperimentswefoundthatresults(suchaserrorratesinFigure3)werequiteinsensitivetovariationsinthehyperparameters.ThemodiedEquation1becomes,p(L;x;w;)=MYj=1p(jj )p(wjj )NYi=10@p(xijz; )Yj2Jip(lijjxi;wj;j)1A:(4)TheonlyobservedvariablesinthemodelarethelabelsL=flijg,fromwhichtheotherparametershavetobeinferred.Sincewehavepriorsontheparameters,weproceedbyMAPestimation,wherewendtheoptimalparameters(x?;w?;?)bymaximizingtheposteriorontheparameters,(x?;w?;?)=argmaxx;w;p(x;w;jL)=argmaxx;w;m(x;w;);(5)wherewehavedenedm(x;w;)=logp(L;x;w;)fromEquation4.Thus,todoinference,weneedtooptimizem(x;w;)=NXi=1logp(xijz; )+MXj=1logp(wjj )+MXj=1logp(jj )+NXi=1Xj2Ji[lijlog(hwj;xii�j)+(1�lij)log(1�(hwj;xii�j))]:(6)Tomaximize(6)wecarryoutalternatingoptimizationusinggradientascent.WebeginbyxingthexparametersandoptimizingEquation6for(w;)usinggradientascent.Thenwex(w;)andoptimizeforxusinggradientascent,iteratingbetweenxingtheimageparametersandannotatorparametersbackandforth.Empirically,wehaveobservedthatthisoptimizationschemeusuallyconvergeswithin20iterations.4 2ixi=(x1i,x2i)wj=(w1j,w2j)!jp(xi|zi=0)p(xi|zi=1)!3!2!10 123!3!2!1012312345678(a)(b)(c)p(yij|zi=1)p(yij|zi=0)xiABCp(xi|zi=0)p(xi|zi=1)p(yij|xi)!j!jyijyij Figure3:(a)and(b)showthecorrelationbetweenthegroundtruthandestimatedparametersasthenumberofannotatorsincreasesonsyntheticdatafor1-dand2-dxiandwj.(c)Performanceofourmodelinpredictingzionthedatafrom(a),comparedtomajorityvoting,themodelof[1],andGLAD[13].(d)PerformanceonreallabelscollectedfromMTurk.SeeSection5.1fordetailson(a-c)andSection5.2fordetailson(d).Inthederivationofthemodelabove,thereisnorestrictiononthedimensionalityofxiandwj;theymaybeone-dimensionalscalarsorhigher-dimensionalvectors.Intheformercase,assuming^wj=1,themodelisequivalenttoastandardsignaldetectiontheoreticmodel[3]whereasignalyijisgeneratedbyoneoftwoNormaldistributionsp(yijjzi)=N(yijjz;s2)withvariances2=2z+2j,centeredon0=�1and1=1forzi=0andzi=1respectively(seeFigure2a).Insignaldetectiontheory,thesensitivityindex,conventionallydenotedd0,isameasureofhowwelltheannotatorcandiscriminatethetwovaluesofzi[14].ItisdenedastheMahalanobisdistancebetween0and1normalizedbys,d0=1�0 s=2 q 2z+2j:(7)Thus,thelowerj,thebettertheannotatorcandistinguishbetweenclassesofzi,andthemore“competent”heis.Thesensitivityindexcanalsobecomputeddirectlyfromthefalsealarmratefandhitratehusingd0=�1(h)��1(f)where�1()istheinverseofthecumulativenormaldistribution[14].Similarly,the“threshold”,whichisameasureofannotatorbias,canbecomputedby=�1 2��1(h)+�1(f).Alargepositivemeansthattheannotatorattributesahighcosttofalsepositives,whilealargenegativemeanstheannotatoravoidsfalsenegativemistakes.Undertheassumptionsofourmodel,isrelatedtojinourmodelbytherelation=^j=s.Inthecaseofhigherdimensionalxiandwj,eachcomponentofthexivectorcanbethoughtofasanattributeorahighlevelfeature.Forexample,thetaskmaybetolabelonlyimageswithaparticularbirdspecies,say“duck”,withlabel1,andallotherimageswith0.Someimagescontainnobirdsatall,whileotherimagescontainbirdssimilartoducks,suchasgeeseorgrebes.Someannotatorsmaybemoreawareofthedistinctionbetweenducksandgeeseandothersmaybemoreawareofthedistinctionbetweenducks,grebesandcormorants.Inthiscase,xicanbeconsideredtobe2-dimensional.Onedimensionrepresentsimageattributesthatareusefulinthedistinctionbetweenducksandgeese,andtheotherdimensionmodelsparametersthatareusefulindistinctionbetweenducksandgrebes(seeFigure2c).Presumablyallannotatorsseethesameattributes,signiedbyxi,buttheyusethemdifferently.Themodelcandistinguishbetweenannotatorswithpreferencesfordifferentattributes,asshowninSection5.2.Imagedifcultyisrepresentedinthemodelbythevalueofxi(seeFigure2b).Ifthereisaparticulargroundtruthdecisionplane,(w0;0),imagesIiwithxiclosetotheplanewillbemoredifcultforannotatorstolabel.Thisisbecausetheannotatorsseeanoisecorruptedversion,yij,ofxi.Howwelltheannotatorscanlabelaparticularimagedependsonboththeclosenessofxitothegroundtruthdecisionplaneandtheannotator's“noise”level,j.Ofcourse,iftheannotatorbiasjisfarfromthegroundtruthdecisionplane,thelabelsforimagesnearthegroundtruthdecisionplanewillbeconsistentforthatannotator,butnotnecessarilycorrect.5Experiments5.1SyntheticDataToexplorewhethertheinferenceprocedureestimatesimageandannotatorparametersaccurately,wetestedourmodelonsyntheticdatageneratedaccordingtothemodel'sassumptions.Similartotheexperimentalsetupin[13],wegenerated500syntheticimageparametersandsimulatedbetween4and20annotatorslabelingeachimage.Theprocedurewasrepeated40timestoreducethenoiseintheresults.WegeneratedtheannotatorparametersbyrandomlysamplingjfromaGammadistribution(shape1.5andscale0.3)andbiasesjfromaNormaldistributioncenteredat0withstandarddeviation5 Figure4:Ellipsedataset.(a)Theimagestobelabeledwerefuzzyellipses(orienteduniformlyfrom0to)enclosedindarkcircles.Thetaskwastoselectellipsesthatweremoreverticalthanhorizontal(theformeraremarkedwithgreencirclesinthegure).(b-d)Theimagedifcultyparametersxi,annotatorcompetence2=s,andbias^j=slearnedbyourmodelarecomparedtothegroundtruthequivalents.Thecloserxiisto0,themoreambiguous/difcultthediscriminationtask,correspondingtoellipsesthathavecloseto45orientation.0.5.Thedirectionofthedecisionplanewjwas+1withprobability0:99and�1withprobability0:01.Theimageparametersxiweregeneratedbyatwo-dimensionalGaussianmixturemodelwithtwocomponentsofstandarddeviation0.8centeredat-1and+1.Theimagegroundtruthlabelzi,andthusthemixturecomponentfromwhichxiwasgenerated,wassampledfromaBernoullidistributionwithp(zi=1)=0:5.Foreachtrial,wemeasuredthecorrelationbetweenthegroundtruthvaluesofeachparameterandthevaluesestimatedbythemodel.WeaveragedSpearman'srankcorrelationcoefcientforeachparameteroveralltrials.TheresultofthesimulatedlabelingprocessisshownFigure3a.Ascanbeseenfromthegure,themodelestimatestheparametersaccurately,withtheaccuracyincreasingasthenumberofannotatorslabelingeachimageincreases.Werepeatedasimilarexperimentwith2-dimensionalxiandwj(seeFigure3b).Asonewouldexpect,estimatinghigherdimensionalxiandwjrequiresmoredata.Wealsoexaminedhowwellourmodelestimatedthebinaryclassvalues,zi.Forcomparison,wealsotriedthreeothermethodsonthesamedata:asimplemajorityvotingruleforeachimage,thebias-competencemodelof[1],andtheGLADalgorithmfrom[13]2,whichmodels1-dimagedifcultyandannotatorcompetence,butnotbias.AscanbeseenfromFigure3c,ourmethodpresentsasmallbutconsistentimprovement.Inaseparateexperiment(notshown)wegeneratedsyntheticannotatorswithincreasingbiasparametersj.WefoundthatGLADperformsworsethanmajorityvotingwhenthevarianceinthebiasbetweendifferentannotatorsishigh( &0:8);thiswasexpectedasGLADdoesnotmodelannotatorbias.Similarly,increasingtheproportionofdifcultimagesdegradestheperformanceofthemodelfrom[1].Theperformanceofourmodelpointstothebenetsofmodelingallaspectsoftheannotationprocess.5.2HumanDataWenextconductedexperimentsonannotationresultsfromrealMTurkannotators.Tocomparetheperformanceofthedifferentmodelsonarealdiscriminationtask,weprepareddatasetof200imagesofbirds(100withIndigoBunting,and100withBlueGrosbeak),andasked40annotatorsperimageifitcontainedatleastoneIndigoBunting;thisisachallengingtask(seeFigure1).Theannotatorsweregivenadescriptionandexamplephotosofthetwobirdspecies.Figure3dshowshowtheperformancevariesasthenumberofannotatorsperimageisincreased.Wesampledasubsetoftheannotatorsforeachimage.Ourmodeldidbetterthantheotherapproachesalsoonthisdataset.Todemonstratethatannotatorcompetence,annotatorbias,imagedifculty,andmulti-dimensionaldecisionsurfacesareimportantreallifephenomenaaffectingtheannotationprocess,andtoquantifyourmodel'sabilitytoadapttoeachofthem,wetestedourmodelonthreedifferentimagedatasets:onebasedonpicturesofrotatedellipses,anotherbasedonsyntheticallygenerated“greebles”,andathirddatasetwithimagesofwaterbirds.EllipseDataset:Annotatorsweregiventhesimpletaskofselectingellipseswhichtheybelievedtobemoreverticalthanhorizontal.Thisdatasetwaschosentomakethemodel'spredictionsquan- 2WeusedtheimplementationofGLADavailableontherstauthor'swebsite:http://mplab.ucsd.edu/˜jake/Wevariedthe priorintheircodebetween1-10toachievebestperformance.6 Figure5:Estimatedimageparameters(symbols)andannotatordecisionplanes(lines)forthegreebleex-periment.Ourmodellearnstwoimageparameterdimensionsx1iandx2iwhichroughlycorrespondtocolorandheight,andidentiestwoclustersofannotatordecisionplanes,whichcorrectlycorrespondtoannotatorsprimedwithcolorinformation(greenlines)andheightinformation(redlines).Ontheleftareexampleimagesofclass1,whichareshorterandmoreyellow(redandbluedotsareuncorrelatedwithclass),andontherightareimagesofclass2,whicharetallerandmoregreen.CandFareeasyforallannotators,AandHaredifcultforannotatorsthatpreferheightbuteasyforannotatorsthatprefercolor,DandEaredifcultforannotatorsthatprefercolorbuteasyforannotatorsthatpreferheight,BandGaredifcultforallannotators.tiable,becausegroundtruthclasslabelsandellipseangleparametersareknowntousforeachtestimage(buthiddenfromtheinferencealgorithm).Bydenition,ellipsesatanangleof45areimpossibletoclassify,andweexpectthatimagesgraduallybecomeeasiertoclassifyastheanglemovesawayfrom45.Weusedatotalof180ellipseimages,withrotationanglevaryingfrom1-180,andcollectedlabelsfrom20MTurkannotatorsforeachimage.Inthisdataset,theestimatedimageparametersxiandannotatorparameterswjare1-dimensional,wherethemagnitudesencodeimagedifcultyandannotatorcompetencerespectively.Sincewehadgroundtruthlabels,wecouldcomputethefalsealarmandhitratesforeachannotator,andthuscomputeandd0forcomparisonwith^j=sand2=s(seeEquation7andfollowingtext).TheresultsinFigure4b-dshowthatannotatorcompetenceandbiasvaryamongannotators.More-over,thegureshowsthatourmodelaccuratelyestimatesimagedifculty,annotatorcompetence,andannotatorbiasondatafromrealMTurkannotators.GreebleDataset:Inthesecondexperiment,annotatorswereshownpicturesof“greebles”(seeFigure5)andweretoldthatthegreeblesbelongedtooneoftwoclasses.Someannotatorsweretoldthatthetwogreebleclassescouldbediscriminatedbyheight,whileothersweretoldtheycouldbediscriminatedbycolor(yellowishvs.green).Thiswasdonetoexplorethescenarioinwhichannotatorshavedifferenttypesofpriorknowledgeorabilities.Weusedatotalof200imageswith20annotatorslabelingeachimage.TheheightandcolorparametersforthetwotypesofgreebleswererandomlygeneratedaccordingtoGaussiandistributionswithcenters(1;1)and(�1;�1),andstandarddeviationsof0:8.TheresultsinFigure5showthatthemodelsuccessfullylearnedtwoclustersofannotatordecisionsurfaces,one(green)ofwhichrespondsmostlytotherstdimensionofxi(color)andanother(red)respondingmostlytotheseconddimensionofxi(height).Thesetwoclusterscoincidewiththesetsofannotatorsprimedwiththetwodifferentattributes.Additionally,forthesecondattribute,weobservedafew“adversarial”annotatorswhoselabelstendedtobeinvertedfromtheirtruevalues.Thiswasbecausetheinstructionstoourcolorannotationtaskwereambiguouslyworded,sothatsomeannotatorshadbecomeconfusedandhadinvertedtheirlabels.Ourmodelrobustlyhandlestheseadversariallabelsbyinvertingthesignofthe^wvector.WaterbirdDataset:Thegreebleexperimentshowsthatourmodelisabletosegregateannotatorslookingfordifferentattributesinimages.Toseewhetherthesamephenomenoncouldbeobservedinataskinvolvingimagesofrealobjects,weconstructedanimagedatasetofwaterbirds.Wecollected50photographseachofthebirdspeciesMallard,AmericanBlackDuck,CanadaGooseandRed-neckedGrebe.Inadditiontothe200imagesofwaterbirds,wealsoselected40imageswithoutanybirdsatall(suchasphotosofvariousnaturescenesandobjects)orwherebirdsweretoosmallbeseenclearly,making240imagesintotal.Foreachimage,weasked40annotatorsonMTurkiftheycouldseeaduckintheimage(onlyMallardsandAmericanBlackDucksareducks).Thehypothesis7 Figure6:EstimatedimageandannotatorparametersontheWaterbirdsdataset.Theannotatorswereaskedtoselectimagescontainingatleastone“duck”.Theestimatedxiparametersforeachimagearemarkedwithsymbolsthatarespecictotheclasstheimagebelongsto.Thearrowsshowthexicoordinatesofsomeexampleimages.Thegraylinesarethedecisionplanesoftheannotators.Thedarknessofthelinesisanindicatorofkwjk:darkergraymeansthemodelestimatedtheannotatortobemorecompetent.Noticehowtheannotators'decisionplanesfallroughlyintothreeclusters,markedbythebluecirclesanddiscussedinSection5.2.wasthatsomeannotatorswouldbeabletodiscriminateducksfromthetwootherbirdspecies,whileotherswouldconfuseduckswithgeeseand/orgrebes.Resultsfromtheexperiment,showninFigure6,suggestthatthereareatleastthreedifferentgroupsofannotators,thosewhoseparate:(1)ducksfromeverythingelse,(2)ducksandgrebesfromevery-thingelse,and(3)ducks,grebes,andgeesefromeverythingelse;seenumberedcirclesinFigure6.Interestingly,therstgroupofannotatorswasbetteratseparatingoutCanadageesethanRed-neckedgrebes.ThismaybebecauseCanadageesearequitedistinctivewiththeirlong,blacknecks,whilethegrebeshaveshorternecksandlookmoreduck-likeinmostposes.Therewerealsoafewoutlierannotatorsthatdidnotprovideanswersconsistentwithanyotherannotators.ThisisacommonphenomenononMTurk,whereasmallpercentageoftheannotatorswillprovidebadqualitylabelsinthehopeofstillgettingpaid[7].Wealsocomparedthelabelspredictedbythedifferentmodelstothegroundtruth.Majorityvotingperformedat68:3%correctlabels,GLADat60:4%,andourmodelperformedat75:4%.6ConclusionsWehaveproposedaBayesiangenerativeprobabilisticmodelfortheannotationprocess.Givenonlybinarylabelsofimagesfrommanydifferentannotators,itispossibletoinfernotonlytheunderlyingclass(orvalue)oftheimage,butalsoparameterssuchasimagedifcultyandannota-torcompetenceandbias.Furthermore,themodelrepresentsboththeimagesandtheannotatorsasmultidimensionalentities,withdifferenthighlevelattributesandstrengthsrespectively.Experi-mentswithimagesannotatedbyMTurkworkersshowthatindeeddifferentannotatorshavevariablecompetencelevelandwidelydifferentbiases,andthattheannotators'classicationcriterionisbestmodeledinmultidimensionalspace.Ultimately,ourmodelcanaccuratelyestimatethegroundtruthlabelsbyintegratingthelabelsprovidedbyseveralannotatorswithdifferentskills,anditdoessobetterthanthecurrentstateoftheartmethods.Besidesestimatinggroundtruthclassesfrombinarylabels,ourmodelprovidesinformationthatisvaluablefordeninglossfunctionsandfortrainingclassiers.Forexample,theimageparame-tersestimatedbyourmodelcouldbetakenintoaccountforweighingdifferenttrainingexamples,or,moregenerally,itcouldbeusedforasofterdenitionofgroundtruth.Furthermore,ournd-ingssuggestthatannotatorsfallintodifferentgroupsdependingontheirexpertiseandonhowtheyperceivethetask.Thiscouldbeusedtoselectannotatorsthatareexpertsoncertaintasksandtodiscoverdifferentschoolsofthoughtonhowtocarryoutagiventask.AcknowledgementsP.P.andP.W.weresupportedbyONRMURIGrant#N00014-06-1-0734andEVOLUT.ONR2.S.B.wassup-portedbyNSFCAREERGrant#0448615,NSFGrantAGS-0941760,ONRMURIGrant#N00014-08-1-0638,andaGoogleResearchAward.8 References[1]A.P.DawidandA.M.Skene.Maximumlikelihoodestimationofobservererror-ratesusingtheemalgorithm.J.Roy.StatisticalSociety,SeriesC,28(1):20–28,1979.1,2,5,6[2]J.Deng,W.Dong,R.Socher,L.-J.Li,K.Li,andL.Fei-Fei.ImageNet:ALarge-ScaleHierarchicalImageDatabase.InCVPR,2009.2[3]D.M.GreenandJ.M.Swets.Signaldetectiontheoryandpsychophysics.JohnWileyandSonsInc,NewYork,1966.5[4]V.C.Raykar,S.Yu,L.H.Zhao,A.Jerebko,C.Florin,G.H.Valadez,L.Bogoni,andL.Moy.SupervisedLearningfromMultipleExperts:Whomtotrustwheneveryoneliesabit.InICML,2009.1,2[5]V.S.Sheng,F.Provost,andP.G.Ipeirotis.Getanotherlabel?improvingdataqualityanddataminingusingmultiple,noisylabelers.InKDD,2008.1,2[6]P.Smyth,U.Fayyad,M.Burl,P.Perona,andP.Baldi.InferringgroundtruthfromsubjectivelabellingofVenusimages.NIPS,1995.1,2[7]R.Snow,B.O'Connor,D.Jurafsky,andA.Y.Ng.CheapandFast-ButisitGood?EvaluatingNon-ExpertAnnotationsforNaturalLanguageTasks.InEMNLP,2008.1,2,8[8]A.SorokinandD.Forsyth.Utilitydataannotationwithamazonmechanicalturk.InFirstIEEEWorkshoponInternetVisionatCVPR'08,2008.1,2[9]M.SpainandP.Perona.Someobjectsaremoreequalthanothers:measuringandpredictingimportance.InECCV,2008.1,2[10]L.vonAhnandL.Dabbish.Labelingimageswithacomputergame.InSIGCHIconferenceonHumanfactorsincomputingsystems,pages319–326,2004.2[11]L.vonAhn,B.Maurer,C.McMillen,D.Abraham,andM.Blum.reCAPTCHA:Human-basedcharacterrecognitionviawebsecuritymeasures.Science,321(5895):1465–1468,2008.2[12]PeterWelinderandPietroPerona.Onlinecrowdsourcing:ratingannotatorsandobtainingcost-effectivelabels.InIEEEConferenceonComputerVisionandPatternRecognitionWorkshops(ACVHL),2010.1,2,3[13]J.Whitehill,P.Ruvolo,T.Wu,J.Bergsma,andJ.Movellan.Whosevoteshouldcountmore:Optimalintegrationoflabelsfromlabelersofunknownexpertise.InNIPS,2009.1,2,5,6[14]T.D.Wickens.Elementarysignaldetectiontheory.OxfordUniversityPress,UnitedStates,2002.59