/
FacingImbalancedDataRecommendationsfortheUseofPerformanceMetricsL FacingImbalancedDataRecommendationsfortheUseofPerformanceMetricsL

FacingImbalancedDataRecommendationsfortheUseofPerformanceMetricsL - PDF document

alida-meadow
alida-meadow . @alida-meadow
Follow
370 views
Uploaded On 2017-01-10

FacingImbalancedDataRecommendationsfortheUseofPerformanceMetricsL - PPT Presentation

laszlojeniieeeorg ftorrecscmuedu 2UniversityofPittsburghPittsburghPA jeffcohncscmuedu Abstract ID: 508059

laszlo.jeni@ieee.org ftorre@cs.cmu.edu 2UniversityofPittsburgh Pittsburgh jeffcohn@cs.cmu.edu Abstract

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "FacingImbalancedDataRecommendationsforth..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

FacingImbalancedDataRecommendationsfortheUseofPerformanceMetricsL´aszl´oA.Jeni1,JeffreyF.Cohn1,2,andFernandoDeLaTorre11CarnegieMellonUniversity,Pittsburgh,PA, laszlo.jeni@ieee.org , ftorre@cs.cmu.edu 2UniversityofPittsburgh,Pittsburgh,PA, jeffcohn@cs.cmu.edu Abstract—Recognizingfacialactionunits(AUs)isimportantforsituationanalysisandautomatedvideoannotation.Previousworkhasemphasizedfacetrackingandregistrationandthechoiceoffeaturesclassiers.Relativelyneglectedistheeffectofimbalanceddataforactionunitdetection.Whilethemachinelearningcommunityhasbecomeawareoftheproblemofskeweddatafortrainingclassiers,littleattentionhasbeenpaidtohowskewmaybiasperformancemetrics.Toaddressthisquestion,weconductedexperimentsusingbothsimulatedclassiersandthreemajordatabasesthatdifferinsize,typeofFACScoding,anddegreeofskew.Weevaluatedinuenceofskewonboththresholdmetrics(Accuracy,F-score,Cohen'skappa,andKrippendorf'salpha)andrankmetrics(areaunderthereceiveroperatingcharacteristic(ROC)curveandprecision-recallcurve).WithexceptionofareaundertheROCcurve,allwereattenuatedbyskeweddistributions,inmanycases,dramaticallyso.WhileROCwasunaffectedbyskew,precision-recallcurvessuggestthatROCmaymaskpoorperformance.Ourndingssuggestthatskewisacriticalfactorinevaluatingperformancemetrics.Toavoidorminimizeskew-biasedestimatesofperformance,werecommendreportingskew-normalizedscoresalongwiththeobtainedones.I.INTRODUCTIONOureverydaycommunicationishighlyinuencedbytheemotionalinformationavailabletousfromotherpeople.Recognizingfacialexpressionisimportantforsituationanalysisandautomatedvideoannotation.Inthelastdecademanyapproacheshavebeenproposedforautomaticfacialexpressionrecognition[ 7 ],[ 29 ].Al-though,previousworkhasemphasizedfacetrackingandregistrationandthechoiceoffeatureclassiers,relativelyneglectedistheeffectofimbalanceddatawhenevaluatingactionunitdetection.Inthecaseoffacialexpressiondata,thesamplescanbeannotatedusingeitheremotion-speciedlabels(e.g.,happyorsad)oractionunits,asdenedbytheFacialActionCodingSystem(FACS)[ 10 ].Actionunitsareanatomicallydenedfacialactionsthatsinglyorincombinationscandescribenearlyallpossiblefacialexpressionsormovements.Actionunit(AU)detection,aswellasexpressiondetectionofwhichAUdetectionisasubset,isatypicalbinaryclassicationproblemwherethevastmajorityofexamplesarefromoneclass,butthepractitioneristypicallyinterestedintheminority(positive)class.Theproblemoflearningfromimbalanceddatasetsistwofold.Firstofall,fromtheperspectiveofclassiertraining,imbalanceintrainingdatadistributionoftencauseslearningalgorithmstoperformpoorlyontheminorityclass.Thisissuehasbeenwelladdressedinthemachinelearningliterature[ 4 ],[ 15 ],[ 27 ],[ 26 ],[ 8 ]Acommonsolutionistosamplethedatapriortotrainingtore-balancetheclassdistribution[ 2 ],[ 27 ].Analternativetosamplingistousecost-sensitivelearning.Thisapproachtargetstheproblemofskewbyapplyingdifferentcostmatricesthatdescribethecostsformisclassifyinganyparticulardatapoint[ 26 ],[ 8 ].Foramoredetailedsurveyontheproblemsee[ 16 ]andthereferencestherein.Relativelylittleattentionhasbeenpaidtohowskewmayspoilperformancemetrics.Facialexpressiondataistypicallyhighlyskewed.Imbalanceinthetestdatadistri-butionmightproducemisleadingconclusionswithcertainmetrics.Percentageagreement,referredtoasaccuracy,isespeciallyvulnerabletobiasfromskew.Whenbaserateislow,highaccuracycanresultevenwhenalternativemethodsrarelyifeveragree[ 12 ],[ 14 ].Agreementinthatcaseisabouttheverylargenumberofnegativecasesratherthantheveryfewpositiveones.Alternativemetricshavebeenproposedtoaddressthisissue[ 24 ],[ 15 ].Ferrietal.studiedtherelationshipbetweendifferentperformancemetricsandaddresstheproblemofrankcorrelationsbetweenthem[ 12 ].Howdoesskeweddatainuenceperformancemetricsforactionunitdetection?Toaddressthisquestion,weconductedexperimentsusingbothsimulatedclassiersandthreemajordatabasesthatincludebothposedandspontaneousfacialexpressionanddifferindatabasesize,typeofFACScoding[ 9 ],[ 10 ],anddegreeofskew.ThedatabaseswereCohn-Kanade[ 21 ],RU-FACS[ 13 ],andUNBC-McMasterPainArchive[ 22 ].Weincludedabroadrangeofmetricsthatincludedboththresholdmetrics(Accuracy,F1-score,Cohen'skappa,andKrippendorf'salpha)andrankmetrics(areaundertheROCcurve[ 11 ]andprecision-recallcurve).WithexceptionofareaundertheROCcurve,allwereattenuatedbyskeweddistributions;inmanycases,dramaticallyso.Alphaandkappawereaffectedbyskewineitherdirection;whereasF1-scorewasaffectedbyskewonlyinonedirection.While ROCwasunaffectedbyskew,precision-recallcurvescanre-vealdifferencesbetweenclassiers,becauseofthedifferentvisualrepresentationofthecurves.Verydifferentprecision-recallcanbeassociatedwithsameROC.Ourndingssuggestthatskewisacriticalfactorinevaluatingperformancemetrics.Metricsofclassierper-formancemayrevealmoreaboutskewthantheydoaboutactualperformance.Databasesthatareotherwiseidenticalwithrespecttointensityofactionunits,headpose,andsoonmaygiverisetoverydifferentmetricvaluesdependingonlyondifferencesinskew.Thisndinghasimplicationsfortestingclassierssoastoavoidorminimizeconfoundsandformeta-analysesofclassierperformance.Sensitivityofthethresholdmetricsforskeweddistributionscouldbereducedbybalancingthedistributionofdatasets.Thepaperisbuiltasfollows.Datasetsandtheirprop-ertiesarereviewedinSection2.TheoreticalcomponentsaredescribedinSection3.ExperimentalresultsontheeffectofimbalanceddataonperformancemetricsandAUclassicationaredetailedinSection4.Discussionandasummaryconcludethepaper(Section5).II.DATASETSFirst,wedescribethedatasets(SectionII.A-C).WethenreportndingswithrespecttoskewforeachAU(SectionII.D).Inoursimulationsweusedthreemajordatabasesthatincludebothposedandspontaneousfacialexpressionanddifferindatabasesize,typeofFACScoding,anddegreeofskew.ThedatabaseswereCohn-Kanade,RU-FACSandUNBC-McMasterPainArchive.A.Cohn-KanadeExtendedTheCohn-KanadeExtendedFacialExpression(CK+)Database[ 21 ]isanextensionoftheoriginalCohn-KanadeDatabase[ 18 ].Cohn-Kanadehasbeenwidelyusedtocom-paretheperformanceofdifferentmethodsofautomatedfacialexpressionanalysis.CK+includes593frontalimagesequencesofdirectedfacialactiontasks(i.e.,posedAUandAUcombinations)performedby123differentpartic-ipants.Faciallandmarks(68-pointmesh)weretrackedus-ingperson-specicactiveappearancemodels[ 28 ].Twenty-sevenactionunitsweremanuallycodedforpresenceorabsencebycertiedFACScoders.Forasubsetof118sequences,thesevenuniversalemotionexpressions(anger,contempt,disgust,fear,happy,sadandsurprise)plusneutralwerelabeled.Weusedall593sequencesforthecurrentstudy.B.McMasterPainArchiveThePainArchive[ 22 ]consistsoffacialexpressionsof129participantswhoweresufferingfromshoulderpain.Theparticipantsperformeddifferentactiveandpassivemotiontestswiththeiraffectedandunaffectedlimbsontwoseparateoccasions.Thedistributionhas200videosequenceswith48398framesfrom25participants.AlloftheframeswereFACScodedfor12AUbycertiedFACScodersandhaveframelevelpainscores,sequence-levelself-report,andobservermeasures.Faciallandmarks(66-pointmesh)weretrackedusingperson-specicactiveappearancemodels[ 28 ].C.RU-FACSDatabaseTheversionofRU-FACSavailabletousconsistedofunscripted(i.e.,spontaneous)facialbehaviorfrom34par-ticipants.Participantshadbeenrandomlyassignedtoeitherlieortellthetruthaboutanissueforwhichtheyhadstrongfeelings.Thescenarioinvolvesnaturalinteractionwithan-otherperson.AUsweremanuallycodedforeachvideoframe.Videofromveparticipantshadtobeexcludedduetoexcessivenoiseinthedigitizedvideo.Thus,videofrom29participantswasused.Faciallandmarksweretrackedusinga68-pointmeshusingsameAAMimplementation[ 3 ].D.ImbalanceintheDatasetsActionunitclassicationisatypicaltwo-classproblem.Thepositiveclassisthegivenactionunitthatwewanttodetect,andthenegativeclasscontainsalloftheotherexamples.Unlessdatabaseshavebeencontrivedtominimizeskew,skewisquitecommon.Mostfacialactionshaverelativelylowratesofoccurrence.Smilecontrols,actionsthatcounteracttheupwardpullofasmile(e.g.,AU14orAUTableIDATABASESTATISTICS.FORMOREDETAILS,SEETEXT. 15),occurlessthan3%ofthetimeeveninahighlysocialcontext[ 25 ].Thus,foractionunitdetection,thenumberofpositivetrainingexampleswilloftenbesmall,whichcanresultinlargeimbalancebetweenthepositiveandnegativeexamples.Whileskewintrainingsetscanbeadjustedbyunder-samplingnegativecases,skewintestsetsremains.Theimbalanceofthistypeofdatacanbedenedbytheskewratiobetweentheclasses:Skew=negativeexamples positiveexamples(1)Table I showstheskewratiosofactionunitsfromthethreedatasets.Inthesmall,posedCK+dataset,theaverageskewratioisaround30.Inthecaseoflarger,spontaneousdatasetstheskewratioisevenmoreextreme:about60inthePainArchiveandover80inRU-FACS.III.METHODSWetunedhighprecisionshape-basedAUclassiersineachdataset.DetailsofthemethodsarepresentedinSectionIII.A.Toevaluatetheeffectofskewontheclassiers,weusedabroadrangeofboththresholdandrankmetrics.ThesearedescribedinSectionIII.B.InSectionIII.Cwedescriberandomsamplingmethodstobalancethedistributionofthetestingpartitionofthedatasets.A.TrainingAUClassiersOurmethodcontainstwomainsteps.First,weestimate3Dlandmarkpositionsonfaceimagesusinga2D/3DAAMmethod[ 23 ].WedescribethedetailsofthistechniqueinSection III-A1 .Second,weremovetherigidtransformationfromtheacquired3DshapeandperformanSVM-basedbinaryclassicationonitusingthedifferentAUsastheclasslabels.WeshowthismethodinSection III-A2 and III-A3 .1)ActiveAppearanceModels:Asnotedabove,eachofthedatasetshadbeentrackedusingperson-specicAAM.AAMsaregenerativeparametricmodelsforfacealignment.A3Dshapemodelisdenedbya3Dmeshandinparticularthe3Dvertexlocationsofthemesh,calledlandmarkpoints.Considerthe3Dshapeasthecoordinatesof3Dverticesthatmakeupthemesh:x=(x1;y1;z1;:::;xM;yM;zM)T(2)or,x=(x1;:::;xM)T,wherexi=(xi;yi;zi)T.WehaveTsamples:fx(t)gTt=1.Weassumethat–apartfromscale,rotation,andtranslation–allsamplesfx(t)gTt=1canbeapproximatedbymeansofthelinearprincipalcomponentanalysis(PCA).Theinterestedreaderisreferredto[ 23 ]forthedetailsofthe2D/3DAAMalgorithm.2)ExtractedFeatures:Toregisterfaceimages,3Dstruc-turefrommotionrstwasestimatedusingthemethodofXiaoetal.[ 28 ].Wethenextractedthenormalized3Dshapeparametersbyremovingtherigidtransformation.Next,weperformedapersonalmeanshapenormalization[ 17 ].Wecalculatedanaverageshapeforeachsubject(thesocalledpersonalmeanshape)andcomputedthedifferencesbetweenthefeaturesoftheactualshapeandthefeaturesofthepersonalmeanshape.Thisstepremoveswithin-personvariation.3)SupportVectorMachineforAUDetection:Afterex-tractingthenormalized3Dshape,weperformedanSVM-basedbinary-classclassicationusingeachAUinturnasthepositiveclasslabels.NegativelabelswereallotherAU.SupportVectorMachines(SVMs)arepowerfulforbi-naryandmulti-classclassicationaswellasforregressionproblems.Theyarerobustagainstoutliers[ 1 ].Fortwo-classseparation,SVMestimatestheoptimalseparatinghyper-planebetweenthetwoclassesbymaximizingthemarginbetweenthehyper-planeandclosestpointsoftheclasses.Theclosestpointsoftheclassesarecalledsupportvectors.Theydeterminetheoptimalseparatinghyper-plane,whichliesathalfdistancebetweenthem.Wearegivensampleandlabelpairs(x(k);y(k))withx(k)2Rm,y(k)2f11g,andk=1;:::;K.Here,forclass1(class2)y(k)=1(y(k)=1).Assumefurtherthatwehaveasetoffeaturevectors(=[1:::M]):Rm!RM,whereMmightbeinnite.Thesupportvectorclassicationseekstominimizethecostfunctionminw;b;1 2wTw+CKXi=1i(3)y(k)(wT(x(k))+b)1i;i0(4)Weusedbinary-classclassicationforeachAU,wherethepositiveclasscontainsallshapeslabelledbythegivenAU,andthenegativeclasscontainseveryothershapes.Inallcases,weusedonlylinearclassiersandalsovariedtheregularizationparameterbyfactorsoftenfrom104to102.B.PerformanceMetricsInabinaryclassicationproblemthelabelsareeitherpositiveornegative.Thedecisionmadebytheclassiercanberepresentedasa22confusionmatrix.Thematrixhasfourcategories:Truepositives(TP)areexamplescorrectlylabeledaspositives.Falsepositives(FP)refertonegativeexamplesincorrectlylabeledaspositive.Truenegatives(TN)correspondtonegativescorrectlylabeledasnegativeandfalsenegatives(FN)refertopositiveexamplesincorrectlylabeledasnegative.Usingthesecategorieswecanderivetwoperformancemetrics:theprecision(P=TP TP+FP)andtherecall(R=TP TP+FN)valuesoftheclassier.Precisionis thefractionofrecognizedinstancesthatarerelevant,whilerecallisthefractionofrelevantinstancesthatareretrieved.Forthecomparisonweusedboththresholdmetrics(Ac-curacy,F1-score,Cohen'skappa,andKrippendorf'salpha)andrankmetrics(areaundertheROCcurveandprecision-recallcurve).1)ThresholdMetrics:ThethresholdmetricsusedinthispaperareAccuracy,F1-score,Cohen'skappa,andKrippen-dorf'salpha.Thesemetricshaveathresholdlevel,whereexamplesabovethethresholdarepredictedaspositiveandtherestasnegative.Forthesemetrics,itisnotimportanthowcloseapredictionistothelevel,onlyifitisaboveorbelowthreshold.Accuracyisthepercentageofthecorrectlyclassiedpositiveandnegativeexamples:Acc=TP+TN TP+FP+TN+FN(5)Accuracyisawidelyusedmetricformeasuringtheperformanceofaclassier,however,whenthepriorproba-bilitiesoftheclassesareverydifferent,thismetriccanbemisleading.AbetterchoiceisF1-score,whichcanbeinterpretedasaweightedaverageoftheprecisionandrecallvalues:F1=2PR P+R(6)Cohen'skappaisacoefcientdevelopedtomeasureagreementamongobservers[ 6 ].Itshowstheobservedagreementnormalizedtotheagreementbychance:K=PObsPChance 1PChance(7)Krippendorff's -reliabilitymeasurestheobserveddis-agreementnormalizedtotheobserveddisagreement[ 19 ],[ 20 ]: =1DObs DChance(8)2)RankMetrics:Therankmetricsdependonlyontheorderingofthecases,nottheactualpredictedvalues.Aslongasorderingispreserved,itmakesnodifferencewhetherpredictedvaluesfallbetweendifferentintervals.Thesemet-ricsmeasurehowwellthepositivecasesareorderedbeforenegativecasesandcanbeviewedasasummaryofmodelperformanceacrossallpossiblethresholds.TherankmetricsweuseareareaundertheROCcurve(AUC-ROC)andareaunderPrecision-Recallcurve(AUC-PR).TheROCcurvedepictsthetruepositiverateasthefunctionofthefalsepositiverate,whilethePrecision-Recallcurveshowstheprecisionasthefunctionofrecall.RecallisthesameasTPR,whereasPrecisionmeasuresthatfractionofexamplesclassiedaspositivethataretrulypositive.C.SkewNormalizationusingRandomSamplingDifferentformsofre-samplingsuchasrandomover-andunder-samplingcanbeusedtobalancetheskeweddistribu-tionofthetestpartitionsofthedatasetbeforecalculatingtheperformancemetrics.Randomunder-samplingtriestobalancetheclassdis-tributionthroughtherandomeliminationofmajorityclassexamples.Themajordrawbackofrandomunder-samplingisthatthismethodcandiscardexamplesthatcouldbeimportantfortheperformancemetric.Inthispaperweusedrandomunder-samplingwithav-eraging:rst,weunder-samplethemajorityclass,thencalculatetheperformancemetrics.Werepeattheprocessinthefunctionoftheskewpresentinthedata.IV.EXPERIMENTSWeexecutedanumberofevaluationstojudgethein-uenceoftheskeweddistributionsontheperformancemetrics.Studiesconcern(i)simulatedclassierswithgivenrelativemisclassicationrates,(ii)theeffectoftheskeweddistributionsonperformancescoresusingdifferentdatabasesforAUclassication.A.ExperimentsonSimulatedClassiersInthisexperimentwesimulatedbinaryclassierswithdifferentpropertiestounderstandtheeffectoftheskewontheperformancemetricsbetter.Theclassiersweredifferentintherelativemisclassicationrate:axedpercentageofthepositive(andnegative)examplesweremisclassiedinpro-portiontothenumberofpositive(andnegative)examples.Forexample,inthe”5%case”5%ofthepositiveexampleswerelabelledasfalsenegatives(FN),and5%ofthenegativeexampleswerelabelledasfalsepositives(FP).Inthecaseofthethresholdmetrics,thescorewascalculatedfromconfusionmatrices,whiletherankmetricswerecalculatedbydrawingrandomsamplesfromGaussiandistributionrepresentingthedecisionvaluesoftheclassi-ers.Fig. 1 depictsthedifferentmetricscoresinthefunctionoftheskewratio.Skew=1representsafullybalanceddataset,Skew�1showsthatthenegativesamplesarethemajority,andtheSkew1valuesrepresentpositivesampledominanceinthedistribution.WiththeexceptionofareaundertheROCcurve,allmetricsareattenuatedbyskeweddistributions.Alphaandkappaareaffectedbyskewineitherdirection;whereasF1-scoreisaffectedbyskewinonedirectiononly.Randomperformanceinthealphaandkappaspacesisequivalentwiththe0value,butintheF1-spaceitchangesasafunctionofskew:inthebalancedcase(Skew=1)isassociatedwith0.5scoreanddropsexponentiallyasskewincreases.Itisimportanttonote,thateventhebest(1%errorrate)classier'sperformancedropssignicantlyinthehighskewratiopartofthegraph(Skew=50).Thisimbalancerange 0.02 0.024 0.333 0.05 0.1 1 10 20 30 40 50 0 0.2 0.4 0.6 0.8 1 SkewAccuracy 1% 5% 10% 20% (a) 0.02 0.024 0.333 0.05 0.1 1 10 20 30 40 50 0 0.2 0.4 0.6 0.8 1SkewCohen's kappa 1% 5% 10% 20% (b) 0.02 0.024 0.333 0.05 0.1 1 10 20 30 40 50 0 0.2 0.4 0.6 0.8 1SkewAUC ROC 1% 5% 10% 20% (c) 0.02 0.024 0.333 0.05 0.1 1 10 20 30 40 50 0 0.2 0.4 0.6 0.8 1SkewF1 score 1% 5% 10% 20% (d) 0.02 0.024 0.333 0.05 0.1 1 10 20 30 40 50 0 0.2 0.4 0.6 0.8 1SkewKrippendorff's alpha 1% 5% 10% 20% (e) 0.02 0.024 0.333 0.05 0.1 1 10 20 30 40 50 0 0.2 0.4 0.6 0.8 1SkewAUC PR 1% 5% 10% 20% (f)Figure1.Thebehaviourofdifferentmetricsusingsimulatedclassiers.Thehorizontalaxisdepictstheskewratio(Skew=Negativeexamples Positiveexamples,whiletheverticalaxisshowsthegivenmetricscore.Themetricsare(a):Accuracy,(b):Cohen'skappa,(c)AreaUnderROC,(d)F1score,(e)Krippendorff'salpha,(f)AreaUnderPRCurve.Thedifferentlinesshowtherelativemisclassicationratesofthesimulatedclassiers.isequalorevenbelowtheskewratiopresentinspontaneousfacialbehaviourdatasets(seeTable I ).B.ExperimentsonRealdataInthisexperimentwestudiedtheeffectofskewedAUdistributionsontheCK+,McMasterPainArchiveandRU-FACS.InthecaseofCK+datasetweusedleave-one-subject-outcross-validationtomaximizethedataavailableinthedatabase.IntheRU-FACSandPaindatasetforeachAUwedividedthedataintoatrainingandtestingsetinawaythattheskewratioofthetwosetswassimilar.WecalculatedF1score,kappa,alphameasuresandareaunderROCandPRcurves.Tables II - III showthesemeasuresinthecolumnslabelled'original'.Toproceed,werepeatedthesameprocedure,butthistimewebalancedthedistributionoftheclassesinthetestingsetusingrandomunder-samplingandaveraging.Theperformancescoresaredepictedinthe'normalized'columnsofTables II - III .Fromtheresults,wecandrawseveralobservationsasfollows.Firstofall,byexaminingthescoresintheimbalancedcaseoftheCK+dataset,wefoundthattheseperformancesaresimilartoothershapebasedmethodsintheliterature[ 5 ],[ 17 ].Second,bycomparingtheskewnormalizedresultstotheimbalancedones,wenoticedthat(excepttheareaunderROCcurve)allscoresimproved.TheaverageF1scoreincreasedfrom0.45to0.77inthecaseofCK+,from0.23to0.68inthecaseofRU-FACSandfrom0.17to0.65onthePaindata.ThedifferencebetweenthescoresisthesmallestinthecaseoftheCK+data,becausethisisthesmallestdatasetwiththesmallestskewratio(around20)amongthethree.Theimprovementissmallerinkappaandalpha:thesemeasuresaresomewhatmorestrictandabittoleranttothepriordistributionsoftheclasses.ThedifferencesinthecaseoftheareaunderPRcurvearecomparabletotheF1scoreimprovements.Third,whileROCwasunaffectedbyskew,theprecision-recallcurvessuggestthatROCmaymaskpoorperformanceinsomecases.V.DISCUSSIONANDSUMMARYInthepresentwork,weaddressedthequestionhowdoimbalanceddatasetsinuenceperformancemetrics.Weconductedstudiesusingthreemajordatabasesthatincludebothposedandspontaneousfacialexpressionanddifferindatabasesize,typeofFACScoding,anddegreeofimbalance.ThedatabaseswereCohn-Kanade,RU-FACS,andMcMasterPainArchive.Weincludedmetricsusedin facialbehaviouranalysisplussomeothers:weincludedboththresholdmetrics(Accuracy,F1-score,Cohen'skappa,andKrippendorf'salpha)andrankmetrics(areaundertheROCcurveandprecision-recallcurve).Weusedavarietyofevaluationstostudytheinuenceofimbalanceddistributiononperformancemetrics.WeusedsimulatedclassiersandbinarySVMstrainedonexpertannotateddatasetsaswell.WediscoveredthatwithexceptionofareaundertheROCcurve,allperformancemetricswereattenuatedbyimbalanceddistributions;inmanycases,dramaticallyso.Alphaandkappameasureswereaffectedbyskewineitherdirection;whereasF1-scorewasaffectedbyskewonlyinonedirection.WhileROCwasunaffectedbyskew,precision-recallcurvessuggestthatROCmaymaskpoorperformance.Metricsofclassierperformancemayrevealmoreaboutskewthantheydoaboutactualperformance.Databasesthatareotherwiseidenticalwithrespecttointensityofactionunits,headpose,andsoonmaygiverisetoverydifferentmetricvaluesdependingonlyondifferencesinskew.Toavoidorminimizebiasedestimatesofperformancemetrics,werecommendthatinvestigatorsreportbothobtainedper-formancemetricsandskew-normalizedscores.Alternatively,reportboththeobtainedscoresandthedegreeofskewindatabases 1 .Intheseways,classierscanbecomparedacross1Codetocomputeskew-normalizedscoresforallofthemetricsconsideredaboveandvisualizationsisavailablefrom http://www.pitt.edu/jeffcohn/skew/ TableIIIPERFORMANCESCORESONCOHN-KANADEEXTENDED. TableIIPERFORMANCESCORESFORTHEORIGINALANDTHESkew=1NORMALIZEDVERSIONOFUNBC-MCMASTERPAINARCHIVEANDRU-FACS. databasesfreeofconfoundsintroducedbyskew.VI.ACKNOWLEDGMENTSResearchreportedinthispublicationwassupportedinpartbytheNationalInstituteofMentalHealthoftheNationalInstitutesofHealthunderAwardNumberMH096951.ThecontentissolelytheresponsibilityoftheauthorsanddoesnotnecessarilyrepresenttheofcialviewsoftheNationalInstitutesofHealth.REFERENCES[1]S.Abe:Supportvectormachinesforpatternclassication.Springer,(2010) 3 [2]R.Akbani,S.Kwek,N.Japkowicz.Applyingsupportvectormachinestoimbalanceddatasets.MachineLearning:ECML2004.SpringerBerlinHeidelberg,39–50.(2004) 1 [3]A.B.Ashraf,S.Lucey,J.F.Cohn,T.Chen,Z.Ambadar,K.M.Prkachin,P.E.Solomon:ThepainfulfacePainexpressionrecognitionusingactiveappearancemodels,ImageandVisionComputing27(12),1788–1796.(2009) 2 [4]N.V.Chawla,N.Japkowicz,A.Kotcz.Editorial:specialissueonlearningfromimbalanceddatasets.SIGKDDExplor.Newsl.6,1(June2004),1-6.(2004) 1 [5]S.W.Chew,P.J.Lucey,S.Lucey,J.Saragih,J.F.CohnS.Sridharan.Person-independentfacialexpressiondetectionusingconstrainedlocalmodels.InProceedingsofFG2011FacialExpressionRecognitionandAnalysisChallenge,SantaBarbara,CA,(2011) 5 [6]J.Cohen,Acoefcientofagreementfornominalscales.Edu-cationalandpsychologicalmeasurement20(1),37–46.(1960) 4 [7]J.F.CohnandF.DelaTorre.Automatedfaceanalysisforaffectivecomputing.InHandbookofAffectiveComputing,R.A.Calvo,S.K.D'Mello,J.Gratch,andA.Kappas,Eds.,edNewYork,NY:Oxford,Inpress. 1 [8]T.Eitrich,B.Lang.Efcientoptimizationofsupportvectormachinelearningparametersforunbalanceddatasets.Journalofcomputationalandappliedmathematics196(2).425–436.(2006) 1 [9]P.Ekman,W.Friesen:Facialactioncodingsystem:Atech-niqueforthemeasurementoffacialmovement.ConsultingPsychologistsPress,PaloAlto(1978) 1 [10]P.Ekman,W.Friesen,J.Hager:Facialactioncodingsystem:Researchnexus.NetworkResearchInformation,SaltLakeCity,UT(2002) 1 [11]T.Fawcett.AnintroductiontoROCanalysis.PatternRecog-nitionLetters,2006.ElsevierScienceInc.,861–874(2006) 1 [12]C.Ferri,J.Hernndez-Orallo,R.Modroiu.Anexperimentalcomparisonofperformancemeasuresforclassication.PatternRecognitionLetters30(1).27–38.(2009) 1 [13]M.Frank,J.Movellan,M.Bartlett,G.Littleworth.RU-FACS-1database,MachinePerceptionLaboratory,U.C.SanDiego 1 [14]V.Garcia,R.A.Mollineda,J.S.Sanchez.Indexofbalancedaccuracy:Aperformancemeasureforskewedclassdistribu-tions.PatternRecognitionandImageAnalysis.SpringerBerlinHeidelberg,441–448.(2009) 1 [15]V.Garcia,R.A.Mollineda,J.S.Sanchez.Theoreticalanal-ysisofaperformancemeasureforimbalanceddata.PatternRecognition(ICPR),201020thInternationalConferenceon.IEEE,(2010) 1 [16]H.He,E.A.Garcia.Learningfromimbalanceddata.Knowl-edgeandDataEngineering,IEEETransactionson21(9),1263–1284.(2009) 1 [17]L.A.Jeni,A.Lorincz,T.Nagy,Zs.Palotai,J.Sebok,Z.Szabo,D.Takacs,3Dshapeestimationinvideosequencesprovideshighprecisionevaluationoffacialexpressions,ImageandVisionComputing,30(10),785-79521February(2012) 3 , 5 [18]T.Kanade,J.F.Cohn,Y.Tian:Comprehensivedatabaseforfacialexpressionanalysis.ProceedingsoftheFourthIEEEInternationalConferenceonAutomaticFaceandGestureRecognition(FG'00),Grenoble,France,46–53.(2000) 2 [19]K.Krippendorff.Estimatingthereliability,systematicerrorandrandomerrorofintervaldata.EducationalandPsycholog-icalMeasurement,30,61-70.(1970) 4 [20]K.Krippendorff.Contentanalysis:Anintroductiontoitsmethodology(2nded.).ThousandOaks,CA:Sage(2004) 4 [21]P.Lucey,J.F.Cohn,T.Kanade,J.Saragih,Z.Ambadar,andI.Matthews,TheExtendedCohn-KanadeDataset(CK+):Acompletedatasetforactionunitandemotionspeciedexpression,In:3rdIEEEWorkshoponCVPRforHumanCommunicativeBehaviorAnalysis(2010) 1 , 2 [22]P.Lucey,J.F.Cohn,K.M.Prkachin,P.E.Solomon,I.Matthews.Painfuldata:TheUNBC-McMastershoulderpainexpressionarchivedatabase.IEEEInternationalConferenceonAutomaticFaceandGestureRecognitionandWorkshops(FG2011),57–64,21-25March(2011) 1 , 2 [23]I.MatthewsandS.Baker.Activeappearancemodelsrevisited.Int.J.Comp.Vision,60(2):135–164,(2004) 3 [24]R.Ranawana,V.Palade.OptimizedPrecision-Anewmeasureforclassierperformanceevaluation.EvolutionaryComputa-tion,2006.CEC2006.IEEECongresson.IEEE,(2006) 1 [25]M.A.Sayette,K.G.Creswell,J.D.Dimoff,C.E.Fairbairn,J.F.Cohn,B.W.Heckman,T.R.Kirchner,J.M.Levine,R.L.Moreland:Alcoholandgroupformation:Amultimodalinvestigationoftheeffectsofalcoholonemotionandsocialbonding.PsychologicalScience23(8).869–878.(2012) 3 [26]Y.Tang,Y.-Q.Zhang,N.V.Chawla,S.Krasser.SVMsmod-elingforhighlyimbalancedclassication.Systems,Man,andCybernetics,PartB:Cybernetics,IEEETransactionson39(1).281–288.(2009) 1 [27]J.VanHulse,T.M.Khoshgoftaar,A.Napolitano.Experimen-talperspectivesonlearningfromimbalanceddata.Proceedingsofthe24thinternationalconferenceonMachinelearning.ACM,(2007) 1 [28]J.Xiao,S.Baker,I.Matthews,T.Kanade:Real-timecom-bined2D+3Dactiveappearancemodels.Proceedingsofthe2004IEEEcomputersocietyconferenceonComputervisionandpatternrecognition,Washington,D.C.,USA,535–542,(2004) 2 , 3 [29]Z.Zeng,M.Pantic,G.I.Roisman,andT.S.Huang.ASurveyofAffectRecognitionMethods:Audio,Visual,andSponta-neousExpressions.IEEETransactionsonPatternAnalysisandMachineIntelligence,31(1),39–58.(2009) 1

Related Contents


Next Show more