biggiodieeunicait Dept of Electrical and Electronic Engineering University of Cagliari Piazza dArmi 09123 Cagliari Italy and Blaine Nelson blainenelsonwsiiunituebingende Dept of Mathematics and Natural Sciences EberhardKarlsUniversitat Tubingen Sand ID: 54370
Download Pdf The PPT/PDF document "JMLR Workshop and Conference Proceedings..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
SupportVectorMachinesUnderAdversarialLabelNoiseeectofoutliersintrainingdata.Similarlytothepreviouscase,thismethodisalsobasedonadierentdenitionofthelossfunction,whichyieldsanon-convexoptimizationproblem,approximatelysolvedthroughaconvexrelaxation.Fromatheoreticalstandpoint,robustnesswasstudiedinthecontextofbothclassicalstatisticsandmachinelearning.Therobuststatisticsapproach(Huber,1981;Hampeletal.,1986;Maronnaetal.,2006)hasstudiedgeneralpropertiesofstatisticalestimatorsunderthechangeoftheunderlyingdistributions.Awell-knowninstrumentofsuchanalysisistheso-calledin uencefunction.Therobustnessissuesofmargin-basedlearningmethodshavebeenstudiedbyChristmannandSteinwart(2004).Inparticular,theystudiedthebehaviorofSVM-likealgorithmsundersmallperturbationsoftrainingdataandprovedthatundersomeconditions,thein uencefunctionofSVMscanbebounded.3.LabelNoiseRobustSVMsInthissectionweintroduceourapproach,LabelNoiserobustSVMs(LN-robustSVMs),toimproveSVMs'robustnesstolabelnoiseintrainingdata.Wepointoutthat,withrespecttopreviousworks,thisapproachdoesnotaectthecomputationalcomplexityofthestandardSVMlearningalgorithm,asitonlyyieldsasimplekernelmatrixcorrection.Labelnoisecanbeexplicitlymodelledbyassumingthatthelabelsinthetrainingsetfxi;yigni=12Xf1;+1gcanbe ipped.Tothisend,werstintroduceasetofrandomvariables"i2f0;1g,i=1;:::;n,whichrepresentwhetherthecorrespondinglabelyiis ipped(1)ornot(0).Accordingly,wethenreplaceyiwithy0i=yi(12"i)suchthatif"i=1,y0i=yi(label ip),whiley0i=yiotherwise.InthedualSVMproblem(Problem3)theclasslabelssolelyaectthematrixQ=Kyy.Inparticular,takingintoaccountlabelnoise,wecanwriteitselementsasQij=yiyjK(xi;xj)(12"i)(12"j):(4)Notethat,intheabsenceofnoise"i=0,i=1;:::;n,and,thus,theelementsofQaresimplyQij=yiyjK(xi;xj),asinthestandardSVMformulation.Ifweassumethateverylabelisindependently ippedwiththesameprobability,then"i,i=1;:::;n,areni.i.d.Booleanrandomvariables,whosemeanissimplytheprobabilityof"i=1,andwhosevarianceis2=(1).Withinthisassumption,wecancomputetheexpectedvalueofQfromEq.4,whichisgivenbyE"[Qij]=(yiyjK(xi;xj)(142);ifi6=j;yiyjK(xi;xj);otherwise.(5)Now,wecanusetheexpectedvalueofQ(whichisstillapositivesemi-denitekernelmatrix)tosolvetheSVMproblem.ThisshouldreasonablyimprovetherobustnessofthelearnedSVMtolabel ipnoise.Theproposedmethodonlyyieldsakernelmatrixcorrection(Eq.5),anddoesnotmodifythestandardSVMproblem.However,itisanheuristicmethodanditisthusnotguaranteedtofulllanyoptimalitycriterion(e.g.,beingoptimalundertheconsiderednoisemodel).Thesolutionissymmetricwithrespectto=0:5,i.e.,thevaluesobtainedfor=and=1arethesame,andareexactlythesameasthestandardSVMsolutionwhen101 BiggioNelsonLaskoviseither0or1(asthecorrespondingkernelcorrectioniszero).Moreover,theequationsofwandbobtainedbysolvingthestandardSVMproblemhavetobemultipliedby12(aswetaketheirexpectationsoverlabelnoise).Thus,when0:5,wandbaremultipliedbyanegativefactor.Thisrepresentsthefactthatmorethanhalfofthetrainingpointsareassumedtohaveawronglabel,and,thus,thedecisionregionsareinverted.Forinstance,when=1,thesolutionisexactlygivenbywandb(beingwandbthestandardSVMsolution):weareinfactassumingthatallsamplesarewronglylabelledinthetrainingset,and,consequently,thehyperplaneobtainedbythestandardSVMresultsrotatedby180.Notelastlythat,if=0:5,w=0.Thisisadegeneratecaseinwhichlabelsintrainingdataareassumedtobecompletelyrandom,sotheSVMisnotabletodetermine,onaverage,anymeaningfuldecisionhyperplane.3.1.DualproblemandequalizationWearenowinapositiontobetteranalyzethechangeinducedbythekernelcorrectionofEq.5intheSVMdualproblem.Indeed,thedualproblemcanbere-writtenasmin1 2(QM)1s.t.0iC;i=1;:::;n;nXi=1iyi=0;(6)wheretheelementsMijofMaregivenbyMij=(1;ifi=j1S;otherwise,(7)whereweuseS=42tosimplifynotation.ThematrixMcanbefurtherdecomposedasM=(1S)1nn+SInn,where1nnisannmatrixwhoseelementsareallones,andInnisthennidentitymatrix.SubstitutingthisdecompositionofMintoProblem6,addingandsubtractingSPni=1ifromtheLagrangian,anddividingitby1S,yieldsthefollowing(equivalent)dualproblem:min1 2Q1+S 1S1 2(QInn)1s.t.0iC;i=1;:::;n;nXi=1iyi=0;(8)wheretheonlydierencewiththestandardSVMformulationisgivenbyanadditionaltermweightedbyS 1S.Thisrevealssomeinterestinginsightsabouttheeectoftheproposedkernelcorrection.First,notethatasincreasesfrom0to0:5,S 1Sapproachesinnity,namely,thevaluesareonlydeterminedbyminimizingthelatterterminProblem8.Second,thistermdoesnotdependontheclasslabels,asitonlyinvolvesandthediagonalofQ(whichisindeedequaltothediagonalofK).Theaboveobservationshighlightthat,as102 BiggioNelsonLaskovandthusweresorttoaheuristicapproachwhichhasshowntobequiteeectiveonoursetofexperiments(seeSect.6).Theideabehindtheadversariallabel ipattackisrstto iplabelsofsampleswithnon-uniformprobabilities,dependingonhowwelltheyareclassiedbytheSVMlearnedontheuntaintedtrainingset;and,second,torepeatthisprocessanumberoftimes,eventuallyretainingthelabel ipswhichmaximallydecreasedperformance.Inparticular,weincreasetheprobabilityof ippinglabelsofsampleswhichareclassiedwithveryhighcondence(i.e.,non-supportvectors),anddecreasetheproba-bilityof ippinglabelsofsupportvectorsanderrorvectors(inverselyproportionaltotheirvalue).Thereasonisthattheformer(mainly,thenon-supportvectors)aremorelikelytobecomesupportvectorsorerrorvectorswhentheSVMislearnedonthetaintedtrainingset,and,consequently,thedecisionhyperplanewillbeclosertothem.Thiswillre ectaconsiderablechangeintheSVMsolution,and,potentially,initsclassicationaccuracy.Furthermore,thelabelsofsamplesindierentclassescanbe ippedinacorrelatedway,toforcethehyperplanetorotateasmuchaspossible.Tothisend,onecandrawarandomhyperplanewrnd,brndinfeaturespace,andfurtherincreasetheprobabilityof ippingthelabelofapositivesamplex+(respectively,anegativeonex),ifwrndx++brnd0(wrndx+brnd0).WeimplementedtheabovedescribedattackasAlgorithm4,usingtwoweightingparameters1and2,setto0:1(basedonsomepreliminaryexperi-mentalobservations,wefoundthatthesevaluesachievedgoodresults).Asimpleexampleofapplicationofthisattackstrategyisreportedinthenextsection.5.ToyexampleWepresenthereasimpletoyexampletodemonstratetheadversariallabel- ippingattack,andhowthekernelcorrectionproposedinSect.3caneectivelycounteractbothrandomandadversariallabel ips.Wegenerateatwo-dimensionaldatasetof100samples,wheresamplesofclassz2f1;+1garedrawnfromaNormaldistributionwithmean[z;0]]TJ/;ø 1;.90; T; 13;.685; 0 T; [0;and(diagonal)covariancematrixequalto1 2I.AnSVMwithlinearkernelislearnedonthis(untainted)trainingset,asdepictedinFig.1(rstplotfromleftinthetoprow).Then,we iplabelsof10samplesusingtheadversariallabel ipattackdescribedintheprevioussection.NotefromFig.1(secondtofourthplotfromleftinthetoprow,label ipsarehighlightedwithgreencircles)that:(1)theadversariallabel ipsmainlyaectsampleswhicharefartherawayfromtheuntaintedSVMdecisionboundary,and(2)thecorrelationimposedbetweenlabel ipsofsamplesofdierentclassesinducesasubstantialchangeinthetaintedSVMdecisionboundary(secondplotfromleftinthetoprow).Besidesthis,notehowtheSVMslearnedusing=0:1and=0:51(thirdandfourthplotinthetoprow)areabletocompensatefortheadversariallabel ips,althoughnotcompletely.TobetterunderstandofthisbehaviorandconrmthecorrectnessoftheobservationsinSect.3.1,wealsoplotthevaluesofstandardSVMsandoftheproposedLN-robustSVMsagainstthescores(i.e.,thedistancesfromthehyperplane)assignedbyeachSVMtoeachtrainingsample(seeFig.1,bottomrow).ThemeanandvarianceofthevaluesforeachSVMarealsoreported.Asexpected,thevarianceofthevaluesoftheLN-robustSVMsdecreaseswithrespecttothestandardSVMs,anddecreasesmoreasapproaches 1.Fromnowon,with=0:5wewillimplicitlyassume=0:5,with0butsmallenough(e.g.,0.001),sothatwdoesnotdegenerateto0.104 SupportVectorMachinesUnderAdversarialLabelNoise Algorithm1Adversariallabel ipattack. Input:theuntaintedtrainingdataD=fxi;yigni=1,theregularizationparameterC(andthekernel'sparameters,ifany),thenumberoflabel ipsL,thenumberofrepetitionsR,andtheweightingparameters1and2.Output:thetaintedlabelsy01;:::;y0n.1:(,b) trainanSVMonD2:fori=1;:::;n,dosi yi[Pnj=1yjjK(xi;xj)+b],endfor3:normalizescores(s1;:::;sn)in[0;1],dividingbymax(s1;:::;sn)4:(rnd,brnd) generatearandomSVM(drawn+1numbersfromauniformdistribution)5:fori=1;:::;n,doqi yi[Pnj=1yjrndjK(xi;xj)+brnd],endfor6:normalizescores(q1;:::;qn)in[0;1],dividingbymax(q1;:::;qn)7:fori=1;:::;n,dovi i=C1si2qi,endfor8:(k1;:::;kn) sort(v1;:::;vn)inascendingorder,andreturnthecorrespondingindexes9:(y01;:::;y0n) (y1;:::;yn)10:fori=1;:::;L,y0ki=yki,endfor11:trainanSVMonfxi;y0igni=112:estimateitstrainingerroronD13:repeatRtimesfrompoint4,andreturnthesetoflabelsy01;:::;y0nwhichyieldedthemaximumtrainingerror.14:returny01;:::;y0n 0:5.ThisconrmsthatthesolutionoftheLN-robustSVMisexpectedtobelesssparsethanthestandardSVM,andthus,lesssensitivetooutliersintrainingdata.Beforeconcludingthissection,weshowthattheproposedLN-robustSVMcanbeeectiveagainstrandomlabel ipsaswell.Tothisaim,weconductasimplearticialexperimentsimilartothepreviouscase.WeconsiderSVMswithlinearkernel,andeachclasstobenormallydistributedwithmean[z;0;:::;0]and(diagonal)covariancematrixequalto1 2I.However,thistimeweconsider300featuresand400trainingsamples,sinceweneedahigherfeaturestosamplesratiofortherandomlabel ipattacktobeeective.Notethattheoptimal(Bayes)classierinthiscaseissimplygivenbyw=[1;0;:::;0],andb=0.Wevarythepercentageofrandomlabel ipsinthetrainingsetupto40%,andplotthecorrespondingtestingaccuracy(evaluatedonaseparateuntaintedtestsetof1;000samples).Theresultsareaveragedover5repetitions,andreportedinFig.2,forfourdierentvaluesoftheregularizationparameterC:0:1;1;10;100.Notehow,inthiscase(andforallvaluesofC)theLN-robustSVMisabletosignicantlyoutperformthestandardSVM(inparticularwhen=0:5),andthat,surprisingly,thisdoesnotcauseadecreaseoftheclassicationaccuracyattainedontheuntainteddataset(i.e.,whenthepercentageof ippedlabelsiszero).Notably,thishighlightsthatthereneednotnecessarilybeatrade-obetweenaccuracyonuntainteddataandrobustnesstoattacks.105 BiggioNelsonLaskov Figure1:(toprow)StandardSVMtrainedonuntaintedandtainteddata(rstandsecondplot,respectively),robustSVMwith=0:1and=0:5trainedonuntainteddata(thirdandfourthplot,respectively);(bottomrow)valuesofeachtrainingsampleversusitsdistancetothehyperplaneg(x),correspondingtotheSVManddatashownintheaboveplots.Meanandvarianceofthevaluesarealsoreported.Dataistaintedbyperforming10adversariallabel ips,highlightedwithgreencircles.ThesupportvectorsofeachSVMarecircledinblack. Figure2:Classicationaccuracyonarticialnormaldata(untainted)fortheSVMandtheLN-robustSVMswithlinearkernel.LN-robustSVMsweretrainedwith=0:05,=0:1and=0:5.Resultsareshownfordierentpercentagesofrandomlabel ipsintrainingdata,anddierentvaluesoftheregularizationparameterC.6.ExperimentsWereportexperimentalresultstoempiricallyvalidatethesoundnessoftheproposedap-proach.Weconsideranumberofrealdatasets,andcomparetheLN-robustSVMtothestandardSVMlearningalgorithm,eitherwithlinearorradialbasisfunction(RBF)kernels,underrandomandadversariallabel ips.Wereporttheclassicationaccuracyattained106 SupportVectorMachinesUnderAdversarialLabelNoisebyeachclassieronuntaintedtestdata,asafunctionofthepercentageoflabel ipsintrainingdata;themoregracefullytheperformancedecreases,themorerobusttheclassieris.Datasets.Wedownloaded7two-classdatasetsfromLibSVMandUCIrepositories,withfeaturevaluesalreadyscaledin[1;+1]2(ChangandLin,2001;AsuncionandNewman,2007).TheircharacteristicsaresummarizedinTable1.Withintheseexperiments,everydatasetwasrandomlysplit5timesintodierenttraining(TR)andtesting(TS)setpairs,respectivelywith60%and40%ofsampleseach.Theresultswerethenaveragedoverthese5trials. Name #samples #features Breast-cancer 683 10 Australian 690 14 Diabetes 768 8 Fourclass 862 2 Heart 270 13 Ionosphere 351 34 Sonar 208 60 Table1:MaincharacteristicsofthedatasetsSetup.WeconsideredfourdierentvaluesoftheregularizationparameterC,namely,C=0:1,1,10,100,forboththelinearandRBFkernels.Moreover,whentheRBFkernelwasused,foranyxedCvalue,theparameter wasselectedamongthevaluesf0:01;0:1;1;10;100gbyperforminga5-foldcrossvalidationontrainingdata.Ineachplot,wereporttheperformanceofthestandardSVMandofthreeLN-robustSVMsrespectivelytrainedwith=0:05,0:1,0:5.Whentheadversariallabel ipattackisconsidered,thelabelstobe ippedaredeterminedusingthestandardSVMsolution.However,wealsonotedthattherewasnotanyrelevantdierenceintheresultswhenthesameattackwascomputedusingthesolutionsoftheLN-robustSVMs.Results.ResultsforSVMswiththelinearkernelagainstadversariallabel ipsandrandomlabel ipsarerespectivelyreportedinFig.3and4.Duetolackofspace,weomitresultsfortheRBFkernel,whichexhibitsimilarbehaviorandleadtosimilarconclusions.First,notethat,asexpected,adversariallabel ipsgenerallydecreasetheperformancewithfewer ipsthanrandom ipping.AsthestandardSVMisnaturallysomewhatrobusttorandomlabelnoise(seeFig.4),theresilienceoftheLN-robustSVMismostpronouncedforadversariallabel ipsalthoughitalsogenerallyoutperformsthestandardSVMwithrandom ips.Second,forlowvaluesofC(i.e.,0:1and1),theLN-robustSVMdoesnotgenerallyimprovetheperformanceoverthestandardSVM;indeed,sometimesitisevenlessrobusttolabel ips(see,e.g.,the\diabetes"datasetunderadversariallabel ips,Fig.3).Ontheotherhand,theLN-robustSVMcansignicantlyimprovetherobustnesswhenCisrelativelyhigh(see,e.g.,\australian",\breast-cancer",\fourclass",\heart"and\ionosphere").ThereasonforthisisthatwhentheregularizationparameterCishigh,theSVMtendstondahard-marginsolutionwhichisclearlymoresensitivetolabelnoise,and,inthiscase, 2.http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html107 BiggioNelsonLaskov Figure4:Randomlabel ipattackagainstSVMandLN-robustSVMs(with=0:05;0:1;0:5)withlinearkernel,fordierentvaluesofC,andpercentageofnoise.110 SupportVectorMachinesUnderAdversarialLabelNoise8.ConclusionsandfutureworksThroughoutthispaper,wehaveinvestigatedtherobustnessofSVMsunderadversariallabelnoiseandproposedamethodtoimproveitbasedonasimplekernelmatrixcorrection.Weshowedtheeectivenessoftheproposedapproachonseveralarticialandrealdatasets.WeempiricallyobservedthatourmethodleadstoequalizationofvaluesinSVMs,whichintuitivelyhedgesthein uenceofindividualpointsandleadstoamorerobustestimator.Ourexperimentalresultssupportthecommonobservationthatrobustnessexhibitsatrade-owithclassicationaccuracy.Acurrentlimitationofourmethodistheneedtoa-prioriagreeonapotentialdegreeoflabelcontamination.Whilesomead-hocheuristicsareconceivableforsettingthecor-respondingparameterinpractice,theinvestigationoftheoreticallysoundmethodsforselectinganoptimal\robustnesslevel"wouldbeaninterestingissueforfuturework,aswellasconsideringourmethodinrealadversarialproblemslikespamlteringandintru-siondetection,andcomparingitwithotherSVMimplementationswhicharemeanttoberobustagainstlabelnoise(e.g.,StempfelandRalaivola,2009).AcknowledgmentsTheauthorswishtoacknowledgetheAlexandervonHumboldtFoundationandtheHeisen-bergFellowshipoftheDeutscheForschungsgemeinschaft(DFG)forprovidingnancialsup-porttocarryoutthisresearch.ThisworkwasalsopartlysupportedbyagrantawardedtoB.BiggiobyRegioneAutonomadellaSardegna,POSardegnaFSE2007-2013,L.R.7/2007\PromotionofthescienticresearchandtechnologicalinnovationinSardinia".Theopin-ionsexpressedinthispaperaresolelythoseoftheauthorsanddonotnecessarilyre ecttheopinionsofanysponsor.ReferencesA.AsuncionandD.J.Newman.UCImachinelearningrepository,2007.http://www.ics.uci.edu/~mlearn/MLRepository.html.M.Barreno,B.Nelson,A.Joseph,andJ.Tygar.Thesecurityofmachinelearning.MachineLearning,81:121{148,2010.J.BiandT.Zhang.Supportvectorclassicationwithinputdatauncertainty.InAdvancesinNeuralInformationProcessingSystems17,2004.C.M.Bishop.PatternRecognitionandMachineLearning(InformationScienceandStatis-tics).Springer,1sted.,2007.C.J.C.Burges.Atutorialonsupportvectormachinesforpatternrecognition.DataMin.Knowl.Discov.,2:121{167,1998.C.-C.ChangandC.-J.Lin.LibSVM:alibraryforsupportvectormachines,2001.http://www.csie.ntu.edu.tw/~cjlin/libsvm/.111