2hiCmXi124istyihxiib211024ii1mAsashortremarkweshouldmentionthatCisaparameterthatneedstuningInadditioniftheslacks24iareallequaltozerothenwecallthesetofgivenexample ID: 822558
Download Pdf The PPT/PDF document "UnderstandingLUPI(LearningusingPrivilege..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
UnderstandingLUPI(LearningusingPrivilege
UnderstandingLUPI(LearningusingPrivilegedInformation)AhmadrezaMomeni,KedarTatwawadiStanfordUniversity,Stanford,USfamomenis,kedartg@stanford.eduI.INTRODUCTIONTheideaofusingprivilegedinformationwasrstsug-gestedbyV.VapnikandA.Vashistin[1],inwhichtheytriedtocapturetheessenceofteacher-studentbasedlearningwhichisveryeffectiveincaseofhumanbeingslearning.Morespecically,whenahumanislearninganovelnotion,heexploitshisteacher'scomments,explanations,andex-amplestofacilitatethelearningprocedure.Vapnikproposedthefollowingframework:assumethatwewanttobuildadecisionrulefordeterminingsomelabelsybasedonsomefeaturesX,butinthetrainingstageinadditiontoX,wearealsoprovidedwithsomeadditionalinformation,denotedastheprivilegedinformationxwhichisnotpresentinthetestingstage.InsuchascenariohowcanweutilizeXtoimprovethelearning?Inthisprojectreport,wetrytounderstandtheframeworkofLUPIusingavarietyofexperiments.WealsotrytoproposeanewalgorithmbasedonpriviledgedinformationforNeuralNetworksbasedontheintuitionobtainedfromtheexperiments.A.LUPIFrameworkWerstbrieydescribethemathematicalframeworkofLUPI:Intheclassicalbinaryclassicationproblemswearegivenmnumberofpairs(xi;yi);i=1;:::;mwherexi2X;yi2f1;+1g,andeachpairisindependentlygeneratedbysomeunderlyingdistributionPXY,whichisunknown.Thegoalhereistondafunctionf:X!f1;+1ginthefunctionclassFtoassignthelabelswiththelowesterrorpossibleaveragedovertheunknowndistributionPXY.IntheLUPIframework,themodelisslightlydifferent,asweareprovidedwithtriplets(xi;xi;yi);i=1;:::;mwherexi2X;xi2X;yi2f1;+1gwitheachtripletisindependentlygeneratedbysomeunderlyingdistributionPXXY,whichisagainunknown.However,thegoalisthesameasbefore:westillaimtondafunctionf:X!f1;+1ginthefunctionclassFtoassignthelabelswiththelowesterrorpossible.TheimportantquestionwhichVapnikasksis:canthegeneralizationperformancebeimprovedusing
theprivilegedinformation?Vapnikalsoshowe
theprivilegedinformation?VapnikalsoshowedthisistrueinthecaseofSVM.WewillnextbrieydescribetheSVMandtheSVM+LUPIbasedframeworkproposedbyVapnik.B.SVMandSVM+WebrieydescribetheSVMandSVM+methodsthatwesolveforclassication,whichinthiscaseisndingsome!2Xandb2Rtobuildthefollowingpredictor:f(x)=sgn[h!;xi+b]:1)SVM:TheSVMlearningmethod(non-separableSVM)tond!andbisequivalenttosolvingthefollowingoptimizationproblem:min12h!;!i+CmXi=1is.t.yi[h!;xii+b]1i;i=1;:::;m:Asashortremark,weshouldmentionthatCisaparameterthatneedstuning.Inaddition,iftheslacksiareallequaltozerothenwecallthesetofgivenexamplesseparable,otherwisetheyarenon-separable.2)SVM+:InordertotakeintoaccounttheprivilegedinformationXVapnikmodiedtheSVMformulationasfollows:min12[h!;!i+ h!;!i]+CmXi=1[h!;xi+b]s.t.yi[h!;xii+b]1[h!;xii+b];i=1;:::;m;[h!;xii+b]0;i=1;:::;m;where!2Xandb2R.InthisproblemCand arehyperparameterstobetuned.Intuitively,wecanthinkof[h!;xii+b]'sassomeestimatorsfortheslacksi'sinthepreviousoptimizationproblem.However,thereducedfreedomandbetterpredictionoftheslacksusingtheprivilegedinformationimprovesthelearning.Anotherintuitionhereisthat,insomesensethemargins[h!;xii+b]capturethedifcultyofthetrainingexamplesintheprivilegedspace.Thisdifcultyinformationisthenusedtorelax/tightentheSVMconstraintstoimprovethelearning.WenextdescribesomemethodologieswhichcapturethisintuitionrelatingtodifcultyofexamplestoconstrutLUPIbasedframeworks.C.WeightedSVMandMarginTransferSVMsOnewayinwhichprivilegedinformationinuenceslearn-ingisbydifferentiatingeasyexamplesfromthereallydifcultones.Thisunderstandingwaslaterformalizedin[2],wheretheauthorsarguethatiftheweightsarechosenappropriatelythenWeighted-SVMcanalwaysoutperformSVM+.InweightedSVMstheexmapleweightsthemselvestellthe
difcult/importanceoftheexamples.Alth
difcult/importanceoftheexamples.Although[2]provedthatweightedSVMsarebetterthanSVM+,thedifcultyarisesfromthefactthattheweightsareunknown.Insomecasesthough,thereareheuristicstoguesstheweightswhichworkprettywell,andthesubjectknowledgecanoftenbeutilizedforthiscause.Wenextdescribeaheuristicproposedin[3]tondweightsandsolveaWSVMproblem1)MarginTransferSVM:Onewaytoexploitprivilegedinformationisproposedin[3],wheretheysuggesttosolveaclassicationproblemusingonlyprivilegedinformationx,andachieveaclassierf(notethatthereisnorequirementforftobeoftheformh!;xi+b.Now,westorethemarginsi:=yif(xi).Forourpurpose,weputsomethresholdonthemarginsanddene^i:=maxfi;g.Nowweareequippedtosolvethefollowingoptimizationproblem:min12h!;!i+CmXi=1^iis.t.yi[h!;xii+b]1i;i=1;:::;m:Intuitively,themarginsideterminehowdifcultanexam-pleis.Inextremecases,ifanexampleistoodifcult(i0)thenitsweight^iisequaltozero,whichmeansthatweareeliminatingthatexampleinthetrainingstage.Thisissimilartohumanlearningprocedure,whereifanexampleistoohardthentheteacherdoesnotuseitbecauseitmakesthestudentdivergefromlearningthemainsubjectandwastetimeonsomeotheruselesspoints.WenextdescribevariousexperimentswhichweconductedtounderstandLUPI.II.EXPERIMENTSA.SVM+v.s.SVMTherstexperimentthatweconductedwastocomparetheperformanceofSVM+andSVM.Weusedthefollowingdatasets:TABLEIDATASETSDataTestsetsizedadbIonosphere20176Ring72501010Wineage10845aThenumberofthenormalfeatures,bThenumberoftheprivilegedfeaturesIneachoftheabovedatasets,wechosesomefeatureasnormalonesandsomeofthemasprivileged,andthentrainedtheclassiersandcomputedtheerroronthecorrespondingtestset.Forallofthedatasets,weusedlinearkernel.Theresultedgraphsareasfollows:Fig.1.SVM+v.s.SVM:IonospheredataFig.2.SVM+v.s.SVM:RingdataFig.3.SVM+v
.s.SVM:WineAgedataAsabriefremark,wenote
.s.SVM:WineAgedataAsabriefremark,wenotethatnotonlydoesSVM+convergefaster,butsurprisinglyinsomecasesitconvergestoabetteranswer,whichisobservedverydistinctlyintheRingexperiment.WealsoobservedthatSVM+needsadifferentsolverthanSVMandisquitesensitivetothehyper-parameters,whichmakesitverydifculttogetitworkingforcomplexdatasets.B.ManuallyWeightedSVMv.s.SVM+Thesecondexperimentthatweconductedaimedtoeval-uatetheperformanceofManuallyWeightedSVM.Theaimoftheexperimentwastoascertaintheintuitionthatdifcultyofexampleshelpsinimprovingthelearning.Thus,weconsideredtheease/difcultyofthetrainingsetitselfastheprivilegedinformation.Weusedthefollowingdatasets:TABLEIIDATASETSDataTestsetsizedaAbalone31787Wineage1187aThenumberofthenormalfeaturesThedifcultylevelsaredeterminedasfollows:AbaloneDataset:Inthisexperimentanabaloneisassignedlabel+1ifitsageisabovesomethresholdotherwisethelabelis-1.Weconsideredtheexamplestheageofwhichisequaltothethresholdtobedifcult.WineageDataset:Inthisexperimentalabelis+1iftheageofwineisabovesomethresholdotherwiseitis-1.Weconsideredtheexamplestheageofwhichisbetweenthethresholdand0:25timesthestandarddeviationofthewholedatasetagetobedifcult.Forbothdatasets,weusedlinearkernel.Theresultedgraphsareasfollows:Fig.4.ManuallyWeightedSVMv.s.SVM+:AbalonedataFig.5.ManuallyWeightedSVMv.s.SVM+:WineagedataOverall,weobservedthattheprivilegedinformationre-latedtodifcultyindeeddoeshelpthelearninginlotofscenarios.Although,forhigherdatasizes,theimprovementisnotsignicant.Thisinsomesenseconrmedtheintuitionstatedearlier.C.LUPI-FNNInthisexperiment,weusedtheintuitionoutainedfromtheWeightedSVMandtheMArgin-TransfermethodstothecaseofNeuralNetworks.Thebasicidea,whichisapplicabletomoregenerallearningframeworksisthat:weightscanbeusedtomodifythelearningrateper-examplewhileapplyingtrainingproceduresbasedongradientdescent(like:SGD,momentumupda
te,RMS-Propetc.).Fig.6.InputData(X,y)In
te,RMS-Propetc.).Fig.6.InputData(X,y)Inthisspecicexample,wehaveaspiraldatasetcon-taining3classes(eachdenotedbydifferentcolors).Theaimistouseneuralnetworkstoperformclassication.Asweobserve,althoughtheinputdatasetitselfiscomplex,theprivilegedinformation,whichiscapturedbythepolarcoordinates(unwarped)representationoftheinputdataset,ismuchmoreeasiertoclassify.Fig.7.PrivilegedInformation(X*,y)OurFirststepistota0-layerFCNNintheprivilegedspace(X;Y).TheFCNNconsistsofalinearclassierfol-lowedbyasoftmaxlayertodeterminetheclassprobabilities.Intheexperiment,wedeterminedtheexampleweightsbasedonthesoftmaxprobabilityofthecorrectclass.Thus,lowertheprobability,thehardertheexampleandvice-versa.Fig.8.LearningWeightsTheweightsobtainedfromtheprivilegedinformationwereusedtotraina1-layerneuralnetwork(Linear-ReLU-Linear-Softmax)[Fig9]fortheproblem.Ascomparedwiththebaselineneuralnetwork[Fig10],weobservedimprovedgeneralizationperformanceimprovementofonanaverage3%.Fig.9.WeightedNNtrainingFig.10.ReferencetrainingwithoutprivilegedinformationIII.CONCLUSIONFromtheexperiments,wegainedalotofintuitionintohowtouseLUPIinpracticalscenarios.WewerealsoabletoformulateLUPIalgorithmforneuralnetworks.However,moreexperimentswithreal-lifedataisnecessarytoconrmtheperformanceoftheheuristicsapplied.IV.CODEAllthesourcecode,includinginteractivematlabandipythonnotebooksareavailableat:https://github.com/kedartatwawadi/LUPI.Weplantoupdatethegithubrepowithmoreexperi-ments/tutorialsonLUPI.REFERENCES[1]V.VapnikandA.Vashist,Anewlearningparadigm:Learningusingprivilegedinformation,NeuralNetworks,vol.22,no.5,pp.544557,2009.[2]M.Lapin,M.Hein,andB.Schiele,Learningusingprivilegedinforma-tion:Svm+andweightedsvm,NeuralNetworks,vol.53,pp.95108,2014.[3]V.Sharmanska,N.Quadrianto,andC.H.Lampert,Learningtotransferprivilegedinformation,CoRR,vol.abs/1410.0389,2