Ho we er their perf ormance critically depends on lar ge number of modeling parameters which can be ery dif64257cult to obtain and ar often set via signi64257cant manual tweaking and at gr eat cost of engineering time In this paper we pr opose metho ID: 28954
Download Pdf The PPT/PDF document "Discriminati raining of Kalman Filters P..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
DiscriminativeTrainingofKalmanFiltersPieterAbbeel,AdamCoates,MichaelMontemerlo,AndrewY.NgandSebastianThrunDepartmentofComputerScienceStanfordUniversityStanford,CA94305AbstractKalmanltersareaworkhorseofroboticsandareroutinelyusedinstate-estimationproblems.However,theirperformancecriticallydependsonalargenumberofmodelingparameterswhichcanbeverydifculttoobtain,andareoftensetviasignicantmanualtweakingandatagreatcostofengineeringtime.Inthispaper,weproposeamethodforautomaticallylearningthenoiseparametersofaKalmanlter.WealsodemonstrateonacommercialwheeledroverthatourKalmanlter'slearnednoisecovarianceparametersobtainedquicklyandfullyautomaticallysignicantlyoutperformanearlier,carefullyandlaboriouslyhand-designedone.I.INTRODUCTIONOverthepastfewdecades,Kalmanlters(KFs)[5]andextendedKalmanlters(EKFs)[3]havefoundwidespreadapplicationsthroughoutallbranchesofengineering.EKFstakeasinputsequencesofmeasurementsandcontrols,andoutputanestimateofthestateofadynamicsystem.Theyrequireamodelofthesystem,comprisedofanextstatefunction,ameasurementfunction,andtheassociatednoiseterms.EKFsarearguablyoneofthemostinuentialBayesiantechniquesinallofengineeringandscience.ThispaperaddressesafundamentalproblemwiththeEKF:thatofarrivingatmodelssuitableforaccuratestateestimation.Thenextstatefunctionandthemeasurementfunctionaresometimesrelativelyeasytomodel,sincetheydescribetheunderlyingphysicsofthesystem.Buteveninapplicationswherethenext-statefunctionandthemeasurementfunctionareaccurate,thenoisetermsareoftendifculttoestimate.Thenoisetermscapturewhatthedeterministicmodelfailsto:theeffectsofunmodeledperturbationsonthesystem.Thenoiseisusuallytheresultofanumberofdifferenteffects:Mis-modeledsystemandmeasurementdynamics.TheexistenceofhiddenstateintheenvironmentnotmodeledbytheEKF.Thediscretizationoftime,whichintroducesadditionalerror.ThealgorithmicapproximationsoftheEKFitself,suchastheTaylorapproximationcommonlyusedforlineariza-tion.Alltheseeffectscauseperturbationsinthestatetransitionsandmeasurements.InEKFs,theyarecommonlycharacterizedasnoise.Further,thenoiseisassumedtobeindependentovertimewhereasthephenomenadescribedabovecausehighlycorrelatednoise.ThemagnitudeofthenoiseinanEKFisthereforeextremelydifculttoestimate.Itisthereforesurprisingthattheissueoflearningnoisetermsremainslargelyunexploredintheliterature.Anotableexceptionistheltertuningliterature.(See,e.g.,[7],[8]foranoverview.)Althoughsomeoftheirideasarefairlysimilarandcouldbeautomated,theyfocusmostlyonaformalanalysisof(optimally)reducingtheorderofthelter(forlinearsystems),andhowtousetheresultinginsightsfortuningthelter.TofurthermotivatetheimportanceofoptimizingtheKalmanlterparameters(eitherbylearningortuning),con-siderthepracticalproblemofestimatingthevarianceparame-terforaGPSunitthatisbeingusedtoestimatethepositionxofarobot.AstandardKalmanltermodelwouldmodeltheGPSreadingsxmeasuredasthetruepositionplusnoise:xmeasured=xtrue+";where"isanoisetermwithzeromeanandvariance2.TheGPS'manufacturerspecicationswillsometimesexplicitlygive2fortheunit;otherwise,onecanalsostraightforwardlyestimate2byplacingthevehicleataxed,known,position,andmeasuringthevariabilityoftheGPSreadings.However,inpracticeeitherofthesechoicesfor2willworkverypoorlyifitistheparameterusedintheKalmanlter.ThisisbecauseGPSerrorsareoftencorrelatedovertime,whereasthestraightforwardimplementationoftheKalmanlteras-sumesthattheerrorsareindependent.Thus,ifthevehicleisstationaryandweaveragenGPSreadings,thelterassumesthatthevarianceoftheresultingestimateis2=n.However,iftheerrorsarecorrelatedovertime,thenthetruevarianceoftheresultingpositionstateestimatecanbesignicantlylargerthan2.Theextremeofthiscasewouldbefullcorrelation:Iftheerrorswereperfectlycorrelatedsothatallnreadingsareidentical,thenthevarianceoftheaveragewouldbe2insteadof2=n.1Thus,if2wastheparameterusedinthelter,itwilltendtounderestimatethelongtime-scalevarianceoftheGPSreadings,andperhapsasaresulttrusttheGPStoomuchrelativetoothersensors(orrelativetothedynamicmodel),andthereforegivepoorestimatesofthestate.Inpractice,anumberofauthorshaveobservedthiseffectonseveralrobotsincludinganautonomoushelicopterplatformusingGPSforitsstateestimates[10],andgroundroverplat-formsusingaSICKLADARtoestimatetheirposition[13].Ineachofthesecases,signicanthumantimewasexpendedtotrytotweakthevarianceparametertowhattheyguessed1Moregenerally,wehavethatVar(1nPn=1xi)=1n2Pn=1Var(xi)+Pn=1Pn=1;j6=iCov(xi;xj). Fig.1.GroundroverplatforminstrumentedwithinexpensiveIMUandGPS.ThisrobotiscommerciallyprovidedtocontractorsintheDARPALAGRprogram.moreappropriatevalues,sothattheKalmanltergivesmoreaccuratestateestimates.Forinstance,intheGPSexampleabove,byarticiallychoosingalarger2thanthatsuggestedbythemanufacturerspecications,onewouldbeabletoreducetheKF'sovercondenceintheGPS,andtherebyobtainmoreaccuratestateestimates.Inthispaper,weproposeseveralmachinelearningalgo-rithmsforautomaticallychoosingthenoiseparametersoftheKalmanlterorextendedKalmanlter(EKF).OurworkisbasedonthefundamentalassumptionthatduringtheEKFdevelopment,itispossibletoinstrumentthesystemtomeasureadditionalvariables,whichprovidehighlyaccurateestimatesforthestatevariables.Thesemeasurementsareonlyavailableintheinitialtuningphase;latertheyarenolongeravailable.AnexampleofthissetupistherobotshowninFigure1.Thisgureshowsacommercialrobotequippedwithalow-costIMU(inertialmeasurementunit)andalow-endGPSreceiver(globalpositioningsystem).Bothareusedtoestimatetherobot'sgeo-referencedlocationwhentherobotisinoperation.ThevendorofthisunitsuppliesanEKFforposeestimationwhichhasbeenoptimizedmanuallytoachievethebestperformance.AclearoptionwhichthevendordidnotexercisewouldhavebeentoattachahighlyaccuratedifferentialGPSreceivertotheunittotunetheEKF.Withsuchaunit,itbecomespossibletoreceivegroundtruthinformationontheactualcoordinatesoftherobot.Thispaperpresentsafamilyoflearningalgorithmsthatutilizessuchinformationforlearningthecovarianceparame-tersofanEKF.Theideaisrelativelystraightforward:TrainanEKFsothatitmaximizesitspredictiveaccuracy,whereaccuracyisevaluatedthroughthereferencedataobtainedduringtraining.ThedifferentalgorithmsprovidedifferentcriteriaforassessingthepredictionaccuracyofanEKF.SomesimplymeasurethemeansquareerroroftheEKF;othersmeasurethelikelihoodofthehigh-accuracymeasurements.Nomatterwhatcriterionisbeingusedfortraining,however,thetrainedltersconsistentlyoutperformtheEKFcarefullytunedbyhand.Infact,inourexperimentsweachieveresultsthataresignicantlymoreaccuratethanthoseprovidedbyacommercialrobotvendor.Thus,ourapproachpromisestorelieveEKFdevelopersofthetedioustaskoftuningnoiseparametersbyhand.II.THEEXTENDEDKALMANFILTERWewillbeginwithabriefreviewoftheEKF,deningthebasicnotationandterminologyinthispaper.ThroughoutthispaperweusezP(z)todenotetherandomvariablezhasdistributionP.WeuseN(z;;)todenotetheformulaforadensityofaGaussianwithmeanandcovariance,evaluatedatz.TheEKFaddressestheproblemofstateestimationinnon-linearcontinuousdynamicalsystems.Here,wewillformulatetheEKFforthediscretetimesetting.Ateachpointintime,thetruestateofthesystembeingmonitoredwillbedenotedbyxt,wheretisthetimeindex.TheEKFassumesthatstatetransitionsaregovernedbyadiscrete-timecontinuous-statenon-linearlawoftheformxt=f(xt 1;ut)+"t:Hereutisacontrol,fisanonlinearfunctionthatmapsthecontrolandthestateonetimestepearlierintothestateattimet,and"tistheprocessnoise.Theprocessnoise"tisGaussianwithzeromeanandcovarianceR.HencewehavethatxtN(xt;f(xt 1;ut);R).Measurementsztareformedthroughanon-linearlawoftheformzt=g(xt)+t:Hereztisameasurement,gisanonlinearfunctionofthestatext,andtisthemeasurementnoise.ThemeasurementnoiseisGaussianwithzeromeanandcovarianceQ.HencewehavethatztN(zt;g(xt);Q).TheEKFthenprovidesuswithanestimatetofthestateattimet,alongwithanexpectederrorofthisestimate,expressedthroughacovariancet.Putdifferently,givenaGaussianestimateofthestatespeciedbymeanandcovari-anceht 1;t 1iattimet 1,theEKFupdateruleprovidesuswithanestimateofbothquantitiesattimet.Indetail,theupdaterequiresustolinearizethenonlinearfunctiongandhthroughTaylorapproximation.Thisisusuallywrittenasfollows:f(xt 1;ut)f(t 1;ut)+Ft(xt 1 t 1);g(xt)g(t)+Gt(xt t):HereFtandGtareJacobianmatricesoffandg,respectively,takenatthelterestimate.Theresultingstatetransitionandmeasurementfunctionsarenowlinearinx.Forlinearsystems,theKalmanlterpro-ducesanexactupdate,bymanipulatingthevariousGaussiansinvolved.Theupdateisthenusuallyfactoredintotwoseparatesteps,apredictionstep,andameasurementupdatestep.Thepredictionstepstartswiththeestimatet 1anditscovariance t 1attimet 1,andproducesanestimatefortimet:t=f(t 1;ut);t=Ftt 1Ft+R:Thebarintandtindicatesthattheseestimatesarepurepredictions,beforetakingthemeasurementztintoaccount.Thishappensinthemeasurementupdatestep,inwhichtheEKFintegratesthemeasurementztbyrstcalculatingtheKalmangain:Kt=tG(GttGt+Q) 1:Thisexpressionspeciestheamountbywhichtheestimatewillbeadjustedinaccordancetothemeasurementpredictionerrorzt g(t).Thisleadstotheupdateofthemeanandvariancet=t+Kt(zt g(t));t=(I KtGt)t:ToimplementanEKF,thedesignerneedstodeterminetwosetsofthings:Thenonlinearfunctionsgandf,andthenoisecovariancesRandQ.Whilefandgcansometimesbeob-tainedthroughbasicphysicsconsiderations,2thecovariancesRandQaredifculttoestimate.ItiscommonpracticetomanuallytunethosematricesuntiltheEKFexhibitsthedesiredperformance.Inthepresenceofgroundtruthdata,onecouldtrytotunetheparameterssuchthatthelterestimatesareasaccurateaspossibleinestimatingthegroundtruthdata.ManualtuningwithsuchanobjectiveiseffectivelymanualdiscriminativetrainingoftheKalmanlterparameters.Inthenextsectionwepresentlearningproceduresthatautomatesuchatuningprocess.III.LEARNINGTHEFILTERPARAMETERSWenowdescribeourlearningtechniquesforobtainingthenoiseparametersoftheKalmanlterautomatically.Forsimplicity,ourdiscussionwillfocusonlearningRandQ,thoughallthepresentedmethodsalsoapplymoregenerally.Allbutoneofourapproachesrequiresthatoneisgivenahighlyaccurateinstrumentformeasuringeitherallorasubsetofthevariablesinthestatext.Putdifferently,intheEKFlearningphase,wearegivenadditionalvaluesy1;y2;:::,whereeachytisgovernedbyaprojectiveequationofthetypeyt=h(xt)+\rt:Herehisafunction,and\rtisthenoisewithcovarianceP.Inourexamplebelow,ytarethereadingsfromahigh-endGPSreceiver.ThefunctionhwillbeaprojectionwhichextractsthesubsetofthevariablesinxtthatcorrespondtotheCartesiancoordinatesoftherobot.2Onecommonexceptionaretheªdampingºterminthestatedynamics.Forexample,ifweestimatethegyrosofanIMU,orindeedanyothersensor,tohaveaslowlyvaryingbias(asiscommonlydoneinpractice),thebiasisusuallymodeledasxt=xt 1+t,where01governstherateatwhichthebiasxttendstowardszero.TheparametersandVar(t)jointlygovernthedynamicsofthebias,andisanexampleofaparameterinthestateupdateequationthatisdif®culttoestimateandis,inpractice,usuallytunedbyhand.Letx0:Tdenotetheentirestatesequence(x0;x1;:::;xT),andsimilarlyletu1:T,y0:Tandz0:Tdenotethecorrespondingobservationsequences.Assumingthatwehaveapriorp(x0)ontheinitialstateattime0,thestateupdateequationandtheobservationequation(togetherwiththeknowncontrolsu1:T)deneajointprobabilitydistributiononx0:T,y0:T,andz0:T.Specically,p(x0:T;y0:T;z0:Tju1:t)(1)=p(x0)TYt=1p(xtjxt 1;ut)TYt=0p(ytjxt)p(ztjxt);wherep(xtjxt 1;ut)=N(xt;f(xt 1;ut);R);(2)p(ytjxt)=N(yt;h(xt);P);(3)p(ztjxt)=N(zt;g(xt);Q):(4)Usingthelinearizationapproximationstofandg(h,beingaprojectionoperation,isassumedtobelinear),thejointdistributionp(x0:T;y0:T;z0:T)denedbytheEKFmodelisactuallyajointlinear-Gaussianmodel.[12]Sincethejointdistributioniswell-dened,soareotherquantitiessuchasthemarginaldistributions(suchasp(y0:T;z0:T)=Rx0:Tp(x0:T;y0:T;z0:T)dx0:T)andtheconditionaldistribu-tions(suchasp(z0:Tjy0:T))overthesesamerandomvariables.A.GenerativeApproach:MaximizingTheJointLikelihoodWewillrstdiscussanaiveapproach,whichrequiresaccesstothefullstatevector.Putdifferently,thisapproachrequiresthathistheidentityfunction,andthatthenoisein\rissosmallthatitcansafelybeneglected.Whilethisapproachisgenerallyinapplicablesimplybecauseitisoftendifculttomeasureallstatevariables,itwillhelpusinsettinguptheotherapproaches.Generativelearningproceedsbymaximizingthelikelihoodofallthedata.Sinceinthissectionweassumethefullstatevectorisobserved(i.e.,foralltwehaveyt=xt),thecovariancematriceshRjoint;Qjointiareestimatedasfollows:hRjoint;Qjointi=argmaxR;Qlogp(x0:T;z0:Tju1:T):(5)NowbysubstitutinginEqn.(1),(2)and(4)intoEqn.(5)wegetthattheoptimizationdecomposesandwecanestimateRjointandQjointindependentlyas:Rjoint=argmaxR Tlogj2Rj TXt=1(xt f(xt 1;ut))R 1(xt f(xt 1;ut));Qjoint=argmaxQ (T+1)logj2Qj TXt=0(zt g(xt))Q 1(xt g(xt)):AninterestingobservationhereisthatboththeobjectiveforRjointandtheobjectiveforQjointdecomposeintotwoterms:atermfromthenormalizerwhoseobjectiveistodeatethe determinantofR(Q),andonethatseekstominimizeaquadraticfunctioninwhichtheinverseofR(Q)isafactor,andwhichthereforeseekstoinateR(Q).TheoptimalRjointandQjointcanactuallybecomputedinclosedformandaregivenbyRjoint=1TTXt=1(xt f(xt 1;ut))(xt f(xt 1;ut));Qjoint=1T+1TXt=0(zt g(xt))(xt g(xt)):Notethenaiveapproachneveractuallyexecutesthelterfortraining.Itsimplytrainstheelementsofthelter.ItthereforeimplicitlyassumesthattrainingtheelementsindividuallyisasgoodastrainingtheEKFasawhole.B.MinimizingTheResidualPredictionErrorThetechniqueofmaximizingthejointlikelihood,asstatedabove,isonlyapplicablewhenthefullstateisavailableduringtraining.Thisisusuallynotthecase.Often,hisafunctionthatprojectsthefullstateintoalower-dimensionalprojectionofthestate.Forexample,fortheinertialnavigationsystemdescribedbelow,thefullstateinvolvesbiastermsofagyroscopethatcannotbedirectlymeasured.Further,thetechniqueofminimizingtheconditionallikelihoodneveractuallyrunsthelter!Thisisaproblemifnoiseisactuallycorrelated,asexplainedintheintroductionofthispaper.Abetterapproach,thus,wouldinvolvetraininganEKFthatminimizesthepredictionerrorforthevaluesofyt.Morespecically,considertheEKF'spredictionofyt:E[ytju1:t;z0:t]=h(t):(6)Here,tistheresultofrunningtheEKFalgorithm(withsomevarianceparametersRandQforthelter),andtakingitsestimateforthestatextaftertheEKFhasseentheobservationsz0:t(andthecontrolsu1:t).ThereforetdependsimplicitlyonRandQ.ThepredictionerrorminimizationtechniquesimplyseekstheparametersRandQthatminimizethequadraticdeviationofytandtheexpectationabove,weightedbytheinversecovarianceP:hRres;Qresi=argminR;QTXt=0(yt h(t))P 1(yt h(t)):IfPisanymultipleoftheidentitymatrix,thissimpliestohRres;Qresi=argminR;QTXt=0jjyt h(t)jj2:(7)Thus,wearesimplychoosingtheparametersRandQthatcausetheltertooutputthestateestimatesthatminimizethesquareddifferencestothemeasuredvaluesyt.Thisoptimizationismoredifcultthanmaximizingthejointlikelihood.TheerrorfunctionisnotasimplefunctionofthecovariancesRandQ.Instead,itisbeingmediatedthroughthemeanestimatest,whichdependonthecovariancesRandQinacomplicatedway.ThemeanestimatestaretheresultofrunninganEKFoverthedata.Hence,thislearningcriterionevaluatestheactualperformanceoftheEKF,insteadofitsindividualcomponents.Computingthegradientsforoptimizingtheresidualpredic-tionerrorismoreinvolvedthaninthepreviouscase.However,anoptimizationthatdoesnotrequireexplicitgradientcompu-tations,suchastheNelder-Meadsimplexalgorithm,canalsobeapplied.[11]C.MaximizingThePredictionLikelihoodTheobjectiveinEqn.(7)measuresthequalityofthestateestimatestoutputbytheEKF,butdoesnotmeasuretheEKF'sestimatesoftheuncertaintyofitsoutput.Ateachtimestep,theEKFestimatesbothtandacovarianceforitserrort.InapplicationswherewerequirethattheEKFgivesaccurateestimatesofitsuncertainty[15],wechooseinsteadthepredictionlikelihoodobjectivehRpred;Qpredi=argmaxR;QTXt=0logp(ytjz0:t;u1:t):(8)Heretheyt'saretreatedasmeasurements.ThistrainingregimetrainstheEKFsoastomaximizetheprobabilityofthesemeasurements.Theprobabilityp(ytjz1:t;u1:t)canbedecomposedintovariablesknownfromthelter:p(ytjz0:t;u1:t)=Zp(ytjxt)p(xtjz0:t;u1:t)|{z}N(xt;t;t)dxt:UndertheTaylorexpansion,thisresolvestop(ytjz0:t;u1:t)=N(yt;h(t);HttHt+P):(9)HereHtistheJacobianofthefunctionh.TheresultingmaximizationoftheloglikelihoodgivesushRpred;Qpredi=argmaxR;QTXt=0 logj2\ntj (yt h(t))\n 1t(yt h(t)):Hereweabbreviated\n=HttHt+P.Onceagain,thisoptimizationinvolvestheestimatet,throughwhichtheeffectsofPandQaremediated.Italsoinvolvesthecovariancet.WenotewhenthecovariancePissmall,wecanomititinthisexpression.ThisobjectiveshouldalsobecontrastedwithEqn.(7).Thedifferenceisthatherethelterisadditionallyrequiredtogivecondenceratedpredictionsbychoosingcovariancestthatreectthetruevariabilityofitsstateestimatest.D.MaximizingTheMeasurementLikelihoodWenowapplytheideaintheprevioussteptothemeasure-mentdataz0:T.Itdiffersinthebasicassumption:Herewedonothaveadditionaldatay1:T,butinsteadhavetotunetheEKFsimplybasedonthemeasurementsz0:Tandthecontrolsu1:T. RecallingthattheEKFmodel,forxedu1:t,givesawell-deneddenitionforthejointp(x0:t;z0:tju1:t),themarginaldistributionp(z0:tju1:t)isalsowelldened.Thus,ourapproachissimplytochoosetheparametersthatmaximizethelikelihoodoftheobservationsinthetrainingdata:hRmeas;Qmeasi=argminR;Qlogp(z0:Tju1:T):Thevalueoftheobjectiveiseasilycomputedbynotingthat,bythechainruleofprobability,p(z0:Tju1:T)=TYt=0p(ztjz0:t 1;u1:T):Moreover,eachofthetermsintheproductisgivenbyp(ztjz0:t 1;u1:T)=Zxtp(ztjxt;z0:t 1;u1:T)p(xtjz0:t 1;u1:T)dxt=Zxtp(ztjxt)p(xtjz0:t 1;u1:t)dxt:Thetermp(ztjxt)isgivenbyEqn.(4),andp(xtjz1:t 1;u1:t 1)=N(t;t),wheretandtarequantitiescomputedbytheEKF.ThusthisapproachalsorunstheEKFtoevaluateitsperformancecriterion.Howeversincenogroundtruthdataisusedhere,theperformancecriterionisnotpredictiveperformanceforthestatesequence(whichiswhatweultimatelycareabout),butmerelypredictiveperformanceontheobservationsz0:T.E.OptimizingThePerformanceAfterSmoothingThetwodiscriminativecriteriaofSectionsIII-BandIII-CevaluatetheperformanceofthecovariancematriceshR;QiasusedintheEKF.Thesecriteriacaneasilybeextendedtothesmoothingsetting.(See,e.g.,[8]fordetailsonsmoothing.)Inparticularlet~tbethestateestimatesasobtainedfromthesmoother,thenthesmootherequivalentofEqn.(7)is:hRres sm;Qres smi=argminR;QTXt=0kyt h(~t)k2:Thesmootherlikelihoodobjectiveisgivenbyconditioningonallobservations(insteadofonlyuptotimetasintheltercase).SothesmootherequivalentofEqn.(8)is:hRpred sm;Qpred smi=argmaxR;QTXt=0logp(ytjz0:T;u1:T):Thesmootherlikelihoodobjectiveiscloselyrelatedtothetrainingcriteriausedforconditionalrandomeldswhicharewidelyusedinmachinelearningtopredictasequenceoflabels(states)fromallobservations.(See,e.g.,[6]and[4]fordetails.)ThetwocriteriaproposedinthissectionareoptimizingthecovariancematriceshR;Qiforsmootherperformance,notlterperformance.Soweexpecttheresultingcovariancema-triceshR;Qi(althoughgoodforsmoothing)nottobeoptimalforuseinthelter.Thisisconrmedinourexperiments.F.TrainingTheprevioustextestablishedanumberofcriteriafortrain-ingcovariancematrices;infact,thecriteriamakeitpossibletoalsotunethefunctionsfandg,butwefoundthistobeoflesserimportanceinourwork.Thetrainingalgorithmusedinallourexperimentsisacoordinateascentalgorithm:GiveninitialestimatesofRandQ,thealgorithmrepeatedlycyclesthrougheachoftheentriesofRandQ.Foreachentryp,theobjectiveisevaluatedwhendecreasingandincreasingtheentrybyppercent.Ifthechangeresultsinabetterobjective,thechangeisacceptedandtheparameterpisincreasedbytenpercent,otherwisepisdecreasedbyftypercent.Initiallywehavep=10.Wendempiricallythatthisalgorithmconvergesreliablywithin20-50iterations.IV.EXPERIMENTSWecarriedoutexperimentsontherobotshowninFig-ure1.Thisisadifferentialdriverobotdesignedforoff-roadnavigation.Forstateestimation,itisinstrumentedwithalowcostGPSunit;alowcostinertialmeasurementunit(IMU)consistingof3accelerometersformeasuringlinearaccelerations,and3gyroscopesformeasuringrotationalveloc-ities;amagnetometer(magneticcompass);andopticalwheelencoders(tomeasureforwardvelocity,assumingrigidcontactwiththeground).TheGPSisWAASenabled,andreturnspositionestimatesat1Hzwithatypicalpositionaccuracyofabout3meters.ThesevehicleswerebuiltbyCarnegieMellonUniversityforacompetitioninwhicheachteamobtainsanidenticalcopyofthevehicle,whichcanbeusedforsoftwaredevelopment.Thesoftwaredevelopedbyeachteamwillthenbetestedonaseparate(butidentical)vehicleataCarnegieMellonsite.Sincewehaveourownvehicle,weareabletoinstallanaccurateGPSunitontoittogetadditional,moreaccurate,stateestimatesduringdevelopmenttime.Specically,wemountedontoourvehicleaNovatelRT2differentialGPSunit,whichgivespositionestimatesytat20Hztoabout2cmaccuracy.WhilewecouldusetheaccurateGPSunitfordevelopment,thehardwareonwhichouralgorithmswillbeevaluatedwillnothavethemoreaccurateGPS.Thevehiclealsocomeswithacarefullyhand-tunedEKF.Sincethispre-existingEKFwasbuiltbyahighlyexperiencedteamofroboticistsatCarnegieMellon(notafliatedwiththeauthors),webelievethatitrepresentsanapproximateupper-boundontheperformancethatcanreasonablybeexpectedinasystembuiltbyhand-tweakingparameters(withoutusinggroundtruthdata).Wethereforeevaluateourlearningalgo-rithmsagainstthishand-designedEKF.Thestateofthevehicleisrepresentedasavedimensionalvector,includingitsmapcoordinatesxtandyt,orientationt,forwardvelocityvtandheadinggyrobiasbt.[2]ThemeasurementerrorofagyroscopeiscommonlycharacterizedashavingaGaussianrandomcomponent,andanadditivebiastermthatvariesslowlyovertime.Neglectingtomodelthebiasofthegyroscopewillleadtocorrelatederrorinthe robot'sheadingovertime,whichwillresultinpoorestimationperformance.Moreformally,ourEKF'sstateupdateequationsaregivenby:xt=xt 1+tvt 1cost 1+"forcost 1 "lattsint 1;yt=yt 1+tvt 1sint 1+"fortsint 1+"latcost 1;t=t 1+t(rt+bt)+";vt=vt 1+tat+"v;bt=bt 1+"b:Here"forand"lattarethepositionnoiseintheforwardandlateraldirectionwithrespecttothevehicle.Thecontrolisut=(rtat),wherertistherotationalvelocitycommand,andatistheforwardacceleration.Theobservationequationsaregivenby~xt=xt+xt;~yt=yt+yt;~t=t+t;~vt=vt+vt:Inourmodel,"tisazeromeanGaussiannoisevari-ablewithcovariancediag(for;lat;;v;b).Similarly,tisazeromeanGaussiannoisevariablewithcovariancediag(\rx;\ry;\r;\rv).Inourexperiments,thenineparametersfor;lat;;v;\rx;\ry;\r;\rvweretusingthelearningalgorithms.Furthermore,ourmodelassumedthat\rx=\ry.Ourexperimentalprotocolwasasfollows.Wecollectedtwosetsofdata(100seach)ofdrivingthevehiclearoundagrasseld,andusedonefortraining,theotherfortesting.Becausetheobservationsytdonotcontainthecompletestate(butonlypositioncoordinates),thenaiveapproachofmaximizingthejointlikelihoodisnotdirectlyapplicable.Howeverthehighlyaccuratepositionestimatesallowustoextractreasonablyaccurateestimatesoftheotherstatevariables.3Usingthesestateestimatesasasubstitutefortherealstates,weestimatethecovarianceshRjoint;Qjointiusingthejointlikelihoodcri-terion.TheestimateshRjoint;Qjointiareusedforinitializationwhenusingtheothercriteria(whichdonothaveclosedformsolutions).Weevaluateouralgorithmsontestdatausingtwoerrormetrics.TherstistheRMSerrorintheestimateofthe3Morespeci®cally,werananextendedKalmansmoothertoobtainesti-matesfort;vt;bt.Thissmootherusedveryhighvariancesforthemeasured~t;~vt,andverysmallvariancesforthepositionmeasurements~xt;~yt.Thesmootheralsoassumedveryhighprocessnoisevariances,exceptforthegyrobiasterm.Thischoiceofvariancesensuresthesmootherextractsstateestimatesthatareconsistentwiththehighlyaccuratelymeasuredpositioncoordinates.Theresultsfromthesmootherwerenotverysensitivetotheexactchoiceofthevariances.Inthereportedexperiments,weuseddiag(1;1;1;1;:0012)fortheprocessnoiseanddiag(:022;:022;102;102)forthemeasurementnoise.vehicle'sposition(cf.Eqn.7): 1TTXt=1jjh(t) ytjj2!1=2:Above,tistheEKFestimateofthefullstateattimet,andh(t)istheEKFestimateofthe2Dcoordinatesofthevehicleattimet.Theseconderrormetricisthepredictionlog-loss 1TTXt=1logp(ytjz0:t;u1:t):FollowingthediscussioninSectionIII.C,themaindifferencebetweenthesetwometricsisinwhetheritdemandsthattheEKFgivesaccuratecovarianceestimates.ThehighlyaccurateGPSoutputspositionmeasurementsat20Hz.Thisisalsothefrequencyatwhichthebuilt-inhand-tunedlteroutputsitsstateestimates.Weusethecorrespondingtimediscretizationt=:05sforourlter.Eachofourlearningalgorithmstookabout20-30minutestoconverge.Ourresultsareasfollows(smallervaluesarebetter):4LearningAlgorithmRMSerrorlog-lossJoint0.286623.5834Res0.27041.0647Pred0.2940-0.1671Meas0.294360.2660Res-sm0.32292.9895Pred-sm0.58310.4793CMUhand-tuned0.39010.7500Inthistable,Resstandsforthealgorithmminimizingtheresidualpredictionerror(hRres;Qresi),etc.AsexpectedthelterslearnedusingthesmoothercriteriaofSectionIII-E(Res-sm,Pred-sm)areoutperformedbythelterslearnedusingthecorrespondingltercriteria(Res,Pred).Sofromhereon,wewillnotconsiderthelterslearnedusingthesmoothercriteria.Weseethatthehand-tunedEKFhadanRMSerrorofabout40cminestimatingthepositionofthevehicle,andthatallofourlearnedltersobtainsignicantlybetterperformance.Usingtheparameterslearnedbymaximizingthepredictionlikelihood(hRpred;Qpredi),wealsoobtainbetterlog-loss(negativeloglikelihood).Minimizingtheresidualpredictionerroronthetrainingdataresultsinsmallestresidualerroronthetestdata.Similarly,minimizingthelog-loss(or,equiva-lently,maximizingthepredictionlikelihood)onthetrainingdataresultsinsmallestlog-lossonthetestdata.Thus,dis-criminativetrainingallowsustosuccessfullyoptimizeforthecriteriawecareabout.Wealsonoticethat,althoughthelterstrainedbyjointlikelihoodmaximizationandmeasurementlikelihoodmaximizationhavesmallRMSerror,theyperformpoorlyonthelog-losscriterion.Thiscanbeexplainedby4Allresultsreportedareaveragedovertwotrials,inwhichhalfofdataisusedfortraining,andtheotherhalffortesting. 0510150246810X(m)Y(m)Zoomed in on this areaFig.2.Typicalstateestimationresults.Plotshowsgroundtruthtrajectory(blacksolidline);on-board(inexpensive)GPSmeasurements(blacktriangles);estimatedstateusingthe®lterlearnedbyminimizingresidualpredictionerror(bluedash-dottedline);estimatedstateusingthe®lterlearnedmaximizingthepredictionlikelihood(greendashedline);andestimatedstateusingtheCMUhand-tuned®lter(reddottedline).(Colorswhereavailable.)4681067891011X(m)Y(m)Fig.3.Close-upofpartofFigure2.(Samelegendasprevious®gure.) correlatednoise.Morespecically,correlatednoisecausesthemodeltrainedbymaximizingthejointlikelihoodtobeovercondentaboutitsstateestimates,whichresultsinthelargerlog-loss.Theeffectofcorrelatednoiseonthemodeltrainedbymaximizingthemeasurementlikelihoodisevenmoresignicant.Themodellearnsverysmallmeasurementvariances,whichallowittotrackthemeasurementsmoreclosely.Unfortunately,inthecaseofcorrelatednoise,trackingthemeasurementsmorecloselydoesnotmeantrackingthestateofthesystemmoreclosely.Thesmallmeasurementvariancesresultinsignicantovercondenceinthestateestimates,whichcausesthelog-losstobesignicantlyhigher.Figure2showsatypicaltrajectorytakenbythevehicle,aswellastheestimatesfromtwoofthelearnedltersandtheCMUhand-tunedlter.Itisvisuallyfairlyclearfromthegurethatthelearnedltersaremoreaccuratelytrackingthegroundtruthtrajectorythanthehand-tunedlter.Figure3showsaclose-upofpartofFigure2.Toreduceclutter,wehaveplottedonlytheoutputoftwoofthesixlearnedltershere;howeveralllearnedltershaveoutputsthatlookvisuallyverysimilar.Oneexceptionisthatasexplainedinthepreviousparagraphlterslearnedbymaximizingthemeasurementlikelihoodtendtofollowthe(oftennoisy)GPSmeasurementsmoreclosely.V.CONCLUSIONWehavepresentedahighlyrelatedfamilyofalgorithmsfortrainingthenoiseparametersofanEKF.Allalgorithmsfollowthesameidea:AdjustthecovariancesoftheEKFinawaythatmaximizestheirpredictiveaccuracy.Experimentalresultsshowthatthistrainingroutinehastwomajoradvantagesoverpreviousbestpractice.First,iteliminatestheneedforahand-tuningphase,therebymakingiteasiertodevelopworkingEKFs.Second,wendthatthelearnedEKFismoreaccuratethanevenawell-tunedEKFconstructedbyhand.Amongthelearnedlters,thebestresultswereobtainedbyusingdiscriminativetraining,whichevaluatescandidatecovariancesbyevaluatingthepredictiveperformanceoftheEKFwhenusingthesecovariances.InourexperimentswecomparetoacommercialEKF,providedaspartofarobotdevelopedforamajorDARPAprogram.Clearly,itisdifculttoassesshowmuchtuningwentintotheEKF,andwhetherthisisactuallyasgoodascanbedonethroughmanualtweaking.However,theEKFiscentraltotheapplicationofthisroboticsystem,andweexpectthedevelopmentteamspentatleastafewdaysdevelopingthisEKF.OurapproachoutperformsthisEKFbyalargemargin,basedonafewminutesofdataandafewminutesoflearning.Thissuggeststhatourapproachmayyieldbetterresultswithlessdevelopmenttime.WenotethatourtrainingapproachisalsoapplicabletobroaderproblemsofEKFtraining.Inparticular,wechosenottolearnthephysicalmodelasexpressedinthestatetransitionandthemeasurementfunctions.However,givenanappropriateparameterizationofthesefunctions,itappearstobefeasibletotunethosefunctionsaswell.However,itremainsanopenquestiontowhichextentover-ttingposesaproblemwhendoingsoinpractice.Theholistictrainingalgorithmspresentedinthispaperarehighlyrelatedtoanongoingdebateintheeldofmachinelearningonusingdiscriminativevs.generativealgorithmsforsupervisedlearning.There,theconsensus(assumingthereisampletrainingdata)seemstobethatitisusuallybettertodirectlyminimizethelosswithrespecttotheultimateperformancemeasure,ratherthananintermediatelossfunctionsuchasthelikelihoodofthetrainingdata;see,e.g.,[14],[9],[1].Thisisbecausethemodelnomatterhowcomplicatedisalmostalwaysnotcompletelycorrectfortheproblemdata.Byanalogy,whenchoosingthenoiseparametersforanEKF,weareinterestedinchoosingparametersthatleadtotheEKFoutputtingaccuratestateestimates,ratherthannecessarilychoosingthenoiseparametersthatmostcorrectlyreectseachmeasurement'struevariance(suchaswouldbeobtainedfromthemaximumlikelihoodestimateorfrommostmanufacturerspecs,asdiscussedabove).ACKNOWLEDGMENTSWegivewarmthankstoAndrewLookingbillandDavidLiebforcollectingthedatafromtheLAGRrobotplatform.ThisworkwassupportedbytheDARPALAGRprogramundercontractnumberFA8650-04-C-7134.REFERENCES[1]PieterAbbeelandAndrewY.Ng.Learning®rstorderMarkovmodelsforcontrol.InNIPS17,2005.[2]J.A.FarrellandM.Barth.TheGlobalPositioningSystemandInertialNavigation.McGrawHill,1998.[3]ArthurGelb,editor.AppliedOptimalEstimation.MITPress,1974.[4]ShamKakade,YeeWhyeTeh,andSamRoweis.AnalternativeobjectivefunctionforMarkovian®elds.InProc.ICML,2002.[5]RudolphE.Kalman.Anewapproachtolinear®lteringandpredictionproblems.TransactionsoftheASMEJournalofBasicEngineering,82(SeriesD):3545,1960.[6]JohnLafferty,AndrewMcCallum,andFernandoPereira.Conditionalrandom®elds:Probabilisticmodelsforsegmentingandlabelingse-quencedata.InProc.ICML,2001.[7]PeterS.Maybeck.StochasticModels,Estimation,andControl,vol-ume1.AcademicPress,1982.[8]PeterS.Maybeck.StochasticModels,Estimation,andControl,vol-ume2.AcademicPress,1982.[9]A.Y.NgandM.I.Jordan.Ondiscriminativevs.generativeclassi®ers:AcomparisonoflogisticregressionandnaiveBayes.InNIPS14,2002.[10]A.Y.Ng,A.Coates,M.Diel,V.Ganapathi,J.Schulte,B.Tse,E.Berger,andE.Liang.Autonomousinvertedhelicopter¯ightviareinforcementlearning.InProceedingsoftheInternationalSymposiumonExperi-mentalRobotics(ISER),Singapore,2004.SpringerTractsinAdvancedRobotics(STAR).[11]WilliamH.Press,SaulA.Teukolsky,WilliamT.Vetterling,andBrianFlannery.NumericalRecipesinC.CambridgeUniversityPress,1992.[12]SamRoweisandZoubinGhahramani.AunifyingreviewoflinearGaussianmodels.Technicalreport,UniversityofToronto,6King'sCollegeRoad,TorontoM5S3H5,Canada,1997.[13]S.Thrun,W.Burgard,andD.Fox.ProbabilisticRobotics.MITPress,Cambridge,MA,2005.[14]V.N.Vapnik.StatisticalLearningTheory.JohnWiley&Sons,1998.[15]B.ZadroznyandC.Elkan.ObtainingcalibratedprobabilityestimatesfromdecisiontreesandnaiveBayesianclassi®ers.InProc.ICML,2001.