/
Discriminati raining of Kalman Filters Pieter Abbeel Adam Coates Michael Montemerlo Andre Discriminati raining of Kalman Filters Pieter Abbeel Adam Coates Michael Montemerlo Andre

Discriminati raining of Kalman Filters Pieter Abbeel Adam Coates Michael Montemerlo Andre - PDF document

giovanna-bartolotta
giovanna-bartolotta . @giovanna-bartolotta
Follow
527 views
Uploaded On 2014-12-24

Discriminati raining of Kalman Filters Pieter Abbeel Adam Coates Michael Montemerlo Andre - PPT Presentation

Ho we er their perf ormance critically depends on lar ge number of modeling parameters which can be ery dif64257cult to obtain and ar often set via signi64257cant manual tweaking and at gr eat cost of engineering time In this paper we pr opose metho ID: 28954

their

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Discriminati raining of Kalman Filters P..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

DiscriminativeTrainingofKalmanFiltersPieterAbbeel,AdamCoates,MichaelMontemerlo,AndrewY.NgandSebastianThrunDepartmentofComputerScienceStanfordUniversityStanford,CA94305Abstract—Kalmanltersareaworkhorseofroboticsandareroutinelyusedinstate-estimationproblems.However,theirperformancecriticallydependsonalargenumberofmodelingparameterswhichcanbeverydifculttoobtain,andareoftensetviasignicantmanualtweakingandatagreatcostofengineeringtime.Inthispaper,weproposeamethodforautomaticallylearningthenoiseparametersofaKalmanlter.WealsodemonstrateonacommercialwheeledroverthatourKalmanlter'slearnednoisecovarianceparameters—obtainedquicklyandfullyautomatically—signicantlyoutperformanearlier,carefullyandlaboriouslyhand-designedone.I.INTRODUCTIONOverthepastfewdecades,Kalmanlters(KFs)[5]andextendedKalmanlters(EKFs)[3]havefoundwidespreadapplicationsthroughoutallbranchesofengineering.EKFstakeasinputsequencesofmeasurementsandcontrols,andoutputanestimateofthestateofadynamicsystem.Theyrequireamodelofthesystem,comprisedofanextstatefunction,ameasurementfunction,andtheassociatednoiseterms.EKFsarearguablyoneofthemostinuentialBayesiantechniquesinallofengineeringandscience.ThispaperaddressesafundamentalproblemwiththeEKF:thatofarrivingatmodelssuitableforaccuratestateestimation.Thenextstatefunctionandthemeasurementfunctionaresometimesrelativelyeasytomodel,sincetheydescribetheunderlyingphysicsofthesystem.Buteveninapplicationswherethenext-statefunctionandthemeasurementfunctionareaccurate,thenoisetermsareoftendifculttoestimate.Thenoisetermscapturewhatthedeterministicmodelfailsto:theeffectsofunmodeledperturbationsonthesystem.Thenoiseisusuallytheresultofanumberofdifferenteffects:Mis-modeledsystemandmeasurementdynamics.TheexistenceofhiddenstateintheenvironmentnotmodeledbytheEKF.Thediscretizationoftime,whichintroducesadditionalerror.ThealgorithmicapproximationsoftheEKFitself,suchastheTaylorapproximationcommonlyusedforlineariza-tion.Alltheseeffectscauseperturbationsinthestatetransitionsandmeasurements.InEKFs,theyarecommonlycharacterizedas“noise.”Further,thenoiseisassumedtobeindependentovertime—whereasthephenomenadescribedabovecausehighlycorrelatednoise.ThemagnitudeofthenoiseinanEKFisthereforeextremelydifculttoestimate.Itisthereforesurprisingthattheissueoflearningnoisetermsremainslargelyunexploredintheliterature.Anotableexceptionistheltertuningliterature.(See,e.g.,[7],[8]foranoverview.)Althoughsomeoftheirideasarefairlysimilarandcouldbeautomated,theyfocusmostlyonaformalanalysisof(optimally)reducingtheorderofthelter(forlinearsystems),andhowtousetheresultinginsightsfortuningthelter.TofurthermotivatetheimportanceofoptimizingtheKalmanlterparameters(eitherbylearningortuning),con-siderthepracticalproblemofestimatingthevarianceparame-terforaGPSunitthatisbeingusedtoestimatethepositionxofarobot.AstandardKalmanltermodelwouldmodeltheGPSreadingsxmeasuredasthetruepositionplusnoise:xmeasured=xtrue+";where"isanoisetermwithzeromeanandvariance2.TheGPS'manufacturerspecicationswillsometimesexplicitlygive2fortheunit;otherwise,onecanalsostraightforwardlyestimate2byplacingthevehicleataxed,known,position,andmeasuringthevariabilityoftheGPSreadings.However,inpracticeeitherofthesechoicesfor2willworkverypoorlyifitistheparameterusedintheKalmanlter.ThisisbecauseGPSerrorsareoftencorrelatedovertime,whereasthestraightforwardimplementationoftheKalmanlteras-sumesthattheerrorsareindependent.Thus,ifthevehicleisstationaryandweaveragenGPSreadings,thelterassumesthatthevarianceoftheresultingestimateis2=n.However,iftheerrorsarecorrelatedovertime,thenthetruevarianceoftheresultingpositionstateestimatecanbesignicantlylargerthan2.Theextremeofthiscasewouldbefullcorrelation:Iftheerrorswereperfectlycorrelatedsothatallnreadingsareidentical,thenthevarianceoftheaveragewouldbe2insteadof2=n.1Thus,if2wastheparameterusedinthelter,itwilltendtounderestimatethelongtime-scalevarianceoftheGPSreadings,andperhapsasaresult“trust”theGPStoomuchrelativetoothersensors(orrelativetothedynamicmodel),andthereforegivepoorestimatesofthestate.Inpractice,anumberofauthorshaveobservedthiseffectonseveralrobotsincludinganautonomoushelicopterplatformusingGPSforitsstateestimates[10],andgroundroverplat-formsusingaSICKLADARtoestimatetheirposition[13].Ineachofthesecases,signicanthumantimewasexpendedtotryto“tweak”thevarianceparametertowhattheyguessed1Moregenerally,wehavethatVar(1nPn=1xi)=1n2Pn=1Var(xi)+Pn=1Pn=1;j6=iCov(xi;xj). Fig.1.GroundroverplatforminstrumentedwithinexpensiveIMUandGPS.ThisrobotiscommerciallyprovidedtocontractorsintheDARPALAGRprogram.more“appropriate”values,sothattheKalmanltergivesmoreaccuratestateestimates.Forinstance,intheGPSexampleabove,byarticiallychoosingalarger2thanthatsuggestedbythemanufacturerspecications,onewouldbeabletoreducetheKF'sovercondenceintheGPS,andtherebyobtainmoreaccuratestateestimates.Inthispaper,weproposeseveralmachinelearningalgo-rithmsforautomaticallychoosingthenoiseparametersoftheKalmanlterorextendedKalmanlter(EKF).OurworkisbasedonthefundamentalassumptionthatduringtheEKFdevelopment,itispossibletoinstrumentthesystemtomeasureadditionalvariables,whichprovidehighlyaccurateestimatesforthestatevariables.Thesemeasurementsareonlyavailableintheinitialtuningphase;latertheyarenolongeravailable.AnexampleofthissetupistherobotshowninFigure1.Thisgureshowsacommercialrobotequippedwithalow-costIMU(inertialmeasurementunit)andalow-endGPSreceiver(globalpositioningsystem).Bothareusedtoestimatetherobot'sgeo-referencedlocationwhentherobotisinoperation.ThevendorofthisunitsuppliesanEKFforposeestimationwhichhasbeenoptimizedmanuallytoachievethebestperformance.Aclearoption—whichthevendordidnotexercise—wouldhavebeentoattachahighlyaccuratedifferentialGPSreceivertotheunitto“tune”theEKF.Withsuchaunit,itbecomespossibletoreceive“groundtruth”informationontheactualcoordinatesoftherobot.Thispaperpresentsafamilyoflearningalgorithmsthatutilizessuchinformationforlearningthecovarianceparame-tersofanEKF.Theideaisrelativelystraightforward:TrainanEKFsothatitmaximizesitspredictiveaccuracy,where“accuracy”isevaluatedthroughthereferencedataobtainedduringtraining.ThedifferentalgorithmsprovidedifferentcriteriaforassessingthepredictionaccuracyofanEKF.SomesimplymeasurethemeansquareerroroftheEKF;othersmeasurethelikelihoodofthehigh-accuracymeasurements.Nomatterwhatcriterionisbeingusedfortraining,however,thetrainedltersconsistentlyoutperformtheEKFcarefullytunedbyhand.Infact,inourexperimentsweachieveresultsthataresignicantlymoreaccuratethanthoseprovidedbyacommercialrobotvendor.Thus,ourapproachpromisestorelieveEKFdevelopersofthetedioustaskoftuningnoiseparametersbyhand.II.THEEXTENDEDKALMANFILTERWewillbeginwithabriefreviewoftheEKF,deningthebasicnotationandterminologyinthispaper.ThroughoutthispaperweusezP(z)todenotetherandomvariablezhasdistributionP.WeuseN(z;;)todenotetheformulaforadensityofaGaussianwithmeanandcovariance,evaluatedatz.TheEKFaddressestheproblemofstateestimationinnon-linearcontinuousdynamicalsystems.Here,wewillformulatetheEKFforthediscretetimesetting.Ateachpointintime,thetruestateofthesystembeingmonitoredwillbedenotedbyxt,wheretisthetimeindex.TheEKFassumesthatstatetransitionsaregovernedbyadiscrete-timecontinuous-statenon-linearlawoftheformxt=f(xt1;ut)+"t:Hereutisacontrol,fisanonlinearfunctionthatmapsthecontrolandthestateonetimestepearlierintothestateattimet,and"tistheprocessnoise.Theprocessnoise"tisGaussianwithzeromeanandcovarianceR.HencewehavethatxtN(xt;f(xt1;ut);R).Measurementsztareformedthroughanon-linearlawoftheformzt=g(xt)+t:Hereztisameasurement,gisanonlinearfunctionofthestatext,andtisthemeasurementnoise.ThemeasurementnoiseisGaussianwithzeromeanandcovarianceQ.HencewehavethatztN(zt;g(xt);Q).TheEKFthenprovidesuswithanestimatetofthestateattimet,alongwithanexpectederrorofthisestimate,expressedthroughacovariancet.Putdifferently,givenaGaussianestimateofthestatespeciedbymeanandcovari-anceht1;t1iattimet1,theEKFupdateruleprovidesuswithanestimateofbothquantitiesattimet.Indetail,theupdaterequiresustolinearizethenonlinearfunctiongandhthroughTaylorapproximation.Thisisusuallywrittenasfollows:f(xt1;ut)f(t1;ut)+Ft(xt1t1);g(xt)g(t)+Gt(xtt):HereFtandGtareJacobianmatricesoffandg,respectively,takenatthelterestimate.Theresultingstatetransitionandmeasurementfunctionsarenowlinearinx.Forlinearsystems,theKalmanlterpro-ducesanexactupdate,bymanipulatingthevariousGaussiansinvolved.Theupdateisthenusuallyfactoredintotwoseparatesteps,apredictionstep,andameasurementupdatestep.Thepredictionstepstartswiththeestimatet1anditscovariance t1attimet1,andproducesanestimatefortimet:t=f(t1;ut);t=Ftt1F�t+R:Thebarintandtindicatesthattheseestimatesarepurepredictions,beforetakingthemeasurementztintoaccount.Thishappensinthemeasurementupdatestep,inwhichtheEKFintegratesthemeasurementztbyrstcalculatingtheKalmangain:Kt=tG�(GttG�t+Q)1:Thisexpressionspeciestheamountbywhichtheestimatewillbeadjustedinaccordancetothemeasurementpredictionerrorztg(t).Thisleadstotheupdateofthemeanandvariancet=t+Kt(ztg(t));t=(IKtGt)t:ToimplementanEKF,thedesignerneedstodeterminetwosetsofthings:Thenonlinearfunctionsgandf,andthenoisecovariancesRandQ.Whilefandgcansometimesbeob-tainedthroughbasicphysicsconsiderations,2thecovariancesRandQaredifculttoestimate.ItiscommonpracticetomanuallytunethosematricesuntiltheEKFexhibitsthedesiredperformance.Inthepresenceofgroundtruthdata,onecouldtrytotunetheparameterssuchthatthelterestimatesareasaccurateaspossibleinestimatingthegroundtruthdata.ManualtuningwithsuchanobjectiveiseffectivelymanualdiscriminativetrainingoftheKalmanlterparameters.Inthenextsectionwepresentlearningproceduresthatautomatesuchatuningprocess.III.LEARNINGTHEFILTERPARAMETERSWenowdescribeourlearningtechniquesforobtainingthenoiseparametersoftheKalmanlterautomatically.Forsimplicity,ourdiscussionwillfocusonlearningRandQ,thoughallthepresentedmethodsalsoapplymoregenerally.Allbutoneofourapproachesrequiresthatoneisgivenahighlyaccurateinstrumentformeasuringeitherallorasubsetofthevariablesinthestatext.Putdifferently,intheEKFlearningphase,wearegivenadditionalvaluesy1;y2;:::,whereeachytisgovernedbyaprojectiveequationofthetypeyt=h(xt)+\rt:Herehisafunction,and\rtisthenoisewithcovarianceP.Inourexamplebelow,ytarethereadingsfromahigh-endGPSreceiver.ThefunctionhwillbeaprojectionwhichextractsthesubsetofthevariablesinxtthatcorrespondtotheCartesiancoordinatesoftherobot.2Onecommonexceptionaretheªdampingºterminthestatedynamics.Forexample,ifweestimatethegyrosofanIMU,orindeedanyothersensor,tohaveaslowlyvaryingbias(asiscommonlydoneinpractice),thebiasisusuallymodeledasxt=xt1+t,where01governstherateatwhichthebiasxttendstowardszero.TheparametersandVar(t)jointlygovernthedynamicsofthebias,andisanexampleofaparameterinthestateupdateequationthatisdif®culttoestimateandis,inpractice,usuallytunedbyhand.Letx0:Tdenotetheentirestatesequence(x0;x1;:::;xT),andsimilarlyletu1:T,y0:Tandz0:Tdenotethecorrespondingobservationsequences.Assumingthatwehaveapriorp(x0)ontheinitialstateattime0,thestateupdateequationandtheobservationequation(togetherwiththeknowncontrolsu1:T)deneajointprobabilitydistributiononx0:T,y0:T,andz0:T.Specically,p(x0:T;y0:T;z0:Tju1:t)(1)=p(x0)TYt=1p(xtjxt1;ut)TYt=0p(ytjxt)p(ztjxt);wherep(xtjxt1;ut)=N(xt;f(xt1;ut);R);(2)p(ytjxt)=N(yt;h(xt);P);(3)p(ztjxt)=N(zt;g(xt);Q):(4)Usingthelinearizationapproximationstofandg(h,beingaprojectionoperation,isassumedtobelinear),thejointdistributionp(x0:T;y0:T;z0:T)denedbytheEKFmodelisactuallyajointlinear-Gaussianmodel.[12]Sincethejointdistributioniswell-dened,soareotherquantitiessuchasthemarginaldistributions(suchasp(y0:T;z0:T)=Rx0:Tp(x0:T;y0:T;z0:T)dx0:T)andtheconditionaldistribu-tions(suchasp(z0:Tjy0:T))overthesesamerandomvariables.A.GenerativeApproach:MaximizingTheJointLikelihoodWewillrstdiscussanaiveapproach,whichrequiresaccesstothefullstatevector.Putdifferently,thisapproachrequiresthathistheidentityfunction,andthatthenoisein\rissosmallthatitcansafelybeneglected.Whilethisapproachisgenerallyinapplicablesimplybecauseitisoftendifculttomeasureallstatevariables,itwillhelpusinsettinguptheotherapproaches.Generativelearningproceedsbymaximizingthelikelihoodofallthedata.Sinceinthissectionweassumethefullstatevectorisobserved(i.e.,foralltwehaveyt=xt),thecovariancematriceshRjoint;Qjointiareestimatedasfollows:hRjoint;Qjointi=argmaxR;Qlogp(x0:T;z0:Tju1:T):(5)NowbysubstitutinginEqn.(1),(2)and(4)intoEqn.(5)wegetthattheoptimizationdecomposesandwecanestimateRjointandQjointindependentlyas:Rjoint=argmaxRTlogj2RjTXt=1(xtf(xt1;ut))�R1(xtf(xt1;ut));Qjoint=argmaxQ(T+1)logj2QjTXt=0(ztg(xt))�Q1(xtg(xt)):AninterestingobservationhereisthatboththeobjectiveforRjointandtheobjectiveforQjointdecomposeintotwoterms:atermfromthenormalizerwhoseobjectiveistodeatethe determinantofR(Q),andonethatseekstominimizeaquadraticfunctioninwhichtheinverseofR(Q)isafactor,andwhichthereforeseekstoinateR(Q).TheoptimalRjointandQjointcanactuallybecomputedinclosedformandaregivenbyRjoint=1TTXt=1(xtf(xt1;ut))(xtf(xt1;ut))�;Qjoint=1T+1TXt=0(ztg(xt))(xtg(xt))�:Notethenaiveapproachneveractuallyexecutesthelterfortraining.Itsimplytrainstheelementsofthelter.ItthereforeimplicitlyassumesthattrainingtheelementsindividuallyisasgoodastrainingtheEKFasawhole.B.MinimizingTheResidualPredictionErrorThetechniqueofmaximizingthejointlikelihood,asstatedabove,isonlyapplicablewhenthefullstateisavailableduringtraining.Thisisusuallynotthecase.Often,hisafunctionthatprojectsthefullstateintoalower-dimensionalprojectionofthestate.Forexample,fortheinertialnavigationsystemdescribedbelow,thefullstateinvolvesbiastermsofagyroscopethatcannotbedirectlymeasured.Further,thetechniqueofminimizingtheconditionallikelihoodneveractuallyrunsthelter!Thisisaproblemifnoiseisactuallycorrelated,asexplainedintheintroductionofthispaper.Abetterapproach,thus,wouldinvolvetraininganEKFthatminimizesthepredictionerrorforthevaluesofyt.Morespecically,considertheEKF'spredictionofyt:E[ytju1:t;z0:t]=h(t):(6)Here,tistheresultofrunningtheEKFalgorithm(withsomevarianceparametersRandQforthelter),andtakingitsestimateforthestatextaftertheEKFhasseentheobservationsz0:t(andthecontrolsu1:t).ThereforetdependsimplicitlyonRandQ.ThepredictionerrorminimizationtechniquesimplyseekstheparametersRandQthatminimizethequadraticdeviationofytandtheexpectationabove,weightedbytheinversecovarianceP:hRres;Qresi=argminR;QTXt=0(yth(t))�P1(yth(t)):IfPisanymultipleoftheidentitymatrix,thissimpliestohRres;Qresi=argminR;QTXt=0jjyth(t)jj2:(7)Thus,wearesimplychoosingtheparametersRandQthatcausetheltertooutputthestateestimatesthatminimizethesquareddifferencestothemeasuredvaluesyt.Thisoptimizationismoredifcultthanmaximizingthejointlikelihood.TheerrorfunctionisnotasimplefunctionofthecovariancesRandQ.Instead,itisbeingmediatedthroughthemeanestimatest,whichdependonthecovariancesRandQinacomplicatedway.ThemeanestimatestaretheresultofrunninganEKFoverthedata.Hence,thislearningcriterionevaluatestheactualperformanceoftheEKF,insteadofitsindividualcomponents.Computingthegradientsforoptimizingtheresidualpredic-tionerrorismoreinvolvedthaninthepreviouscase.However,anoptimizationthatdoesnotrequireexplicitgradientcompu-tations,suchastheNelder-Meadsimplexalgorithm,canalsobeapplied.[11]C.MaximizingThePredictionLikelihoodTheobjectiveinEqn.(7)measuresthequalityofthestateestimatestoutputbytheEKF,butdoesnotmeasuretheEKF'sestimatesoftheuncertaintyofitsoutput.Ateachtimestep,theEKFestimatesbothtandacovarianceforitserrort.InapplicationswherewerequirethattheEKFgivesaccurateestimatesofitsuncertainty[15],wechooseinsteadthepredictionlikelihoodobjectivehRpred;Qpredi=argmaxR;QTXt=0logp(ytjz0:t;u1:t):(8)Heretheyt'saretreatedasmeasurements.ThistrainingregimetrainstheEKFsoastomaximizetheprobabilityofthesemeasurements.Theprobabilityp(ytjz1:t;u1:t)canbedecomposedintovariablesknownfromthelter:p(ytjz0:t;u1:t)=Zp(ytjxt)p(xtjz0:t;u1:t)|{z}N(xt;t;t)dxt:UndertheTaylorexpansion,thisresolvestop(ytjz0:t;u1:t)=N(yt;h(t);HttH�t+P):(9)HereHtistheJacobianofthefunctionh.TheresultingmaximizationoftheloglikelihoodgivesushRpred;Qpredi=argmaxR;QTXt=0logj2\ntj(yth(t))�\n1t(yth(t)):Hereweabbreviated\n=HttH�t+P.Onceagain,thisoptimizationinvolvestheestimatet,throughwhichtheeffectsofPandQaremediated.Italsoinvolvesthecovariancet.WenotewhenthecovariancePissmall,wecanomititinthisexpression.ThisobjectiveshouldalsobecontrastedwithEqn.(7).Thedifferenceisthatherethelterisadditionallyrequiredtogive“condencerated”predictionsbychoosingcovariancestthatreectthetruevariabilityofitsstateestimatest.D.MaximizingTheMeasurementLikelihoodWenowapplytheideaintheprevioussteptothemeasure-mentdataz0:T.Itdiffersinthebasicassumption:Herewedonothaveadditionaldatay1:T,butinsteadhavetotunetheEKFsimplybasedonthemeasurementsz0:Tandthecontrolsu1:T. RecallingthattheEKFmodel,forxedu1:t,givesawell-deneddenitionforthejointp(x0:t;z0:tju1:t),themarginaldistributionp(z0:tju1:t)isalsowelldened.Thus,ourapproachissimplytochoosetheparametersthatmaximizethelikelihoodoftheobservationsinthetrainingdata:hRmeas;Qmeasi=argminR;Qlogp(z0:Tju1:T):Thevalueoftheobjectiveiseasilycomputedbynotingthat,bythechainruleofprobability,p(z0:Tju1:T)=TYt=0p(ztjz0:t1;u1:T):Moreover,eachofthetermsintheproductisgivenbyp(ztjz0:t1;u1:T)=Zxtp(ztjxt;z0:t1;u1:T)p(xtjz0:t1;u1:T)dxt=Zxtp(ztjxt)p(xtjz0:t1;u1:t)dxt:Thetermp(ztjxt)isgivenbyEqn.(4),andp(xtjz1:t1;u1:t1)=N(t;t),wheretandtarequantitiescomputedbytheEKF.ThusthisapproachalsorunstheEKFtoevaluateitsperformancecriterion.Howeversincenogroundtruthdataisusedhere,theperformancecriterionisnotpredictiveperformanceforthestatesequence(whichiswhatweultimatelycareabout),butmerelypredictiveperformanceontheobservationsz0:T.E.OptimizingThePerformanceAfterSmoothingThetwodiscriminativecriteriaofSectionsIII-BandIII-CevaluatetheperformanceofthecovariancematriceshR;QiasusedintheEKF.Thesecriteriacaneasilybeextendedtothesmoothingsetting.(See,e.g.,[8]fordetailsonsmoothing.)Inparticularlet~tbethestateestimatesasobtainedfromthesmoother,thenthesmootherequivalentofEqn.(7)is:hRressm;Qressmi=argminR;QTXt=0kyth(~t)k2:Thesmootherlikelihoodobjectiveisgivenbyconditioningonallobservations(insteadofonlyuptotimetasintheltercase).SothesmootherequivalentofEqn.(8)is:hRpredsm;Qpredsmi=argmaxR;QTXt=0logp(ytjz0:T;u1:T):Thesmootherlikelihoodobjectiveiscloselyrelatedtothetrainingcriteriausedforconditionalrandomeldswhicharewidelyusedinmachinelearningtopredictasequenceoflabels(states)fromallobservations.(See,e.g.,[6]and[4]fordetails.)ThetwocriteriaproposedinthissectionareoptimizingthecovariancematriceshR;Qiforsmootherperformance,notlterperformance.Soweexpecttheresultingcovariancema-triceshR;Qi(althoughgoodforsmoothing)nottobeoptimalforuseinthelter.Thisisconrmedinourexperiments.F.TrainingTheprevioustextestablishedanumberofcriteriafortrain-ingcovariancematrices;infact,thecriteriamakeitpossibletoalsotunethefunctionsfandg,butwefoundthistobeoflesserimportanceinourwork.Thetrainingalgorithmusedinallourexperimentsisacoordinateascentalgorithm:GiveninitialestimatesofRandQ,thealgorithmrepeatedlycyclesthrougheachoftheentriesofRandQ.Foreachentryp,theobjectiveisevaluatedwhendecreasingandincreasingtheentryby ppercent.Ifthechangeresultsinabetterobjective,thechangeisacceptedandtheparameter pisincreasedbytenpercent,otherwise pisdecreasedbyftypercent.Initiallywehave p=10.Wendempiricallythatthisalgorithmconvergesreliablywithin20-50iterations.IV.EXPERIMENTSWecarriedoutexperimentsontherobotshowninFig-ure1.Thisisadifferentialdriverobotdesignedforoff-roadnavigation.Forstateestimation,itisinstrumentedwithalowcostGPSunit;alowcostinertialmeasurementunit(IMU)consistingof3accelerometersformeasuringlinearaccelerations,and3gyroscopesformeasuringrotationalveloc-ities;amagnetometer(magneticcompass);andopticalwheelencoders(tomeasureforwardvelocity,assumingrigidcontactwiththeground).TheGPSisWAASenabled,andreturnspositionestimatesat1Hzwithatypicalpositionaccuracyofabout3meters.ThesevehicleswerebuiltbyCarnegieMellonUniversityforacompetitioninwhicheachteamobtainsanidenticalcopyofthevehicle,whichcanbeusedforsoftwaredevelopment.Thesoftwaredevelopedbyeachteamwillthenbetestedonaseparate(butidentical)vehicleataCarnegieMellonsite.Sincewehaveourownvehicle,weareabletoinstallanaccurateGPSunitontoittogetadditional,moreaccurate,stateestimatesduringdevelopmenttime.Specically,wemountedontoourvehicleaNovatelRT2differentialGPSunit,whichgivespositionestimatesytat20Hztoabout2cmaccuracy.WhilewecouldusetheaccurateGPSunitfordevelopment,thehardwareonwhichouralgorithmswillbeevaluatedwillnothavethemoreaccurateGPS.Thevehiclealsocomeswithacarefullyhand-tunedEKF.Sincethispre-existingEKFwasbuiltbyahighlyexperiencedteamofroboticistsatCarnegieMellon(notafliatedwiththeauthors),webelievethatitrepresentsanapproximateupper-boundontheperformancethatcanreasonablybeexpectedinasystembuiltbyhand-tweakingparameters(withoutusinggroundtruthdata).Wethereforeevaluateourlearningalgo-rithmsagainstthishand-designedEKF.Thestateofthevehicleisrepresentedasavedimensionalvector,includingitsmapcoordinatesxtandyt,orientationt,forwardvelocityvtandheadinggyrobiasbt.[2]ThemeasurementerrorofagyroscopeiscommonlycharacterizedashavingaGaussianrandomcomponent,andanadditivebiastermthatvariesslowlyovertime.Neglectingtomodelthebiasofthegyroscopewillleadtocorrelatederrorinthe robot'sheadingovertime,whichwillresultinpoorestimationperformance.Moreformally,ourEKF'sstateupdateequationsaregivenby:xt=xt1+tvt1cost1+"forcost1"lattsint1;yt=yt1+tvt1sint1+"fortsint1+"latcost1;t=t1+t(rt+bt)+";vt=vt1+tat+"v;bt=bt1+"b:Here"forand"lattarethepositionnoiseintheforwardandlateraldirectionwithrespecttothevehicle.Thecontrolisut=(rtat)�,wherertistherotationalvelocitycommand,andatistheforwardacceleration.Theobservationequationsaregivenby~xt=xt+xt;~yt=yt+yt;~t=t+t;~vt=vt+vt:Inourmodel,"tisazeromeanGaussiannoisevari-ablewithcovariancediag(for;lat;;v;b).Similarly,tisazeromeanGaussiannoisevariablewithcovariancediag(\rx;\ry;\r;\rv).Inourexperiments,thenineparametersfor;lat;;v;\rx;\ry;\r;\rvweretusingthelearningalgorithms.Furthermore,ourmodelassumedthat\rx=\ry.Ourexperimentalprotocolwasasfollows.Wecollectedtwosetsofdata(100seach)ofdrivingthevehiclearoundagrasseld,andusedonefortraining,theotherfortesting.Becausetheobservationsytdonotcontainthecompletestate(butonlypositioncoordinates),the“naiveapproach”ofmaximizingthejointlikelihoodisnotdirectlyapplicable.Howeverthehighlyaccuratepositionestimatesallowustoextractreasonablyaccurateestimatesoftheotherstatevariables.3Usingthesestateestimatesasasubstitutefortherealstates,weestimatethecovarianceshRjoint;Qjointiusingthejointlikelihoodcri-terion.TheestimateshRjoint;Qjointiareusedforinitializationwhenusingtheothercriteria(whichdonothaveclosedformsolutions).Weevaluateouralgorithmsontestdatausingtwoerrormetrics.TherstistheRMSerrorintheestimateofthe3Morespeci®cally,werananextendedKalmansmoothertoobtainesti-matesfort;vt;bt.Thissmootherusedveryhighvariancesforthemeasured~t;~vt,andverysmallvariancesforthepositionmeasurements~xt;~yt.Thesmootheralsoassumedveryhighprocessnoisevariances,exceptforthegyrobiasterm.Thischoiceofvariancesensuresthesmootherextractsstateestimatesthatareconsistentwiththehighlyaccuratelymeasuredpositioncoordinates.Theresultsfromthesmootherwerenotverysensitivetotheexactchoiceofthevariances.Inthereportedexperiments,weuseddiag(1;1;1;1;:0012)fortheprocessnoiseanddiag(:022;:022;102;102)forthemeasurementnoise.vehicle'sposition(cf.Eqn.7): 1TTXt=1jjh(t)ytjj2!1=2:Above,tistheEKFestimateofthefullstateattimet,andh(t)istheEKFestimateofthe2Dcoordinatesofthevehicleattimet.Theseconderrormetricisthepredictionlog-loss1TTXt=1logp(ytjz0:t;u1:t):FollowingthediscussioninSectionIII.C,themaindifferencebetweenthesetwometricsisinwhetheritdemandsthattheEKFgivesaccuratecovarianceestimates.ThehighlyaccurateGPSoutputspositionmeasurementsat20Hz.Thisisalsothefrequencyatwhichthebuilt-inhand-tunedlteroutputsitsstateestimates.Weusethecorrespondingtimediscretizationt=:05sforourlter.Eachofourlearningalgorithmstookabout20-30minutestoconverge.Ourresultsareasfollows(smallervaluesarebetter):4LearningAlgorithmRMSerrorlog-lossJoint0.286623.5834Res0.27041.0647Pred0.2940-0.1671Meas0.294360.2660Res-sm0.32292.9895Pred-sm0.58310.4793CMUhand-tuned0.39010.7500Inthistable,“Res”standsforthealgorithmminimizingtheresidualpredictionerror(hRres;Qresi),etc.AsexpectedthelterslearnedusingthesmoothercriteriaofSectionIII-E(Res-sm,Pred-sm)areoutperformedbythelterslearnedusingthecorrespondingltercriteria(Res,Pred).Sofromhereon,wewillnotconsiderthelterslearnedusingthesmoothercriteria.Weseethatthehand-tunedEKFhadanRMSerrorofabout40cminestimatingthepositionofthevehicle,andthatallofourlearnedltersobtainsignicantlybetterperformance.Usingtheparameterslearnedbymaximizingthepredictionlikelihood(hRpred;Qpredi),wealsoobtainbetterlog-loss(negativeloglikelihood).Minimizingtheresidualpredictionerroronthetrainingdataresultsinsmallestresidualerroronthetestdata.Similarly,minimizingthelog-loss(or,equiva-lently,maximizingthepredictionlikelihood)onthetrainingdataresultsinsmallestlog-lossonthetestdata.Thus,dis-criminativetrainingallowsustosuccessfullyoptimizeforthecriteriawecareabout.Wealsonoticethat,althoughthelterstrainedbyjointlikelihoodmaximizationandmeasurementlikelihoodmaximizationhavesmallRMSerror,theyperformpoorlyonthelog-losscriterion.Thiscanbeexplainedby4Allresultsreportedareaveragedovertwotrials,inwhichhalfofdataisusedfortraining,andtheotherhalffortesting. 0510150246810X(m)Y(m)Zoomed in on this areaFig.2.Typicalstateestimationresults.Plotshowsgroundtruthtrajectory(blacksolidline);on-board(inexpensive)GPSmeasurements(blacktriangles);estimatedstateusingthe®lterlearnedbyminimizingresidualpredictionerror(bluedash-dottedline);estimatedstateusingthe®lterlearnedmaximizingthepredictionlikelihood(greendashedline);andestimatedstateusingtheCMUhand-tuned®lter(reddottedline).(Colorswhereavailable.)4681067891011X(m)Y(m)Fig.3.Close-upofpartofFigure2.(Samelegendasprevious®gure.) correlatednoise.Morespecically,correlatednoisecausesthemodeltrainedbymaximizingthejointlikelihoodtobeovercondentaboutitsstateestimates,whichresultsinthelargerlog-loss.Theeffectofcorrelatednoiseonthemodeltrainedbymaximizingthemeasurementlikelihoodisevenmoresignicant.Themodellearnsverysmallmeasurementvariances,whichallowittotrackthemeasurementsmoreclosely.Unfortunately,inthecaseofcorrelatednoise,trackingthemeasurementsmorecloselydoesnotmeantrackingthestateofthesystemmoreclosely.Thesmallmeasurementvariancesresultinsignicantovercondenceinthestateestimates,whichcausesthelog-losstobesignicantlyhigher.Figure2showsatypicaltrajectorytakenbythevehicle,aswellastheestimatesfromtwoofthelearnedltersandtheCMUhand-tunedlter.Itisvisuallyfairlyclearfromthegurethatthelearnedltersaremoreaccuratelytrackingthegroundtruthtrajectorythanthehand-tunedlter.Figure3showsaclose-upofpartofFigure2.Toreduceclutter,wehaveplottedonlytheoutputoftwoofthesixlearnedltershere;howeveralllearnedltershaveoutputsthatlookvisuallyverysimilar.Oneexceptionisthat—asexplainedinthepreviousparagraph—lterslearnedbymaximizingthemeasurementlikelihoodtendtofollowthe(oftennoisy)GPSmeasurementsmoreclosely.V.CONCLUSIONWehavepresentedahighlyrelatedfamilyofalgorithmsfortrainingthenoiseparametersofanEKF.Allalgorithmsfollowthesameidea:AdjustthecovariancesoftheEKFinawaythatmaximizestheirpredictiveaccuracy.Experimentalresultsshowthatthistrainingroutinehastwomajoradvantagesoverpreviousbestpractice.First,iteliminatestheneedforahand-tuningphase,therebymakingiteasiertodevelopworkingEKFs.Second,wendthatthelearnedEKFismoreaccuratethanevenawell-tunedEKFconstructedbyhand.Amongthelearnedlters,thebestresultswereobtainedbyusingdiscriminativetraining,whichevaluatescandidatecovariancesbyevaluatingthepredictiveperformanceoftheEKFwhenusingthesecovariances.InourexperimentswecomparetoacommercialEKF,providedaspartofarobotdevelopedforamajorDARPAprogram.Clearly,itisdifculttoassesshowmuchtuningwentintotheEKF,andwhetherthisisactuallyasgoodascanbedonethroughmanualtweaking.However,theEKFiscentraltotheapplicationofthisroboticsystem,andweexpectthedevelopmentteamspentatleastafewdaysdevelopingthisEKF.OurapproachoutperformsthisEKFbyalargemargin,basedonafewminutesofdataandafewminutesoflearning.Thissuggeststhatourapproachmayyieldbetterresultswithlessdevelopmenttime.WenotethatourtrainingapproachisalsoapplicabletobroaderproblemsofEKFtraining.Inparticular,wechosenottolearnthephysicalmodelasexpressedinthestatetransitionandthemeasurementfunctions.However,givenanappropriateparameterizationofthesefunctions,itappearstobefeasibletotunethosefunctionsaswell.However,itremainsanopenquestiontowhichextentover-ttingposesaproblemwhendoingsoinpractice.Theholistictrainingalgorithmspresentedinthispaperarehighlyrelatedtoanongoingdebateintheeldofmachinelearningonusingdiscriminativevs.generativealgorithmsforsupervisedlearning.There,theconsensus(assumingthereisampletrainingdata)seemstobethatitisusuallybettertodirectlyminimizethelosswithrespecttotheultimateperformancemeasure,ratherthananintermediatelossfunctionsuchasthelikelihoodofthetrainingdata;see,e.g.,[14],[9],[1].Thisisbecausethemodel—nomatterhowcomplicated—isalmostalwaysnotcompletely“correct”fortheproblemdata.Byanalogy,whenchoosingthenoiseparametersforanEKF,weareinterestedinchoosingparametersthatleadtotheEKFoutputtingaccuratestateestimates,ratherthannecessarilychoosingthenoiseparametersthatmostcorrectlyreectseachmeasurement'struevariance(suchaswouldbeobtainedfromthemaximumlikelihoodestimateorfrommostmanufacturerspecs,asdiscussedabove).ACKNOWLEDGMENTSWegivewarmthankstoAndrewLookingbillandDavidLiebforcollectingthedatafromtheLAGRrobotplatform.ThisworkwassupportedbytheDARPALAGRprogramundercontractnumberFA8650-04-C-7134.REFERENCES[1]PieterAbbeelandAndrewY.Ng.Learning®rstorderMarkovmodelsforcontrol.InNIPS17,2005.[2]J.A.FarrellandM.Barth.TheGlobalPositioningSystemandInertialNavigation.McGrawHill,1998.[3]ArthurGelb,editor.AppliedOptimalEstimation.MITPress,1974.[4]ShamKakade,YeeWhyeTeh,andSamRoweis.AnalternativeobjectivefunctionforMarkovian®elds.InProc.ICML,2002.[5]RudolphE.Kalman.Anewapproachtolinear®lteringandpredictionproblems.TransactionsoftheASME–JournalofBasicEngineering,82(SeriesD):35–45,1960.[6]JohnLafferty,AndrewMcCallum,andFernandoPereira.Conditionalrandom®elds:Probabilisticmodelsforsegmentingandlabelingse-quencedata.InProc.ICML,2001.[7]PeterS.Maybeck.StochasticModels,Estimation,andControl,vol-ume1.AcademicPress,1982.[8]PeterS.Maybeck.StochasticModels,Estimation,andControl,vol-ume2.AcademicPress,1982.[9]A.Y.NgandM.I.Jordan.Ondiscriminativevs.generativeclassi®ers:AcomparisonoflogisticregressionandnaiveBayes.InNIPS14,2002.[10]A.Y.Ng,A.Coates,M.Diel,V.Ganapathi,J.Schulte,B.Tse,E.Berger,andE.Liang.Autonomousinvertedhelicopter¯ightviareinforcementlearning.InProceedingsoftheInternationalSymposiumonExperi-mentalRobotics(ISER),Singapore,2004.SpringerTractsinAdvancedRobotics(STAR).[11]WilliamH.Press,SaulA.Teukolsky,WilliamT.Vetterling,andBrianFlannery.NumericalRecipesinC.CambridgeUniversityPress,1992.[12]SamRoweisandZoubinGhahramani.AunifyingreviewoflinearGaussianmodels.Technicalreport,UniversityofToronto,6King'sCollegeRoad,TorontoM5S3H5,Canada,1997.[13]S.Thrun,W.Burgard,andD.Fox.ProbabilisticRobotics.MITPress,Cambridge,MA,2005.[14]V.N.Vapnik.StatisticalLearningTheory.JohnWiley&Sons,1998.[15]B.ZadroznyandC.Elkan.ObtainingcalibratedprobabilityestimatesfromdecisiontreesandnaiveBayesianclassi®ers.InProc.ICML,2001.