King DAVISKING USERS SOURCEFORGE NET Northrop Grumman ES ATR and Image Exploitation Group Baltimore Maryland USA Editor Soeren Sonnenburg Abstract There are many excellent toolkits which provide support for developing machine learning soft ware in P ID: 48656
Download Pdf The PPT/PDF document "Journal of Machine Learning Research ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
JournalofMachineLearningResearch10(2009)1755-1758Submitted10/08;Revised4/09;Published7/09Dlib-ml:AMachineLearningToolkitDavisE.KingDAVISKING@USERS.SOURCEFORGE.NETNorthropGrummanES,ATRandImageExploitationGroupBaltimore,Maryland,USAEditor:SoerenSonnenburgAbstractTherearemanyexcellenttoolkitswhichprovidesupportfordevelopingmachinelearningsoft-wareinPython,R,Matlab,andsimilarenvironments.Dlib-mlisanopensourcelibrary,targetedatbothengineersandresearchscientists,whichaimstoprovideasimilarlyrichenvironmentfordevelopingmachinelearningsoftwareintheC++language.Towardsthisend,dlib-mlcontainsanextensiblelinearalgebratoolkitwithbuiltinBLASsupport.ItalsohousesimplementationsofalgorithmsforperforminginferenceinBayesiannetworksandkernel-basedmethodsforclassi-cation,regression,clustering,anomalydetection,andfeatureranking.Toenableeasyuseofthesetools,theentirelibraryhasbeendevelopedwithcontractprogramming,whichprovidescompleteandprecisedocumentationaswellaspowerfuldebuggingtools.Keywords:kernel-methods,svm,rvm,kernelclustering,C++,Bayesiannetworks1.IntroductionDlib-mlisacrossplatformopensourcesoftwarelibrarywrittenintheC++programminglanguage.Itsdesignisheavilyinuencedbyideasfromdesignbycontractandcomponent-basedsoftwareengineering.Thismeansitisrstandforemostacollectionofindependentsoftwarecomponents,eachaccompaniedbyextensivedocumentationandthoroughdebuggingmodes.Moreover,thelibraryisintendedtobeusefulinbothresearchandrealworldcommercialprojectsandhasbeencarefullydesignedtomakeiteasytointegrateintoauser'sC++application.Thereareanumberofwellknownmachinelearninglibraries.However,manyoftheselibrariesfocusonprovidingagoodenvironmentfordoingresearchusinglanguagesotherthanC++.TwoexamplesofthiskindofprojectaretheShogun(Sonnenburgetal.,2006)andTorch(CollobertandBengio,2001)toolkitswhich,whiletheyareimplementedinC++,arenotfocusedonprovid-ingsupportfordevelopingmachinelearningsoftwareinthatlanguage.InsteadtheyareprimarilyintendedtobeusedwithlanguageslikeR,Python,Matlab,orLua.ThentherearetoolkitssuchasShark(Igeletal.,2008)anddlib-mlwhichareexplicitlytargetedatuserswhowishtodevelopsoftwareinC++.Giventheseconsiderations,dlib-mlattemptstohelpllsomeofthegapsintoolsupportnotalreadylledbylibrariessuchasShark.Itishopedthattheseeffortswillproveusefulforresearchersandengineerswhowishtodevelopmachinelearningsoftwareinthislanguage.c\r2009DavisE.King. KING Figure1:Elementsofdlib-ml.Arrowsshowdependenciesbetweencomponents.2.ElementsoftheLibraryThelibraryiscomposedofthefourdistinctcomponentsshowninFigure1.Thelinearalgebracomponentprovidesasetofcorefunctionalitywhiletheotherthreeimplementvarioususefultools.Thispaperaddressesthetwomaincomponents,linearalgebraandmachinelearningtools.2.1LinearAlgebraThedesignofthelinearalgebracomponentofthelibraryisbasedonthetemplateexpressiontech-niquespopularizedbyVeldhuizenandPonnambalam(1996)intheBlitz++numericalsoftware.ThistechniqueallowsanauthortowritesimpleMatlab-likeexpressionsthat,whencompiled,ex-ecutewithspeedcomparabletohand-optimizedCcode.Thedlib-mlimplementationextendsthisoriginaldesigninanumberofways.Mostnotably,thelibrarycanusetheBLASwhenavailable,meaningthattheperformanceofcodedevelopedusingdlib-mlcangainthespeedofhighlyopti-mizedlibrariessuchasATLASortheIntelMKLwhilestillusingaverysimplesyntax.Considerthefollowingexampleinvolvingmatrixmultiplies,transposes,andscalarmultiplications:(1)result=3*trans(A*B+trans(A)*2*B);(2)result=3*trans(B)*trans(A)+6*trans(B)*A;Theresultofexpression(1)couldbecomputedusingonlytwocallstothematrixmultiplyroutineinBLASbutrstitisnecessarytoreorderthetermsintoform(2)tottheformexpectedbytheBLASroutines.Performingthesetransformationsbyhandistediousanderrorprone.Dlib-mlautomaticallyperformsthesetransformationsonallexpressionsandinvokestheappropriateBLAScalls.Thisenablestheusertowriteequationsintheformmostintuitivetothemandleavethesedetailsofsoftwareoptimizationtothelibrary.ThisisafeaturenotfoundinthesupportingtoolsofotherC++machinelearninglibraries.2.2MachineLearningToolsAmajordesigngoalofthisportionofthelibraryistoprovideahighlymodularandsimplearchi-tecturefordealingwithkernelalgorithms.Inparticular,eachalgorithmisparameterizedtoallowausertosupplyeitheroneofthepredeneddlib-mlkernels,oranewuserdenedkernel.Moreover,theimplementationsofthealgorithmsaretotallyseparatedfromthedataonwhichtheyoperate.1756 DLIB-ML:AMACHINELEARNINGTOOLKITThismakesthedlib-mlimplementationgenericenoughtooperateonanykindofdata,beitcolumnvectors,images,orsomeotherformofstructureddata.Allthatisnecessaryisanappropriatekernel.Thisisafeatureuniquetodlib-ml.Manylibrariesallowarbitraryprecomputedkernelsandsomeevenallowuserdenedkernelsbuthaveinterfaceswhichrestrictthemtooperatingoncolumnvectors.However,noneallowtheexibilitytooperatedirectlyonarbitraryobjects,makingitmucheasiertoapplycustomkernelsinthecasewherethekernelsoperateonobjectsotherthanxedlengthvectors.ThelibraryprovidesimplementationsofpopularalgorithmssuchasRBFnetworksandsupportvectormachinesforclassication.ItalsoincludesalgorithmsnotpresentinothermajorMLtoolkitssuchasrelevancevectormachinesforclassicationandregression(TippingandFaul,2003).Allofthesealgorithmsareimplementedasgenerictrainerobjectswithastandardinterface.Thisdesignallowstrainerobjectstobeusedbyanumberofgenericmeta-algorithmsthatdotaskssuchasperformingcrossvalidation,reducingthenumberofoutputsupportvectors(SuttorpandIgel,2007),orttingasigmoidtotheoutputdecisionfunctiontomakedecisionsinterpretableinprobabilisticterms(Platt,1999).Thisgenerictrainerinterface,alongwiththecontractprogrammingapproach,makesthelibraryeasilyextensiblebyotherdevelopers.AnothergoodexampleofagenerickernelalgorithmprovidedbythelibraryisthekernelRLStechniqueintroducedbyEngeletal.(2004).Itisakernelizedversionofthefamousrecursiveleastsquareslter,andfunctionsasanexcellentonlineregressionmethod.Withit,Engelintroducedasimplebutveryeffectivetechniqueforproducingsparseoutputsfromkernellearningalgorithms.Engel'ssparsicationtechniqueisalsousedbyoneofdlib-ml'smostversatiletools,thekcen-troidobject.Itisageneralutilityforrepresentingaweightedsumofsamplepointsinakernelinducedfeaturespace.Itcanbeusedtoeasilykernelizeanyalgorithmthatrequiresonlytheabilitytoperformvectoraddition,subtraction,scalarmultiplication,andinnerproducts.Thekcentroidobjectenablesthelibrarytoprovideanumberofusefulkernel-basedmachinelearningalgorithms.Themoststraightforwardofwhichisonlineanomalydetection,whichsimplymarksdatasamplesasnoveliftheirdistancefromthecentroidofapreviouslyobservedbodyofdataislarge(e.g.,3standarddeviationsfromthemeandistance).Asimilarlysimplebutstillpowerfulapplicationisinfeatureranking,wherefeaturesareconsideredgoodiftheirinclusionresultsinalargedistancebetweenthecentroidsofdifferentclassesofdata.Anotherstraightforwardapplicationofthistechniqueisinkernelizedclusteranalysis.Usingthekcentroiditiseasytocreatesparsekernelclusteringalgorithms.Todemonstratethis,thelibrarycomeswithasparsekernelk-meansalgorithm.Finally,dlib-mlcontainstwoSVMsolvers.OneisessentiallyareimplementationofLIB-SVM(ChangandLin,2001)butwiththegenericparameterizedkernelapproachusedintherestofthelibrary.ThissolverhasroughlythesameCPUandmemoryutilizationcharacteristicsasLIBSVM.TheotherSVMsolverisakernelizedversionofthePegasosalgorithmintroducedbyShalev-Shwartzetal.(2007).Itisbuiltusingthekcentroidandthusproducessparseoutputs.3.AvailabilityandRequirementsThelibraryisreleasedundertheBoostSoftwareLicense,allowingittobeincorporatedintobothopen-sourceandcommercialsoftware.Itrequiresnoadditionallibraries,doesnotneedtobecon-guredorinstalled,andisfrequentlytestedonMSWindows,LinuxandMacOSXbutshouldworkwithanyISOC++compliantcompiler.1757 KINGNotethatdlib-mlisasubsetofalargerprojectnameddlibhostedathttp://dclib.sourceforge.net.Dlibisageneralpurposesoftwaredevelopmentlibrarycontainingagraphicalapplicationforcreat-ingBayesiannetworksaswellastoolsforhandlingthreads,networkI/O,andnumerousothertasks.Dlib-mlisavailablefromthedlibproject'sdownloadpageonSourceForge.ReferencesChih-ChungChangandChih-JenLin.LIBSVM:ALibraryforSupportVectorMachines,2001.Softwareavailableathttp://www.csie.ntu.edu.tw/cjlin/libsvmRonanCollobertandSamyBengio.Svmtorch:supportvectormachinesforlarge-scaleregressionproblems.J.Mach.Learn.Res.,1:143160,2001.ISSN1533-7928.YaakovEngel,ShieMannor,andRonMeir.Kernelrecursiveleastsquares.IEEETransactionsonSignalProcessing,52:22752285,2004.ChristianIgel,TobiasGlasmachers,andVerenaHeidrich-Meisner.Shark.JournalofMachineLearningResearch,9:993996,2008.JohnC.Platt.Probabilisticoutputsforsupportvectormachinesandcomparisonstoregularizedlikelihoodmethods.InAdvancesinLargeMarginClassiers,pages6174.MITPress,1999.ShaiShalev-Shwartz,YoramSinger,andNathanSrebro.Pegasos:Primalestimatedsub-gradientsolverforsvm.InICML'07,pages807814,NewYork,NY,USA,2007.ACM.S¨orenSonnenburg,GunnarR¨atsch,ChristinSch¨afer,andBernhardSch¨olkopf.Largescalemultiplekernellearning.J.Mach.Learn.Res.,7:15311565,2006.ISSN1533-7928.ThorstenSuttorpandChristianIgel.Resilientapproximationofkernelclassiers.volume4668ofLectureNotesinComputerScience,pages139148.Springer,2007.MichaelE.TippingandAnitaC.Faul.FastmarginallikelihoodmaximisationforsparseBayesianmodels.InProceedingsoftheNinthInternationalWorkshoponArticialIntelligenceandStatis-tics,pages36,2003.ToddVeldhuizenandKumaraswamyPonnambalam.LinearalgebrawithC++templatemetapro-grams.Dr.Dobb'sJournalofSoftwareTools,21(8):3844,1996.1758