/
Journal of Machine Learning Research    Su bmitted  Re Journal of Machine Learning Research    Su bmitted  Re

Journal of Machine Learning Research Su bmitted Re - PDF document

briana-ranney
briana-ranney . @briana-ranney
Follow
472 views
Uploaded On 2015-04-04

Journal of Machine Learning Research Su bmitted Re - PPT Presentation

King DAVISKING USERS SOURCEFORGE NET Northrop Grumman ES ATR and Image Exploitation Group Baltimore Maryland USA Editor Soeren Sonnenburg Abstract There are many excellent toolkits which provide support for developing machine learning soft ware in P ID: 48656

King DAVISKING USERS SOURCEFORGE

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Journal of Machine Learning Research ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

JournalofMachineLearningResearch10(2009)1755-1758Submitted10/08;Revised4/09;Published7/09Dlib-ml:AMachineLearningToolkitDavisE.KingDAVISKING@USERS.SOURCEFORGE.NETNorthropGrummanES,ATRandImageExploitationGroupBaltimore,Maryland,USAEditor:SoerenSonnenburgAbstractTherearemanyexcellenttoolkitswhichprovidesupportfordevelopingmachinelearningsoft-wareinPython,R,Matlab,andsimilarenvironments.Dlib-mlisanopensourcelibrary,targetedatbothengineersandresearchscientists,whichaimstoprovideasimilarlyrichenvironmentfordevelopingmachinelearningsoftwareintheC++language.Towardsthisend,dlib-mlcontainsanextensiblelinearalgebratoolkitwithbuiltinBLASsupport.ItalsohousesimplementationsofalgorithmsforperforminginferenceinBayesiannetworksandkernel-basedmethodsforclassi-cation,regression,clustering,anomalydetection,andfeatureranking.Toenableeasyuseofthesetools,theentirelibraryhasbeendevelopedwithcontractprogramming,whichprovidescompleteandprecisedocumentationaswellaspowerfuldebuggingtools.Keywords:kernel-methods,svm,rvm,kernelclustering,C++,Bayesiannetworks1.IntroductionDlib-mlisacrossplatformopensourcesoftwarelibrarywrittenintheC++programminglanguage.Itsdesignisheavilyinuencedbyideasfromdesignbycontractandcomponent-basedsoftwareengineering.Thismeansitisrstandforemostacollectionofindependentsoftwarecomponents,eachaccompaniedbyextensivedocumentationandthoroughdebuggingmodes.Moreover,thelibraryisintendedtobeusefulinbothresearchandrealworldcommercialprojectsandhasbeencarefullydesignedtomakeiteasytointegrateintoauser'sC++application.Thereareanumberofwellknownmachinelearninglibraries.However,manyoftheselibrariesfocusonprovidingagoodenvironmentfordoingresearchusinglanguagesotherthanC++.TwoexamplesofthiskindofprojectaretheShogun(Sonnenburgetal.,2006)andTorch(CollobertandBengio,2001)toolkitswhich,whiletheyareimplementedinC++,arenotfocusedonprovid-ingsupportfordevelopingmachinelearningsoftwareinthatlanguage.InsteadtheyareprimarilyintendedtobeusedwithlanguageslikeR,Python,Matlab,orLua.ThentherearetoolkitssuchasShark(Igeletal.,2008)anddlib-mlwhichareexplicitlytargetedatuserswhowishtodevelopsoftwareinC++.Giventheseconsiderations,dlib-mlattemptstohelpllsomeofthegapsintoolsupportnotalreadylledbylibrariessuchasShark.Itishopedthattheseeffortswillproveusefulforresearchersandengineerswhowishtodevelopmachinelearningsoftwareinthislanguage.c\r2009DavisE.King. KING Figure1:Elementsofdlib-ml.Arrowsshowdependenciesbetweencomponents.2.ElementsoftheLibraryThelibraryiscomposedofthefourdistinctcomponentsshowninFigure1.Thelinearalgebracomponentprovidesasetofcorefunctionalitywhiletheotherthreeimplementvarioususefultools.Thispaperaddressesthetwomaincomponents,linearalgebraandmachinelearningtools.2.1LinearAlgebraThedesignofthelinearalgebracomponentofthelibraryisbasedonthetemplateexpressiontech-niquespopularizedbyVeldhuizenandPonnambalam(1996)intheBlitz++numericalsoftware.ThistechniqueallowsanauthortowritesimpleMatlab-likeexpressionsthat,whencompiled,ex-ecutewithspeedcomparabletohand-optimizedCcode.Thedlib-mlimplementationextendsthisoriginaldesigninanumberofways.Mostnotably,thelibrarycanusetheBLASwhenavailable,meaningthattheperformanceofcodedevelopedusingdlib-mlcangainthespeedofhighlyopti-mizedlibrariessuchasATLASortheIntelMKLwhilestillusingaverysimplesyntax.Considerthefollowingexampleinvolvingmatrixmultiplies,transposes,andscalarmultiplications:(1)result=3*trans(A*B+trans(A)*2*B);(2)result=3*trans(B)*trans(A)+6*trans(B)*A;Theresultofexpression(1)couldbecomputedusingonlytwocallstothematrixmultiplyroutineinBLASbutrstitisnecessarytoreorderthetermsintoform(2)tottheformexpectedbytheBLASroutines.Performingthesetransformationsbyhandistediousanderrorprone.Dlib-mlautomaticallyperformsthesetransformationsonallexpressionsandinvokestheappropriateBLAScalls.Thisenablestheusertowriteequationsintheformmostintuitivetothemandleavethesedetailsofsoftwareoptimizationtothelibrary.ThisisafeaturenotfoundinthesupportingtoolsofotherC++machinelearninglibraries.2.2MachineLearningToolsAmajordesigngoalofthisportionofthelibraryistoprovideahighlymodularandsimplearchi-tecturefordealingwithkernelalgorithms.Inparticular,eachalgorithmisparameterizedtoallowausertosupplyeitheroneofthepredeneddlib-mlkernels,oranewuserdenedkernel.Moreover,theimplementationsofthealgorithmsaretotallyseparatedfromthedataonwhichtheyoperate.1756 DLIB-ML:AMACHINELEARNINGTOOLKITThismakesthedlib-mlimplementationgenericenoughtooperateonanykindofdata,beitcolumnvectors,images,orsomeotherformofstructureddata.Allthatisnecessaryisanappropriatekernel.Thisisafeatureuniquetodlib-ml.Manylibrariesallowarbitraryprecomputedkernelsandsomeevenallowuserdenedkernelsbuthaveinterfaceswhichrestrictthemtooperatingoncolumnvectors.However,noneallowtheexibilitytooperatedirectlyonarbitraryobjects,makingitmucheasiertoapplycustomkernelsinthecasewherethekernelsoperateonobjectsotherthanxedlengthvectors.ThelibraryprovidesimplementationsofpopularalgorithmssuchasRBFnetworksandsupportvectormachinesforclassication.ItalsoincludesalgorithmsnotpresentinothermajorMLtoolkitssuchasrelevancevectormachinesforclassicationandregression(TippingandFaul,2003).Allofthesealgorithmsareimplementedasgenerictrainerobjectswithastandardinterface.Thisdesignallowstrainerobjectstobeusedbyanumberofgenericmeta-algorithmsthatdotaskssuchasperformingcrossvalidation,reducingthenumberofoutputsupportvectors(SuttorpandIgel,2007),orttingasigmoidtotheoutputdecisionfunctiontomakedecisionsinterpretableinprobabilisticterms(Platt,1999).Thisgenerictrainerinterface,alongwiththecontractprogrammingapproach,makesthelibraryeasilyextensiblebyotherdevelopers.AnothergoodexampleofagenerickernelalgorithmprovidedbythelibraryisthekernelRLStechniqueintroducedbyEngeletal.(2004).Itisakernelizedversionofthefamousrecursiveleastsquareslter,andfunctionsasanexcellentonlineregressionmethod.Withit,Engelintroducedasimplebutveryeffectivetechniqueforproducingsparseoutputsfromkernellearningalgorithms.Engel'ssparsicationtechniqueisalsousedbyoneofdlib-ml'smostversatiletools,thekcen-troidobject.Itisageneralutilityforrepresentingaweightedsumofsamplepointsinakernelinducedfeaturespace.Itcanbeusedtoeasilykernelizeanyalgorithmthatrequiresonlytheabilitytoperformvectoraddition,subtraction,scalarmultiplication,andinnerproducts.Thekcentroidobjectenablesthelibrarytoprovideanumberofusefulkernel-basedmachinelearningalgorithms.Themoststraightforwardofwhichisonlineanomalydetection,whichsimplymarksdatasamplesasnoveliftheirdistancefromthecentroidofapreviouslyobservedbodyofdataislarge(e.g.,3standarddeviationsfromthemeandistance).Asimilarlysimplebutstillpowerfulapplicationisinfeatureranking,wherefeaturesareconsideredgoodiftheirinclusionresultsinalargedistancebetweenthecentroidsofdifferentclassesofdata.Anotherstraightforwardapplicationofthistechniqueisinkernelizedclusteranalysis.Usingthekcentroiditiseasytocreatesparsekernelclusteringalgorithms.Todemonstratethis,thelibrarycomeswithasparsekernelk-meansalgorithm.Finally,dlib-mlcontainstwoSVMsolvers.OneisessentiallyareimplementationofLIB-SVM(ChangandLin,2001)butwiththegenericparameterizedkernelapproachusedintherestofthelibrary.ThissolverhasroughlythesameCPUandmemoryutilizationcharacteristicsasLIBSVM.TheotherSVMsolverisakernelizedversionofthePegasosalgorithmintroducedbyShalev-Shwartzetal.(2007).Itisbuiltusingthekcentroidandthusproducessparseoutputs.3.AvailabilityandRequirementsThelibraryisreleasedundertheBoostSoftwareLicense,allowingittobeincorporatedintobothopen-sourceandcommercialsoftware.Itrequiresnoadditionallibraries,doesnotneedtobecon-guredorinstalled,andisfrequentlytestedonMSWindows,LinuxandMacOSXbutshouldworkwithanyISOC++compliantcompiler.1757 KINGNotethatdlib-mlisasubsetofalargerprojectnameddlibhostedathttp://dclib.sourceforge.net.Dlibisageneralpurposesoftwaredevelopmentlibrarycontainingagraphicalapplicationforcreat-ingBayesiannetworksaswellastoolsforhandlingthreads,networkI/O,andnumerousothertasks.Dlib-mlisavailablefromthedlibproject'sdownloadpageonSourceForge.ReferencesChih-ChungChangandChih-JenLin.LIBSVM:ALibraryforSupportVectorMachines,2001.Softwareavailableathttp://www.csie.ntu.edu.tw/˜cjlin/libsvmRonanCollobertandSamyBengio.Svmtorch:supportvectormachinesforlarge-scaleregressionproblems.J.Mach.Learn.Res.,1:143–160,2001.ISSN1533-7928.YaakovEngel,ShieMannor,andRonMeir.Kernelrecursiveleastsquares.IEEETransactionsonSignalProcessing,52:2275–2285,2004.ChristianIgel,TobiasGlasmachers,andVerenaHeidrich-Meisner.Shark.JournalofMachineLearningResearch,9:993–996,2008.JohnC.Platt.Probabilisticoutputsforsupportvectormachinesandcomparisonstoregularizedlikelihoodmethods.InAdvancesinLargeMarginClassiers,pages61–74.MITPress,1999.ShaiShalev-Shwartz,YoramSinger,andNathanSrebro.Pegasos:Primalestimatedsub-gradientsolverforsvm.InICML'07,pages807–814,NewYork,NY,USA,2007.ACM.S¨orenSonnenburg,GunnarR¨atsch,ChristinSch¨afer,andBernhardSch¨olkopf.Largescalemultiplekernellearning.J.Mach.Learn.Res.,7:1531–1565,2006.ISSN1533-7928.ThorstenSuttorpandChristianIgel.Resilientapproximationofkernelclassiers.volume4668ofLectureNotesinComputerScience,pages139–148.Springer,2007.MichaelE.TippingandAnitaC.Faul.FastmarginallikelihoodmaximisationforsparseBayesianmodels.InProceedingsoftheNinthInternationalWorkshoponArticialIntelligenceandStatis-tics,pages3–6,2003.ToddVeldhuizenandKumaraswamyPonnambalam.LinearalgebrawithC++templatemetapro-grams.Dr.Dobb'sJournalofSoftwareTools,21(8):38–44,1996.1758