LeischRprojectorg September 14 2009 This is a reprint of an article that has appeared in Paula Brito editor Compstat 2008Proceedings in Computational Statistics Physica Verlag Heidelberg Germany 2008 Abstract This tutorial gives a practical introduc ID: 4878
Download Pdf The PPT/PDF document "Creating R Packages A Tutorial Friedrich..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
lastbutnotleastthepackagingsystemhastoolsforsoftwarevalidationwhichcheckthatdoc-umentationexistsandistechnicallyinsyncwiththecode,spotcommonerrors,andcheckthatexamplesactuallyrun.BeforewestartcreatingRpackageswehavetoclarifysometermswhichsometimesgetcon-fused:Package:AnextensionoftheRbasesystemwithcode,dataanddocumentationinstandardizedformat.Library:Adirectorycontaininginstalledpackages.Repository:Awebsiteprovidingpackagesforinstallation.Source:Theoriginalversionofapackagewithhuman-readabletextandcode.Binary:Acompiledversionofapackagewithcomputer-readabletextandcode,mayworkonlyonaspecicplatform.Basepackages:PartoftheRsourcetree,maintainedbyRCore.Recommendedpackages:PartofeveryRinstallation,butnotnecessarilymaintainedbyRCore.Contributedpackages:Alltherest.Thisdoesnotmeanthatthesepackagesarenecessarilyoflesserqualitythantheabove,e.g.,manycontributedpackagesonCRANarewrittenandmaintainedbyRCoremembers.Wesimplytrytokeepthebasedistributionasleanaspossible.SeeRDevelopmentCoreTeam(2008a)forthefullmanualonRinstallationissues,andLigges(2003)forashortintroduction.Theremainderofthistutorialisorganizedasfollows:Asexampleweimplementafunctionforlinearregressionmodels.Werstdeneclassesandmethodsforthestatisticalmodel,includingaformulainterfaceformodelspecication.WethenstructurethepiecesintotheformofanRpackage,showhowtodocumentthecodeproperlyandnallydiscussthetoolsforpackagevalidationanddistribution.ThistutorialismeantasastartingpointonhowtocreateanRpackage,seeRDevelopmentCoreTeam(2008b)forthefullreferencemanual.NotethatRimplementsadialectoftheSprogramminglanguage(Beckeretal1988),inthefollowingwewillprimarilyusethename\S"whenwespeakofthelanguage,and\R"whenthecompleteRsoftwareenvironmentismeantorextensionsoftheSlanguagewhichcanonlybefoundinR.2RcodeforlinearregressionAsrunningexampleinthistutorialwewilldevelopRcodeforthestandardlinearregressionmodely=x0+;N(0;2)OurgoalisnottoimplementallthebellsandwhistlesofthestandardRfunctionlm()fortheproblem,buttowriteasimplefunctionwhichcomputestheOLSestimateandhasa\professionallookandfeel"inthesensethattheinterfaceissimilartotheinterfaceoflm().IfwearegivenadesignmatrixXandresponsevectorythentheOLSestimateisofcourse^=(X0X)1X0ywithcovariancematrixvar(^)=2(X0X)1Fornumericalreasonsitisnotadvisabletocompute^usingtheaboveformula,itisbettertouse,e.g.,aQRdecompositionoranyothernumericallygoodwaytosolvealinearsystemofequations(e.g.,Gentle1998).Hence,aminimalRfunctionforlinearregressionis2 Thenumericalestimatesareexactlythesame,butourcodelacksaconvenientuserinterface:1.Prettierformattingofresults.2.Addutilitiesforttedmodellikeasummary()functiontotestforsignicanceofparameters.3.Handlecategoricalpredictors.4.Useformulasformodelspecication.Objectorientedprogramminghelpsuswithissues1and2,formulaswith3and4.3ObjectorientedprogramminginR3.1S3andS4OurfunctionlinmodEstreturnsalistwithfournamedelements,theparameterestimatesandtheircovariancematrix,andthestandarddeviationanddegreesoffreedomoftheresiduals.Fromthecontextitiscleartousthatthisisalinearmodelt,however,nobodytoldthecomputersofar.Forthecomputerthisissimplyalistcontainingavector,amatrixandtwoscalarvalues.Manyprogramminglanguages,includingS,useso-called1.classestodenehowobjectsofacertaintypelooklike,and2.methodstodenespecialfunctionsoperatingonobjectsofacertainclassAclassdeneshowanobjectisrepresentedintheprogram,whileanobjectisaninstanceoftheclassthatexistsatruntime.Inourcasewewillshortlydeneaclassforlinearmodelts.Theclassistheabstractdenition,whileeverytimeweactuallyuseittostoretheresultsforagivendataset,wecreateanobjectoftheclass.Oncetheclassesaredenedweprobablywanttoperformsomecomputationsonobjects.Inmostcaseswedonotcarehowtheobjectisstoredinternally,thecomputershoulddecidehowtoperformthetasks.TheSwayofreachingthisgoalistousegenericfunctionsandmethoddispatch:thesamefunctionperformsdierentcomputationsdependingontheclassesofitsarguments.Sisrarebecauseitisbothinteractiveandhasasystemforobject-orientation.Designingclassesclearlyisprogramming,yettomakeSusefulasaninteractivedataanalysisenvironment,itmakessensethatitisafunctionallanguage.In\real"object-orientedprogramming(OOP)languageslikeC++orJavaclassandmethoddenitionsaretightlyboundtogether,methodsarepartofclasses(andhenceobjects).Wewantincrementalandinteractiveadditionslikeuser-denedmethodsforpre-denedclasses.Theseadditionscanbemadeatanypointintime,evenonthe yatthecommandlinepromptwhileweanalyzeadataset.Striestomakeacompromisebetweenobjectorientationandinteractiveuse,andalthoughcompromisesareneveroptimalwithrespecttoallgoalstheytrytoreach,theyoftenworksurprisinglywellinpractice.TheSlanguagehastwoobjectsystems,knowninformallyasS3andS4.S3objects,classesandmethodshavebeenavailableinRfromthebeginning,theyareinformal,yet\veryinteractive".S3wasrstdescribedinthe\WhiteBook"(Chambers&Hastie1992).S4objects,classesandmethodsaremuchmoreformalandrigorous,hence\lessinteractive".S4wasrstdescribedinthe\GreenBook"(Chambers1998).InRitisavailablethroughthemethodspackage,attachedbydefaultsinceversion1.7.0.S4providesformalobjectorientedprogrammingwithinaninteractiveenvironment.Itcanhelpalottowritecleanandconsistentcode,checksautomaticallyifobjectsconformtoclassdenitions,andhasmuchmorefeaturesthanS3,whichinturnismoreasetofnamingconventionsthanatrueOOPsystem,butitissucientformostpurposes(takealmostallofRasproof).BecauseS3ismucheasiertolearnthanS4andsucientforourpurposes,wewilluseitthroughoutthistutorial.4 e.g.,summary.factor()orprint.myvector().Ifnobarmethodisfound,S3searchesforfoo.default().Inheritancecanbeemulatedbyusingaclassvector.Letusreturntoournewclasscalled"myvector".Todeneanewprint()methodforourclass,allwehavetodoisdeneafunctioncalledprint.myvector():print.myvectorfunction(x,...){cat("Thisismyvector:\n")cat(paste(x[1:5]),"...\n")}Ifwenowhavealookatxandmyxtheyareprinteddierently:--52;倀Rx[1]000000000011111111111111111111--52;倀RmyxThisismyvector:0000000000...SoweseethatS3ishighlyinteractive,onecancreatenewclassesandchangemethodsdeni-tionsonthe y,anditiseasytolearn.Becauseeverythingisjustsolvedbynamingconventions(whicharenotcheckedbyRatruntime1),itiseasytobreaktherules.Asimpleexample:Func-tionlm()returnsobjectsofclass"lm".Methodsforthatclassofcourseexpectacertainformatfortheobjects,e.g.,thattheycontainregressioncoecientsetcasthefollowingsimpleexampleshows:--52;倀Rnolm"Thisisnotalinearmodel!"--52;倀Rclass(nolm)"lm"--52;倀RnolmCall:NULLNocoefficientsWarningmessages:1:Inx$call:$operatorisinvalidforatomicvectors,returningNULL2:Inobject$coefficients:$operatorisinvalidforatomicvectors,returningNULLThecomputershoulddetecttherealproblem:acharacterstringisnotan"lm"object.S4providesfacilitiesforsolvingthisproblem,howeverattheexpenseofasteeperlearningcurve.Inadditiontothesimplenamingconventiononhowtondobjects,thereareseveralmoreconventionswhichmakeiteasiertousemethodsinpractice:Amethodmusthavealltheargumentsofthegeneric,including...ifthegenericdoes.Amethodmusthaveargumentsinexactlythesameorderasthegeneric.Ifthegenericspeciesdefaults,allmethodsshouldusethesamedefaults.Thereasonfortheaboverulesisthattheymakeitlessnecessarytoreadthedocumentationforallmethodscorrespondingtothesamegeneric.Theusercanrelyoncommonrules,allmethodsoperate\assimilar"aspossible.Unfortunatelytherearenoruleswithoutexceptions,aswewillseeinoneofthenextsections... 1Thereischeckingdoneaspartofpackagequalitycontrol,seethesectiononRCMDchecklaterinthistutorial.6 Notethatwehaveusedthestandardnames"coefficients","fitted.values"and"residuals"fortheelementsofourclass"linmod".Asabonusonthesidewegetmethodsforseveralstandardgenericfunctionsforfree,becausetheirdefaultmethodsworkforourclass:Rcoef(mod1)ConstBwt-0.35666244.0340627Rfitted(mod1)[1]7.7114637.7114637.7114638.1148698.1148698.114869...Rresid(mod1)[1]-0.7114630-0.31146301.7885370-0.9148692-0.8148692...Thenotionoffunctionsreturninganobjectofacertainclassisusedextensivelybythemod-ellingfunctionsofS.Inmanystatisticalpackagesyouhavetospecifyalotofoptionscontrollingwhattypeofoutputyouwant/need.InSyoursttthemodelandthenhaveasetofmethodstoinvestigatetheresults(summarytables,plots,...).Theparameterestimatesofastatisticalmodelaretypicallysummarizedusingamatrixwith4columns:estimate,standarddeviation,z(ortor...)scoreandp-value.Thesummarymethodcomputesthismatrix:summary.linmodfunction(object,...){sesqrt(diag(object$vcov))tvalcoef(object)/seTABcbind(Estimate=coef(object),StdErr=se,t.value=tval,p.value=2*pt(-abs(tval),df=object$df))reslist(call=object$call,coefficients=TAB)class(res)"summary.linmod"res}TheutilityfunctionprintCoefmat()canbeusedtoprintthematrixwithappropriateroundingandsomedecoration:print.summary.linmodfunction(x,...){cat("Call:\n")print(x$call)cat("\n")printCoefmat(x$coefficients,P.value=TRUE,has.Pvalue=TRUE)}Theresultsis--52;倀Rsummary(mod1)Call:linmod.default(x=x,y=y)EstimateStdErrt.valuep.valueConst-0.356660.69228-0.51520.60728 Bwt2.636410.775903.39790.0008846***SexM-4.165402.06176-2.02030.0452578*Bwt:SexM1.676260.837332.00190.0472246*---Signif.codes:0***0.001**0.01*0.05.0.11ThelastmissingmethodsmoststatisticalmodelsinShaveareaplot()andpredict()method.Forthelatterasimplesolutioncouldbepredict.linmodfunction(object,newdata=NULL,...){if(is.null(newdata))yfitted(object)else{if(!is.null(object$formula)){##modelhasbeenfittedusingformulainterfacexmodel.matrix(object$formula,newdata)}else{xnewdata}yas.vector(x%*%coef(object))}y}whichworksformodelsttedwitheitherthedefaultmethod(inwhichcasenewdataisassumedtobeamatrixwiththesamecolumnsastheoriginalxmatrix),orformodelsttedusingtheformulamethod(inwhichcasenewdatawillbeadataframe).Notethatmodel.matrix()canalsobeuseddirectlyonaformulaandadataframeratherthanrstcreatingamodel.frame.Theformulahandlinginoursmallexampleisratherminimalistic,productioncodeusuallyhandlesmuchmorecases.Wedidnotbothertothinkabouttreatmentofmissingvalues,weights,osets,subsettingetc.Togetanideaofmoreelaboratecodeforhandlingformulas,onecanlookatthebeginningoffunctionlm()inR.5RpackagesNowthatwehavecodethatdoesusefulthingsandhasaniceuserinterface,wemaywanttoshareourcodewithotherpeople,orsimplymakeiteasiertouseourselves.Therearetwopopularwaysofstartinganewpackage:1.LoadallfunctionsanddatasetsyouwantinthepackageintoacleanRsession,andrunpackage.skeleton().Theobjectsaresortedintodataandfunctions,skeletonhelplesarecreatedforthemusingprompt()andaDESCRIPTIONleiscreated.Thefunctionthenprintsoutalistofthingsforyoutodonext.2.Createitmanually,whichisusuallyfasterforexperienceddevelopers.5.1StructureofapackageTheextractedsourcesofanRpackagearesimplyadirectory3somewhereonyourharddrive.Thedirectoryhasthesamenameasthepackageandthefollowingcontents(allofwhicharedescribedinmoredetailbelow): 3Directoriesaresometimesalsocalledfolders,especiallyonWindows.10 Figure1:Directorylistingofpackagelinmodafterrunningfunctionpackage.skeleton().12 5.3ThepackageDESCRIPTIONleAnappropriateDESCRIPTIONforourpackageisPackage:linmodTitle:LinearRegressionVersion:1.0Date:2008-05-13Author:FriedrichLeischMaintainer:FriedrichLeischrie; ric;h.Le;isch;@R-p;roje;t.o;rg00;Description:Thisisademopackageforthetutorial"CreatingRPackages"toCompstat2008inPorto.Suggests:MASSLicense:GPL-2TheleisinsocalledDebian-control-leformat,whichwasinventedbytheDebianLinuxdistri-bution(http://www.debian.org)todescribetheirpackage.EntriesareofformKeyword:Valuewiththekeywordalwaysstartingintherstcolumn,continuationlinesstartwithoneoremorespacecharacters.ThePackage,Version,License,Description,Title,Author,andMaintainereldsaremandatory,theremainingelds(Date,Suggests,...)areoptional.ThePackageandVersioneldsgivethenameandtheversionofthepackage,respectively.Thenameshouldconsistofletters,numbers,andthedotcharacterandstartwithaletter.Theversionisasequenceofatleasttwo(andusuallythree)non-negativeintegersseparatedbysingledotsordashes.TheTitleshouldbenomorethan65characters,becauseitwillbeusedinvariouspackagelistingswithonelineperpackage.TheAuthoreldcancontainanynumberofauthorsinfreetextformat,theMaintainereldshouldcontainonlyonenameplusavalidemailaddress(similartothe\correspondingauthor"ofapaper).TheDescriptioneldcanbeofarbitrarylength.TheSuggestseldinourexamplemeansthatsomecodeinourpackageusesfunctionalityfrompackageMASS,inourcasewewillusethecatsdataintheexamplesectionofhelppages.AstrongerformofdependencycanbespeciedintheoptionalDependseldlistingpackageswhicharenecessarytorunourcode.TheLicensecanbefreetext,ifyousubmitthepackagetoCRANorBioconductoranduseastandardlicense,westronglypreferthatyouuseastandardizedabbreviationlikeGPL-2whichstandsfor\GNUGeneralPublicLicenseVersion2".AlistoflicenseabbreviationsRunderstandsisgiveninthemanual\WritingRExtensions"(RDevelopmentCoreTeam2008b).ThemanualalsocontainsthefulldocumentationforallpossibleeldsinpackageDESCRIPTIONles.Theaboveisonlyaminimalexample,muchmoremeta-informationaboutapackageaswellastechnicaldetailsforpackageinstallationcanbestated.5.4RdocumentationlesThesourcesofRhelplesareinRdocumentationformatandhaveextension.Rd.TheformatissimilartoLATEX,howeverprocessedbyRandhencenotallLATEXcommandsareavailable,infactthereisonlyaverysmallsubset.ThedocumentationlescanbeconvertedintoHTML,plaintext,GNUinfoformat,andLATEX.Theycanalsobeconvertedintotheoldnro-basedShelpformat.AjointhelppageforallourfunctionsisshowninFigure2.Firstthenameofthehelppage,thenaliasesforalltopicsthepagedocuments.ThetitleshouldagainbeonlyonelinebecauseitisusedforthewindowtitleinHTMLbrowsers.Thedescriptionsshouldbeonly1{2paragraphs,ifmoretextisneededitshouldgointotheoptionaldetailssectionnotshownintheexample.RegularRuserswillimmediatelyrecognizemostsectionsofRdlesfromreadingotherhelppages.TheusagesectionshouldbeplainRcodewithadditionalmarkupformethods:13 Forregularfunctionsitisthefullheaderwithallargumentsandtheirdefaultvalues:Copy&pastefromthecodeandremovefunction.ForS3methods,usethespecialmarkup\method{generic}{class}(arguments)whichwillprintasgeneric(arguments)butmakesthetruenameandpurposeavailableforchecking.Fordatasetsitistypicallysimplydata(name).TheexamplessectionshouldcontainexecutableRcode,andautomaticallyrunningthecodeispartofcheckingapackage.Therearetwospecialmarkupcommandsfortheexamples:dontrun:Everythinginside\dontrun{}isnotexecutedbythetestsorexample().Thisisuseful,e.g.,forinteractivefunctions,functionsaccessingtheInternetetc..Donotmisuseittomakelifeeasierforyoubygivingexampleswhichcannotbeexecuted.dontshow:Everythinginside\dontshow{}isexecutedbythetests,butnotshowntotheuserinthehelppage.Thisismeantforadditionaltestsofthefunctionsdocumented.Thereareotherpossiblesections,andwaysofspecifyingequations,URLs,linkstootherRdoc-umentation,andmore.Themanual\WritingRExtensions"hasthefulllistofallRdcommands.Thepackagingsystemcancheckthatallobjectsaredocumented,thattheusagecorrespondstotheactualdenitionofthefunction,andthattheexampleswillrun.Thisenforcesaminimallevelofaccuracyonthedocumentation.ThereisanEmacsmodeforeditingRdocumentation(Rossinietal2004),andafunctionprompt()tohelpproduceit.Therearetwo\special"helples:pkgname-package:itshouldbeashortoverview,togiveareaderunfamiliarwiththepackageenoughinformationtogetstarted.Moreextensivedocumentationisbetterplacedintoapackagevignette(andreferencedfromthispage),orintoindividualmanpagesforthefunctions,datasets,orclasses.Thislecanbeusedtooverridethedefaultcontentsofhelp(package="pkgname").pkgname-internal:Popularnameforahelplecollectingfunctionswhicharenotpartofthepackageapplicationprogramminginterface(API),shouldnotdirectlybeusedbytheuserandhencearenotdocumented.OnlytheretomakeRCMDcheckhappy,youreallyshoulduseanamespaceinstead.Foroursimplepackageitmakesnosensetocreatelinmod-package.Rd,becausethereisonlyonemajorfunctionanyway.WithlinmodEstwedohaveoneinternalfunctioninourcode,whichisnotintendedtobeusedattheprompt.Onewaytodocumentthisfactistocreatealelinmod-internal.Rd,includeanaliasforlinmodEstandsaythatthisfunctionisforinternalusageonly.5.5DatainpackagesUsingexampledatafromrecommendedpackageslikeMASSisnoproblem,becauserecommendedpackagesarepartofanyRinstallationanyway.Incaseyouwanttouseyourowndata,simplycreateasubdirectorydatainyourpackage,writethedatatodiskusingfunctionsave()andcopytheresultingles(withextension.rdaor.RData)tothedatasubdirectory.Typingdata("foo")inRwilllookforleswithnamefoo.rdaorfoo.RDatainallattachedpackagesandload()therstitnds.Togetahelpleskeletonforadataset,simplytypeprompt("foo")whenfooisadataobjectpresentinyourcurrentRsession.Datainpackagescanbeinotherformats(text,csv,Scode,...),seeagain\WritingRExtensions"fordetails.15 shell$RCMDchecklinmod*checkingforworkingpdflatex...OK*usinglogdirectory'compstat-2008-tutorial/package/linmod.Rcheck'*usingRversion2.7.0(2008-04-22)*usingsessioncharset:UTF-8*checkingforfile'linmod/DESCRIPTION'...OK*checkingextensiontype...Package*thisispackage'linmod'version'1.0'*checkingpackagenamespaceinformation...OK*checkingpackagedependencies...OK*checkingifthisisasourcepackage...OK*checkingwhetherpackage'linmod'canbeinstalled...OK*checkingpackagedirectory...OK*checkingforportablefilenames...OK*checkingforsufficient/correctfilepermissions...OK*checkingDESCRIPTIONmeta-information...OK*checkingtop-levelfiles...OK*checkingindexinformation...OK*checkingpackagesubdirectories...OK*checkingRfilesfornon-ASCIIcharacters...OK*checkingRfilesforsyntaxerrors...OK*checkingwhetherthepackagecanbeloaded...OK*checkingwhetherthepackagecanbeloadedwithstateddependencies...OK*checkingwhetherthenamespacecanbeloadedwithstateddependencies...OK*checkingforunstateddependenciesinRcode...OK*checkingS3generic/methodconsistency...OK*checkingreplacementfunctions...OK*checkingforeignfunctioncalls...OK*checkingRcodeforpossibleproblems...OK*checkingRdfiles...OK*checkingRdcross-references...OK*checkingformissingdocumentationentries...WARNINGUndocumentedcodeobjects:linmodEstAlluser-levelobjectsinapackageshouldhavedocumentationentries.Seethechapter'WritingRdocumentationfiles'inmanual'WritingRExtensions'.**checkingforcode/documentationmismatches...OK*checkingRd\usagesections...OK*creatinglinmod-Ex.R...OK*checkingexamples...OK*creatinglinmod-manual.tex...OK*checkinglinmod-manual.texusingpdflatex...OKFigure3:RunningRCMDcheckonpackagelinmod.17 ReferencesBECKER,R.A.,CHAMBERS,J.M.andWILKS,A.R.(1988):TheNewSLanguage.Chapman&Hall,London,UK.CHAMBERS,J.M.(1998):Programmingwithdata:AguidetotheSlanguage.SpringerVerlag,Berlin,Germany.CHAMBERS,J.M.andHASTIE,T.J.(1992):StatisticalModelsinS.Chapman&Hall,London,UK.GENTLE,J.E.(1998):NumericalLinearAlgebraforApplicationsinStatistics.SpringerVerlag,NewYork,USA.HORNIK,K.(2004):R:Thenextgeneration.In:J.Antoch(Ed.):Compstat2004|Proceed-ingsinComputationalStatistics.PhysicaVerlag,Heidelberg,235{249.ISBN3-7908-1554-3.LEISCH,F.(2002):Sweave:Dynamicgenerationofstatisticalreportsusingliteratedataanalysis.In:W.HardleandB.Ronz(Eds.):Compstat2002|ProceedingsinComputationalStatistics.PhysicaVerlag,Heidelberg,575{580.ISBN3-7908-1517-9.LEISCH,F.(2003):Sweave,partII:Packagevignettes.RNews,3(2),21{24.LIGGES,U.(2003):Rhelpdesk:Packagemanagement.RNews,3(3),37{39.RDevelopmentCoreTeam(2008):R:Alanguageandenvironmentforstatisticalcomputing.RFoundationforStatisticalComputing,Vienna,Austria.ISBN3-900051-07-0.RDevelopmentCoreTeam(2008a):RInstallationandAdministration.RFoundationforStatisticalComputing,Vienna,Austria.ISBN3-900051-09-7.RDevelopmentCoreTeam(2008b):WritingRExtensions.RFoundationforStatisticalComputing,Vienna,Austria.ISBN3-900051-11-9.ROSSINI,A.J.,HEIBERGER,R.M.,SPARAPANI,R.,MACHLER,M.andHORNIK,K.(2004):Emacsspeaksstatistics:Amultiplatform,multipackagedevelopmentenvironmentforstatisticalanalysis.JournalofComputationalandGraphicalStatistics,13(1),247{261.TIERNEY,L.(2003):NamespacemanagementforR.RNews,3(1),2{6.VENABLES,W.N.andRIPLEY,B.D.(2002):ModernAppliedStatisticswithS.FourthEdition.Springer.ISBN0-387-95457-0.19