018 Date 2014065 Author Max Kuhn Steve Weston Chris Keefer Nathan Coulter C code for Cubist by Ross Quinlan Maintainer Max Kuhn Description Regression modeling using rules with added instancebased corrections Depends lattice Imports reshape2 Suggests ID: 72948
Download Pdf The PPT/PDF document "Package Cubist February Type Package T..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Package`Cubist'January10,2020TypePackageTitleRule-AndInstance-BasedRegressionModelingVersion0.2.3MaintainerMaxKuhnmxku;hn@g;mail;.com;DescriptionRegressionmodelingusingruleswithaddedinstance-basedcorrections.DependslatticeImportsreshape2,utilsSuggestsmlbench,caret,knitr,modeldata,dplyrmxku;hn@g;mail;.com;(=0.7.4),rlang,tidyrulesURLhttps://topepo.github.io/CubistBugReportshttps://github.com/topepo/Cubist/issuesLicenseGPL-3LazyLoadyesRoxygenNote7.0.2.9000VignetteBuilderknitrEncodingUTF-8NeedsCompilationyesAuthorMaxKuhn[aut,cre],SteveWeston[ctb],ChrisKeefer[ctb],NathanCoulter[ctb],RossQuinlan[aut](AuthorofimportedCcode),RulequestResearchPtyLtd.[cph](CopyrightholderofimportedCcode)RepositoryCRANDate/Publication2020-01-1017:50:23UTC1 2cubist.defaultRtopicsdocumented:cubist.default........................................2cubistControl........................................4dotplot.cubist........................................6exportCubistFiles......................................7predict.cubist........................................8summary.cubist.......................................10Index14 cubist.defaultFitaCubistmodel DescriptionThisfunctiontstherule-basedmodeldescribedinQuinlan(1992)(akaM5)withadditionalcor-rectionsbasedonnearestneighborsinthetrainingset,asdescribedinQuinlan(1993a).Usage##DefaultS3method:cubist(x,y,committees=1,control=cubistControl(),weights=NULL,...)Argumentsxamatrixordataframeofpredictorvariables.Missingdataareallowedbut(atthistime)onlynumeric,characterandfactorvaluesareallowed.yanumericvectorofoutcomecommitteesaninteger:howmanycommitteemodels(e.g..boostingiterations)shouldbeused?controloptionsthatcontroldetailsofthecubistalgorithm.SeecubistControl()weightsanoptionalvectorofcaseweights(thesamelengthasy)forhowmucheachinstanceshouldcontributetothemodelt.FromtheRuleQuestwebsite:"Therelativeweightassignedtoeachcaseisitsvalueofthisattributedividedbytheaveragevalue;ifthevalueisundened,notapplicable,orislessthanorequaltozero,thecase'srelativeweightissetto1."...optionalargumentstopass(notcurrentlyused)DetailsCubistisaprediction-orientedregressionmodelthatcombinestheideasinQuinlan(1992)andQuinlan(1993).Althoughitinitiallycreatesatreestructure,itcollapseseachpaththroughthetreeintoarule.Aregressionmodelistforeachrulebasedonthedatasubsetdenedbytherules.Thesetofrulesareprunedorpossiblycombined.andthecandidatevariablesforthelinearregressionmodelsarethe cubist.default3predictorsthatwereusedinthepartsoftherulethatwereprunedaway.Thispartofthealgorithmisconsistentwiththe"M5"orModelTreeapproach.Cubistgeneralizesthismodeltoaddboosting(whencommittees1)andinstancebasedcorrec-tions(seepredict.cubist()).Thenumberofinstancesissetatpredictiontimebytheuserandisnotneededformodelbuilding.ThisfunctionlinksRtotheGPLversionoftheCcodegivenontheRuleQuestwebsite.TheRuleQuestcodedifferentiatesmissingvaluesfromvaluesthatarenotapplicable.Currently,thispackagesdoesnotmakesuchadistinction(allvaluesaretreatedasmissing).Thiswillproduceslightlydifferentresults.Totunethecubistmodeloverthenumberofcommitteesandneighbors,thecaret::train()func-tioninthecaretpackagehasbindingstondappropriatesettingsoftheseparameters.Valueanobjectofclasscubistwithelements:data,names,modelcharacterstringsthatcorrespondtotheircounterpartsforthecommand-linepro-gramavailablefromRuleQuestoutputbasiccubistoutputcapturedfromtheCcode,includingtherules,theirterminalmodelsandvariableusagestatisticscontrolalistofcontrolparameterspassedinbytheusercomposite,neighbors,committeesmirrorsofthevaluestotheseargumentsthatwerepassedinbytheuserdimstheoutputifdim(x)splitsinformationaboutthevariablesandvaluesusedintheruleconditionscallthefunctioncallcoefsadataframeofregressioncoefcientsforeachrulewithineachcommitteevarsalistwithelementsallandusedlistingthepredictorspassedintothefunctionandusedbyanyruleormodelfitted.valuesanumericvectorofpredictionsonthetrainingset.usageadataframewiththepercentofmodelswhereeachvariablewasused.Seesummary.cubist()foradiscussion.Author(s)RcodebyMaxKuhn,originalCsourcesbyRQuinlanandmodicationsbeSteveWestonReferencesQuinlan.Learningwithcontinuousclasses.Proceedingsofthe5thAustralianJointConferenceOnArticialIntelligence(1992)pp.343-348Quinlan.Combininginstance-basedandmodel-basedlearning.ProceedingsoftheTenthInterna-tionalConferenceonMachineLearning(1993a)pp.236-243 4cubistControlQuinlan.C4.5:ProgramsForMachineLearning(1993b)MorganKaufmannPublishersInc.SanFrancisco,CAWangandWitten.Inducingmodeltreesforcontinuousclasses.ProceedingsoftheNinthEuropeanConferenceonMachineLearning(1997)pp.128-137http://rulequest.com/cubist-info.htmlSeeAlsocubistControl(),predict.cubist(),summary.cubist(),dotplot.cubist(),caret::train()Exampleslibrary(mlbench)data(BostonHousing)##1committee,sojustanM5fit:mod1cubist(x=BostonHousing[,-14],y=BostonHousing$medv)mod1##Nowwith10committeesmod2cubist(x=BostonHousing[,-14],y=BostonHousing$medv,committees=10)mod2 cubistControlVariousparametersthatcontrolaspectsoftheCubistt. DescriptionMostofthesevaluesarediscussedatlengthinhttp://rulequest.com/cubist-unix.htmlUsagecubistControl(unbiased=FALSE,rules=100,extrapolation=100,sample=0,seed=sample.int(4096,size=1)-1L,label="outcome") cubistControl5Argumentsunbiasedalogical:shouldunbiasedrulesbeused?rulesaninteger(orNA):deneanexplicitlimittothenumberofrulesused(NAlet'sCubistdecide).extrapolationanumberbetween0and100:sinceCubistuseslinearmodels,predictionscanbeoutsideoftheoutsideoftherangeseenthetrainingset.Thisparametercontrolshowmuchrulepredictionsareadjustedtobeconsistentwiththetrainingset.sampleanumberbetween0and99.9:thisisthepercentageofthedatasettoberan-domlyselectedformodelbuilding(notforout-of-bagtypeevaluation).seedanintegerfortherandomseed(intheCcode)labelalabelfortheoutcome(whenprintingrules)ValueAlistcontainingtheoptions.Author(s)MaxKuhnReferencesQuinlan.Learningwithcontinuousclasses.Proceedingsofthe5thAustralianJointConferenceOnArticialIntelligence(1992)pp.343-348Quinlan.Combininginstance-basedandmodel-basedlearning.ProceedingsoftheTenthInterna-tionalConferenceonMachineLearning(1993)pp.236-243Quinlan.C4.5:ProgramsForMachineLearning(1993)MorganKaufmannPublishersInc.SanFrancisco,CAhttp://rulequest.com/cubist-info.htmlSeeAlsocubist(),predict.cubist(),summary.cubist(),predict.cubist(),dotplot.cubist()ExamplescubistControl() 6dotplot.cubist dotplot.cubistVisualizationofCubistRulesandEquations DescriptionLatticedotplotsoftheruleconditionsorthelinearmodelcoefcientsproducedbycubist()objectsUsage##S3methodforclass'cubist'dotplot(x,data=NULL,what="splits",committee=NULL,rule=NULL,...)Argumentsxacubist()objectdatanotcurrentlyused(hereforlatticecompatibility)whateither"splits"or"coefs"committeewhichcommitteestoplotrulewhichrulestoplot...optionstopasstolattice::dotplot()DetailsForthesplits,apaneliscreatedforeachpredictor.Thex-axisistherangeofthepredictorscaledtobebetweenzeroandoneandthey-axishasalineforeachrule(withineachcommittee).Areasarecoloredasbasedontheirregion.Forexample,ifonerulehasvar110,thelinearforthisrulewouldbecolored.Ifanotherrulehadthecomplementaryregionofvar110,itwouldbeonanotherlineandshadedadifferentcolor.Forthecoefcientplot,anotherdotplotismade.Thelayoutisthesameexceptthethex-axisisintheoriginalunitsandhasadotiftheruleusedthatvariableinalinearmodel.Valuealattice::dotplot()objectAuthor(s)RcodebyMaxKuhn,originalCsourcesbyRQuinlanandmodicationsbeSteveWestonReferencesQuinlan.Learningwithcontinuousclasses.Proceedingsofthe5thAustralianJointConferenceOnArticialIntelligence(1992)pp.343-348Quinlan.Combininginstance-basedandmodel-basedlearning.ProceedingsoftheTenthInterna-tionalConferenceonMachineLearning(1993)pp.236-243 exportCubistFiles7Quinlan.C4.5:ProgramsForMachineLearning(1993)MorganKaufmannPublishersInc.SanFrancisco,CAhttp://rulequest.com/cubist-info.htmlSeeAlsocubist(),cubistControl(),predict.cubist(),summary.cubist(),predict.cubist(),lattice::dotplot()Exampleslibrary(mlbench)data(BostonHousing)##1committeeandnoinstance-basedcorrection,sojustanM5fit:mod1cubist(x=BostonHousing[,-14],y=BostonHousing$medv)dotplot(mod1,what="splits")dotplot(mod1,what="coefs")##Nowwith10committeesmod2cubist(x=BostonHousing[,-14],y=BostonHousing$medv,committees=10)dotplot(mod2,scales=list(y=list(cex=.25)))dotplot(mod2,what="coefs",between=list(x=1,y=1),scales=list(x=list(relation="free"),y=list(cex=.25))) exportCubistFilesExportCubistInformationTotheFileSystem DescriptionForattedcubistobject,textlesconsistentwiththeRuleQuestcommand-lineversioncanbeexported.UsageexportCubistFiles(x,neighbors=0,path=getwd(),prefix=NULL)Argumentsxacubist()objectneighborshowmany,ifany,neighborsshouldbeusedtocorrectthemodelpredictionspaththepathtoputthelesprefixaprex(or"lestem")forcreatingles 8predict.cubistDetailsUsingtheRuleQuestspecications,model,namesanddatalesarecreatedforusewiththecommand-lineversionoftheprogram.ValueNovalueisreturned.Threelesarewrittenout.Author(s)MaxKuhnReferencesQuinlan.Learningwithcontinuousclasses.Proceedingsofthe5thAustralianJointConferenceOnArticialIntelligence(1992)pp.343-348Quinlan.Combininginstance-basedandmodel-basedlearning.ProceedingsoftheTenthInterna-tionalConferenceonMachineLearning(1993)pp.236-243Quinlan.C4.5:ProgramsForMachineLearning(1993)MorganKaufmannPublishersInc.SanFrancisco,CAhttp://rulequest.com/cubist-info.htmlSeeAlsocubist(),predict.cubist(),summary.cubist(),predict.cubist()Exampleslibrary(mlbench)data(BostonHousing)mod1cubist(x=BostonHousing[,-14],y=BostonHousing$medv)exportCubistFiles(mod1,neighbors=8,path=tempdir(),prefix="BostonHousing") predict.cubistPredictmethodforcubistts DescriptionPredictionusingtheparametricmodelarecalculatedusingthemethodofQuinlan(1992).Ifneighborsisgreaterthanzero,thesepredictionsareadjustedbytrainingsetinstancesnearbyusingtheapproachofQuinlan(1993). predict.cubist9Usage##S3methodforclass'cubist'predict(object,newdata=NULL,neighbors=0,...)Argumentsobjectanobjectofclasscubistnewdataadataframeofpredictors(inthesameorderastheoriginaltrainingdata)neighborsanintegerfrom0to9:howmanyinstancestousetocorrecttherule-basedprediction?...otheroptionstopassthroughthefunction(notcurrentlyused)DetailsNotethatthepredictionscanfailforvariousreasons.Forexample,asshownintheexamples,ifthemodelusesaqualitativepredictorandthepredictiondatahasanewlevelofthatpredictor,thefunctionwillthrowanerror.ValueanumericvectorisreturnedAuthor(s)RcodebyMaxKuhn,originalCsourcesbyRQuinlanandmodicationsbeSteveWestonReferencesQuinlan.Learningwithcontinuousclasses.Proceedingsofthe5thAustralianJointConferenceOnArticialIntelligence(1992)pp.343-348Quinlan.Combininginstance-basedandmodel-basedlearning.ProceedingsoftheTenthInterna-tionalConferenceonMachineLearning(1993)pp.236-243Quinlan.C4.5:ProgramsForMachineLearning(1993)MorganKaufmannPublishersInc.SanFrancisco,CAhttp://rulequest.com/cubist-info.htmlSeeAlsocubist(),cubistControl(),summary.cubist(),predict.cubist(),dotplot.cubist()Exampleslibrary(mlbench)data(BostonHousing)##1committeeandnoinstance-basedcorrection,sojustanM5fit:mod1cubist(x=BostonHousing[,-14],y=BostonHousing$medv) 10summary.cubistpredict(mod1,BostonHousing[1:4,-14])##nowaddinstancespredict(mod1,BostonHousing[1:4,-14],neighbors=5)#Exampleerroriris_testirisiris_test$Speciesas.character(iris_test$Species)modcubist(x=iris_test[1:99,2:5],y=iris_test$Sepal.Length[1:99])#predict(mod,iris_test[100:151,2:5])#Error:#***line2of`undefined.cases':#badvalueof'virginica'forattribute'Species' summary.cubistSummarizingCubistFits DescriptionThisfunctionechoestheoutputoftheRuleQuestCcode,includingtherules,theresultinglinearmodelsaswellasthevariableusagesummaries.Usage##S3methodforclass'cubist'summary(object,...)Argumentsobjectacubist()object...otheroptions(notcurrentlyused)DetailsTheCubistoutputcontainsvariableusagestatistics.Itgivesthepercentageoftimeswhereeachvariablewasusedinaconditionand/oralinearmodel.Notethatthisoutputwillprobablybeinconsistentwiththerulesshownabove.Ateachsplitofthetree,Cubistsavesalinearmodel(afterfeatureselection)thatisallowedtohavetermsforeachvariableusedinthecurrentsplitoranysplitaboveit.Quinlan(1992)discussesasmoothingalgorithmwhereeachmodelpredictionisalinearcombinationoftheparentandchildmodelalongthetree.Assuch,thenalpredictionisafunctionofallthelinearmodelsfromtheinitialnodetotheterminalnode.ThepercentagesshownintheCubistoutputreectsallthemodelsinvolvedinprediction(asopposedtotheterminalmodelsshownintheoutput). summary.cubist11Valueanobjectofclasssummary.cubistwithelementsoutputatextstringoftheoutputcalltheoriginalcalltocubist()Author(s)RcodebyMaxKuhn,originalCsourcesbyRQuinlanandmodicationsbeSteveWestonReferencesQuinlan.Learningwithcontinuousclasses.Proceedingsofthe5thAustralianJointConferenceOnArticialIntelligence(1992)pp.343-348Quinlan.Combininginstance-basedandmodel-basedlearning.ProceedingsoftheTenthInterna-tionalConferenceonMachineLearning(1993)pp.236-243Quinlan.C4.5:ProgramsForMachineLearning(1993)MorganKaufmannPublishersInc.SanFrancisco,CAhttp://rulequest.com/cubist-info.htmlSeeAlsocubist(),cubistControl(),predict.cubist(),dotplot.cubist()Exampleslibrary(mlbench)data(BostonHousing)##1committeeandnoinstance-basedcorrection,sojustanM5fit:mod1cubist(x=BostonHousing[,-14],y=BostonHousing$medv)summary(mod1)##exampleoutput:##Cubist[Release2.07GPLEdition]SunApr1017:36:562011##---------------------------------####Targetattribute`outcome'####Read506cases(14attributes)fromundefined.data####Model:####Rule1:[101cases,mean13.84,range5to27.5,esterr1.98]####if##nox--50;0.668##then 12summary.cubist##outcome=-1.11+2.93dis+21.4nox-0.33lstat+0.008b##-0.13ptratio-0.02crim-0.003age+0.1rm####Rule2:[203cases,mean19.42,range7to31,esterr2.10]####if##nox0.668##lstat=-50;9.59##then##outcome=23.57+3.1rm-0.81dis-0.71ptratio-0.048age##-0.15lstat+0.01b-0.0041tax-5.2nox+0.05crim##+0.02rad####Rule3:[43cases,mean24.00,range11.9to50,esterr2.56]####if##rm6.226##lstat9.59##then##outcome=1.18+3.83crim+4.3rm-0.06age-0.11lstat-0.003tax##-0.09dis-0.08ptratio####Rule4:[163cases,mean31.46,range16.5to50,esterr2.78]####if##rm=-50;6.226##lstat9.59##then##outcome=-4.71+2.22crim+9.2rm-0.83lstat-0.0182tax##-0.72ptratio-0.71dis-0.04age+0.03rad-1.7nox##+0.008zn######Evaluationontrainingdata(506cases):####Average|error|2.07##Relative|error|0.31##Correlationcoefficient0.94######Attributeusage:##CondsModel####80%100%lstat##60%92%nox##40%100%rm##100%crim##100%age##100%dis##100%ptratio##80%tax##72%rad##60%b summary.cubist13##32%zn######Time:0.0secs IndexTopichplotdotplot.cubist,6Topicmodelscubist.default,2exportCubistFiles,7predict.cubist,8summary.cubist,10TopicutilitiescubistControl,4caret::train(),3,4cubist(cubist.default),2cubist(),511cubist.default,2cubistControl,4cubistControl(),2,4,7,9,11dotplot.cubist,6dotplot.cubist(),4,5,9,11exportCubistFiles,7lattice::dotplot(),6,7predict.cubist,8predict.cubist(),35,79,11summary.cubist,10summary.cubist(),35,7914