/
1 Peter  Fox and Greg Hughes 1 Peter  Fox and Greg Hughes

1 Peter Fox and Greg Hughes - PowerPoint Presentation

cheryl-pisano
cheryl-pisano . @cheryl-pisano
Follow
350 views
Uploaded On 2018-11-07

1 Peter Fox and Greg Hughes - PPT Presentation

Data Analytics ITWS4600ITWS6600 Group 3 Module 11 April 27 2017 Weak Models Bagging Boosting Bootstrap Aggregation Bootstrap aggregation bagging Improve the stability and accuracy of machine learning algorithms used in statistical ID: 721210

data bagging pred control bagging data control pred hipcirc vehicle boosting model weak cars dexfat bodyfat learners breastcancer class anthro3a kneebreadth error

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "1 Peter Fox and Greg Hughes" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

1

Peter

Fox and Greg HughesData Analytics – ITWS-4600/ITWS-6600Group 3 Module 11, April 27, 2017

Weak Models: Bagging, Boosting,

Bootstrap AggregationSlide2

Bootstrap aggregation (bagging)Improve the stability and accuracy of machine learning algorithms used in statistical classification and regression. Also

reduces variance and helps to avoid overfitting. Usually applied to decision tree methods, but can be used with any type of method.Bagging is a special case of the model averaging approach.Harder to interpret – why?2Slide3

Cf. Random Forest“Averages” over the trees… i.e. a different form of model averagingBut the trees are “dimension-reduced” and provide immediate “prescriptive” capabilityLocal partitioning

– but in a different way that bagging is appliedLet’s see how…3Slide4

Ozonelibrary(ipred)data(Ozone,package=“mlbench

”)l <- length(Ozone[,1])sub <- sample(1:l,2*l/3)OZ.bagging <- bagging(V4 ~., data=Ozone[,-1], mfinal=30,control=rpart.control(maxdepth=5))OZ.bagging.pred <-predict(OZ.bagging, newdata=Ozone[-sub,-4])4Slide5

Ozone5

10 of 100 bootstrap samplesaverageWhat other local models? Splines. +more next few modulesSlide6

Example reading…http://amunategui.github.io/bagging-in-R/ Note comment about “competitions”https://www.r-bloggers.com/improve-predictive-performance-in-r-with-bagging

/ 6Slide7

Shows improvements for unstable procedures (Breiman, 1996): e.g. neural nets, classification and regression trees, and subset selection in linear regression

… can mildly degrade the performance of stable methods such as K-nearest neighbors7Slide8

Bagging (bootstrapping aggregation)*library(mlbench) # library(adabag

) – requires a number of othersdata(BreastCancer)l <- length(BreastCancer[,1])sub <- sample(1:l,2*l/3)BC.bagging <- bagging(Class ~., data=BreastCancer[,-1], mfinal=20, control=rpart.control(maxdepth=3)) # rpartBC.bagging.pred <-predict.bagging( BC.bagging, newdata=BreastCancer[-sub,-1])

BC.bagging.pred$confusion Observed ClassPredicted Class benign malignant benign 142 2 malignant 8 818BC.bagging.pred$error[1] 0.04291845Slide9

A “little later” - randomized> data(BreastCancer)> l <- length(BreastCancer

[,1])> sub <- sample(1:l,2*l/3)> BC.bagging <- bagging(Class ~.,data=BreastCancer[,-1],mfinal=20,+ control=rpart.control(maxdepth=3))> BC.bagging.pred <- predict.bagging(BC.bagging,newdata=BreastCancer[-sub,-1])> BC.bagging.pred$confusion Observed ClassPredicted Class benign malignant benign 147 1 malignant 7 78> BC.bagging.pred$error

[1] 0.034334769BC.bagging.pred$error[1] 0.04291845 Observed ClassPredicted Class benign malignant benign 142 2 malignant 8 81Slide10

Bagging (Vehicle)> data(Vehicle)> l <- length(Vehicle[,1])> sub <- sample(1:l,2*l/3)> Vehicle.bagging

<- bagging(Class ~.,data=Vehicle[sub, ],mfinal=40,+ control=rpart.control(maxdepth=5))> Vehicle.bagging.pred <- predict.bagging(Vehicle.bagging, newdata=Vehicle[-sub, ])> Vehicle.bagging.pred$confusion Observed ClassPredicted Class bus opel saab van bus 63 10 8 0 opel 1 42 27 0

saab 0 18 30 0 van 5 7 9 62> Vehicle.bagging.pred$error[1] 0.301418410Slide11

Up to nowStrong modelsDirect use of variables (independent)Some or allAveraging to reduce overfitting*Guided by statistical significant (R2, p-value, other measures, error rate)

Strong models + “weaker” modelsPCA – identifying dominant dimensionsFactor analysis – cross correlations down to r=.3 and combing variables into factorsAimed at explaining variance11Slide12

Weak models …A weak learner: a classifier which is only slightly correlated with the true classification (it can label examples better than random guessing)A strong learner:

a classifier that is arbitrarily well-correlated with the true classification.Can a set of weak learners create a single strong learner (not called latent but same idea)? 12Slide13

Boosting… reducing bias in supervised learningmost boosting algorithms consist of iteratively learning weak classifiers with respect to a distribution and adding them to a final strong classifier.

typically weighted in some way that is usually related to the weak learners' accuracy. After a weak learner is added, the data is reweighted: examples that are misclassified gain weight and examples that are classified correctly lose weight Thus, future weak learners focus more on the examples that previous weak learners misclassified.13Slide14

14

Diamonds (lab this week)

Compare the identification of this variable under boosting versus using strong learnersSlide15

Using diamonds… boost (glm)> mglmboost<-glmboost(

as.factor(Expensive) ~ ., data=diamonds, family=Binomial(link="logit"))> summary(mglmboost) Generalized Linear Models Fitted via Gradient BoostingCall:glmboost.formula(formula = as.factor(Expensive) ~ ., data = diamonds, family = Binomial(link = "logit")) Negative Binomial Likelihood Loss function: { f <- pmin(abs(f), 36) * sign(f) p <- exp(f)/(exp(f) + exp(-f))

y <- (y + 1)/2 -y * log(p) - (1 - y) * log(1 - p) } 15Slide16

Using diamonds… boost (glm)> summary(mglmboost) #continued

Number of boosting iterations: mstop = 100 Step size: 0.1 Offset: -1.339537 Coefficients: NOTE: Coefficients from a Binomial model are half the size of coefficients from a model fitted via glm(... , family = 'binomial').See Warning section in ?coef.mboost(Intercept) carat clarity.L -1.5156330 1.5388715 0.1823241 attr(,"offset")[1] -1.339537Selection frequencies: carat (Intercept) clarity.L 0.50 0.42 0.08

#add up to 1.016Slide17

Cluster boostingAssessment of the clusterwise stability of a clustering of data, which can be cases x variables or dissimilarity data.

The data is resampled using several schemes (bootstrap, subsetting, jittering, replacement of points by noise) and the Jaccard similarities of the original clusters to the most similar clusters in the resampled data are computed. The mean over these similarities is used as an index of the stability of a cluster (other statistics can be computed as well). 17Slide18

Cluster boostingQuite general clustering methods are possible, i.e. methods estimating or fixing the number of clusters, methods producing overlapping clusters or not assigning all cases to clusters (but declaring them as "noise"). In R – clustermethod = X is used to select the method, e.g.

KmeansLab this week … (iris, etc..)18Slide19

Example - bodyfatThe response variable is the body fat measured by DXA (DEXfat), which can be seen as the gold standard to measure body fat. However, DXA measurements are too expensive and complicated for a broad use.

Anthropometric measurements as waist or hip circumferences are in comparison very easy to measure in a standard screening. A prediction formula only based on these measures could therefore be a valuable alternative with high clinical relevance for daily usage. Tutorial (lab): https://cran.r-project.org/web/packages/mboost/vignettes/mboost_tutorial.pdf19Slide20

20Slide21

bodyfat## regular linear model using three variableslm1 <- lm(DEXfat ~ hipcirc

+ kneebreadth + anthro3a, data = bodyfat)## Estimate same model by glmboostglm1 <- glmboost(DEXfat ~ hipcirc + kneebreadth + anthro3a, data = bodyfat)# We consider all available variables as potential predictors.glm2 <- glmboost(DEXfat ~ ., data = bodyfat)# or one could essentially call:preds <- names(bodyfat[, names(bodyfat) != "

DEXfat"]) ## names of predictorsfm <- as.formula(paste("DEXfat ~", paste(preds, collapse = "+"))) ## build formula21Slide22

Compare linear models> coef(lm1)(Intercept) hipcirc kneebreadth

anthro3a -75.2347840 0.5115264 1.9019904 8.9096375 > coef(glm1, off2int=TRUE) ## off2int adds the offset to the intercept(Intercept) hipcirc kneebreadth anthro3a -75.2073365 0.5114861 1.9005386 8.9071301 Conclusion?22Slide23

> fmDEXfat ~ age + waistcirc + hipcirc

+ elbowbreadth + kneebreadth + anthro3a + anthro3b + anthro3c + anthro4> coef(glm2, which = "") ## select all. (Intercept) age waistcirc hipcirc elbowbreadth kneebreadth anthro3a anthro3b anthro3c -98.8166077 0.0136017 0.1897156 0.3516258 -0.3841399 1.7365888 3.3268603 3.6565240 0.5953626 anthro4 0.0000000 attr(,"offset")[1] 30.7828223Slide24

plot(glm2, off2int = TRUE)

24Slide25

plot(glm2, ylim = range(coef

(glm2, which = preds)))25Slide26

26Slide27

AdaboostPreparation for lab: http://math.mit.edu/~rothvoss/18.304.3PM/Presentations/1-Eric-Boosting304FinalRpdf.pdf

27Slide28

Other forms of boostingGamboost = Generalized Additive Model - Gradient boosting for optimizing arbitrary loss functions, where component-wise smoothing procedures are utilized as (univariate) base

-learners.28Slide29

> gam1 <-

gamboost(DEXfat ~ bbs(hipcirc) + bbs(kneebreadth) + bbs(anthro3a),data = bodyfat)> #Using plot() on a gamboost object delivers automatically the partial effects of the different base-learners:> par(mfrow = c(1,3)) ## 3 plots in one frame> plot(gam1) ## get the partial effects# bbs, bols, btree..

29Slide30

30Slide31

Compare to rpart> fattree<-rpart(DEXfat ~ ., data=

bodyfat)> plot(fattree)> text(fattree) > labels(fattree)[1] "root" "waistcirc< 88.4" "anthro3c< 3.42" "anthro3c>=3.42" "hipcirc< 101.3" "hipcirc>=101.3" [7] "waistcirc>=88.4" "hipcirc< 109.9" "hipcirc>=109.9" 31Slide32

32Slide33

Variants on boosting – loss fncars.gb <- blackboost(dist ~ speed, data = cars

, control = boost_control(mstop = 50))### plot fitplot(dist ~ speed, data = cars)lines(cars$speed, predict(cars.gb), col = "red")33Slide34

Blackboosting (cf. brown)Gradient boosting for optimizing arbitrary loss functions where regression trees are utilized as base-learners. > cars.gb

Model-based BoostingCall:blackboost(formula = dist ~ speed, data = cars, control = boost_control(mstop = 50)) Squared Error (Regression) Loss function: (y - f)^2 Number of boosting iterations: mstop = 50 Step size: 0.1 Offset: 42.98 Number of baselearners: 1 34Slide35

Cars - gamboost

35“Localized”Note characteristics of model, cf. blackboostingSlide36

iris

36Slide37

cars

37Slide38

38

library(mboost)Slide39

39

CarsSlide40

Sparse matrix example> coef(mod, which = which(beta > 0)) V306 V1052 V1090 V3501 V4808 V5473 V7929 V8333 V8799 V9191 2.1657532 0.0000000 4.8756163 4.7068006 0.4429911 5.4029763 3.6435648 0.0000000 3.7843504 0.4038770

attr(,"offset")[1] 2.90198 40Slide41

41Slide42

Aside: Boosting and SVM…Remember “margins” from the SVM? Partitioning the “linear” or transformed space? In boosting we are effectively (not explicitly) attempting to maximize the minimum margin of any training

example42Slide43

Assignment 7E.g. https://rpubs.com/chengjiun/52658 A7 will be up in LMS in the next day or two

Lab this week (group3/lab4…)Group 4 next week – cross validation ++ in relation to ~ all other methods43