/
Regression Trees Modeling of Body Fat Regression Trees Modeling of Body Fat

Regression Trees Modeling of Body Fat - PowerPoint Presentation

candy
candy . @candy
Follow
0 views
Uploaded On 2024-02-16

Regression Trees Modeling of Body Fat - PPT Presentation

Pablo Aldama Kristina Vatcheva PhD School of Mathematical amp Statistical Sciences University of Texas Rio Grande Val ley Data mining methods such as decision trees have become essential in healthcare for detecting fraud and abuse physicians finding effective treatments for their pati ID: 1046397

fat body data trees body fat trees data circumference variables boosting regression methods random bagging cart variable method decision

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Regression Trees Modeling of Body Fat" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1. Regression Trees Modeling of Body FatPablo Aldama, Kristina Vatcheva, PhDSchool of Mathematical & Statistical Sciences, University of Texas Rio Grande ValleyData mining methods, such as decision trees, have become essential in healthcare for detecting fraud and abuse, physicians finding effective treatments for their patients, and patients receiving more affordable healthcare services (Koh et al., 2005)Classification and Regression Trees (CARTs) are type of decision trees used to build predictive models, variable selection, and alternative methods to view data differently (Breiman, et al., 1984)There are other methods that are used to improve CART models, such as bagging, random forests, and boosting.The purpose of this study is to show and compare the results of the CART, bagging, random forests, and boosting machine learning methods in predicting the DXA measured body fat using standard anthropometric measures. The body fat dataset used in this study have the estimations of the body fat percentage measured by underwater weighing and various body circumference measurements of 71 men.Data were divided into two parts: training data (70%) and testing data (30%). The training data were used to build the model, and the testing data were used as a dummy protection environment to test predictions and check the model’s accuracy (Das, 2017) We used 5 predictor variables: age, waist circumference, hip circumference, knee breadth, and elbow breadth, to reduce the model’s variance with bagging.With random forests, we measured variable importance by the mean decrease of accuracy in predictions on the out of bag sample.In boosting, we created a relative influence plot to determine which variables are the most significant.Partial dependence plot was created to tell us the relationship and dependence of the variables. All models we fit using R programming language.Figure 1: Fitted regression tree for the DXA body fat outcome variableFigure 2: Relation and bagged relation of the mean body fat percentages between the predicted and observed valuesFigure 3: Variable importance measuring, and out-of-bag error estimation vs. number of variables usedFigure 4: Age and waist circumference relationships to body fat percentage According to the CART model, the prediction is within 5.23% of the true mean body fat. The 15 values show a slight positive correlation between the predicted and observed values.Using the bagging algorithm to make our predictions showed us that the predictions were within 2.83% of the true mean body fat. There was a stronger positive correlation between the predicted and observed values on the bagged plot compared to the unbagged plot.Based on random forests algorithm with 500 trees, it was determined that the prediction was within 2.76% of the true mean body fat. We then concluded that the waist and hip circumference deliver high importance and used them to find out that the optimal number of variables used in each tree along with the out-of-bag (OOB) estimation errors for each value was 6.With the boosting method, it was determined that age and waist circumference are the most important variables, and it was calculated that the mean predicted body fat was within 9.15% of the true body fat percentage.According to our study, it was determined that the random forests method resulted with the most accurate prediction compared to CART, bagging, and boosting. The boosting method turned out to be the least accurate method between all the aforementioned methods.Regression trees are considered to be the best decision tree method known because it is more likely to select the independent variable that is most different with respect to the target variable (Steinberg et al., 1995). Koh HC, Tan G. Data mining applications in healthcare. J Healthc Inf Manag. 2005 Spring;19(2):64-72. PMID: 15869215.Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J. (1984). Classification and Regression Trees. Wadsworth and Brooks, Monterey, Ca.Das, S. (2017, November 30). Decision Trees and Pruning in R. Retrieved from DZone: https://dzone.com/articles/decision-trees-and-pruning-in-rSteinberg D, Colla P: CART: Tree-structured non-parametric data analysis. San Diego, CA: Salford Systems, 1995IntroductionObjectivesMethodsResultsResults (cont.)Discussion/ConclusionsReferences