/
Regularization Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824 Regularization Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824

Regularization Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824 - PowerPoint Presentation

myesha-ticknor
myesha-ticknor . @myesha-ticknor
Follow
342 views
Uploaded On 2019-11-06

Regularization Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824 - PPT Presentation

Regularization JiaBin Huang Virginia Tech Spring 2019 ECE5424G CS5824 Administrative Women in Data Science Blacksburg Location Holtzman Alumni Center Welcome 330 340 Assembly hall Keynote Speaker ID: 763826

credit regression andrew slide regression credit slide andrew size logistic regularization descent gradient regularized linear function 1000

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Regularization Jia-Bin Huang Virginia Te..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Regularization Jia-Bin HuangVirginia Tech Spring 2019 ECE-5424G / CS-5824

Administrative Women in Data Science BlacksburgLocation: Holtzman Alumni CenterWelcome, 3:30 - 3:40, Assembly hallKeynote Speaker: Milinda Lakkam, "Detecting automation on LinkedIn's platform," 3:40 - 4:05, Assembly hallCareer Panel, 4:05 - 5:00, hallBreak , 5:00 - 5:20, Grand hallAssemblyKeynote Speaker: Sally Morton , "Bias," 5:20 - 5:45, Assembly hallDinner with breakout discussion groups, 5:45 - 7:00, MuseumIntroductory track tutorial: Jennifer Van Mullekom, "Data Visualization", 7:00 - 8:15, Assembly hallAdvanced track tutorial: Cheryl Danner, "Focal-loss-based Deep Learning for Object Detection," 7-8:15, 2nd floor board room

k-NN (Classification/Regression) Model Cost function None Learning Do nothing Inference , where  

Linear regression (Regression) Model Cost function Learning 1) Gradient descent: Repeat { } 2) Solving normal equation Inference  

Naïve Bayes (Classification) Model Cost function Maximum likelihood estimation: Maximum a posteriori estimation : Learning (Discrete ) (Continuous ) mean , variance , Inference  

Logistic regression (Classification) Model Cost function Learning Gradient descent: Repeat { } Inference  

Logistic Regression Hypothesis representationCost functionLogistic regression with gradient descent RegularizationMulti-class classification     

How about MAP? Maximum conditional likelihood estimate (MCLE) Maximum conditional a posterior estimate (MCAP)    

Prior   Common choice of : Normal distribution, zero mean, identity covariance“Pushes” parameters towards zerosCorresponds to RegularizationHelps avoid very large weights and overfitting   Slide credit: Tom Mitchell

MLE vs. MAP Maximum conditional likelihood estimate (MCLE) Maximum conditional a posterior estimate ( MCAP)  

Logistic Regression Hypothesis representationCost functionLogistic regression with gradient descent RegularizationMulti-class classification

Multi-class classification Email foldering/taggning : Work, Friends, Family, HobbyMedical diagrams: Not ill, Cold, FluWeather: Sunny, Cloudy, Rain, SnowSlide credit: Andrew Ng

Binary classification     Multiclass classification    

One-vs-all (one-vs-rest)     Class 1: Class 2: Class 3:                     Slide credit: Andrew Ng

One-vs-all Train a logistic regression classifier for each class to predict the probability that Given a new input , pick the class that maximizes   Slide credit: Andrew Ng

Generative Approach Ex: Na ïve BayesEstimate and Prediction   Discriminative Approach Ex: Logistic regression Estimate directly ( Or a discriminant function: e.g., SVM) Prediction 

Further readings Tom M. MitchellGenerative and discriminative classifiers: Naïve Bayes and Logistic Regression http://www.cs.cmu.edu/~tom/mlbook/NBayesLogReg.pdfAndrew Ng, Michael JordanOn discriminative vs. generative classifiers: A comparison of logistic regression and naive bayeshttp://papers.nips.cc/paper/2020-on-discriminative-vs-generative-classifiers-a-comparison-of-logistic-regression-and-naive-bayes.pdf

Regularization OverfittingCost functionRegularized linear regression Regularized logistic regression

Regularization OverfittingCost function Regularized linear regressionRegularized logistic regression

Example: Linear regression Price ($) in 1000’s Size in feet^2 Price ($) in 1000’s Size in feet^2 Price ($) in 1000’s Size in feet^2       Underfitting Overfitting Just right Slide credit: Andrew Ng

Overfitting If we have too many features (i.e. complex model), the learned hypothesis may fit the training set very well but fail to generalize to new examples ( predict prices on new examples).   Slide credit: Andrew Ng

Example: Linear regression Price ($) in 1000’s Size in feet^2 Price ($) in 1000’s Size in feet^2 Price ($) in 1000’s Size in feet^2       Underfitting Overfitting Just right High bias High variance Slide credit: Andrew Ng

Bias-Variance Tradeoff Bias: difference between what you expect to learn and truthMeasures how well you expect to represent true solutionDecreases with more complex model Variance: difference between what you expect to learn and what you learn from a particular dataset Measures how sensitive learner is to specific datasetIncreases with more complex model

Low variance High variance Low bias High bias

Bias–variance decomposition Training set We want that minimizes   https://en.wikipedia.org/wiki/Bias%E2%80%93variance_tradeoff

Overfitting Tumor Size Age Tumor Size Age Tumor Size Age       Underfitting Overfitting Slide credit: Andrew Ng

Addressing overfitting size of house no. of bedrooms no. of floorsage of houseaverage income in neighborhood kitchen size   Price ($) in 1000’s Size in feet^2 Slide credit: Andrew Ng

Addressing overfitting 1 . Reduce number of features.Manually select which features to keep.Model selection algorithm (later in course).2. Regularization.Keep all the features, but reduce magnitude/values of parameters .Works well when we have a lot of features, each of which contributes a bit to predicting . Slide credit: Andrew Ng

Overfitting Thriller https://www.youtube.com/watch?v=DQWI1kvmwRg

Regularization OverfittingCost function Regularized linear regressionRegularized logistic regression

Intuition Suppose we penalize and make really small.       Price ($) in 1000’s Size in feet^2 Price ($) in 1000’s Size in feet^2 Slide credit: Andrew Ng

Regularization. Small values for parameters “Simpler” hypothesisLess prone to overfittingHousing:Features: Parameters:     Slide credit: Andrew Ng

Regularization   Price ($) in 1000’s Size in feet^2 : Regularization parameter   Slide credit: Andrew Ng

Question What if is set to an extremely large value (say )? Algorithm works fine; setting to be very large can’t hurt it Algorithm fails to eliminate overfitting.Algorithm results in underfitting. (Fails to fit even training data well).Gradient descent will fail to converge. Slide credit: Andrew Ng

Question What if is set to an extremely large value (say )?   Price ($) in 1000’s Size in feet^2   Slide credit: Andrew Ng

Regularization OverfittingCost function Regularized linear regressionRegularized logistic regression

Regularized linear regression : Number of features is not panelized   Slide credit: Andrew Ng

Gradient descent (Previously) Repeat { }     Slide credit: Andrew Ng  

Gradient descent (Regularized) Repeat { }     Slide credit: Andrew Ng

Comparison Regularized linear regression Un-regularized linear regression   : Weight decay  

Normal equation     Slide credit: Andrew Ng

Regularization OverfittingCost functionRegularized linear regression Regularized logistic regression

Regularized logistic regression Cost function:   Tumor Size Age   Slide credit: Andrew Ng

Gradient descent (Regularized) Repeat { }       Slide credit: Andrew Ng

: Lasso regularization  LASSO: Least Absolute Shrinkage and Selection Operator  

Single predictor: Soft Thresholding Soft Thresholding operator  

Multiple predictors: : Cyclic Coordinate Desce For each , update with w here  

L1 and L2 balls Image credit: https://web.stanford.edu/~hastie/StatLearnSparsity_files/SLS.pdf

Terminology Regularization function NameSolver Tikhonov regularization Ridge regression Close form LASSO regression Proximal gradient descent, least angle regression Elastic net regularization Proximal gradient descent Regularization function Name SolverTikhonov regularizationRidge regressionClose formLASSO regressionProximal gradient descent, least angle regressionElastic net regularizationProximal gradient descent

Things to remember OverfittingCost functionRegularized linear regression Regularized logistic regression