Analysis ShaLi Limitation of PCA The direction of maximum variance is not always good for classification Limitation of PCA The direction of maximum variance is not always good for classification ID: 756327
Download Presentation The PPT/PDF document "LDA ( Linear Discriminant" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
LDA (Linear Discriminant Analysis)
ShaLi
Slide2
Limitation of PCA
The direction of maximum variance is not always good for classificationSlide3
Limitation of PCA
The direction of maximum variance is not always good for classificationSlide4
Limitation of PCA
The direction of maximum variance is not always good for classificationSlide5
Limitation of PCA
The direction of maximum variance is not always good for classificationSlide6
Limitation of PCA
There are better direction that support classification tasks.
LDA tries to find the best direction to separate classesSlide7
Idea of LDASlide8
Idea of LDA
Find the w that
maximizes
minimizes Slide9
Limitations of LDA
If the distributions are significantly non Gaussian, the LDA projections may not preserve complex structure in the data needed for classification
LDA will also fail if discriminatory information is not in the mean but in the variance of the dataSlide10
10LDA for two class and k=1
Compute the means of classes:
Projected class means:
Difference between projected class means:Slide11
11LDA for two class and k=1
Scatter of projected data
in 1 dim spaceSlide12
12Objective function
Find the w that
maximizes
minimizes
LDA does this by
maximizing :
Slide13
13Objective function—Numerator
We can rewrite:
Between class scatterSlide14
14
Objective function—Denominator
We can rewrite:
Within class scatterSlide15
15Objective function
Putting all together:
Where
Maximize r(w) by setting the first derivative of w to zero:
Slide16
For K=1For K>1
Extension to K>1
Transform data onto the new subspace:
Y = X
×
W
n
×
k
n×d
d×kSlide17
17Prediction as a classifier
Classification rule
:
x
in
C
lass
2
if
y
(
x
)>0, else
x
in
Class
1
, whereSlide18
Comparison of PCA and LDA
PCA: Perform dimensionality reduction while preserving as much of the Variance in the high dimensional space as possible.
LDA: Perform dimensionality reduction while preserving as much of the class discriminatory information as possible.
PCA is the standard choice for unsupervised problems(no labels)
LDA exploits class labels to find a subspace so that separates the classes as good as possibleSlide19
PCA and LDA example
Var1 and Var2 are large
Seriously overlap
m
1
and
m
2 are
close
Data:
Springleaf
customer information
2 classes
Original dimension: d=1934
Reduced dimension: k=1Slide20
PCA and LDA example
PCA
LDA
Data: Iris
3 classes
Original dimension: d=4
Reduced dimension: k=2Slide21
PCA and LDA example
Data:
coffee bean
recognition
5 classes
Original dimension:
d=60
Reduced dimension: k=3Slide22
Question?