Tamara Berg CS 590133 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik Dan Klein Stuart Russell Andrew Moore Percy Liang Luke Zettlemoyer Rob ID: 792753
Download The PPT/PDF document "Machine Learning Overview" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Machine Learning Overview
Tamara BergCS 590-133 Artificial Intelligence
Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell, Andrew Moore, Percy Liang, Luke Zettlemoyer, Rob Pless, Killian Weinberger, Deva Ramanan
1
Slide2Announcements
HW4 is due April 3Reminder: Midterm2 next Thursday
Next Tuesday’s lecture topics will not be included (but material will be on the final so attend!)Midterm review Monday, 5pm in FB009
Slide3Midterm Topic List
Be able to define the following terms and answer basic questions
about them: Reinforcement learningPassive vs Active RLModel-based vs model-free approachesDirect utility estimationTD Learning and TD Q-learning
Exploration
vs
exploitation
Policy Search
Application to Backgammon/
Aibos
/helicopters (at
a high
level)
Probability
Random
variables
Axioms
of
probability
Joint
, marginal, conditional probability distributions
Independence
and conditional
independence
Product rule, chain rule, Bayes
rule
Slide4Midterm Topic List
Bayesian Networks General
Structure and parameters Calculating joint and conditional probabilitiesIndependence in Bayes Nets (Bayes Ball)Bayesian InferenceExact Inference (Inference by Enumeration, Variable Elimination)Approximate Inference (Forward Sampling, Rejection Sampling, Likelihood Weighting)
Networks for which efficient inference is
possible
Naïve Bayes
Parameter learning including Laplace smoothing
Likelihood, prior, posterior
Maximum likelihood (ML), maximum a posteriori (MAP) inference
Application to spam/ham classification
Application to image
classification (at a high level)
Slide5Midterm Topic List
HMMsMarkov
PropertyMarkov ChainsHidden Markov Model (initial distribution, transitions, emissions)Filtering (forward algorithm)Machine LearningUnsupervised/supervised/semi-supervised learning
K Means clustering
Training, tuning, testing, generalization
Slide6Machine learning
Image source:
https://www.coursera.org/course/ml
Slide7Machine learning
DefinitionGetting a computer to do well on a task without explicitly programming itImproving performance on a task based on experience
Slide8Big Data!
Slide9What is machine learning?
Computer programs that can learn from dataTwo key components
Representation: how should we represent the data?Generalization: the system should generalize from its past experience (observed data items) to perform well on unseen data items.
Slide10Types of ML algorithms
UnsupervisedAlgorithms operate on unlabeled examples
SupervisedAlgorithms operate on labeled examples Semi/Partially-supervisedAlgorithms combine both labeled and unlabeled examples
Slide11Slide12Clustering
The assignment of objects into groups (aka clusters) so that objects
in the same cluster are more similar to each other than objects in different clusters. Clustering is a common technique for statistical data analysis, used in many fields, including machine learning, data mining, pattern recognition, image analysis and bioinformatics.
Slide13Euclidean distance, angle between data vectors,
etc
Slide14Slide15K-means clustering
Want to minimize sum of squared Euclidean distances between points
xi and their nearest cluster centers mk
Slide16Slide17Slide18Slide19Slide20Slide21Slide22Slide23Slide24Slide25Slide26Slide27Slide28Slide29Source: Hinrich Schutze
Slide30Hierarchical clustering strategies
Agglomerative clustering
Start with each data point in a separate cluster
At each iteration, merge two of the “closest”
clusters
Divisive clustering
Start with all
data points
grouped into a single cluster
At each iteration, split the “largest” cluster
Slide31P
Produces a hierarchy of
clusterings
P
P
P
Slide32P
Slide33Divisive Clustering
Top-down (instead of bottom-up as in Agglomerative Clustering
)Start with all data points
in one big cluster
Then
recursively split clusters
Eventually
each
data point
forms a cluster on its own.
Slide34Flat or hierarchical clustering?
For high efficiency, use flat clustering (e.g. k means
)For deterministic results: hierarchical clustering
When
a hierarchical structure is desired: hierarchical algorithm
Hierarchical
clustering can also be applied if K cannot be predetermined (can start without knowing K)
Source: Hinrich Schutze
Slide35Clustering in Action – example from c
omputer vision
Slide36Recall: Bag of Words Representation
Represent
document as a “bag of words”
Slide37Bag-of-features models
Slides
adapted from
Fei-Fei
Li, Rob Fergus, and Antonio
Torralba
Slide38Bags of features for image classification
Extract features
Slide39Extract features
Learn “visual vocabulary”
Bags of features for image classification
Slide40Extract features
Learn “visual vocabulary”Represent images by frequencies of
“visual words”
Bags of features for image classification
Slide41…
1. Feature extraction
Slide422. Learning the visual vocabulary
…
Slide432. Learning the visual vocabulary
Clustering
…
Slide442. Learning the visual vocabulary
Clustering
…
Visual vocabulary
Slide45Example visual vocabulary
Fei-Fei et al. 2005
Slide463. Image representation
…..
frequency
Visual
words
Slide47Types of ML algorithms
UnsupervisedAlgorithms operate on unlabeled examples
SupervisedAlgorithms operate on labeled examples Semi/Partially-supervisedAlgorithms combine both labeled and unlabeled examples
Slide48Slide49Slide50Slide51Slide52Example: Sentiment analysis
http://gigaom.com/2013/10/03/stanford-researchers-to-open-source-model-they-say-has-nailed-sentiment-analysis/
http://nlp.stanford.edu:8080/sentiment/rntnDemo.html
Slide53Example: Image classification
apple
pear
tomato
cow
dog
horse
input
desired output
Slide54http://
yann.lecun.com
/exdb/mnist/index.html
Slide55Example: Seismic data
Body wave magnitude
Surface wave magnitude
Nuclear explosions
Earthquakes
Slide56Slide57The basic classification framework
y = f(
x)Learning: given a training set of labeled examples {(
x
1
,y
1
), …, (
x
N
,y
N
)}
, estimate the parameters of the prediction function
f
Inference:
apply
f
to a never before seen
test example
x
and output the predicted value
y = f(
x
)
output
c
lassification function
input
Slide58Naïve Bayes classifier
A single dimension or attribute of
x
Slide59Example: Image classification
Car
Input: Image Representation
Classifier (e.g. Naïve Bayes, Neural Net,
etc
Output: Predicted label
Slide60Example: Training and testing
Key challenge:
generalization to unseen examples
Training set (labels known)
Test set (labels unknown)
Slide61Slide62Some classification
methods
10
6
examples
Nearest neighbor
Shakhnarovich, Viola, Darrell 2003
Berg, Berg, Malik 2005
…
Neural networks
LeCun, Bottou, Bengio, Haffner 1998
Rowley, Baluja, Kanade 1998
…
Support Vector Machines and Kernels
Conditional Random Fields
McCallum, Freitag, Pereira 2000
Kumar, Hebert 2003
…
Guyon, Vapnik
Heisele, Serre, Poggio, 2001
…
Slide63Classification … more soon
Slide64Types of ML algorithms
UnsupervisedAlgorithms operate on unlabeled examples
SupervisedAlgorithms operate on labeled examples Semi/Partially-supervisedAlgorithms combine both labeled and unlabeled examples
Slide65Supervised learning
has many successes
recognize speech,
steer a car,
classify documents
classify proteins
recognizing faces, objects in images
...
Slide Credit:
Avrim
Blum
Slide66However, for many problems, labeled data can be rare or expensive.
Unlabeled data is much cheaper.
Need to pay someone to do it, requires special testing,…
Slide Credit:
Avrim
Blum
Slide67However, for many problems, labeled data can be rare or expensive.
Unlabeled
data is much cheaper.
Speech
Images
Medical outcomes
Customer modeling
Protein sequences
Web pages
Need to pay someone to do it, requires special testing,…
Slide Credit:
Avrim
Blum
Slide68However, for many problems, labeled data can be rare or expensive.
Unlabeled
data is much cheaper.
[From Jerry Zhu]
Need to pay someone to do it, requires special testing,…
Slide Credit:
Avrim
Blum
Slide69Need to pay someone to do it, requires special testing,…
However, for many problems, labeled data can be rare or expensive.
Unlabeled
data is much cheaper.
Can we make use of cheap unlabeled data?
Slide Credit:
Avrim
Blum
Slide70Semi-Supervised Learning
Can
we use unlabeled data to
augment a small
labeled
sample
to
improve
learning?
But unlabeled data is missing the most important info!!
But maybe still has useful regularities that we can use.
But…
But…
But…
Slide Credit:
Avrim
Blum