/
Announcements CS Ice Cream Social Announcements CS Ice Cream Social

Announcements CS Ice Cream Social - PowerPoint Presentation

nonhurmer
nonhurmer . @nonhurmer
Follow
348 views
Uploaded On 2020-06-15

Announcements CS Ice Cream Social - PPT Presentation

9 5 3304 30 ECCR 265 includes poster session student group presentations Concept Learning Examples Word meanings Edible foods Abstract structures eg irony glorch glorch not glorch ID: 777234

data examples likelihood priors examples data priors likelihood prior number hypothesis razor model positive concept learning principle function size

Share:

Link:

Embed:

Download Presentation from below link

Download The PPT/PDF document "Announcements CS Ice Cream Social" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Announcements

CS Ice Cream Social

9

/5

3:30-4:

30, ECCR 265

includes poster session, student group presentations

Slide2

Concept Learning

Examples

Word meanings

Edible foodsAbstract structures (e.g., irony)

glorch

glorch

not

glorch

not

glorch

Slide3

Supervised Approach To Concept Learning

Both positive and negative examples provided

Typical models (both in ML and Cog

Sci

) circa 2000 required both positive and negative examples

Slide4

Contrast With Human Learning

Abiliites

Learning from positive examples only

Learning from a small number of examplesE.g., word meanings

E.g., learning appropriate social behaviorE.g., instruction on some skillWhat would it mean to learn from asmall number of positive examples?

+

+

+

Slide5

Tenenbaum

(1999)

Two dimensional continuous

feature space

Concepts defined

by axis-

parallel rectangles

e.g., feature

dimensions cholesterol level insulin levele.g., concept

healthy

Slide6

Learning Problem

Given a set of

given a set of

n examples,

X = {x1

, x2, x3, …,

xn}, which are instances of the concept…Will some unknown example Y also be an instance of the concept?

Problem of generalization

+++123

Slide7

Hypothesis (Model) Space

H: all rectangles on the plane,

parameterized by (l

1

, l

2

, s

1, s2

)h: one particular hypothesisNote: |H| = ∞Consider all hypotheses in parallelIn contrast to non-Bayesian approach of

maintaining only the best hypothesisat any point in time.

Slide8

Prediction

Via

Model Averaging

Will some unknown input y be in the concept given examples

X

=

{x

1, x

2, x3, …, xn}?Q: y is a positive example of the concept (T,F)P(Q | X) = ⌠h

p(Q & h | X) dhP(Q & h | X) = p(Q | h, X) p(h | X)

P(Q | h, X) = P(Q | h) = 1 if y is in hp(h | X) ~ P(X | h) p(h)

prior

likelihood

Chain rule

Marginalization

Conditional independence and deterministic concepts

Bayes rule

Slide9

Priors and Likelihood Functions

Priors, p(h)

Location invariant

Uninformative prior

(prior depends only on area of rectangle)

Expected size prior

Likelihood function, p(X|h)

X = set of

n examplesSize principle

x

Slide10

Expected size prior

Slide11

Generalization Gradients

MIN: smallest hypothesis consistent with data

weak Bayes: instead of using size principle, assumes examples are produced by process independent of the true class

Dark line =

50% prob.

Slide12

Experimental Design

Subjects shown n dots on screen that are “randomly chosen examples from some rectangle of healthy levels

n

drawn from {2, 3, 4, 6, 10, 50}

Dots varied in horizontal and vertical range r

drawn from {.25, .5, 1, 2, 4, 8} units in a 24 unit window

Task draw the ‘true’ rectangle around the dots

Slide13

Experimental Results

Slide14

Number Game

Experimenter picks integer arithmetic concept C

E.g., prime number

E.g., number between 10 and 20E.g., multiple of 5

Experimenter presents positive examples drawn at random from C, say, in range [1, 100]Participant asked whether some new test case belongs in C

Slide15

Empirical Predictive Distributions

Slide16

Hypothesis Space

Even numbers

Odd numbers

SquaresMultiples of

nEnds in nPowers of n

All numbersIntervals [n, m] for n>0, m<101Powers of 2, plus 37

Powers of 2, except for 32

Slide17

Observation = 16

Likelihood function

Size principle

Prior

Intuition

Slide18

Observation =

16 8 2 64

Likelihood function

Size principlePrior

Intuition

Slide19

Posterior Distribution After Observing 16

Slide20

Model Vs. Human Data

MODEL

HUMAN

DATA

Slide21

Summary of Tenenbaum (1999)

Method

Pick prior distribution (includes hypothesis space)

Pick likelihood

function (

size principle)

leads to predictions for generalization as a function of r (range) and n (number of examples)

Claims people generalize optimally given assumptions about priors and likelihoodBayesian approach provides best description of how people generalize on rectangle task.

Explains how people can learn from a small number of examples, and only positive examples.

Slide22

Important Ideas in Bayesian Models

Generative models

Likelihood

function

Consideration of multiple models in parallel

Potentially infinite model space

Inference

prediction via model averaging

role of priors diminishes with amount of evidence

Learningtrade off between model simplicity and fit to data  Bayesian Occam’s Razor

Slide23

Ockham's Razor

If two hypotheses are equally consistent with the data, prefer the simpler one.

Simplicity

can accommodate fewer observations

smoother

fewer parameters

restricts predictions more

(“sharper” predictions)

Examples

1st vs. 4th order polynomialsmall rectangle vs. large rectanglein Tenenbaum model

medieval philosopher

and monk

tool for cutting

(metaphorical)

Slide24

Motivating Ockham's Razor

Aesthetic considerations

A theory with mathematical beauty is more likely to be right (or believed) than an ugly one, given that both fit the same data.

Past empirical success of the principle

Coherent inference, as embodied by Bayesian reasoning, automatically incorporates Ockham's razor

Two theories H

1

and H

2

PRIORS

LIKELIHOODS

Slide25

Ockham's Razor with Priors

Jeffreys

(1939)

probabililty

text

more complex hypotheses should have

lower priors

R

equires a numerical rule for assessing complexitye.g., number of free parameterse.g., Vapnik-Chervonenkis (VC)

dimension

Slide26

Subjective vs. Objective Priors

subjective

or

informative

prior

specific

, definite information about a random variable

objective

or uninformative priorvague, general information

Philosophical arguments for certain priors as uninformativeMaximum entropy / least committmente.g., interval [a b]: uniform

e.g., interval [0, ∞) with mean 1/λ: exponential distribution e.g., mean μ and std deviation σ: Gaussian

Independence of measurement scale

e.g., Jeffrey

s prior 1/(

θ

(1-

θ

))

for

θ

in [0,1]

expresses same belief whether we talk

about

θ

or

log

θ

Slide27

Ockham’s Razor Via Likelihoods

Coin flipping example

H

1

: coin has two heads

H

2

: coin has a head and a tail

Consider 5 flips producing HHHHHH1 could produce only this sequenceH2

could produce HHHHH, but also HHHHT, HHHTH, ... TTTTTP(HHHHH | H1) = 1, P(HHHHH | H2) = 1/32H2 pays the price of having a lower likelihood via the fact it can accommodate a greater range of observations

H1 is more readily rejected by observations

Slide28

Simple and Complex Hypotheses

H

2

H

1

Slide29

Bayes Factor

BIC is approximation to Bayes

factor

A.k.a. likelihood ratio

Slide30

Hypothesis

Classes

Varying In Complexity

E.g., 1

st

, 2

nd

, and 3d

order polynomialsHypothesis class is parameterized by wv

Slide31

Rissanen (1976)

Minimum Description Length

Prefer models that can communicate the data in the smallest number of bits.

The preferred hypothesis H for explaining data D minimizes:

(1) length of the description of the hypothesis

(

2) length of the description of the data with the help of the chosen theory

L: length

Slide32

MDL & Bayes

L: some measure of length (complexity)

MDL: prefer hypothesis that min. L(H) + L(D|H)

Bayes rule implies MDL principle

P(H|D) = P(D|H)P(H) / P(D)

–log P(H|D) = –log P(D|H) – log P(H) + log P(D)

= L(D|H) + L(H) + const

Slide33

Slide34

Relativity Example

Explain deviation in Mercury's orbit at perihelion with respect to prevailing theory

E: Einstein's theory

α

= true deviation

F: fudged Newtonian theory a = observed deviation

Slide35

Relativity Example (Continued)

Subjective Ockham's razor

result depends on one's belief about P(

α

|F)

Objective Ockham's razor

for Mercury example, RHS is 15.04

Applies to generic situation