/
Perception, interaction, and optimality Perception, interaction, and optimality

Perception, interaction, and optimality - PowerPoint Presentation

avantspac
avantspac . @avantspac
Follow
342 views
Uploaded On 2020-07-01

Perception, interaction, and optimality - PPT Presentation

DALMATIAN The policeman ate the spaghetti with a nice sauce The policeman ate the spaghetti with a nice fork Context matters Today An influential early account of howwhy context matters in perceptionrecognition that depends on ID: 792211

letter word model probabilities word letter probabilities model compute probability recognition perception interactive males hair short words location feature

Share:

Link:

Embed:

Download Presentation from below link

Download The PPT/PDF document "Perception, interaction, and optimality" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Perception, interaction, and optimality

Slide2

DALMATIAN

Slide3

Slide4

Slide5

Slide6

Slide7

Slide8

The policeman ate the spaghetti with a nice sauce.

The policeman ate the spaghetti with a nice fork.

Slide9

Slide10

Slide11

Context matters

Slide12

Today

An influential early account of how/why context matters in perception/recognition that depends on

interactive processing

A critique of the interactive approach that depends on proposal that perception/recognition is a process of

optimal statistical inference.

Conflict

between “neural-like” processing account versus “functional” accounts of perceptual and cognitive phenomena.

A demonstration that an interactive/distributed network can carry out optimal statistical inference.

Slide13

Key task: word superiority effect

People are faster to recognize (perceive?) a letter when it appears within a word or a word-like letter string than (1) when it appears alone or (2) when it appears within a non-word-like letter string.

So: Is the final letter an E or F?

SAVE, MAVE … faster than …. KLVE or … E

Important because perception/recognition of the *feature/letter* is influenced by information about the *object/word* in which it appears…

…SO process can’t be: first recognize features, then recognize object

Slide14

Slide15

Slide16

Slide17

At word level, about 1100 4-letter English words…

At letter level, 26 units (one for each letter) in each of 4 possible locations (96 units)…

At feature level, “present” and “absent” units for each of

14

possible line segments at each of 4 locations

(

112

units)

Slide18

Weight values:

Inhibitory among incompatible items, excitatory among mutually compatible items…

BUT no principled way of setting them; set by hand to get something that worked okay…Though a bit ad-hoc the model was highly successful:

Accounted for basic word-superiority effect on letter perception

Explained ambiguity resolution

Explained aspects of the time-course of processing

Predicted word-superiority effects for well-formed

nonwords

, even if these were unpronounceable.

Slide19

Slide20

What’s the problem?

For recognition/perception problems like this, it is possible to compute the “right” answer—the exactly correct probability for the ambiguous or missing information—using Bayesian inference.

Human behavior seems to accord with this “correct” inference behavior…

…but the IAC model does *not*---its behaves similarly to people, but does not compute exactly correct probabilities.

This difference was thought to arise

because the model is interactive.

So, important difference: building in interactivity as a “neutrally motivated” processing mechanism seemed to produce a model that is wrong in important ways.

Slide21

Bayes rule

You work for a Madison Avenue advertising agency in the early 1960’s

There are 80 men and 20 women at the firm72 men have short hair

82 employees have short hair

You spot someone with short hair at the end of the hallway. What is the probability that the person is a man?

Slide22

p

(m | s) =

p (m) x p (s|m)

p(s)

# males

# people with short hair

# males

# males + # females

# males with short hair

# males

# people with short hair

# males + # females

Slide23

m

m+f

ms

m

x

m+f

ms+fs

x

=

p (m) x p (

s|m

)

p(s)

m

m+f

ms

m

x

ms+fs

m+f

ms

ms+fs

=

= p(m | s)

p

(m | s) =

p (m) x p (

s|m

)

p(s)

Slide24

P(m | s) = ?

P(m | s) = p(m)…

P(m | s) =

p(m) x p(s | m)

80/100 = 0.8

0.8 x 72/80 =

0.8 x 0.9

P(m | s) =

p(m) x p(s | m)

p(s)

0.8 x 0.9

0.82

=~0.88

Slide25

Slide26

But what about word recognition?

If you can specify the relevant priors, likelihoods, and unconditional probabilities, you can compute exact probabilities for letter identities, given some input features and lexicon.

To specify those probabilities you need a “generative model”—a kind of hypothesis about how the distribution of words in the environment is generated, that allows for computation of the needed probabilities.

Slide27

A probabilistic generative model of 4-letter words

Select a word from among all candidate word with a probability proportional to its log frequency…

For each location, select a letter based on the conditional probability of the letters in that location given the word, p(l | w).

For each feature in each location, select “present” or “absent” depending upon the conditional probability of these values given the letter selected in the same location, p(f | l).

So generative model specified prior probabilities of words, conditional probabilities of letters in 4 locations given word, and conditional probabilities of feature status in different locations given letter.

Slide28

For each feature in location 2:

Compute p(l | f)

= p(f | l) * p(l) / p(f)

Product of all these is p(l | all features)

Compute this for all letters in all locations

BUT this does not take context into account!

Slide29

At word level, for every word, compute p(w | l)

= p (l | w) p(w) / p(l)

…for each letter in each position

Product of these over letter positions is probability of word

Slide30

Slide31

So….

For a generative model that specifies required probabilities

, you can exactly compute what the “correct” recognition behavior is!

And, you can do this in feed-forward pass---no need for interaction

AND, interaction seems wrong, because first-pass probabilities over letters & words are incorrect.

Conclusion: interactive model probably wrong…

…and IAC model behavior shown to be non-optimal where human behavior is optimal…

Slide32

Multinomial model

Slide33

Slide34

Bi-directional weights set to natural log of actual conditional probabilities stipulated by generative model…

Bias weights on word units set to natural log of prior probability of words (

ie

, subjective frequency)…

Activation function:

Stochastic winner-take all within competing pools (instead of direct inhibition)

Softmax

activation function:

Slide35

Slide36

Slide37

Letter probabilities initially determined by bottom-up input

After 2 steps, determined partly by word activity…

…but still INCORRECT!Need to let it continue to run for several cycles.

Eventually…

Slide38

Slide39

So interactive model can sample from correct posterior distribution….

Why prefer one model over another?

Slide40

Slide41

Other reasons? For discussion…

Slide42

Slide43