DALMATIAN The policeman ate the spaghetti with a nice sauce The policeman ate the spaghetti with a nice fork Context matters Today An influential early account of howwhy context matters in perceptionrecognition that depends on ID: 792211
Download The PPT/PDF document "Perception, interaction, and optimality" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Perception, interaction, and optimality
Slide2DALMATIAN
Slide3Slide4Slide5Slide6Slide7Slide8The policeman ate the spaghetti with a nice sauce.
The policeman ate the spaghetti with a nice fork.
Slide9Slide10Slide11Context matters
Slide12Today
An influential early account of how/why context matters in perception/recognition that depends on
interactive processing
A critique of the interactive approach that depends on proposal that perception/recognition is a process of
optimal statistical inference.
Conflict
between “neural-like” processing account versus “functional” accounts of perceptual and cognitive phenomena.
A demonstration that an interactive/distributed network can carry out optimal statistical inference.
Slide13Key task: word superiority effect
People are faster to recognize (perceive?) a letter when it appears within a word or a word-like letter string than (1) when it appears alone or (2) when it appears within a non-word-like letter string.
So: Is the final letter an E or F?
SAVE, MAVE … faster than …. KLVE or … E
Important because perception/recognition of the *feature/letter* is influenced by information about the *object/word* in which it appears…
…SO process can’t be: first recognize features, then recognize object
Slide14Slide15Slide16Slide17At word level, about 1100 4-letter English words…
At letter level, 26 units (one for each letter) in each of 4 possible locations (96 units)…
At feature level, “present” and “absent” units for each of
14
possible line segments at each of 4 locations
(
112
units)
Slide18Weight values:
Inhibitory among incompatible items, excitatory among mutually compatible items…
BUT no principled way of setting them; set by hand to get something that worked okay…Though a bit ad-hoc the model was highly successful:
Accounted for basic word-superiority effect on letter perception
Explained ambiguity resolution
Explained aspects of the time-course of processing
Predicted word-superiority effects for well-formed
nonwords
, even if these were unpronounceable.
Slide19Slide20What’s the problem?
For recognition/perception problems like this, it is possible to compute the “right” answer—the exactly correct probability for the ambiguous or missing information—using Bayesian inference.
Human behavior seems to accord with this “correct” inference behavior…
…but the IAC model does *not*---its behaves similarly to people, but does not compute exactly correct probabilities.
This difference was thought to arise
because the model is interactive.
So, important difference: building in interactivity as a “neutrally motivated” processing mechanism seemed to produce a model that is wrong in important ways.
Slide21Bayes rule
You work for a Madison Avenue advertising agency in the early 1960’s
There are 80 men and 20 women at the firm72 men have short hair
82 employees have short hair
You spot someone with short hair at the end of the hallway. What is the probability that the person is a man?
Slide22p
(m | s) =
p (m) x p (s|m)
p(s)
# males
# people with short hair
# males
# males + # females
# males with short hair
# males
# people with short hair
# males + # females
Slide23m
m+f
ms
m
x
m+f
ms+fs
x
=
p (m) x p (
s|m
)
p(s)
m
m+f
ms
m
x
ms+fs
m+f
ms
ms+fs
=
= p(m | s)
p
(m | s) =
p (m) x p (
s|m
)
p(s)
Slide24P(m | s) = ?
P(m | s) = p(m)…
P(m | s) =
p(m) x p(s | m)
…
80/100 = 0.8
0.8 x 72/80 =
0.8 x 0.9
…
P(m | s) =
p(m) x p(s | m)
p(s)
0.8 x 0.9
0.82
=~0.88
Slide25Slide26But what about word recognition?
If you can specify the relevant priors, likelihoods, and unconditional probabilities, you can compute exact probabilities for letter identities, given some input features and lexicon.
To specify those probabilities you need a “generative model”—a kind of hypothesis about how the distribution of words in the environment is generated, that allows for computation of the needed probabilities.
Slide27A probabilistic generative model of 4-letter words
Select a word from among all candidate word with a probability proportional to its log frequency…
For each location, select a letter based on the conditional probability of the letters in that location given the word, p(l | w).
For each feature in each location, select “present” or “absent” depending upon the conditional probability of these values given the letter selected in the same location, p(f | l).
So generative model specified prior probabilities of words, conditional probabilities of letters in 4 locations given word, and conditional probabilities of feature status in different locations given letter.
Slide28For each feature in location 2:
Compute p(l | f)
= p(f | l) * p(l) / p(f)
Product of all these is p(l | all features)
Compute this for all letters in all locations
BUT this does not take context into account!
Slide29At word level, for every word, compute p(w | l)
= p (l | w) p(w) / p(l)
…for each letter in each position
Product of these over letter positions is probability of word
Slide30Slide31So….
For a generative model that specifies required probabilities
, you can exactly compute what the “correct” recognition behavior is!
And, you can do this in feed-forward pass---no need for interaction
AND, interaction seems wrong, because first-pass probabilities over letters & words are incorrect.
Conclusion: interactive model probably wrong…
…and IAC model behavior shown to be non-optimal where human behavior is optimal…
Slide32Multinomial model
Slide33Slide34Bi-directional weights set to natural log of actual conditional probabilities stipulated by generative model…
Bias weights on word units set to natural log of prior probability of words (
ie
, subjective frequency)…
Activation function:
Stochastic winner-take all within competing pools (instead of direct inhibition)
Softmax
activation function:
Slide35Slide36Slide37Letter probabilities initially determined by bottom-up input
After 2 steps, determined partly by word activity…
…but still INCORRECT!Need to let it continue to run for several cycles.
Eventually…
Slide38Slide39So interactive model can sample from correct posterior distribution….
Why prefer one model over another?
Slide40Slide41Other reasons? For discussion…
Slide42Slide43