Psych209 January 25 2013 A Problem For the Interactive Activation Model Data from many experiments give rise to a pattern corresponding to logistic additivity And we expect such a pattern from a Bayesian point of view ID: 562707
Download Presentation The PPT/PDF document "Principled Probabilistic Inference and I..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Principled Probabilistic Inference and Interactive Activation
Psych209
January 25, 2013Slide2
A Problem For
the
Interactive Activation Model
Data from many experiments give rise to a pattern corresponding to ‘logistic
additivity
’
And we expect such a pattern from a Bayesian point of view.
Unfortunately
, the
original interactive
activation model does not exhibit this pattern
.
Does this mean that the interactive activation model is fundamentally wrong – i.e. processing is strictly
feedforward
(as
Massaro
believed
)?
If not, is there a principled basis for understanding interactive activation as principled probabilistic inference?Slide3
Joint Effect of Context and Stimulus Information in Phoneme Identification (/l/ or /r/)
From Massaro & Cohen (1991)Slide4
Massaro’s Model
Joint effects of context and stimulus obey the fuzzy logical model of
perception:
t
i
is the stimulus support for
r
given input
i and cj is the contextual support for r given context j. Massaro sees this model as having a strictly feed-forward organization:
Evaluate stimulus
Evaluate context
Integration
DecisionSlide5
Massaro’s model implies ‘logistic additivity’:
log(p
ij
/(1-p
ij
)) = log(ti/(1-ti)) + log(cj/(1-c
j
))
logit(p
ij
)
The p
ij
on this graph corresponds to the p(r|Sij) on the preceding slide
L-like R-like
Different lines referto different contextconditions:r means ‘favors r’l means ‘favors l’n means ‘neutral’Slide6
Ideal logistic-additive pattern (upper right)
vs. mini-IA simulation results (lower right).Slide7
Massaro’s argument against the IA model
In the IA model, feature information gets used twice, once on the way up and then again on the way back
down…
Feeding the activation back in this way, he suggested, distorts the process of correctly identifying the target phoneme
.Slide8
Should we agree and give up on interactivity?
Perception of each letter is influenced by the amount of information about every other letter
So, it would be desirable to have a way for each letter to facilitate perception of others while it itself is being facilitated.
In speech, there are both ‘left’ and ‘right’ context effects
Examples of ‘right’ context effects:
‘?
ift
’
vs ‘?iss’‘the ?eel of the {shoe/wagon/orange/leather}’As we discussed before, there are knock-on effects of context that appear to penetrate the perceptual system, as well as support from neurophysiologySlide9
What was wrong with the Interactive Activation model?
The original interactive activation model ‘tacked the variability on at the end’ but neural activity is intrinsically stochastic.
McClelland (1991)
incorporated
intrinsic variability in the computation of the net
input to the IA model:
Rather than choosing probabilistically based on relative activations, we simply
choose the alternative with the highest activation after settling.
Logistic additivity is observed.
Intrinsic VariabilitySlide10
Can we Relate IA to Principled Probabilistic Inference?
We begin with a probabilistic generative model
We then show how a variant of the IA model samples from the correct posterior of the generative modelSlide11
The Generative Model
Select a word with probability p(
wi
)
Generate letters with probability p(
ljp|wi)Generate feature values with probability p(f
vdp
|l
jp)Note that features are specified as ‘present’ or ‘absent’Slide12
The Neural Network Model
Network is viewed as consisting of several multinomial variables each represented by a pool of units corresponding to mutually exclusive alternatives.
There are:
4*14 feature level variables, each with two alternative possible values (not well depicted in figure)
4 letter level variables, each with 26 possible values.
1 word level variable, with 1129 possible values.
Connection weights are bi-directional, but their values are the logs of the top-down probabilities given in the generative model.
There are biases only at the word level, corresponding to the logs of the p(
wi).Slide13
The Neural Network Model
An input, assumed to have been produced by the generative model is clamped on the units at the feature level.
The letter and word level variables are initialized to 0.
Then, we alternate updating the letter and word variables
Letters can be updated in parallel or sequentially
Word updated after all of the letters
Updates occur by calculating each unit’s net input based on active units that have connections to it (and the bias at the word level), then setting the activations using the
softmax
function.A state of the model consists of one active word, four active letters, and 4*14 active features.The hidden state consists of one active word and four active letters. We can view each state as a composite hypothesis about what underlying path might have produced the feature values clamped on the input units.After a ‘burn in period’, the network visits hidden states with probability proportional to the posterior probability that the partial path corresponding to the hidden state generated the observed features.Slide14
Sampled and Calculated Probabilities for a Specific Display (? = a random set feature values)
Mirman et al
Figure
14
?Slide15
Alternatives to the MIAM Approach
For the effect of context in a specific position:
Calculate p(
w
i
|other letters)for all wordsUse this to calculate p(l
jp
|context
)Pearl’s procedure:Calculate p(wi|all letters)Divide the contribution of position p back out when calculating p(ljp|context) for each position.This produces the correct marginals for each multinomial variable but doesn’t specify their joint distribution (see next slide)Slide16
Joint vs. marginal posterior probabilities
Can you make sense of the given features?
In the Rumelhart font, considering each position separately likely letters are:
{H,F}, {E,O}, {X,W}
Known words are
HOW, HEX, FEW, FOXThere are constraints between the word and letter possibilities not captured by just listing the marginal probabilitiesThese constraints are captured in samples from the joint posterior.Slide17
Some Key Concepts
A generative model as the basis for principled probabilistic inference
Perception as a probabilistic sampling process
A sample from the joint posterior as a compound hypothesis
Joint vs. marginal posteriors
Interactive neural networks as mechanisms that implement principled probabilistic sampling