/
Principled Probabilistic Inference and Interactive Activati Principled Probabilistic Inference and Interactive Activati

Principled Probabilistic Inference and Interactive Activati - PowerPoint Presentation

phoebe-click
phoebe-click . @phoebe-click
Follow
391 views
Uploaded On 2017-06-23

Principled Probabilistic Inference and Interactive Activati - PPT Presentation

Psych209 January 25 2013 A Problem For the Interactive Activation Model Data from many experiments give rise to a pattern corresponding to logistic additivity And we expect such a pattern from a Bayesian point of view ID: 562707

context model word activation model context activation word probabilistic level interactive values letter joint feature principled letters generative active

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Principled Probabilistic Inference and I..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Principled Probabilistic Inference and Interactive Activation

Psych209

January 25, 2013Slide2

A Problem For

the

Interactive Activation Model

Data from many experiments give rise to a pattern corresponding to ‘logistic

additivity

And we expect such a pattern from a Bayesian point of view.

Unfortunately

, the

original interactive

activation model does not exhibit this pattern

.

Does this mean that the interactive activation model is fundamentally wrong – i.e. processing is strictly

feedforward

(as

Massaro

believed

)?

If not, is there a principled basis for understanding interactive activation as principled probabilistic inference?Slide3

Joint Effect of Context and Stimulus Information in Phoneme Identification (/l/ or /r/)

From Massaro & Cohen (1991)Slide4

Massaro’s Model

Joint effects of context and stimulus obey the fuzzy logical model of

perception:

t

i

is the stimulus support for

r

given input

i and cj is the contextual support for r given context j. Massaro sees this model as having a strictly feed-forward organization:

Evaluate stimulus

Evaluate context

Integration

DecisionSlide5

Massaro’s model implies ‘logistic additivity’:

log(p

ij

/(1-p

ij

)) = log(ti/(1-ti)) + log(cj/(1-c

j

))

logit(p

ij

)

The p

ij

on this graph corresponds to the p(r|Sij) on the preceding slide

L-like R-like

Different lines referto different contextconditions:r means ‘favors r’l means ‘favors l’n means ‘neutral’Slide6

Ideal logistic-additive pattern (upper right)

vs. mini-IA simulation results (lower right).Slide7

Massaro’s argument against the IA model

In the IA model, feature information gets used twice, once on the way up and then again on the way back

down…

Feeding the activation back in this way, he suggested, distorts the process of correctly identifying the target phoneme

.Slide8

Should we agree and give up on interactivity?

Perception of each letter is influenced by the amount of information about every other letter

So, it would be desirable to have a way for each letter to facilitate perception of others while it itself is being facilitated.

In speech, there are both ‘left’ and ‘right’ context effects

Examples of ‘right’ context effects:

‘?

ift

vs ‘?iss’‘the ?eel of the {shoe/wagon/orange/leather}’As we discussed before, there are knock-on effects of context that appear to penetrate the perceptual system, as well as support from neurophysiologySlide9

What was wrong with the Interactive Activation model?

The original interactive activation model ‘tacked the variability on at the end’ but neural activity is intrinsically stochastic.

McClelland (1991)

incorporated

intrinsic variability in the computation of the net

input to the IA model:

Rather than choosing probabilistically based on relative activations, we simply

choose the alternative with the highest activation after settling.

Logistic additivity is observed.

Intrinsic VariabilitySlide10

Can we Relate IA to Principled Probabilistic Inference?

We begin with a probabilistic generative model

We then show how a variant of the IA model samples from the correct posterior of the generative modelSlide11

The Generative Model

Select a word with probability p(

wi

)

Generate letters with probability p(

ljp|wi)Generate feature values with probability p(f

vdp

|l

jp)Note that features are specified as ‘present’ or ‘absent’Slide12

The Neural Network Model

Network is viewed as consisting of several multinomial variables each represented by a pool of units corresponding to mutually exclusive alternatives.

There are:

4*14 feature level variables, each with two alternative possible values (not well depicted in figure)

4 letter level variables, each with 26 possible values.

1 word level variable, with 1129 possible values.

Connection weights are bi-directional, but their values are the logs of the top-down probabilities given in the generative model.

There are biases only at the word level, corresponding to the logs of the p(

wi).Slide13

The Neural Network Model

An input, assumed to have been produced by the generative model is clamped on the units at the feature level.

The letter and word level variables are initialized to 0.

Then, we alternate updating the letter and word variables

Letters can be updated in parallel or sequentially

Word updated after all of the letters

Updates occur by calculating each unit’s net input based on active units that have connections to it (and the bias at the word level), then setting the activations using the

softmax

function.A state of the model consists of one active word, four active letters, and 4*14 active features.The hidden state consists of one active word and four active letters. We can view each state as a composite hypothesis about what underlying path might have produced the feature values clamped on the input units.After a ‘burn in period’, the network visits hidden states with probability proportional to the posterior probability that the partial path corresponding to the hidden state generated the observed features.Slide14

Sampled and Calculated Probabilities for a Specific Display (? = a random set feature values)

Mirman et al

Figure

14

?Slide15

Alternatives to the MIAM Approach

For the effect of context in a specific position:

Calculate p(

w

i

|other letters)for all wordsUse this to calculate p(l

jp

|context

)Pearl’s procedure:Calculate p(wi|all letters)Divide the contribution of position p back out when calculating p(ljp|context) for each position.This produces the correct marginals for each multinomial variable but doesn’t specify their joint distribution (see next slide)Slide16

Joint vs. marginal posterior probabilities

Can you make sense of the given features?

In the Rumelhart font, considering each position separately likely letters are:

{H,F}, {E,O}, {X,W}

Known words are

HOW, HEX, FEW, FOXThere are constraints between the word and letter possibilities not captured by just listing the marginal probabilitiesThese constraints are captured in samples from the joint posterior.Slide17

Some Key Concepts

A generative model as the basis for principled probabilistic inference

Perception as a probabilistic sampling process

A sample from the joint posterior as a compound hypothesis

Joint vs. marginal posteriors

Interactive neural networks as mechanisms that implement principled probabilistic sampling