/
Neural Networks and Language Understanding: Do we need to rely on predetermined structured Neural Networks and Language Understanding: Do we need to rely on predetermined structured

Neural Networks and Language Understanding: Do we need to rely on predetermined structured - PowerPoint Presentation

olivia-moreira
olivia-moreira . @olivia-moreira
Follow
343 views
Uploaded On 2019-11-01

Neural Networks and Language Understanding: Do we need to rely on predetermined structured - PPT Presentation

Neural Networks and Language Understanding Do we need to rely on predetermined structured representations to understand and use language Psychology 209 2019 February 21 2019 The Fodor Chomsky Vision ID: 762083

network 100 176 hidden 100 network hidden 176 sentence plays chess gestalt man query output pdate input probe semantic

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Neural Networks and Language Understandi..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Neural Networks and Language Understanding: Do we need to rely on predetermined structured representations to understand and use language? Psychology 209 – 2019 February 21, 2019

The Fodor / Chomsky Vision We can understand and process sentences we’ve never heard before because We use a system of structure sensitive rules That processes sentences according to their structure and Composes meaning as an assemblage of parts whose meanings are already known Fodor’s example: The man loves the woman The woman loves the man He claims, among other things, that the meaning of the word ‘loves’ contributes the same thing to the meaning of the overall sentence in both cases.

Some sentences that pose problems for this view John loves Mary Mary loves John John loves ice cream She felt the baby kick John poured coffee into the container Jill put apples into the container I like going to the movies with friends I like eating spaghetti with meatballs I like eating C hinese food with chopsticks I saw the sheep grazing in the field I saw the grand canyon flying to New York

An alternative perspective The intention of a speaker is to convey information about a situation or event Words, and the order in which they come, are clues to aspects of meaning, but clues to any aspect of meaning can come from anywhere in the sentence A commitment to structure just gets in the way – a learned distributed representation that discovers how to capture sentence meaning is the best solution to the problem

Two ModelsThe Sentence Gestalt model St. John & McClelland (1990) Rabovsky , Hansen & McClelland (2016) The Google Neural Machine Translation system Wu et al, 2016 Gideon Lewis-Kraus, The Great AI Awakening

The Sentence Gestalt Model - theory ( based on McClelland , St. John, & Taraban , 1989; St. John & McClelland , 1990) Words as „ cues to meaning “ ( Rumelhart , 1979) that change the representation of sentence meaning ( corresponding to pattern of neural activity , modeled in artificial neural network ) Activation state implicitly represents subjective probability distributions over the semantic features of the event described by a sentence No assumption of specific format of the internal representation of sentences : Representation is not directly trained but instead used as basis to respond to probes (e.g., answer questions concerning described event ). Feedback only on responses to probes word-by-word update of a probabilistic representation of meaning with the goal to maximize the agreement between the true probability of each possible answer to each possible question and the estimates the network makes given the words seen so far

Simplified implemented model Uses a simple generative model of events and sentences that describe them Uses a simplified set of queries to constrain learning in the model (thematic roles) Other versions of the query model are possible including Accounting for information derived from other sources about an event or situation Questions we might be responsible for answering based on expectations of others

Input (74) u pdate network Hidden 1 (100) Sentence Gestalt (100) Probe (176) Hidden 2 (100) Output (176) query network Target (176) Sentences  „The man plays chess .“ Events ( role filler pairs ): Agent: man, action : play , patient : chess

Learning Model is probed for all aspects of meaning of the event after every word language learner observes event and hears a sentence about it - learning based on comparison of comprehension output and eventanticipation of sentence meaningMinimum of cross-entropy error: Activation of each feature unit corresponds to the conditional probability of that feature in that situation (Rumelhart et al., 1995)In ideally trained model, change in activation at the SG layer induced by each incoming word would support accurate update in the probabilities of semantic features ‚cued‘ by that word

Input (74) Hidden 1 (100) Sentence Gestalt (100) Probe (176) Hidden 2 (100) Output (176) „The man“ The man plays chess . query network u pdate network

Input (74) Hidden 1 (100) Sentence Gestalt (100) Probe (176) Hidden 2 (100) Output (176) „The man“ The man plays chess . query network u pdate network

Input (74) Hidden 1 (100) Sentence Gestalt (100) Probe (176) Hidden 2 (100) Output (176) „The man“ The man plays chess . query network u pdate network

Input (74) Hidden 1 (100) Sentence Gestalt (100) Probe (176) Hidden 2 (100) Output (176) Agent? „The man“ The man plays chess . query network u pdate network

Input (74) Hidden 1 (100) Sentence Gestalt (100) Probe (176) Hidden 2 (100) Output (176) „The man“ The man plays chess . query network u pdate network Agent?

Input (74) Hidden 1 (100) Sentence Gestalt (100) Probe (176) Hidden 2 (100) Output (176) „The man“ The man plays chess . query network u pdate network Agent?

Input (74) Hidden 1 (100) Sentence Gestalt (100) Probe (176) Hidden 2 (100) Output (176) „The man“ The man plays chess . query network u pdate network Action?

Input (74) Hidden 1 (100) Sentence Gestalt (100) Probe (176) Hidden 2 (100) Output (176) „The man“ The man plays chess . query network u pdate network Action?

Input (74) Hidden 1 (100) Sentence Gestalt (100) Probe (176) Hidden 2 (100) Output (176) „The man“ The man plays chess . query network u pdate network Action?

Input (74) Hidden 1 (100) Sentence Gestalt (100) Probe (176) Hidden 2 (100) Output (176) „The man“ The man plays chess . query network u pdate network Patient?

Input (74) Hidden 1 (100) Sentence Gestalt (100) Probe (176) Hidden 2 (100) Output (176) „The man“ The man plays chess . query network u pdate network Patient?

Input (74) Hidden 1 (100) Sentence Gestalt (100) Probe (176) Hidden 2 (100) Output (176) „The man“ The man plays chess . query network u pdate network Patient?

Input (74) Hidden 1 (100) Sentence Gestalt (100) Probe (176) Hidden 2 (100) Output (176) „ plays “ The man plays chess . query network u pdate network

Input (74) Hidden 1 (100) Sentence Gestalt (100) Probe (176) Hidden 2 (100) Output (176) „ plays “ The man plays chess . query network u pdate network

Input (74) Hidden 1 (100) Sentence Gestalt (100) Probe (176) Hidden 2 (100) Output (176) „ plays “ The man plays chess . query network u pdate network

Input (74) Hidden 1 (100) Sentence Gestalt (100) Probe (176) Hidden 2 (100) Output (176) „ plays “ The man plays chess . query network u pdate network Action?

Input (74) Hidden 1 (100) Sentence Gestalt (100) Probe (176) Hidden 2 (100) Output (176) „ plays “ The man plays chess . query network u pdate network Action?

Input (74) Hidden 1 (100) Sentence Gestalt (100) Probe (176) Hidden 2 (100) Output (176) „ plays “ The man plays chess . query network u pdate network Action?

Input (74) Hidden 1 (100) Sentence Gestalt (100) Probe (176) Hidden 2 (100) Output (176) „ plays “ The man plays chess . query network u pdate network Patient?

Input (74) Hidden 1 (100) Sentence Gestalt (100) Probe (176) Hidden 2 (100) Output (176) „ plays “ The man plays chess . query network u pdate network Patient?

Input (74) Hidden 1 (100) Sentence Gestalt (100) Probe (176) Hidden 2 (100) Output (176) „ plays “ The man plays chess . query network u pdate network Patient?

Input (74) Hidden 1 (100) Sentence Gestalt (100) Probe (176) Hidden 2 (100) Output (176) „ chess “ The man plays chess . u pdate network query network

Input (74) Hidden 1 (100) Sentence Gestalt (100) Probe (176) Hidden 2 (100) Output (176) „ chess “ The man plays chess . query network u pdate network

Input (74) Hidden 1 (100) Sentence Gestalt (100) Probe (176) Hidden 2 (100) Output (176) „ chess “ The man plays chess . query network u pdate network

Input (74) Hidden 1 (100) Sentence Gestalt (100) Probe (176) Hidden 2 (100) Output (176) „ chess “ The man plays chess . query network u pdate network Patient?

Input (74) Hidden 1 (100) Sentence Gestalt (100) Probe (176) Hidden 2 (100) Output (176) „ chess “ The man plays chess . query network u pdate network Patient?

Input (74) Hidden 1 (100) Sentence Gestalt (100) Probe (176) Hidden 2 (100) Output (176) „ chess “ The man plays chess . query network u pdate network Patient?

Sentences could be active or passive, constituents can be vaguely identified or may be left out it strongly implied . The model could use word order and meaning as well as syntactic markers, capturing event constraints and using context to disambiguate Assessment of each participant depended on all words in sentence (next slide) St J & McC Corpus and Results

Changing interpretations of role fillers as a sentence unfolds

Limitations and Alternatives The query language appears to build in a commitment to structure I see this as a limitation – queries of all kinds, posed in many different kinds of ways, are likely to be the source of teaching information for real human learners Machine translation might seem to offer one solution to this problem, but may not really require sufficient attention to meaning. Other approaches are definitely being explored, including various kinds of question-answering systems . What approaches do you think might be interesting?

Google’s Neural Machine Translation System

Which is the original, and which is the result of E –> J –> E translation? Kilimanjaro is a mountain of 19,710 feet covered with snow and is said to be the highest mountain in Africa. The summit of the west is called “ Ngaje Ngai” in Masai , the house of God. Near the top of the west there is a dry and frozen dead body of a leopard . No one has ever explained what the leopard wanted at that altitude .Kilimanjaro is a snow-covered mountain 19,710 feet high, and is said to be the highest mountain in Africa. Its western summit is called the Masai “Ngaje Ngai,” the House of God. Close to the western summit there is the dried and frozen carcass of a leopard. No one has explained what the leopard was seeking at that altitude.I added two articles to the translation.

Ideas in the GNMT systemsSequence-to-Sequence model Blissfully uncommitted to any structure whatsoever Attention and Bi-directionality Is some structure sneaking in? Words seem to have a special status, but in written text, words do appear to have external reality They still pose problems, however

Sequence to Sequence Model of Sutskever, Vinyals & Le Some details: four stacked LSTM’s; different LSTMs on the encoding and decoding side A sentence Gestalt-like representation

The Wu et al GNMT model:Attention, Bi-directionality, and Skip-connections

A Success and a Failure Does well with this kind of case: The hen that chased the dog was too shaggy La gallina que perseguia al perro era demasiado peludaBut not this kind:The trombone did not fit in the suitcase because it was too smallEl trombon no cabia en la maleta porque era demasiado pequenoThis seems to indicate it is not fully processing the semantics -- meaning is necessary to determine the correct referent here (but more work is needed…)The best system may ultimately have to be responsible for understanding the meaning of the sentence, rather than just producing translations.

What is the N400? “A temporally delimited electrical snapshot of the intersection of a feedforward flow of stimulus-driven activity with a state of the distributed, dynamically active neural landscape that is semantic memory” Tanya asks: What does this mean??? The N400 component of the ERP

Our account : Change of a representation of meaning that implicitly and probabilistically represents all aspects of meaning of the event described by a sentence The N400 component of the ERP

Modulating variables Semantic violations, Contextual fit, Frequency,... Meaning processing Functional basis ? Lexical access ? (Lau et al., 2008) Semantic inhibition ? (Debruille, 2007) Semantic integration? (Baggio & Hagoort, 2011) The N400 component of the ERP

N400 correlate: High probability Low probability Semantic violation

Model environment ( results based on 10 runs , each trained on 800000 sentences)

Experimental Manipulation N400 (empirical) Semantic update Semantic congruency incongruent > congruent  Cloze probability low > high  Position in sentence early > late  Categorical relation of incongruent completion incongr . unrel > incongr . rel  Cloze prob. of i ndefinitie articles low > high  Semantic illusion Congruent = illusion < incongr .  Repetition first pres . > repetition  Associative priming unrelated > related  Semantic priming related > related  Lexical frequency high < low  Priming during chance performance unrelated > related  Development Very young < young > old  Semantic congruency x rep . interaction 

N400 data:Incongruent > congruent ( Kutas & Hillyard , 1980) Simulation ( with 10 sentences / condition ), e.g.: „The girl eats the toast...“ „The girl eats the email...“ Semantic congruity

Semantic congruity : incongruent > congruent

Lexical frequency N400 data : Low frequency > high frequency (Van Petten & Kutas, 1990)Simulation:10 higher frequent words 10 lower frequent words Before: Blank stimulus  default activation

Lexical frequency: Low > high

Some of your ideas Noah: How does all of this relate to Lieder et al’s work on Bayesian models of mismatch negativity? Rafael : Maybe the N400 reflects gating into an LSTM-like representation Sonja: Isn’t there a difference between suprising words and unknown words?

N400 data:Related < unrelated ( Bentin et al., 1985) Simulation (10 pairs / condition), e.g.:Semantically related: pine – oak Semantically unrelated : chess - oak Semantic p riming

Semantic p riming : Related < unrelated

Semantic congruity X repetition N400 data: incongruent > congruent sentence completions ( Kutas & Hillyard , 1980) 1st presentation > (delayed) repetition Incongruent ( 1st – repeated) > congruent (1 st – repeated) ( Besson et al., 1992)  All congruent and incongruent sentences repeated after first round of presentations.

Semantic congruity X repetition Repetition effects as consequences of connection weight adaptations (McClelland & Rumelhart , 1985)  Learning operative during first presentation BUT: Training assumes adaptation based on observed events – not true in experiment, and people can learn based on hearing or reading!  Learning signal? Input (74) Hidden 1 (100) Sentence Gestalt (100) u pdate network

Learning signal? Frequent assumption: Learning based on prediction error ( Friston , 2005; McClelland, 1994; Schultz et al., 1997 ) Lukas: How does all this relate to Friston’s ideas? SG activation: implicit prediction of semantic features of event Change of activation induced by next word corresponds to prediction error (as far as revealed by that word)Temporal difference learning approach: Error signal for word n: SG n +1 – SG n Summed magnitude of error signal corresponds to N400 correlate ! Riley: I wonder about the learning dynamics of the proposed rule

Simulation results

“Semantic illusions”/ reversal anomalies N400 data: “Every morning at breakfast, the eggs would eat …” =< “ Every morning at breakfast, the boys would eat…” < “Every morning at breakfast, the boys would plant…” ( Kuperberg et al., 2003)  N400: word meaning, not sentence meaning? ( Brouwer , Fitz, & Hoeks , 2012, Brouwer et al., in press) Simulation (8 sentences/ condition): “At breakfast, the egg eats…”“At breakfast, the boy eats…”“At breakfast, the boy plants…”10% passive sentences during training

“Semantic illusions”incong > rev. anom . >= cong. Hillary: What’s causing this?

“At breakfast, the egg eats…” Language processing can be shallow (Ferreira et al., 2002; Sanford & Stuart, 2002) P600 increase Error attributed to sentence structure? Re-analysis of the sentence? P600 as an instance of P3 (surprise & update in working memory)? N400 in reversal anomalies is consistent with N400 as update of representation of sentence meaning

More challenging ReversalAnomaly Case ‘ De vos op de stroper joeg ’ (‘The fox on the poacher hunted ’)‘De zieke in de chirurg sneede… ’ (‘The patient into the surgeon cut ’)

Development N400 data: I ncrease with comprehension skills in babies (Friedrich et al., 2009) Later: decrease with age from childhood through adulthood ( Atchley et al., 2006; Kutas & Iragui, 1998) Simulation: Influences of semantic congruity at different points in training (with 10 sentences/ condition), e.g. “The girl eats the toast .” “The g irl eats the email .”

Development: Very young < young > old More efficient connections  small changes at SG sufficient to produce big changes in output activation (~ decreased activation with increased practice) N400 does not directly reflect change in explicit estimates of feature probabilities but the change of an internal representation that implicitly represents these probabilities such that they can be made explicit when queried.

Some further thoughts Rafael: Maybe the N400 reflects gating into an LSTM-like representation Lucas: How does all this relate to Karl Friston’s ideas? Noah: