Neural Networks and Language Understanding Do we need to rely on predetermined structured representations to understand and use language Psychology 209 2019 February 21 2019 The Fodor Chomsky Vision ID: 762083
Download Presentation The PPT/PDF document "Neural Networks and Language Understandi..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Neural Networks and Language Understanding: Do we need to rely on predetermined structured representations to understand and use language? Psychology 209 – 2019 February 21, 2019
The Fodor / Chomsky Vision We can understand and process sentences we’ve never heard before because We use a system of structure sensitive rules That processes sentences according to their structure and Composes meaning as an assemblage of parts whose meanings are already known Fodor’s example: The man loves the woman The woman loves the man He claims, among other things, that the meaning of the word ‘loves’ contributes the same thing to the meaning of the overall sentence in both cases.
Some sentences that pose problems for this view John loves Mary Mary loves John John loves ice cream She felt the baby kick John poured coffee into the container Jill put apples into the container I like going to the movies with friends I like eating spaghetti with meatballs I like eating C hinese food with chopsticks I saw the sheep grazing in the field I saw the grand canyon flying to New York
An alternative perspective The intention of a speaker is to convey information about a situation or event Words, and the order in which they come, are clues to aspects of meaning, but clues to any aspect of meaning can come from anywhere in the sentence A commitment to structure just gets in the way – a learned distributed representation that discovers how to capture sentence meaning is the best solution to the problem
Two ModelsThe Sentence Gestalt model St. John & McClelland (1990) Rabovsky , Hansen & McClelland (2016) The Google Neural Machine Translation system Wu et al, 2016 Gideon Lewis-Kraus, The Great AI Awakening
The Sentence Gestalt Model - theory ( based on McClelland , St. John, & Taraban , 1989; St. John & McClelland , 1990) Words as „ cues to meaning “ ( Rumelhart , 1979) that change the representation of sentence meaning ( corresponding to pattern of neural activity , modeled in artificial neural network ) Activation state implicitly represents subjective probability distributions over the semantic features of the event described by a sentence No assumption of specific format of the internal representation of sentences : Representation is not directly trained but instead used as basis to respond to probes (e.g., answer questions concerning described event ). Feedback only on responses to probes word-by-word update of a probabilistic representation of meaning with the goal to maximize the agreement between the true probability of each possible answer to each possible question and the estimates the network makes given the words seen so far
Simplified implemented model Uses a simple generative model of events and sentences that describe them Uses a simplified set of queries to constrain learning in the model (thematic roles) Other versions of the query model are possible including Accounting for information derived from other sources about an event or situation Questions we might be responsible for answering based on expectations of others
Input (74) u pdate network Hidden 1 (100) Sentence Gestalt (100) Probe (176) Hidden 2 (100) Output (176) query network Target (176) Sentences „The man plays chess .“ Events ( role filler pairs ): Agent: man, action : play , patient : chess
Learning Model is probed for all aspects of meaning of the event after every word language learner observes event and hears a sentence about it - learning based on comparison of comprehension output and eventanticipation of sentence meaningMinimum of cross-entropy error: Activation of each feature unit corresponds to the conditional probability of that feature in that situation (Rumelhart et al., 1995)In ideally trained model, change in activation at the SG layer induced by each incoming word would support accurate update in the probabilities of semantic features ‚cued‘ by that word
Input (74) Hidden 1 (100) Sentence Gestalt (100) Probe (176) Hidden 2 (100) Output (176) „The man“ The man plays chess . query network u pdate network
Input (74) Hidden 1 (100) Sentence Gestalt (100) Probe (176) Hidden 2 (100) Output (176) „The man“ The man plays chess . query network u pdate network
Input (74) Hidden 1 (100) Sentence Gestalt (100) Probe (176) Hidden 2 (100) Output (176) „The man“ The man plays chess . query network u pdate network
Input (74) Hidden 1 (100) Sentence Gestalt (100) Probe (176) Hidden 2 (100) Output (176) Agent? „The man“ The man plays chess . query network u pdate network
Input (74) Hidden 1 (100) Sentence Gestalt (100) Probe (176) Hidden 2 (100) Output (176) „The man“ The man plays chess . query network u pdate network Agent?
Input (74) Hidden 1 (100) Sentence Gestalt (100) Probe (176) Hidden 2 (100) Output (176) „The man“ The man plays chess . query network u pdate network Agent?
Input (74) Hidden 1 (100) Sentence Gestalt (100) Probe (176) Hidden 2 (100) Output (176) „The man“ The man plays chess . query network u pdate network Action?
Input (74) Hidden 1 (100) Sentence Gestalt (100) Probe (176) Hidden 2 (100) Output (176) „The man“ The man plays chess . query network u pdate network Action?
Input (74) Hidden 1 (100) Sentence Gestalt (100) Probe (176) Hidden 2 (100) Output (176) „The man“ The man plays chess . query network u pdate network Action?
Input (74) Hidden 1 (100) Sentence Gestalt (100) Probe (176) Hidden 2 (100) Output (176) „The man“ The man plays chess . query network u pdate network Patient?
Input (74) Hidden 1 (100) Sentence Gestalt (100) Probe (176) Hidden 2 (100) Output (176) „The man“ The man plays chess . query network u pdate network Patient?
Input (74) Hidden 1 (100) Sentence Gestalt (100) Probe (176) Hidden 2 (100) Output (176) „The man“ The man plays chess . query network u pdate network Patient?
Input (74) Hidden 1 (100) Sentence Gestalt (100) Probe (176) Hidden 2 (100) Output (176) „ plays “ The man plays chess . query network u pdate network
Input (74) Hidden 1 (100) Sentence Gestalt (100) Probe (176) Hidden 2 (100) Output (176) „ plays “ The man plays chess . query network u pdate network
Input (74) Hidden 1 (100) Sentence Gestalt (100) Probe (176) Hidden 2 (100) Output (176) „ plays “ The man plays chess . query network u pdate network
Input (74) Hidden 1 (100) Sentence Gestalt (100) Probe (176) Hidden 2 (100) Output (176) „ plays “ The man plays chess . query network u pdate network Action?
Input (74) Hidden 1 (100) Sentence Gestalt (100) Probe (176) Hidden 2 (100) Output (176) „ plays “ The man plays chess . query network u pdate network Action?
Input (74) Hidden 1 (100) Sentence Gestalt (100) Probe (176) Hidden 2 (100) Output (176) „ plays “ The man plays chess . query network u pdate network Action?
Input (74) Hidden 1 (100) Sentence Gestalt (100) Probe (176) Hidden 2 (100) Output (176) „ plays “ The man plays chess . query network u pdate network Patient?
Input (74) Hidden 1 (100) Sentence Gestalt (100) Probe (176) Hidden 2 (100) Output (176) „ plays “ The man plays chess . query network u pdate network Patient?
Input (74) Hidden 1 (100) Sentence Gestalt (100) Probe (176) Hidden 2 (100) Output (176) „ plays “ The man plays chess . query network u pdate network Patient?
Input (74) Hidden 1 (100) Sentence Gestalt (100) Probe (176) Hidden 2 (100) Output (176) „ chess “ The man plays chess . u pdate network query network
Input (74) Hidden 1 (100) Sentence Gestalt (100) Probe (176) Hidden 2 (100) Output (176) „ chess “ The man plays chess . query network u pdate network
Input (74) Hidden 1 (100) Sentence Gestalt (100) Probe (176) Hidden 2 (100) Output (176) „ chess “ The man plays chess . query network u pdate network
Input (74) Hidden 1 (100) Sentence Gestalt (100) Probe (176) Hidden 2 (100) Output (176) „ chess “ The man plays chess . query network u pdate network Patient?
Input (74) Hidden 1 (100) Sentence Gestalt (100) Probe (176) Hidden 2 (100) Output (176) „ chess “ The man plays chess . query network u pdate network Patient?
Input (74) Hidden 1 (100) Sentence Gestalt (100) Probe (176) Hidden 2 (100) Output (176) „ chess “ The man plays chess . query network u pdate network Patient?
Sentences could be active or passive, constituents can be vaguely identified or may be left out it strongly implied . The model could use word order and meaning as well as syntactic markers, capturing event constraints and using context to disambiguate Assessment of each participant depended on all words in sentence (next slide) St J & McC Corpus and Results
Changing interpretations of role fillers as a sentence unfolds
Limitations and Alternatives The query language appears to build in a commitment to structure I see this as a limitation – queries of all kinds, posed in many different kinds of ways, are likely to be the source of teaching information for real human learners Machine translation might seem to offer one solution to this problem, but may not really require sufficient attention to meaning. Other approaches are definitely being explored, including various kinds of question-answering systems . What approaches do you think might be interesting?
Google’s Neural Machine Translation System
Which is the original, and which is the result of E –> J –> E translation? Kilimanjaro is a mountain of 19,710 feet covered with snow and is said to be the highest mountain in Africa. The summit of the west is called “ Ngaje Ngai” in Masai , the house of God. Near the top of the west there is a dry and frozen dead body of a leopard . No one has ever explained what the leopard wanted at that altitude .Kilimanjaro is a snow-covered mountain 19,710 feet high, and is said to be the highest mountain in Africa. Its western summit is called the Masai “Ngaje Ngai,” the House of God. Close to the western summit there is the dried and frozen carcass of a leopard. No one has explained what the leopard was seeking at that altitude.I added two articles to the translation.
Ideas in the GNMT systemsSequence-to-Sequence model Blissfully uncommitted to any structure whatsoever Attention and Bi-directionality Is some structure sneaking in? Words seem to have a special status, but in written text, words do appear to have external reality They still pose problems, however
Sequence to Sequence Model of Sutskever, Vinyals & Le Some details: four stacked LSTM’s; different LSTMs on the encoding and decoding side A sentence Gestalt-like representation
The Wu et al GNMT model:Attention, Bi-directionality, and Skip-connections
A Success and a Failure Does well with this kind of case: The hen that chased the dog was too shaggy La gallina que perseguia al perro era demasiado peludaBut not this kind:The trombone did not fit in the suitcase because it was too smallEl trombon no cabia en la maleta porque era demasiado pequenoThis seems to indicate it is not fully processing the semantics -- meaning is necessary to determine the correct referent here (but more work is needed…)The best system may ultimately have to be responsible for understanding the meaning of the sentence, rather than just producing translations.
What is the N400? “A temporally delimited electrical snapshot of the intersection of a feedforward flow of stimulus-driven activity with a state of the distributed, dynamically active neural landscape that is semantic memory” Tanya asks: What does this mean??? The N400 component of the ERP
Our account : Change of a representation of meaning that implicitly and probabilistically represents all aspects of meaning of the event described by a sentence The N400 component of the ERP
Modulating variables Semantic violations, Contextual fit, Frequency,... Meaning processing Functional basis ? Lexical access ? (Lau et al., 2008) Semantic inhibition ? (Debruille, 2007) Semantic integration? (Baggio & Hagoort, 2011) The N400 component of the ERP
N400 correlate: High probability Low probability Semantic violation
Model environment ( results based on 10 runs , each trained on 800000 sentences)
Experimental Manipulation N400 (empirical) Semantic update Semantic congruency incongruent > congruent Cloze probability low > high Position in sentence early > late Categorical relation of incongruent completion incongr . unrel > incongr . rel Cloze prob. of i ndefinitie articles low > high Semantic illusion Congruent = illusion < incongr . Repetition first pres . > repetition Associative priming unrelated > related Semantic priming related > related Lexical frequency high < low Priming during chance performance unrelated > related Development Very young < young > old Semantic congruency x rep . interaction
N400 data:Incongruent > congruent ( Kutas & Hillyard , 1980) Simulation ( with 10 sentences / condition ), e.g.: „The girl eats the toast...“ „The girl eats the email...“ Semantic congruity
Semantic congruity : incongruent > congruent
Lexical frequency N400 data : Low frequency > high frequency (Van Petten & Kutas, 1990)Simulation:10 higher frequent words 10 lower frequent words Before: Blank stimulus default activation
Lexical frequency: Low > high
Some of your ideas Noah: How does all of this relate to Lieder et al’s work on Bayesian models of mismatch negativity? Rafael : Maybe the N400 reflects gating into an LSTM-like representation Sonja: Isn’t there a difference between suprising words and unknown words?
N400 data:Related < unrelated ( Bentin et al., 1985) Simulation (10 pairs / condition), e.g.:Semantically related: pine – oak Semantically unrelated : chess - oak Semantic p riming
Semantic p riming : Related < unrelated
Semantic congruity X repetition N400 data: incongruent > congruent sentence completions ( Kutas & Hillyard , 1980) 1st presentation > (delayed) repetition Incongruent ( 1st – repeated) > congruent (1 st – repeated) ( Besson et al., 1992) All congruent and incongruent sentences repeated after first round of presentations.
Semantic congruity X repetition Repetition effects as consequences of connection weight adaptations (McClelland & Rumelhart , 1985) Learning operative during first presentation BUT: Training assumes adaptation based on observed events – not true in experiment, and people can learn based on hearing or reading! Learning signal? Input (74) Hidden 1 (100) Sentence Gestalt (100) u pdate network
Learning signal? Frequent assumption: Learning based on prediction error ( Friston , 2005; McClelland, 1994; Schultz et al., 1997 ) Lukas: How does all this relate to Friston’s ideas? SG activation: implicit prediction of semantic features of event Change of activation induced by next word corresponds to prediction error (as far as revealed by that word)Temporal difference learning approach: Error signal for word n: SG n +1 – SG n Summed magnitude of error signal corresponds to N400 correlate ! Riley: I wonder about the learning dynamics of the proposed rule
Simulation results
“Semantic illusions”/ reversal anomalies N400 data: “Every morning at breakfast, the eggs would eat …” =< “ Every morning at breakfast, the boys would eat…” < “Every morning at breakfast, the boys would plant…” ( Kuperberg et al., 2003) N400: word meaning, not sentence meaning? ( Brouwer , Fitz, & Hoeks , 2012, Brouwer et al., in press) Simulation (8 sentences/ condition): “At breakfast, the egg eats…”“At breakfast, the boy eats…”“At breakfast, the boy plants…”10% passive sentences during training
“Semantic illusions”incong > rev. anom . >= cong. Hillary: What’s causing this?
“At breakfast, the egg eats…” Language processing can be shallow (Ferreira et al., 2002; Sanford & Stuart, 2002) P600 increase Error attributed to sentence structure? Re-analysis of the sentence? P600 as an instance of P3 (surprise & update in working memory)? N400 in reversal anomalies is consistent with N400 as update of representation of sentence meaning
More challenging ReversalAnomaly Case ‘ De vos op de stroper joeg ’ (‘The fox on the poacher hunted ’)‘De zieke in de chirurg sneede… ’ (‘The patient into the surgeon cut ’)
Development N400 data: I ncrease with comprehension skills in babies (Friedrich et al., 2009) Later: decrease with age from childhood through adulthood ( Atchley et al., 2006; Kutas & Iragui, 1998) Simulation: Influences of semantic congruity at different points in training (with 10 sentences/ condition), e.g. “The girl eats the toast .” “The g irl eats the email .”
Development: Very young < young > old More efficient connections small changes at SG sufficient to produce big changes in output activation (~ decreased activation with increased practice) N400 does not directly reflect change in explicit estimates of feature probabilities but the change of an internal representation that implicitly represents these probabilities such that they can be made explicit when queried.
Some further thoughts Rafael: Maybe the N400 reflects gating into an LSTM-like representation Lucas: How does all this relate to Karl Friston’s ideas? Noah: