/
Better Word Representations with Recursive Neural Networks Better Word Representations with Recursive Neural Networks

Better Word Representations with Recursive Neural Networks - PowerPoint Presentation

lois-ondreau
lois-ondreau . @lois-ondreau
Follow
431 views
Uploaded On 2017-03-22

Better Word Representations with Recursive Neural Networks - PPT Presentation

Thang Luong Joint work with Richard Socher and Christopher D Manning Word frequencies in Wikipedia documents 986 million tokens And more indistinctly nondistinct indistinctive nondistinctive indistinctness ID: 528105

amp word words model word amp model words rare suf context pre huang distinctive stm collobert neural 2012 morphological

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Better Word Representations with Recursi..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Better Word Representations with Recursive Neural Networks for Morphology

Thang

Luong

Joint work with

Richard

Socher

and

Christopher D. ManningSlide2

Word frequencies in Wikipedia documents

(986 million tokens)Slide3

And more … indistinctly

, non-distinct, indistinctive, non-distinctive, indistinctness,

distincted

, semi-distinct, contra-distinction, indistinction, contradistinctiveProblem: these words are independent entities!

Word frequencies in Wikipedia documents

(986 million tokens)Slide4

(

Collobert

& Weston, 2010)

(Huang et. al., 2012)

distinct

different distinctive broader narrower

unique broad distinctive separate

distinctness

morphologies pesawat clefts pathologiescompanion roskam hitoshi enjoyedaffectexacerbate impacts characterize outweigh allow prevent involve enable unaffectedunnoticed dwarfed mitigated overwhelmed monti sheaths krystal south-southeast

Very successful in recent years for NLP tasks

Vector-space word representationsSlide5

Vector-space word representations

(

Collobert

& Weston, 2010)

(Huang et. al., 2012)

distinct

different distinctive broader narrower

unique broad distinctive separate

distinctnessmorphologies pesawat clefts pathologiescompanion roskam hitoshi enjoyedaffectexacerbate impacts characterize outweigh allow prevent involve enable unaffectedunnoticed dwarfed mitigated overwhelmed monti sheaths krystal south-southeast

Very successful in recent years for NLP tasks

Problem

: poorly estimate rare and complex wordsSlide6

Very successful in recent years for NLP

tasks

Problem

: poorly estimate rare and complex words

(

Collobert

& Weston, 2010)

(Huang et. al., 2012)

This workdistinctdifferent distinctive broader narrowerunique broad distinctive separatedivergent diverse distinctive homogeneousdistinctnessmorphologies pesawat clefts pathologiescompanion roskam hitoshi enjoyeddistinctiveness smallness largeness exactnessaffectexacerbate impacts characterize outweigh allow prevent involve enable decrease arise complicate extend

unaffectedunnoticed dwarfed mitigated overwhelmed

monti

sheaths

krystal

south-southeast

disaffected undisputed unopposed unrestricted

Goal

: capture both syntactics (word structure) and semantics

Vector-space word representationsSlide7

Our approach

Neural Language Model

Morphology Model

Compute vector representations for complex words on the fly.

unfortunately

the

bank

was

closed

 

 

Semantics

SyntacticsSlide8

Neural Language Model

Morphology Model

Our approach –

network structure

unfortunately

the

bank

was

closed

 

 

un

fortunate

unfortunate

ly

close

d

 

 

 

Neural Language Model

: simple feed-forward network (Huang, et al., 2012) with ranking-type cost (

Collobert

et al., 2011).

Morphology Model

:

r

ecursive

neural

network

(

Socher

et al., 2011

).Slide9

Unsupervised Morphological Structures

Our

morphoRNN

assumes words pre

*

stm

suf*

:

Morphological segmentations (pre* stm suf*)+ provided by Morfessor (Creutz & Lagus, ’07).Post-process words, including hyphenated ones.Learn meanings of unconventional morphemes:alpre in Arabic names: al-jazeera, al-salemrelatedsuf in compound adjectives: health-related, government-relatedSlide10

Experiments: Word Similarity Task

Word similarity dataset:

k

ing – queen: 8.58 king – cabbage: 0.23Datasets: WS-353

(Finkelstein et al., ’02

) MC

(Miller & Charles, ’91)RG (Rubenstein & Goodenough, ’65) SCWS* (Huang et al., ’12)

A

new rare word (RW) dataset:Metric: correlation between similarity scores given by our models and those assigned by human raters. word1untrackedunflaggingunprecedentedapocalypticalorganismaldiagonalobtainmentdiscernmentconfinementword2

inaccessibleconstant

new

prophetic

system

line

acquiring

knowing

restraintSlide11

Results

Start training from either:

HSMN embedding (Huang et al., 2012

).C&W embedding (Collobert et al., 2011).Slide12

Results

Start training from either:

HSMN embedding (Huang et al., 2012

).C&W embedding (Collobert et al., 2011).Slide13

Conclusions

Learned syntactic-semantic word vectors by combining:

RNN: models morphological structures of words.

NLM: learns semantics from the surrounding contexts.Introduced a new dataset of rare words.Future: apply to morphologically complex languages or in other English domains such as bio-NLP.

Words

(

Collobert

et

al., 2011)This workcommentinginsisting insisted focusing hintedcommented interviewing commentsunaffectedunnoticed dwarfed mitigateddisaffected undisputed unopposedheartlessnesscorruptive inhumanity ineffectual overawed saudi-ownedavatar mohajir kripalanisaudi-based syrian-controlled syrian-backedSlide14

Outline

More details (context-

sensitive

model)Context-insensitive model (first thing we tried)Rare word dataset

More results & AnalysisSlide15

Context-sensitive model

Feed-forward network:

Recursive neural network (RNN): parameter sharing

Objective: ranking-type cost

s(cat chills on a

mat

) > s(cat chills on a Sofia)Learning: back-propagation

unfortunately

the

bank

was

closed

 

 

un

fortunate

unfortunate

ly

close

d

 

 

 Slide16

Context-insensitive model

Objective: squared Euclidean distances

newly constructed vs. reference representations

Problem

: strongly bias towards syntactic agreement.

Similar to

(Lazaridou et al., ACL’13): compositional distributional semantic models

No recursive composition

: only an affix and a stemunfortunately

unfortunately

un

fortunate

unfortunate

ly

 

 

 

 Slide17

Rare Word Datasets

Datasets:

WS-353

(Finkelstein et al., ’02) MC (Miller & Charles, ’91)RG (Rubenstein &

Goodenough

,

’65) SCWS* (Huang et al., ’12) A new rare word (RW) datasetSlide18

Rare Word Dataset Construction

Select rare words

: grouped by

affixes and frequenciesEach word has a non-zero number of synsets in

WordNet

Form

pairs

: for

each rare word, first select a synsetthen select hypernyms, hyponyms, meronyms, and attributesCollect human judgments: Amazon Mechanical Turk3145 pairs rated by 10 people (US native speakers), 2034 pairs accepted.[6, 10][11, 100][101, 1000]un-untracked unrolls undissolved

unrehearsed unflagging unfavourable

unprecedented

unmarried

u

ncomfortable

-alapocalyptical traversals bestowalsacoustical extensional organismal

directional diagonal spherical-mentobtainment acquirement retrenchments

discernment revetment rearrangements

confinement establishment management

word1

untracked

unflagging

unprecedented

apocalyptical

organismal

diagonal

obtainment

discernment

confinement

word2

inaccessible

constant

new

prophetic

system

line

acquiring

knowing

restraintSlide19

ResultsSlide20

Analysis

Context-

insensitive

model: well enforces structural agreement, but ignore semantics.Context-sensitive

model: blends well syntactic (word structure) and semantic information.

Words

C&W

C&W + context-insensitive

C&W + context-sensitivecommentinginsisting insisted focusing hintedrepublishing accounting expoundingcommented interviewing commentsaffectedcaused plagued impacted damageddisaffected unaffected mitigated disturbedaffecting extended extending constrainedunaffectedunnoticed dwarfed mitigateddisaffected unconstrained uninhibiteddisaffected undisputed unopposedheartlessnessfearlessness vindictiveness restlessnesscorruptive inhumanity ineffectual overawed saudi-owned

avatar mohajir kripalanisaudi

-based

somaliland

al-

jaber

saudi

-based syrian-controlled syrian-backedSlide21

Possible extensions

Bio-NLP domain:

complicated but logical taxonomy

alpha-adrenergic, alpha-mpt, alpha-mpt-treated

dihydroxyphenylaline

,

dihydroxyphenylserineExtend the model from pre*

stm

suf* to (pre* stm suf*)+ .Jointly learn morpheme vectors and segmentations:Bad segmentations: disc+over, under+standPerhaps, a fast version of the split-merge heuristic in grammatical induction?Thank you!Slide22

Bilingual Word Representations

Objective: for each alignment, sum

LM

English: score(unfortunately the

bank

was

closed)LMFrench: score(malheureusement la banque a été

fermé

)Alignment:Weigh alignment constraints by alignment probabilities.Assume alignments are perfect for now? unfortunately the bank was closed malheureusement la banque a été ferméSlide23

Related Work

Factored

NLM (

Alexandrescu & Kirchhoff, HLT’06):each word: a vector of features (factors) such as stems, morphological

tags, and cases.

Concatenate vectors of factors,

no composition.Compositional distributional semantic models (Lazaridou et al., ACL’13): Similar to our context-insensitive model

No recursive composition

: only an affix and a stemSlide24

Unsupervised Morphological Structures

Utilize morphological segmentation toolkit:

Morfessor

(Creutz

&

Lagus

, 2007): (pre*

stm suf*)+Recursively split words: MDL-inspired objectiveWant input of the form pre* stm suf*:(1) Restrict segmentations pre* stm{1,2} suf*(2) Split hyphenated words A-B as Astm Bstm(3) Decide the main stem in

pre* Astm

B

stm

suf

*

Discover more interesting morphemes:alpre in Arabic names: al-jazeera, al-salem

related

suf

in compound adjectives: health-related, government-related

(4) Reject segmentation if

an affix or an unknown stem (not a word by itself) whose type count is below a predefined threshold.