/
The  Galactic Dependencies The  Galactic Dependencies

The Galactic Dependencies - PowerPoint Presentation

briana-ranney
briana-ranney . @briana-ranney
Follow
343 views
Uploaded On 2020-01-16

The Galactic Dependencies - PPT Presentation

The Galactic Dependencies Treebanks Getting More Data by Synthesizing New Languages Dingquan Wang and Jason Eisner 1 We created The Galactic Dependencies Treebanks More than 50000 synthetic languages ID: 772954

verb pron adp aux pron verb aux adp propn noun punct english languages det language hindi dobj nsubj force

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "The Galactic Dependencies" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

The Galactic Dependencies Treebanks: Getting More Data by Synthesizing New Languages Dingquan Wang and Jason Eisner 1

We created …The Galactic Dependencies Treebanks!More than 50,000 synthetic languages!Resemble real languages, but not found on EarthEach has a corpus of dependency parsesIn the Universal Dependencies formatVertices are words labeled with POS tagsEdges are labeled syntactic relationships Provide train/dev/test splits, alignments, toolsWhy??? 2

Puzzle3Basic word order – SVO or SOV? Full syntactic grammar and morphology?How about this one? jin ave sekke verven anni m'orvikoon

Let’s cheat  (for now)4Basic word order – SVO or SOV? Full syntactic grammar?How about this one? jin ave sekke verven anni m'orvikoon AUX PRON VERB ADP PRON PROPN DET PRON PROPN PRON ADP AUX VERB

Help!5 ? Yer / PRON amos / AUX yjja / VERB Ajjx / PROPN aat/ADP orrr/PRON ./PUNCTPer/NOUN anni/VERB inn/ADP se/ NOUN in/PART hahh/CASE wee/ VERB ./ PUNCT Con/ VERB per/ NOUN aat / ADP Ajjx / PROPN “/ PUNCT tat/ PRON “/PUNCT yue/ADP han /NOUN ./PUNCT … S → NP VP VP → VP PP … 0. 9 0. 9 SVO . . .

Yer/PRON amos/AUX yjja/VERB Ajjx/PROPN aat /ADP orrr/PRON ./ PUNCT Per/ NOUN anni / VERB inn/ ADP se/ NOUN in/PART hahh/CASE wee/VERB ./PUNCT Con/VERB per/NOUN aat/ADP Ajjx/PROPN “/PUNCT tat/PRON “/PUNCT yue / ADP han / NOUN ./ PUNCT … Grammar Induction 6 6 Yer / PRON amos / AUX yjja / VERB Ajjx / PROPN aat / ADP orrr / PRON ./ PUNCT Per/ NOUN anni/VERB inn/ADP se/NOUN in/PART hahh/CASE wee/ VERB ./PUNCTCon/VERB per/NOUN aat/ADP Ajjx/PROPN “/PUNCT tat/PRON “/PUNCT yue/ADP han/NOUN ./PUNCT … S → NP VP VP → VP PP… 0. 9 0.2

Yer/PRON amos/AUX yjja/VERB Ajjx/PROPN aat /ADP orrr/PRON ./ PUNCT Per/ NOUN anni / VERB inn/ ADP se/ NOUN in/PART hahh/CASE wee/VERB ./PUNCT Con/VERB per/NOUN aat/ADP Ajjx/PROPN “/PUNCT tat/PRON “/PUNCT yue / ADP han / NOUN ./ PUNCT … Grammar Induction Unsupervised method (like EM) 7 7 Yer / PRON amos / AUX yjja / VERB Ajjx / PROPN aat / ADP orrr / PRON ./ PUNCT Per/ NOUN anni/VERB inn/ADP se/NOUN in/PART hahh/CASE wee/VERB ./PUNCTCon/VERB per/NOUN aat/ADP Ajjx/PROPN “/PUNCT tat/PRON “/PUNCT yue/ADP han/NOUN ./PUNCT … S → NP VP VP → VP PP… 0. 9 0.2

Grammar Induction8 Unsupervised method (like EM)Locally optimalHard to harness linguistic knowledgeMight use the latent variables in the “wrong” wayWon't follow syntactic conventions used by linguistsMight not even model syntax, but other things like topic How about a supervised method?

Grammar InductionSupervised method9 9 S → NP VP VP → VP PP … 0. 9 0.2 ( , ) You / PRON can/ AUX … Keep/ VERB Google/ PROPN … In/ ADP my/ PRON office/ NOUN … … S → NP VP VP → VP PP … 0. 9 0.2 ( , ) S → NP VP VP → VP PP … 0.1 0.01 / PRON / AUX … / VERB / PROPN … / ADP / PRON / NOUN … …

Grammar InductionUnsupervised method (like EM) Locally optimalHard to harness linguistic knowledge Might use the latent variables in the “wrong” wayWon't follow syntactic conventions used by linguistsMight not even model syntax, but other things like topicHow about a supervised method? Globally optimal (if objective is convex) Allows feature-rich discriminative model Imitates what it sees in supervised training data 10

What’s wrong?Each supervised training example is a (language, structure) pair.There are only about 7,000 languages on Earth.Only about 60 languages on Earth are labeled (have treebanks).Why Earth? 11

LuckilyWe are not alone12

LuckilyNot alone, we are 13

The Galactic Dependencies Treebanks v1.050,000+ synthetic languages In Universal Dependencies (UD) format14

Back to RealityThe languages should be diverse enough.The languages should be in the galaxy (in-domain)!15

Synthetic data elsewhere16 Computer VisionGenerating more data by rotating, enlarging…. ( , 6) ( , 6) ( , 6) ( , 6) real synthetic variants

Synthetic data elsewhere17 Computer VisionGenerating more data by rotating, enlarging….SpeechVocal Tract Length Perturbation ( Jaitly and Hinton, 2013)NLPbAbI (Weston et al., 2016)The 30M Factoid Question- Answer Corpus ( Serban et al., 2016)

One Way to Synthesize DataSuppose we have a real dataset:We can diversify by “mix-and-match”:Now do the same for languages! 18

Substrate & Superstrates(terms come from linguistics of creole languages)19 English — Substrate Hindi — Superstrate verb order Japanese — Superstrate noun order

You must feel the Force around you20

A English parse:21 nsubj nmod must AUX You PRON feel VERB around ADP you PRON aux the DET Force PROPN det dobj case Example: You must feel the Force around you Language: English

Permute the children of verbsSVO (English)  SOV (Hindi)22 nsubj nmod must AUX You PRON feel VERB around ADP you PRON aux the DET Force PROPN det dobj case Language: English

Permute the children of verbsSVO (English)  SOV (Hindi)23 nsubj nmod must AUX You PRON feel VERB around ADP you PRON aux the DET Force PROPN det dobj case Language: English

Permute the children of verbsSVO (English)  SOV (Hindi)24 nsubj nmod must AUX You PRON feel VERB around ADP you PRON aux the DET Force PROPN det dobj case New language: English [ Hindi/V ]

Permute the children of nounsPrepositions (English)  Postpositions (Japanese)25 nsubj nmod must AUX You PRON feel VERB around ADP you PRON aux the DET Force PROPN det dobj case New language: English [ Hindi/V ]

Permute the children of nounsPrepositions (English)  Postpositions (Japanese)26 nsubj nmod must AUX You PRON feel VERB around ADP you PRON aux the DET Force PROPN det dobj case New language: English [ Hindi/V ]

Permute the children of nounsPrepositions (English)  Postpositions (Japanese)27 around ADP case New language: English [ Hindi/V, Japanese/N ] nsubj nmod must AUX You PRON feel VERB you PRON aux the DET Force PROPN det dobj

Permute the children of nounsPrepositions (English)  Postpositions (Japanese)28 around nsubj nmod must AUX You PRON feel VERB you PRON aux the DET Force PROPN det dobj ADP case New language: English [ Hindi/V, Japanese/N ]

What do we get?29 New language: English[Hindi/V , Japanese/N]Start from 37 earthly languagesThanks to the Universal Dependencies project Mix and match: Lang1 [ Lang3/V , Lang2/N ] Yields a bout 37 3 ≈ 50,000 extraterrestrial languages “Galactic Dependencies” treebanksStill in Universal Dependencies Format

SVO (English)  SOV (Hindi)30 nsubj nmod must AUX You PRON feel VERB you PRON aux Force PROPN dobj

SVO (English)  SOV (Hindi)31 nsubj nmod must AUX You PRON feel VERB you PRON aux Force PROPN dobj

P( | ,Hindi)Sampling32 nsubj nmod AUX PRON VERB PRON aux PROPN dobj nsubj nmod AUX PRON VERB PRON aux PROPN dobj 5! How many possible orders ?

p( · | , Hindi)33 Order Prob. BOS S adj. S<V O<V SO adj. 1 1 1 1 0 0 1 1 1 1 1 0 0 0 1 0 0 1 0 1 O V S S V O S V O O V S O V S S V O 0 0 0 1 S:nsubj O:dobj V:VERB BOS: Beginning of sentence S V O

p( · | , Hindi)Each order has features as shownTrain a log-linear model on Hind treebankSample from it to reorder English trees34 Order Prob. BOS S adj. S<V O<V SO adj. SOV SVO OSV OVS VSO VOS 1 1 1 1 0 0 1 1 1 1 1 0 0 0 1 0 0 1 0 1 0 0 0 1 S:nsubj O:dobj V:VERB BOS: Beginning of sentence S V O

p( · | , Hindi)35 Order Prob. BOS S adj. S<V O<V SO adj. O V S S V O S V O O V S O V S S V O 0.8 0.1 0.03 0.03 0.02 0.02 S:nsubj O:dobj V:VERB BOS: Beginning of sentence S V O 1 1 1 1 0 0 1 1 1 1 1 0 0 0 1 0 0 1 0 1 0 0 0 1

How “good” are the synthetic languages?36

Evaluation: Parsability37 Is this language functional enough to survive during human/alien evolution?Less parsable  worse for communicationTrain a parser on some trees of the languageEvaluate UAS (unlabeled attachment score)on held-out trees of the same language

Evaluation: Parsability38

Evaluation: Diversity39 “Distance” between languagesuas(Y | X) = how well can we parse POS sequences from language Y using a parser trained on language X?dissimilarity(X,Y) = uas(Y | Y) – uas(Y | X)Let’s visualize the pairwise distances via a 2-D embedding (multidimensional scaling) are synthetic languages too boring? too weird?

Evaluation: Diversity40

41Are the synthetic languages useful?(proof of principle) (lots more fun we can all have in future!)

Single source transfer421-nearest-neighbor ?

Single source transfer 43 ?

Single source transfer44 ?

Selection methods45(weakly) Supervised Ask whether a (source) language provides a good parsing model for 100 labeled trees of the target language Use POS sequence only ( delexicalized )

Extend the learning curve46 Supervised:71.38

Results on English47

Future Work48

Limitations of version 1.049 Ideal V1.0

More adventurously 50Generate non-projective trees

More adventurously 51Generate non-projective treesBeyond mix-and-match of real languages

More adventurously 52Generate non-projective treesBeyond mix-and-match of real languages Change the substrate's vocabulary You must feel the Force around you Rea natr q uu m xdu Qelvu lzeabp rea a -> i e -> u i -> o o -> e u -> a b -> d g -> k p -> q …

More adventurously 53Generate non-projective trees Beyond mix-and-match of real languagesChange the substrate's vocabularyBeyond reordering: add/remove/relink words

More conservatively 54Correctly handle punctuation Jane loves her dog, Lexie . her dog, Lexie Jane loves. H er dog, Lexie Jane loves. ∧ ,

More conservatively 55 Correctly handle punctuationFollow Greenberg’s Typological universalsEnsure appropriate parsability and perplexityFiner-grained modeling of superstrate’s order(features consider tree depth, dependency length, etc.) "With overwhelmingly more than chance frequency, languages with dominant order VSO have the adjective after the noun." I read a book which was a lot of fun by her. I read a book by her which was a lot of fun . nmod nmod

Conclusion56 New dataset: More than 50,000 high-resource languages! Use them for something!How do we put linguistics back into NLP? (inductive bias for training on small data)Traditional idea: Put it into ML objective Our idea: Put it into the extra (synthetic) training data!

Acknowledgement57Contributors to the Universal Dependencies project Reviewers and editors of TACLConference organizers and audiencesMaster Yoda

With you the Force may be!58