/
Linguistic Networks Applications in NLP and CL Linguistic Networks Applications in NLP and CL

Linguistic Networks Applications in NLP and CL - PowerPoint Presentation

reportperfect
reportperfect . @reportperfect
Follow
353 views
Uploaded On 2020-08-27

Linguistic Networks Applications in NLP and CL - PPT Presentation

Monojit Choudhury Microsoft Research India monojitcmicrosoftcom light color red blue blood sky heavy weight 100 20 1 NLP vs Computational Linguistics Computational Linguistics is the study of ID: 804688

networks network language complex network networks complex language social words world degree nodes word topological distribution structure law human

Share:

Link:

Embed:

Download Presentation from below link

Download The PPT/PDF document "Linguistic Networks Applications in NLP ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Linguistic NetworksApplications in NLP and CL

Monojit ChoudhuryMicrosoft Research Indiamonojitc@microsoft.com

light

color

red

blue

blood

sky

heavy

weight

100

20

1

Slide2

NLP vs. Computational Linguistics

Computational Linguistics is the study of language using computers and language-using computersNLP is an engineering discipline that seeks to improve human-human, human-machine and machine-machine(?) communication by developing appropriate systems.

Slide3

Charting the World of NLP

Anaphora resolutionParsing

Spell-checking

Machine Translation

Graph Theory

Data mining

Supervised learning

Unsupervised learning

Slide4

Outline of the Talk

A broader picture of research in the merging grounds of language and computationComplex Network TheoryApplication of CNT in linguistics and NLPTwo case studies

Slide5

LINGUISTIC system

5evolution

lexica

learning

word

NLP

model

node

network

syntax

POS

@

complex

semanti

edge

bangla

PA

DD

zulu

I speak, therefore I am.

Production

Perception

Learning

Representation and Processing

Change & Evolution

Slide6

LINGUISTIC system

6evolution

lexica

learning

word

NLP

model

node

network

syntax

POS

@

complex

semanti

edge

bangla

PA

DD

zulu

I speak, therefore I am.

Production

Perception

Learning

Representation and Processing

Change & Evolution

Psycholinguistics

Neurolinguistics

Theo. Linguistics

Data Modeling

Socio/Dia. Linguistics

Games/Simulations

Slide7

Language is a Complex Adaptive System

Complex: Parts cannot explain the whole (reductionism fails)Emerges from the interactions of a huge number of interacting entitiesAdaptiveIt is dynamic in nature (evolves)The evolution is in response to the environmental changes (paralinguistic and extra-linguistic factors)

Slide8

Layers of Complexity

Linguistic Organization: phonology, morphology, syntax, semantics, …Biological Organization:Neurons, areas, faculty of language, brain, Social Organization:Individual, family, community, region, worldTemporal Organization: Acquisition, change, evolution

Slide9

Layers of Complexity

Linguistic Organization: phonology, morphology, syntax, semantics, …Biological Organization:Neurons, areas, faculty of language, brain, Social Organization:Individual, family, community, region, worldTemporal Organization: Acquisition, change, evolution

Linguists

Neuroscientist

Psychologist

Physicist

Social scientist

Computer Scientists

Slide10

Complex System View of Language

Emerges through interactions of entitiesMicroscopic view: individual’s utterancesMesoscopic view: linguistic entities (words, phones)Macroscopic view: language as a whole (grammar and vocabulary)

Slide11

Complex Network Models

Nodes: Social entities (people, organization etc.)Edges: Interaction/relationship between entities (Friendship, collaboration)

Courtesy: http://blogs.clickz.com

11

Slide12

Linguistic Networks

light

color

red

blue

blood

sky

heavy

weight

100

20

1

12

Slide13

Complex Network Theory

Handy toolbox for modeling complex systemsMarriage of Graph theory and StatisticsComplex because:Non-trivial topologyDifficult to specify completelyUsually large (in terms of nodes and edges)Provides insight into the nature and evolution of the system being modeled

13

Slide14

Internet

14

Slide15

9-11 Terrorist Network

Social Network Analysis is a mathematical methodology for

connecting the dots

-- using science to fight terrorism. Connecting multiple pairs of dots soon reveals an emergent

network

of organization.

15

Slide16

What Questions can be asked

Do these networks display some symmetry?

Are these networks creation of intelligent objects (

by design) or have emerged (self-organized)?

How have these networks emerged:

What are the underlying

simple rules leading to their complex structure?

16

Slide17

Bi-directional Approach

Analysis of the real-world networksGlobal topological propertiesCommunity structureNode-level propertiesSynthesis of the network by means of some simple rulesSmall-world models ……..Preferential attachment models

17

Slide18

Application of CNT in Linguistics - I

Quantitative & Corpus linguisticsInvariance and typologyProperties of NL CorporaNatural Language ProcessingUnsupervised methods for text labeling (POS tagging, NER, WSD, etc.)Textual similarity (automatic evaluation, document clustering)Evolutionary Models (NER, multi-document summarization)

18

Slide19

Application of CNT in Linguistics - II

Language EvolutionHow did sound systems evolve?Development of syntaxLanguage ChangeInnovation diffusion over social networksLanguage as an evolving networkLanguage AcquisitionPhonological acquisitionEvolution of the mental lexicon of the child

19

Slide20

Linguistic Networks

NameNodesEdgesWhy?PhoNetPhonemeCo-occurrence likelihood in languages

Evolution of

sound systemsWordNet

WordsOntological relationHost of NLP applications

Syntactic NetworkWords

Similarity between syntactic contextsPOS Tagging

Semantic NetworkWords, Names

Semantic relationIR, Parsing, NER, WSD

Mental LexiconWords

Phonetic similarity and semantic relationCognitive modeling,

Spell CheckingTree-banksWords

Syntactic Dependency linksEvolution of syntaxWord Co-occurrence

WordsCo-occurrence

IR, WSD, LSA, …

20

Slide21

Case Study IWord co-occurrence Networks

21

Slide22

Word Co-occurrence Network

22

word

language

in

human

treat

as

is

can

evolving

neighboring

distinct

interacting

web

sentences

such

structure

a

complex

network

Proc of the Royal Society of London B,

268

, 2603-2606,

2001

Words are nodes.Two words are connected by an edge if they are adjacent in a sentence (directed, weighted)

Slide23

Topological characteristics of WCN

23R. Ferrer-i-Cancho and R. V. Sole. The small world of human language. Proceedings of The Royal Society of London. Series B, Biological Sciences, 268(1482):2261 -2265, 2001R. Ferrer-i-Cancho and R. V. Sole. Two regimes in the frequency of words and the origin of complex lexicons: Zipf's

law revisited.

Journal of Quantitative Linguistics, 8:165 - 173, 2001

WCN for human languages are small world

 accessing mental lexicon is fast.The

degree distribution of WCN follows

two-regime power law  core and peripheral lexicon

Slide24

Degree Distribution (DD)

Let pk be the fraction of vertices in the network that has a degree k. The k versus pk plot is defined as the degree distribution of a network

For most of the real world networks these distributions are right skewed with a long right tail showing up values far above the mean –

pk

varies as k-

αCumulative degree distribution is plotted

Slide25

Compute the degree distribution of the following network

wordlanguage

in

human

treat

as

is

can

evolving

neighboring

distinct

interacting

web

sentences

such

structure

a

complex

Slide26

A Few Examples

Power law:

P

k

~

k

-

α

Slide27

WCN has two regime power-law

27High degree words form the core lexicon

Low degree words form the peripheral lexicon

Slide28

Core-periphery Structure

Core: A densely connected set of fewer nodesPeriphery: A large number of nodes sparsely connected to core-nodesFractal Networks: Recursive core-periphery structure28

ML has a

core-periphery

structure (perhaps recursive)Core lexicon = function words plus generic conceptsPeripheral lexicon = jargons, specialized vocabulary

Slide29

Topological characteristics of WCN

29R. Ferrer-i-Cancho and R. V. Sole. The small world of human language. Proceedings of The Royal Society of London. Series B, Biological Sciences, 268(1482):2261 -2265, 2001R. Ferrer-i-Cancho and R. V. Sole. Two regimes in the frequency of words and the origin of complex lexicons: Zipf's

law revisited.

Journal of Quantitative Linguistics, 8:165 - 173, 2001

The degree distribution

of WCN follows two-regime power law  core and peripheral

lexiconWCN for human languages are small world

 accessing mental lexicon is fast.

Slide30

Small World PhenomenonA Network is small world

iff it hasScale-free (power law) degree distributionHigh clustering coefficientSmall diameter (average path length)

Slide31

Measuring Transitivity: Clustering Coefficient

The clustering coefficient for a vertex ‘v’ in a network is defined as the ratio between the total number of connections among the neighbors of ‘v’ to the total number of possible connections between the neighborsHigh clustering coefficient means my friends know each other with high probability – a typical property of social networks

Slide32

Mathematically…

The clustering coefficient of a vertex i isThe clustering coefficient of the whole network is the averageAlternatively,

C

i

=

#

of links between ‘n’ neighbors

n(n-1)/2

C

=

1

N

∑C

i

C

=

#

triangles in the n/w

# triples in the n/w

Slide33

Diameter of a Network

Diameter of a network is the length of the longest smallest path among all pairs of vertices.A network with N nodes is said to be small world if the diameter scales as log(N)6 degrees of separation!33

word

language

in

human

treat

as

is

can

evolving

neighboring

distinct

interacting

web

sentences

such

structure

a

complex

network

Slide34

Which of these are Small World N/ws?

34word

in

web

such

structure

a

complex

Path (or line graph)

word

language

in

human

treat

as

is

can

neighboring

web

sentences

such

structure

a

complex

network

Tree

language

inhumantreat

asiscanweb

sentences

such

Star

Slide35

WCN are small worlds!

Activation of any word will need only a very few steps to activate any other word in the networkThus, spreading of activation is really fastLesson: ML has a topological structure that supports very fast spreading of activation and thus, very fast lexical access.35

Slide36

Self-organization of WCN

Dorogovtsev-Mendes Model36

word

language

in

human

treat

as

is

can

evolving

neighboring

distinct

interacting

web

sentences

such

structure

a

complex

network

Proc of the Royal Society of London B,

268

, 2603-2606,

2001

* A new node joins the network at every time step t.* It attaches to an existing node with probability proportional to degree* ct new edges are added proportional to degrees of existing nodes

Slide37

DM Model leads to two regime power-law networks

37kcross ≈ √(

ct

)(2+ct)

3/2

kcut

∼ √(t

/8)(ct)

3/2

Slide38

Significance of The DM Model

Topological significanceApart from degree distribution, what other properties of WCN can and cannot be explained by the DM modelLinguistic and Cognitive SignificanceWhat linguistic/cognitive phenomenon is being modeled here?What is the significance of the parameter c.38

Slide39

Structural Equivalence (Similarity)

Two nodes are said to be exactly structurally equivalent if they have the same relationships to all other nodes.

Computation

:

Let A be the adjacency matrix.

Compute the Euclidean Distance /Pearson Correlation between a pair or rows/columns representing the neighbor profile of two nodes (say i and j). This value shows how much structurally similar i and j are.

Slide40

Probing Deeper than Degree Distribution

Co-occurrence of words are governed by their syntactic and semantic propertiesTherefore, words occurring in similar context has similar properties (distribution)40

Structural Equivalence

: How similar are the

local neighborhood of the two nodes?

Social Roles

– Nodes (actors) in a social n/w who have similar patterns of relations (ties) with other nodes

Slide41

41

Structural Similarity Transform

Lesson

: DM Model cannot take into account the distributional properties of words and hence it is topologically different from WCNs

Degree distribution of real and DM networks after taking structural similarity transforms

Slide42

42

Spectral Analysis

Spectral Analysis shows that real networks are much more structured than those generated by DM Model

Reflects the global topology of the network through the distributions of

eigenvalues

and eigenvectors of the Adjacency matrix

Slide43

Global Topology of WCN: Beyond the two-regime power law

Choudhury et al., Coling 201043

Slide44

Significance of Parameter c in DM Model

t (also, #nodes) is actually the rate of seeing a new unigram (which varies with corpus size N)#Edges is the number of unique bigramsc is a function of N !!44

Slide45

Things you know

Topological properties:Degree distribution, Small world, Path lengths, Structural equivalence, core-periphery structure, fractal networks, spectrum of a networkTypes of networksPower-law, two-regime power-law, core-periphery, trees or hierarchical, small world, cliques, pathsNetwork Growth ModelsPreferential attachment, DM model

45

Slide46

Things to explore yourself

More node properties:Clustering coefficient: friends of friends are friendsCentrality: Degree, betweenness, eigenvector centralityTypes of NetworksAssortative, super-peerCommunity AnalysisDefinitions and Algorithms Random networks46

word

language

in

human

treat

as

is

can

evolving

neighboring

interacting

web

sentences

such

structure

a

complex

Slide47

Phonological Neighborhood Networks

2-4 segment words

8-10 segment words

Removal of low-degree nodes disconnect the n/w as opposed to the removal of hubs like “pastor” (deg. =112)

Slide48

CASE STUDY II:

Unsupervised POS Tagging

48

Slide49

Labeling of Text

Lexical Category (POS tags)Syntactic Category (Phrases, chunks)Semantic Role (Agent, theme, …)Sense Domain dependent labeling (genes, proteins, …)How to define the set of labels?How to (learn to) predict them automatically?

49

Slide50

What are Parts-of-Speech (POS)?

Distributional Hypothesis: “A word is characterized by the company it keeps” – Firth, 1957The X is a …You Y that, did not you?

Part-Of-Speech (POS) induction

Discovering natural morpho-syntactic classesWords that belong to these classes

50

Slide51

1: Acquire raw text corpus

In the context of network theory, a complex network is a network (graph) with non-trivial topological features—features that do not occur in simple networks such as lattices or random graphs. The study of complex networks is a young and active area of scientific research inspired largely by the empirical study of real-world networks such as computer networks and social networks. Most social, biological, and technological networks display sub-stantial non-trivial topological features, with patterns of connection between their elements that are neither purely regular nor purely random.

http://www.wikipedia.org/

বাংলা সাহিত্যের মধ্যযুগে বিশেষ এক শ্রেণীর ধর্মবিষয়ক আখ্যান কাব্য মঙ্গলকাব্য নামে পরিচিত।

বলা হয়ে থাকে, যে কাব্যে দেবতার আরাধনা, মাহাত্য-কীর্তন করা হয়, যে কাব্য শ্রবণেও মঙ্গল হয় এবং বিপরীতে হয় অমঙ্গল; যে কাব্য মঙ্গলাধার, এমন কি, যে কাব্য যার ঘরে রাখলেও মঙ্গল হয় তাকে বলা হয় মঙ্গলকাব্য।

মঙ্গলকাব্য বিশেষ হিন্দু দেবতা যারা “নিম্নকোটি” নামে পরিচিত ছিল তাদের মাহাত্ম বর্ণণায় ব্যবহৃত হত বলে ইতিহাসবিদেরা মনে করেন কেননা এগুলো শাস্ত্রীয় হিন্দু সাহিত্য যেমন বেদ ও পুরাণে অনুল্লেখ্য ছিল।

Slide52

1: Acquire raw text corpus

In the context of network theory, a complex network is a

network (

graph) with non-trivial topological features

—features that do not

occur in simple networks such as

lattices or random graphs

. The study of

complex networks is a young

and active area of

scientific research inspired largely by the empirical study of

real-world networks such as computer networks and

social networks. Most social,

biological, and

technological networks display sub-stantial non-trivial topological features

, with patterns

of connection between

their elements that are neither purely regular

nor purely random.

Feature word

http://en.wikipedia.org/wiki/Complex_network

Slide53

1: Acquire raw text corpus

In the context of network theory, a complex network is a

network (

graph) with non-trivial topological features

—features that do not

occur in simple networks

such as lattices

or random graphs. The

study of complex

networks is a young

and active area of

scientific research inspired largely by the empirical study of

real-world networks such as computer

networks and social

networks. Most social,

biological, and

technological networks display sub-stantial

non-trivial topological features, with

patterns of connection between

their elements that are neither

purely regular nor purely random

. Target wordFeature wordhttp://en.wikipedia.org/wiki/Complex_network

Slide54

2: Construct context vectors

In the context of network theory, a complex network is a

network (

graph) with non-trivial topological features

—features that do not

occur in simple networks

such as lattices

or random graphs. The

study of complex

networks is a young

and active area of

scientific research inspired largely by the empirical study of

real-world networks such as computer

networks and social

networks. Most social,

biological, and

technological networks display substantial non-trivial topological features,

with patterns of

connection between their

elements that are neither purely regular nor

purely random.

networksof

aand

isasPU

…the

-220

20

10

…0-1

0

00

00

0

0

1

0

0

1

1

0

1

0

2

0

1

0

0

2

0

0

Slide55

3: Construct network

graphs

pattern

display

lattices

graph

random

study

features

simple

complex

elements

occur

network

active

computer

regular

networks

inspired

young

most

social

area

substantial

purely

Words are nodes. The weight of the edge between nodes (words)

u

and

v

is:

sim

(

u,v

) =

cos

(

u

,

v

)

Slide56

Experiments

Cluster the NetworkHierarchical clusteringRandom walk based clusteringStudy the topological properties of the networks across languages

Develop unsupervised POS tagger

Slide57

Languages

Bangla (2M, ABP)Catalan (3M, LCC)Czech (4M, LCC)Danish (3M, LCC)Dutch (18M, LCC)English (6M, BNC)Finnish (11M, LCC)French (3M, LCC)German (40M, Wortschatz)Hindi (2M, DJ) Hungarian (18M, LCC)Icelandic (14M ,LCC)Italian (9M, LCC)Norwegian (16M, LCC)

Spanish (4.5M, LCC)Swedish (3M, LCC)

57

http://wortschatz.uni-leipzig.de/~cbiemann/software/unsupos.html

Slide58

Structural Properties: Degree Distribution

58Pk

k

Power-law with exponent -1

(Zipf Distribution)

Inference

: Hierarchical organization of the morpho-syntactic ambiguity classes.

Slide59

Structural Properties:

Clustering Coefficient59CC

k

Avg. CC = 0.53

High k  High CC

(Pearson = 0.49)

Community structure;

Frequent words connect to frequent words (rich club phenomenon),

Existence of a large core

Slide60

Clustering AlgorithmsCrisp/hard vs. Fuzzy/soft

Hierarchical vs. non-hierarchicalDivisive vs. AgglomerativePopular strategiesk-meansHierarchical agglomerative clusteringSpectral clustering (Shi-Malik algorithm)

Slide61

Syntactic Network of Words

light

color

red

blue

blood

sky

heavy

weight

100

20

1

1

1 – cos(

red

,

blue

)

61

Slide62

The Chinese Whispers Algorithm

light

color

red

blue

blood

sky

heavy

weight

0.9

0.5

0.9

0.7

0.8

-0.5

62

Slide63

The Chinese Whispers Algorithm

light

color

red

blue

blood

sky

heavy

weight

0.9

0.5

0.9

0.7

0.8

-0.5

63

Slide64

The Chinese Whispers Algorithm

light

color

red

blue

blood

sky

heavy

weight

0.9

0.5

0.9

0.7

0.8

-0.5

64

Slide65

Structural Properties: Cluster Size Distribution

MSR-I TAB Presentation 200865sizerank

Power-law with exponent close to -1

Inference

: Fractal nature of the Network

Slide66

The Clusters

66Bangla

Finnish

German

English

kaksi

, kaksi-kolme

, viiteen,

vajaata, 22:een, miljoona

, 40-vuotiaan …Quantifiers (199)

Adjectives (590)

chinesischer, Deutscher

, nationalistischer, grüner

, tamilischer, indianischer

, amerikanischer …

গোলমালের, দাবির, আগুনের, ফলের,

মনোভাবের, দূষণের, ব্যয়ের, মাথার, কথার, বোধের …

(352) Genitive Nouns

(189) Adverbsdefiantly, steadily, uncertainly, abruptly, thoughtfully, neatly, uniformly, freely, upwards, aloud, sidelong, savagely …

Slide67

Proper Nouns

67Finnish

German

English

Eemil

, J-P, Benedictus

, Jarl, James, Kristian, Petra, El, Dave, Otto, Bo,

Mirka …First Names (919)

Acronyms (2884)

WIZO, IPOs, FDD, KDA, CIC, IMB, VDP, FIBT, DBAG, G7, DOG, WJC, Eucom

, WWF, BfV, L-Bank,

MuZ, ORH …

Blair, Singh, Azad, Chowdhury, Kumar, Ganguly, Khan, Gandhi, Das, Basu, Roy, Sen

, Bush, … (102) Surnames

(988) PlacesPunjab, Spain, Vienna, Chicago, Antarctica, Gibraltar, Carnegie, Zambia, North-East, England, Bangladesh, India, USA,

Yorks …

Bangla

Slide68

Clusters in Bangla

Cluster 1: Proper Nouns buddhabAbu, saurabha, rAkesha

Cluster 2: Noun-genitive

golamAlera (of problem),

dAbira

(of right), phalera

(of result)

Cluster 3: Quantifiers

sAtaTi (seven),

anekaguli (many),

3Ti (three)Cluster 4: Noun-locative

adhibeshane

(during the session), dalei

(in party), baktritAYe (in speech),

bhAShaNe (in speech)

Cluster 5: Infinitives

bhAbte (to think),

khete (to eat), jitate

(to win)

Slide69

Dendogram of POS in Bangla

Slide70

Lexicon Induction and LabelingFuzzy clusters define lexical categories

Induction of lexiconUse lexicon to train HMM in an unsupervised manner Evaluation: Tag perplexityResult: Improves accuracy of NER, Chunking etc. over no POS tagging, but supervised POS tagging still better

70

Slide71

Word Sense Disambiguation

Véronis, J. 2004. HyperLex: lexical cartography for information retrieval. Computer Speech & Language 18(3):223-252.Let the word to be disambiguated be “light”Select a subcorpus of paragraphs which have at least one occurrence of “light”Construct the word co-occurrence graph

71

Slide72

HyperLex

A beam of white light is dispersed into its component colors by its passage through a prism. Energy efficient light fixtures including solar lights, night lights, energy star lighting, ceiling lighting, wall lighting, lamps What enables us to see the

light and experience such wonderful shades of colors during the course of our everyday lives?

beam

colors

prism

dispersed

white

energy

lamps

fixtures

efficient

shades

72

Slide73

Hub Detection and MST

beam

colors

prism

dispersed

white

energy

lamps

fixtures

efficient

shades

light

colors

lamps

beam

prism

dispersed

white

shades

energy

fixtures

efficient

White

fluorescent

lights

consume less

energy

than incandescent

lamps

73

Slide74

Other Related Works

Solan, Z., Horn, D., Ruppin, E. and Edelman, S. 2005. Unsupervised learning of natural languages. PNAS, 102 (33): 11629-11634 Ferrer i Cancho, R. 2007. Why do syntactic links not cross? Europhysics LettersAlso applied to: IR, Summarization, sentiment detection and categorization, script evaluation, author detection, …

74

Slide75

One slide summaryComputer science has a much bigger role to play in understanding language than the scope of NLP today

A holistic research agenda in computational linguistics is the need of the hourResearch in linguistic networks is an emerging area with tremendous potentialsGraphs are amazing tools for visualization – and therefore teaching

Slide76

Resources

ConferencesTextGraphs, Sunbelt, EvoLang, ECCSJournalsPRE, Physica A, IJMPC, EPL, PRL, PNAS, QL, ACS, Complexity, Social Networks, Interaction Studies ToolsPajek, C#UNG, http://www.insna.org/INSNA/soft_inf.html Online Resources Bibliographies, courses on CNT

76

Slide77

Thank youmonojitc@microsoft.com