/
CS460/626 : Natural Language CS460/626 : Natural Language

CS460/626 : Natural Language - PowerPoint Presentation

moistbiker
moistbiker . @moistbiker
Follow
343 views
Uploaded On 2020-08-28

CS460/626 : Natural Language - PPT Presentation

ProcessingSpeech NLP and the Web Lecture 1 Introduction Pushpak Bhattacharyya CSE Dept IIT Bombay 4 th Jan 2011 Persons involved Faculty instructors Dr Pushpak Bhattacharyya ID: 807277

attachment data language word data attachment word language nlp verb training disambiguation preposition features lexical based approach noun boy

Share:

Link:

Embed:

Download Presentation from below link

Download The PPT/PDF document "CS460/626 : Natural Language" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

CS460/626 : Natural Language Processing/Speech, NLP and the Web(Lecture 1 – Introduction)

Pushpak Bhattacharyya

CSE Dept.,

IIT

Bombay

4

th

Jan

,

2011

Slide2

Persons involvedFaculty instructors: Dr. Pushpak Bhattacharyya (www.cse.iitb.ac.in/~pb)TAs: Joydip Datta,

Debarghya

Majumdar

{

joydip,deb

}@

cse

Course home page (to be created)

www.cse.iitb.ac.in/~

cs626-460-2011

Slide3

Perpectivising NLP: Areas of AI and their inter-dependencies

Search

Vision

Planning

Machine Learning

Knowledge Representation

Logic

Expert Systems

Robotics

NLP

Slide4

Books etc.Main Text(s):Natural Language Understanding: James Allan

Speech and NLP: Jurafsky and Martin

Foundations of Statistical NLP: Manning and Schutze

Other References:

NLP a Paninian Perspective: Bharati, Cahitanya and Sangal

Statistical NLP: Charniak

Journals

Computational Linguistics, Natural Language Engineering, AI, AI Magazine, IEEE SMC

Conferences

ACL, EACL, COLING, MT Summit, EMNLP, IJCNLP, HLT, ICON, SIGIR, WWW, ICML, ECML

Slide5

Allied Disciplines

Philosophy

Semantics, Meaning of “meaning”, Logic (syllogism)

Linguistics

Study of Syntax, Lexicon, Lexical Semantics etc.

Probability and Statistics

Corpus Linguistics, Testing of Hypotheses, System Evaluation

Cognitive Science

Computational Models of Language Processing, Language Acquisition

Psychology

Behavioristic insights into Language Processing, Psychological Models

Brain Science

Language Processing Areas in Brain

Physics

Information Theory, Entropy, Random Fields

Computer Sc. & Engg.

Systems for NLP

Slide6

Topics proposed to be coveredShallow ProcessingPart of Speech Tagging and Chunking using HMM, MEMM, CRF, and Rule Based Systems

EM Algorithm

Language Modeling

N-grams

Probabilistic CFGs

Basic

Speech Processing

Phonology and Phonetics

Statistical Approach

Automatic Speech Recognition and Speech Synthesis

Deep ParsingClassical Approaches: Top-Down, Bottom-UP and Hybrid MethodsChart Parsing,

Earley ParsingStatistical Approach: Probabilistic Parsing, Tree Bank Corpora

Slide7

Topics proposed to be covered (contd.)Knowledge Representation and NLP

Predicate Calculus, Semantic Net, Frames, Conceptual Dependency, Universal Networking Language (UNL)

Lexical Semantics

Lexicons, Lexical Networks and Ontology

Word Sense Disambiguation

Applications

Machine Translation

IR

Summarization

Question Answering

Slide8

GradingBased onMidsemEndsemAssignmentsPaper-reading/Seminar

Except the first two everything else in groups of 4.

Weightages

will be revealed soon.

Slide9

Definitions etc.

Slide10

What is NLPBranch of AI2 GoalsScience Goal: Understand the way language operatesEngineering Goal: Build systems that analyse and generate language; reduce the man machine gap

Slide11

The famous Turing Test: Language Based Interaction

Machine

Human

Test conductor

Can the test conductor find out which is the machine and which

the human

Slide12

Inspired Elizahttp://www.manifestation.com/neurotoys/eliza.php3

Slide13

Inspired Eliza (another sample interaction)A Sample of Interaction:

Slide14

“What is it” question: NLP is concerned with Grounding

Ground the language into perceptual, motor and cognitive capacities.

Slide15

Grounding

Chair

Computer

Slide16

Two Views of NLP and the Associated ChallengesClassical ViewStatistical/Machine Learning View

Slide17

Stages of processingPhonetics and phonologyMorphologyLexical AnalysisSyntactic AnalysisSemantic AnalysisPragmaticsDiscourse

Slide18

PhoneticsProcessing of speechChallengesHomophones: bank (finance) vs. bank (river

bank)

Near Homophones:

maatraa

vs.

maatra (hin)

Word Boundaryaajaayenge (aa jaayenge (will come) or aaj aayenge (will come today)I got [ua]plate

Phrase boundarymtech1 students are especially exhorted to attend as such seminars are integral to one's post-graduate educationDisfluency: ah, um, ahem etc.

Slide19

MorphologyWord formation rules from root wordsNouns: Plural (boy-boys); Gender marking (czar-czarina)Verbs: Tense (

stretch-stretched);

Aspect (

e.g. perfective sit-had sat

); Modality (e.g.

request khaanaa

khaaiie)First crucial first step in NLPLanguages rich in morphology: e.g., Dravidian, Hungarian, TurkishLanguages poor in morphology: Chinese, EnglishLanguages with rich morphology have the advantage of easier processing at higher stages of processingA task of interest to computer science: Finite State Machines for Word Morphology

Slide20

Lexical AnalysisEssentially refers to dictionary access and obtaining the properties of the word e.g. dog noun (lexical property)

take-’s’-in-plural (morph property)

animate (semantic property)

4-legged (-do-)

carnivore (-do)

Challenge:

Lexical or word sense disambiguation

Slide21

Lexical DisambiguationFirst step: part of Speech DisambiguationDog as a noun (animal)

Dog

as a verb (

to pursue)

Sense Disambiguation

Dog (

as

animal)Dog (as a very detestable person)Needs word relationships in a context

The chair emphasised the need for adult educationVery common in day to day communicationsSatellite Channel Ad: Watch what you want, when you want (two senses of watch)

e.g., Ground breaking ceremony/research

Slide22

Technological developments bring in new terms, additional meanings/nuances for existing termsJustify as in justify the right margin (word processing context)Xeroxed: a new verbDigital Trace:

a new expression

Communifaking:

pretending to talk on mobile when you are actually not

Discomgooglation:

anxiety/discomfort at not being able to access internet

Helicopter Parenting

: over parenting

Slide23

Syntax Processing StageStructure Detection

S

NP

VP

V

NP

I

like

mangoes

Slide24

Parsing StrategyDriven by grammarS-> NP VPNP-> N | PRONVP-> V NP | V PPN-> MangoesPRON-> IV-> like

Slide25

Challenges in Syntactic Processing: Structural AmbiguityScope1.The old men and women were taken to safe locations(old men and women)

vs.

((old men) and women)

2. No smoking areas will allow Hookas inside

Preposition Phrase Attachment

I saw the boy with a telescope

(who has the

telescope?)I saw the mountain with a telescope

(world knowledge: mountain cannot be an instrument of seeing)I saw the boy with the pony-tail

(world knowledge: pony-tail cannot be an instrument of seeing)Very ubiquitous: newspaper headline “20 years later, BMC pays father 20 lakhs for causing son’s death”

Slide26

Structural Ambiguity…OverheardI did not know my PDA had a phone for 3 monthsAn actual sentence in the newspaperThe camera man shot the man with the gun when he was near Tendulkar

(P.G. Wodehouse,

Ring in Jeeves) Jill had rubbed ointment on Mike the Irish Terrier, taken a look at the goldfish belonging to the cook, which had caused anxiety in the kitchen by refusing its ant’s eggs…

(Times of India, 26/2/08)

Aid for kins of cops killed in terrorist attacks

Slide27

Headache for Parsing: Garden Path sentencesGarden PathingThe horse raced past the garden fell.The old man the boat.Twin Bomb Strike in Baghdad kill 25 (Times of India 05/09/07)

Slide28

Semantic AnalysisRepresentation in terms ofPredicate calculus/Semantic Nets/Frames/Conceptual Dependencies and ScriptsJohn gave a book to Mary

Give action: Agent: John, Object: Book, Recipient: Mary

Challenge: ambiguity in semantic role labeling

(Eng) Visiting aunts can be a nuisance

(Hin) aapko mujhe mithaai khilaanii padegii (ambiguous in Marathi and Bengali too; not in Dravidian languages)

Slide29

PragmaticsVery hard problemModel user intentionTourist (in a hurry, checking out of the hotel, motioning to the service boy): Boy, go upstairs and see if my sandals are under the divan. Do not be late. I just have 15 minutes to catch the train.Boy (running upstairs and coming back panting): yes sir, they are there.

World knowledge

WHY INDIA NEEDS A SECOND OCTOBER (

ToI, 2/10/07)

Slide30

DiscourseProcessing of sequence of sentences

Mother

to

John

:

John go to school. It is open today. Should you bunk? Father will be very angry.

Ambiguity of openbunk what?

Why will the father be angry? Complex chain of reasoning and application of world knowledge Ambiguity of

father father as parent

or father as headmaster

Slide31

Complexity of Connected Text John was returning from school dejected – today was the math test

He couldn’t control the class

Teacher shouldn’t have made him

responsible

After all he is just a janitor

Slide32

A look at Textual HumourTeacher (angrily): did you miss the class yesterday?Student: not much

A man coming back to his parked car sees the sticker "Parking fine". He goes and thanks the policeman for appreciating his parking skill.

Son

: mother, I broke the

neighbour's

lamp shade.

Mother

: then we have to give them a new one.Son: no need, aunty said the lamp shade is irreplaceable.

Ram: I got a Jaguar car for my unemployed youngest son.Shyam: That's  a great exchange!

Shane Warne should bowl maiden overs, instead of bowling maidens over

Slide33

Giving a flavour of what is done in NLP: Structure Disambiguation

Scope, Clause and Preposition/Postpositon

Slide34

Structure Disambiguation is as critical as Sense DisambiguationScope (portion of text in the scope of a modifier)Old men and women will be taken to safe locations

No smoking areas allow hookas inside

Clause

I told the child that I liked that he came to the game on time

Preposition

I saw the boy with a telescope

Slide35

Structure Disambiguation is as critical as Sense Disambiguation (contd.)Semantic roleVisiting aunts can be a nuisance

Mujhe

aapko

mithaai

khilaani

padegii (“I have to give you sweets” or “You have to give me sweets”)Postposition

unhone teji se bhaaagte hue chor

ko pakad liyaa (“he caught the thief that was running fast” or “he ran fast and caught the thief”)

All these ambiguities lead to the construction of multiple parse trees for each sentence and need semantic, pragmatic and discourse cues for disambiguation

Slide36

Higher level knowledge needed for disambiguationSemanticsI saw the boy with a pony tail (pony tail cannot be an instrument of seeing)

Pragmatics

((old men) and women)

as opposed to

(old men and women)

in “

Old men and women were taken to safe location”

, since women- both and young and old- were very likely taken to safe locationsDiscourse:

No smoking areas allow hookas inside, except the one in Hotel Grand.No smoking areas allow hookas inside, but not cigars.

Slide37

Preposition Attachment Disambiguation

Slide38

Problem definition4-tuples of the form V N1 P N2

saw (V) boys (N

1

) with (P) telescopes (N

2

)Attachment choice is between the matrix verb V and the object noun N1

Slide39

Lexical Association Table (Hindle and Rooth, 1991 and 1993)From a large corpus of parsed textfirst find all noun phrase headsthen record the verb (if any) that precedes the head

and the preposition (if any) that follows it

as well as some other syntactic information about the sentence.

Extract attachment information from this table of co-occurrences

Slide40

Example: lexical associationA table entry is considered a definite instance of the prepositional phrase attaching to the verb if:the verb definitely licenses the prepositional phrase

E.g. from Propbank,

absolve

frames

absolve.XX:

NP-ARG0 NP-ARG2-of obj-ARG1

1

absolve.XX NP-ARG0 NP-ARG2-of obj-ARG1 On Friday , the firms filed a suit *ICH*-1 against West Virginia in New York state court asking for [

ARG0 a declaratory judgment] [rel absolving] [ARG1 them] of [ARG2-of liability]

.

Slide41

Core stepsSeven different procedures for deciding whether a table entry is an instance of no attachment, sure noun attach, sure verb attach, or ambiguous attachable to extract frequency information, counting the number of times a particular verb or noun attaches with a particular preposition

Slide42

Core steps (contd.)These frequencies serve as the training data for the statistical model used to predict correct attachmentTo disambiguate a sentence, compute the likelihood of the particular preposition given the particular verb and contrast with the likelihood of the preposition given the particular noun

i.e., compare

P(with

|

saw)

with

P(with|telescope)

as in I saw the boy with a telescope

Slide43

CritiqueLimited by the number of relationships in the training corpora Too large a parameter spaceModel acquired during training is represented in a huge table of probabilities, precluding any straightforward analysis of its workings

Slide44

Approach based on Transformation Based Error Driven Learning, Brill and Resnick, COLING 1994

Slide45

Example Transformations

Initial attach-

ments by default

are to N1 pre-

dominantly.

Slide46

Transformation rules with word classes

Wordnet synsets

and

Semantic classes

used

Slide47

Accuracy values of the transformation based approach: 12000 training and 500 test examples

Method

Accuracy

#of transformation rules

Hindle and Rooth

(baseline)

70.4 to 75.8%

NA

Transformations

79.2

418

Transformations

(word classes)

81.8

266

Slide48

Maximum Entropy Based Approach: (Ratnaparki, Reyner, Roukos, 1994)Use more features than (V N1) bigram and (N1 P) bigramApply Maximum Entropy Principle

Slide49

Core formulationWe denote the partially parsed verb phrase, i.e., the verb phrase without the attachment decision, as a history h, and the conditional probability of an attachment as

P(d

|

h)

,

where

d and corresponds to a noun or verb attachment- 0 or 1- respectively.

Slide50

Maximize the training data log likelihood

--(1)

--(2)

Slide51

Equating the model expected parameters and training data parameters

--(3)

--(4)

Slide52

FeaturesTwo types of binary-valued questions:Questions about the presence of any n-gram of the four head words, e.g., a bigram maybe V == ‘‘is’’, P == ‘‘of’’Features comprised solely of questions on words are denoted as “word” features

Slide53

Features (contd.)Questions that involve the class membership of a head wordBinary hierarchy of classes derived by mutual information

Slide54

Features (contd.)Given a binary class hierarchy, we can associate a bit string with every word in the vocabulary

Then, by querying the value of certain bit positions we can construct

binary questions.

Features comprised solely of questions about class bits are denoted as “class” features, and features containing questions about both class bits and words are denoted as “mixed” features.

Slide55

Word classes (Brown et. al. 1992)

Slide56

Experimental data size

Slide57

Performance of ME Model on Test Events

Slide58

Examples of Features Chosen for Wall St. Journal Data

Slide59

Average Performance of Human & ME Model on300 Events of WSJ Data

Slide60

Human and ME model performance on consensus set for WSJ

Slide61

Average Performance of Human & ME Model on200 Events of Computer Manuals Data

Slide62

Back-off model based approach (Collins and Brooks, 1995) NP-attach: (joined ((the board) (as a non executive director)))

VP-attach:

((joined (the board)) (as a non executive director))

Correspondingly,

NP-attach:

1 joined board as director

VP-attach:

0 joined board as director

Quintuple of (attachment: A: 0/1, V, N1, P, N2)5 random variables

Slide63

Probabilistic formulation

Or briefly,

If

Then the attachment is to the noun, else to the verb

Slide64

Maximum Likelihood estimate

Slide65

The Back-off estimate

Inspired by speech recognition

Prediction of the

N

th

word from previous (N-1) words

Data sparsity problem

f(w

1

, w

2

, w3,…w

n) will frequently be 0 for large values on n

Slide66

Back-off estimate contd.

The cut off frequencies (c

1

, c

2

....) are thresholds determining whether to back-off or not at each level-

counts lower than

ci at stage i are deemed

to be too low to give an accurate estimate, so in this case

backing-off continues.

Slide67

Back off for PPT attachment

Note: the back off tuples always retain the preposition

Slide68

The backoff algorithm

Slide69

Lower and upper bounds on performance

Lower bound

(most frequent)

Upper bound

(human experts

Looking at 4 word

only)

Slide70

Results

Slide71

Comparison with other systems

Maxent,

Ratnaparkhi et. al.

Transformation

Learning,

Brill et. al.

Slide72

Flexible Unsupervised PP Attachment using WSD and Data Sparsity Reduction: (Medimi Srinivas and Pushpak Bhattacharyya, IJCAI 2007)Unsupervised approach (some way similar to Ratnaparkhi 1998): The training data is extracted from raw text

The unambiguous training data of the form V-P-N and N1-P-N2 TEACH the system how to resolve PP-attachment in ambiguous test data V-N1-P-N2

Refinement of extracted training data. And use of N2 in PP-attachment resolution process.

Slide73

Flexible Unsupervised PP Attachment using WSD and Data Sparsity Reduction: (Medimi Srinivas and Pushpak Bhattacharyya, IJCAI 2007)PP-attachment is determined by the semantic property of lexical items in the context of preposition using WordNet

An Iterative Graph based unsupervised approach is used for Word Sense disambiguation (Similar to Mihalcea 2005)

Use of a Data sparseness Reduction (DSR) Process which uses lemmatization, Synset replacement and a form of inferencing. DSRP uses WordNet.

Flexible use of WSD and DSR processes for PP-Attachment

Slide74

Graph based disambiguation: page rank based algorithm,

Mihalcea 2005

Slide75

Experimental setupTraining Data: Brown corpus (raw text). Corpus size is 6 MB, consists of 51763 sentences, nearly 1 million 27 thousand words.

Most frequent Prepositions in the syntactic context N1-P-N2:

of, in, for, to, with, on, at, from, by

Most frequent Prepositions in the syntactic context V-P-N:

in, to, by, with, on, for, from, at, of

The Extracted unambiguous N1-P-N2: 54030 and V-P-N: 22362

Test Data:

Penn Treebank Wall Street Journal (WSJ) data extracted by RatnaparkhiIt consists of V-N1-P-N2

tuples: 20801(training), 4039(development) and 3097(Test)

Slide76

Experimental setup contd.BaseLine: The unsupervised approach by Ratnaparkhi, 1998 (

Base-RP

).

Preprocessing

:

Upper case to lower case

Any four digit number less than 2100 as a year

Any other number or % signs are converted to

numExperiments are performed using DSRP: with different stages of DSRP Experiments are performed using GuWSD and DSRP: with different senses

Slide77

The process of extracting training data: Data Sparsity Reduction

Tools/process

Output

Raw Text

The professional conduct of the doctors is guided by Indian Medical Association.

POS Tagger

The_DT professional_JJ conduct_NN of_IN the_DT doctors_NNS is_VBZ guided_VBN by_ IN Indian_NNP Medical_NNP Association_NNP._.

Chunke

r

[The_DT professional_JJ conduct_NN ] of_IN [the_DT doctors_NNS ] (is_VBZ guided_VBN) by_IN [Indian_NNP Medical_NNP Association_NNP].

After replacing each chunk by its head word it results in:

conduct_NN of_IN doctors_NNS guided_VBN by_IN Association_NNP

Extraction Heuristics

N

1

PN

2

: conduct of doctors and

VPN

: guided by Association

Morphing

N

1

PN

2

: conduct of doctor and

VPN

: guide by association

DSRP (Synset Replacement)

N

1

PN

2

: {conduct, behavior} of {doctor, physician} can result in 4 combination with the same sense and similarly for

VPN

: {guide, direct} by {association} can result in 2 combinations with the same sense.

Slide78

Data Sparsity Reduction: Inferencing If V1-P-N

1

and V

2

-P-N

1

exist as also do V1

-P- N2 and V2-P-N2, then if

V3-P-Ni exist (i=1,2), then we can infer the existence of V3-P-N

J (i ≠ j) with a frequency count of V3-P-Ni that can be added to the corpus.

Slide79

Example of DSR by inferencingV1-P-N1: play in garden and V2-P-N1

:

sit in garden

V

1

-P-N2:

play in house and V2-P-N2: sit in houseV3-P-N2:

jump in house existsInfer the existence of V3-P-N1: jump in garden

Slide80

Results

Slide81

Effect of various processes on FlexPPAttach algorithm

Slide82

Precision vs. various processes