/
Instructor: Instructor:

Instructor: - PowerPoint Presentation

alida-meadow
alida-meadow . @alida-meadow
Follow
404 views
Uploaded On 2015-11-14

Instructor: - PPT Presentation

Smaranda Muresan Columbia University smaracclscolumbiaedu Course Introduction 10TH DEGREE is a full service advertising agency specializing in direct and interactive marketing Located in Irvine CA 10TH DEGREE is looking for an Assistant Account Manager to help manage and coordinate int ID: 193016

discussion class negative week class discussion week negative positive research beliefs topic judges instructor case rabies text http papers meaning tweets office

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Instructor:" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Instructor: Smaranda MuresanColumbia Universitysmara@ccls.columbia.edu

Course IntroductionSlide2

10TH DEGREE is a full service advertising agency specializing in direct and interactive marketing. Located in Irvine CA, 10TH DEGREE is looking for an Assistant Account Manager to help manage and coordinate interactive marketing initiatives for a marquee automative account. Experience in online marketing, automative and/or the advertising field is a plus. Assistant Account Manager Responsibilities Ensures smooth implementation of programs and initiatives Helps manage the delivery of projects and key client deliverables … Compensation: $50,000-\$80,000

INDUSTRY

POSITION

LOCATION

COMPANYAdvertisingAssist. Account Manag.Irvine, CA10th DEGREE

Information Extraction: Identifying the instances of facts names/entities , relations and events from semi-structured or unstructured text; and convert them into structured representations (e.g. databases)

Natural Language Processing ApplicationsSlide3

Question Answering IBM’

s Watson

Won Jeopardy on February 16, 2011

Bram StokerSlide4

Watson has no discourse understanding

“Watson also tripped up on an

Olympic Oddities” answer, but so imperceptibly that Alex Trebek didn’t notice at first, raising an important point of clarification. After Jennings responded incorrectly that Olympian gymnast George

Eyser was “missing a hand”, Watson responded, “What is a leg?”http://www.wired.com/business/2011/02/watson-wrong-answer-trebek/Slide5

This ClassThe journalist William Finnegan has said about his profession (New Yorker, July 2,2012): ``You fish for facts and instead pull up boatloads of speculation, some of it well informed, much of it trailing tangled agendas. You end up reporting not so much what happened as what people think or imagine or say happened.'’

[Thanks Owen Rambow for this reference]

In this class we are interested in understanding communication through the eyes of the authors/speakers. Slide6

http://www.washingtonpost.com/blogs/erik-wemple/post/hurricane-sandy-nyse-not-flooded/2012/10/30/37532512-223d-11e2-ac85-e669876c6a24_blog.htmlSlide7

Syllabus Overviewhttp://www1.cs.columbia.edu/~smara/teaching/E6998/S14/Slide8

OutlineInstructor IntroductionBackground, Research InterestsStudent IntroductionsClass Overview

Class organizationWebsiteOffice Hours & TATopics covered in this classGradingSlide9

Instructor IntroResearcher at the Center for Computational Learning Systemshttp://www1.cs.columbia.edu/~smara

Broad research interests: computational semantics, language in social mediaSlide10

Some of my current research projectsrelevant to the course Detecting Contrary MeaningSlide11

Contrary meaning Explicit: Conflicting statements/beliefs overtly expressed in text

Implicit: Sarcasm

User: I'm so happy I'm going back to the emergency room User: Newspaper faces court over sleazing Facebook ? Facebook is so defenseless and innocent . Slide12

Prelim work on Sarcasm DetectionCan we automatically distinguish among sarcastic, positive and negative utterances?

Can we easily build a labeled corpus of naturally occurring sarcastic, positive and negative utterances?

(Gonzalez,

Muresan and Wacholder, 2011; Muresan et al., underreview) Slide13

Data collectionSlide14

How can we distinguish sarcastic, pos, and negative tweets?Lexical FeaturesPannebacker et al. (2007) LIWC lexicon (64 word categories grouped into

four general classes: Linguistic Processes (LP) (e.g., adverbs, pronouns),

Psychological Processes

(PP) (e.g., positive and negative emotions)Personal Concerns (PC) (e.g, work, achievement) Spoken Categories (SC) (e.g., disfluencies); WordNet Affect (WNA) (Strapparava and Valitutti, 2004)

list of interjections (e.g., ah, oh, yeah), and punctuations (e.g., !, ?). We merged all of the lists into a single dictionary. The token overlap between the words in combined dictionary and the words in the tweets was 85%.Slide15

How can we distinguish sarcastic, pos, and negative tweets?Pragmatic featuresEmoticons (, )

ToUser (@john)Slide16

Classification experimentsSeveral settingsS-N-P (900 example each; balanced datasets)

S-NS (NS contain 450 negative and 450 positive)S-N (900 example each)S-P (900 example each)2 classifiers

support vector machines (SVM),

and logistic regression (LogR). Features used: 1) unigrams; 2) presence of the dictionary-based lexical factors and pragmatic factors (LIWC+_P); 3) frequency of the dictionary-based lexical factors and pragmatic factors (LIWC+_F).4) combination of unigrams and presence featuresSlide17

ResultsSlide18

How hard is the task? Can humans do it?Slide19

Human performance on the taskTwo studies1) we asked 3 judges to classify 10% of our S-P-N datasets (90 randomly selected tweets per category). we also trained our SVM and

LogR classifiers using the remaining 90% of the data. 2) we asked another 3 judges to classify 10% of the S-NS dataset (90 per category. The NS category contained 45 positive and 45 negative tweets). We also trained SVM and LogR

on the remaining 90% of data Slide20

Humans on S-N-Poverall agreement of 50% was achieved among the three judges, with a Fleiss’ Kappa value of 0.4788 (p<.05). The average accuracy was 62.59%

When we considered only the 135 of 270 tweets on which all three judges agreed, the accuracy on the set they agreed on was 86.67%. (this can be an upper bound)Slide21

Humans on S-NSResults showed an agreement of 71.67% among the three judges with a Fleiss’ Kappa value of 0.5861 (p<.05). The average accuracy rate was 66.85% .

When we considered only cases where all three judges agreed (129 out of 180), the accuracy on the set they agreed on was 82.95%. (this can be un upper bound)Slide22

Humans vs Automatic Classification

S-N-P

S-NSSlide23

DiscussionHard task both for Automatic Measures and HumansSome judges reported specific difficulties:Lack of context (e.g., world knowledge; context of conversation)

Brevity of messagesOther issues/observations – We will have a whole class on Sarcasm Detection

Slide24

Detection Conflicting InformationExplicit Contrary Meaning

User1: A shooting has just occurred at the Occupy Oakland encampment

.

User2: Shootings happen in Oakland all the time and it had nothing to do with the Occupy movement.User1: This shooting does have something to do with the Occupy movement because many of the witness's are the Occupiers and it happened only a few yards away from the encampment.User3: On Twitter, Occupy Oakland has said the shooting was "related to

the occupation. Please keep this man in your thoughts." Slide25

ImpactConflicting statements/beliefs can signal:

anomalies in events (e.g., different theories about the cause of an event), anomalies in beliefs (change in beliefs), deception

/lying

misinformationmisconceptionSlide26

Recognizing Textual Entailment (RTE)Given two text fragments – the

Text(T) and the Hypothesis(H)– predict whether a human reader would say:

That the

H is true, given TThat the H contradicts TThat it

can’t be determined whether or not H is true given TT: John Smith, who was 65, resigned yesterday.H: 65-year-old Mr. Smith left office.

T: UberSoft CEO Bill JobsH: Frank N. Furter is CEO of UbersoftSlide27

ApproachFramed as a 2-way Textual Entailment problem (contradict., non-contradictory). Assume utterances are about the same topic/event

1. Linguistic

analysis

2. Graph

alignment3. Contradiction features &classificationtunedthreshold

contradictsdoesn’tcontradict

score = =

–2.00

1.84

T:

A case of indigenously acquired rabies infection has been confirmed.

H:

No case of rabies was confirmed.

case

No

rabies

det

prep_of

0.10

0.00

–0.75

rabies

POS

NER

IDF

NNS

--

0.027

Feature

f

i

w

i

Polarity difference

-

-2.00

case

No

rabies

det

prep_of

case

A

rabies

det

amod

infection

case

A

rabies

det

amod

infection

prep_of

Event coreferenceSlide28

Student IntroductionYour education: PhD/Master/Undergrad and yearDid you take NLP course? Did you take ML course?Are you doing or have done research in NLP? If yes, briefly say in what area

Any other info you want to share with the class?Slide29

OutlineInstructor IntroductionBackground, Research InterestsStudent IntroductionsClass OverviewClass organization

Office Hours & TAWebsite/details of topics covered in this classGradingSlide30

Class organization (except first two lectures)50 min discussion of research articles led by

students on topic of the week (intro on topic done previous week)There will be 2 papers per class for discussion

25 min each (15 min presentation, 10 min discussion)

5 minutes break30 minutes in depth lecture/open questions for topic of the week 25 Intro lecture to topic of following week (to facilitate paper discussion)Slide31

Office HoursInstructor Office hours”Thursday 6:00-7:00 (after class) or by appointment if neededTA: Arpit Gupta

TA office ours: TBASlide32

Class Websitehttp://www1.cs.columbia.edu/~smara/teaching/E6998/S14/Pay attention to top of page for announcements.Slide33

Extracting social/interactional meaningSentiment (Positive or negative)

Movie or Products or Politics: is a text positive or negative? “The movie was great”

- How can we automatically detect sentiment? (word level and text level)

Emotion(sad, happy) and Mood (depressed)Detecting expression of emotion/mood in languageApplications: Annoyance in talking to dialog systemsUncertainty of students in tutoringDetecting Trauma or DepressionHedging & BeliefsCommitted Belief (CB): W/S firmly believes p “John will arrive at 6” - Non-committed Belief (NCB): W/S weakly believes p

“John may arrive at 6”Reported Belief (RB): W/S is reporting someone else’s belief “John said he would arrive at 6”How can we automatically detect/tag beliefs? Slide34

Extracting social/interactional meaningSarcasmContrary of people’s actual sentiments or beliefs

“I love shopping on Black Friday” “A Shooting in Oakland? That NEVER happend”

Agreement/Disagreement

Agreement vs. disagreement with propositions (and people)PerspectiveAn aggregate of a person’s beliefs and sentiments w.r.t topic/event/propositionHow can we detect perspective automatically? DeceptionAutomatic ways to identify deceptive languageSlide35

Extracting social/interactional meaningPowerDifferent types of power: e.g. hierarchical, influence

Applications: Find influential people in online communities or those who want to become influentialtarget ads to influential people in community

Extracting Social Networks from text

Analyze online discussion and identify who are the people, and how are they related (beyond metadata)Social network of characters from novelsPersonality and Interpersonal StanceRomantic interest, flirtation, friendlinessSlide36

GradingCritical Discussion of one of the research articles (40% of grade)Brief Presentation in class about the paper Lead a critical discussion on key positive and negative aspects

Full list of papers up by Tuesday, Jan 28 11:59pm. Students select their top 5 papers before class on Jan 30. TA/Instructor assigns papers based on preference and in case of conflict first-come-first served, by Feb 1 5pm. Project about a topic discussed in class (or related) (60%)

Computational Implementation

Can be individual or team of 2-3Project Proposal (5th week of classes; receive feedback by week 6)Literature review on the chosen topic (9th week of classes; receive feedback by week 10 )Final paper – conference/workshop format (8 pages) (last week of classes)Final project presentation (last week of classes)Slide37

Next ClassComputational Models for learning semantic lexicons2 papers for reading/discussionI will lead the discussion of one of the research articles to set up a model of what’s expectedSecond paper will be free discussion (unless there is a volunteer to present on of the papers

)Slide38

ResourcesACL anthologyAll the proceedings of main conferences in NLP as well as major journals. http://aclweb.org/anthology

/(Recent years authors are encourage to submit datasets and code)Linguistic Data Consortium

Annotated corpora

http://catalog.ldc.upenn.edu/(If interested to have access to some corpora for your project ask Instructor, most likely we have it)