/
Presenter: Liu ,  Ya Tian Presenter: Liu ,  Ya Tian

Presenter: Liu , Ya Tian - PowerPoint Presentation

myesha-ticknor
myesha-ticknor . @myesha-ticknor
Follow
342 views
Uploaded On 2018-11-09

Presenter: Liu , Ya Tian - PPT Presentation

Yujia Pham Anh TwitterMonitor Trend Detection over the Twitter Stream EvenTweet Online Localized Event Detection from Twitter Michael Mathioudakis Nick Koudas TwitterMoniter ID: 723515

twitter detection event localized detection twitter localized event michael time keyword nick koudas keywords online trend eventweet abdelhaq tweets

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Presenter: Liu , Ya Tian" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Presenter:Liu, YaTian, YujiaPham, Anh

TwitterMonitor

:

Trend Detection over the Twitter Stream

EvenTweet

:

Online Localized Event Detection from TwitterSlide2

Michael Mathioudakis, Nick KoudasTwitterMoniter: Trend Detection over the Twitter StreamSlide3

INTRODUCTIONTwitterMonitor, a system that performs trend detection over the Twitter stream.Identifies emerging topics on Twitter in real time and provides analytics that synthesize and accurate description of each topic.

Michael

Mathioudakis

, Nick

Koudas

, Nick

Koudas

,

TwitterMonitor

: trend detection over the twitter stream.,

In

: SIGMOD Conference,

pp. 155-1158, 2010Slide4

TREND DETECTION AND ANALYSISTrend detection in two steps. Analyzes trends in a third step:Identifies ‘bursty’ keywords,Groups bursty

keywords into trends,

Extracts additional information to discover interesting aspects of it.

Michael

Mathioudakis

, Nick

Koudas

, Nick

Koudas

,

TwitterMonitor

: trend detection over the twitter stream.,

In

: SIGMOD Conference,

pp. 155-1158, 2010Slide5

Detecting Bursty KeywordsKeyword: An unusually high rate in the stream.New topic emerged and seeks to explore in the further.Algorithm: QueueBurst1) One-pass.

2) Real-time.

3) Adjustable against ‘spurious’ bursts.

4) Adjustable against spam.

5) theoretically sound.

Michael

Mathioudakis

, Nick

Koudas

, Nick

Koudas

,

TwitterMonitor

: trend detection over the twitter stream.,

In

: SIGMOD Conference,

pp. 155-1158, 2010Slide6

From Bursty Keywords to TrendsGroup keywords together.Every moment t, computes keywords set

, and then divides it into subsets

( i.e. a ‘trend’).

GroupBurst

, based on co-occurrences.

Retrieves a few minutes’ history of tweets, and group keywords together if co-occurred in a relatively large number of recent tweets.

A greedy strategy that produces groups in a small number of steps.

 

Michael

Mathioudakis

, Nick

Koudas

, Nick

Koudas

,

TwitterMonitor

: trend detection over the twitter stream.,

In

: SIGMOD Conference,

pp. 155-1158, 2010Slide7

Trend AnalysisCompose a more accurate description:Identify more keywords associated with it. Context extraction algorithms (PCA, SVD, etc.) search the recent history and reports the most

correlated keywords.

Grapevine’s entity extractor to identify the entities.

Frequently cited sources are added to the trend description.

Identifies frequent geographical origins.

A chart will be produced for each trend and gets updated.

Michael

Mathioudakis

, Nick

Koudas

, Nick

Koudas

,

TwitterMonitor: trend detection over the twitter stream.,

In

: SIGMOD Conference,

pp. 155-1158, 2010Slide8

ArchitectureIndex

Michael

Mathioudakis

, Nick

Koudas

, Nick

Koudas

,

TwitterMonitor

: trend detection over the twitter stream.,

In

: SIGMOD Conference, pp.

1156, 2010Slide9

Architecture: Back-EndThe StreamListener module receives sample which consists 10M out of 50M tweets per day, via the Twitter API.Then

seperates

tweet information into fields and exports two feeds:

Reporting tweets with all their fields to an

Index

module

Reporting only the text and timestamp of tweets to

Bursty

Keywords Detection

module

Michael

Mathioudakis

, Nick

Koudas

, Nick Koudas, TwitterMonitor: trend detection over the twitter stream.,

In: SIGMOD Conference, pp. 155-1158, 2010Slide10

Architecture: Back-End(Cont.)After bursty keywords are identified and grouped into trends, the Index is contacted by the Trend Analysis module to retrieve information on tweets that belong to each trend.

Michael

Mathioudakis

, Nick

Koudas

, Nick

Koudas

,

TwitterMonitor

: trend detection over the twitter stream.,

In

: SIGMOD Conference,

pp. 155-1158, 2010Slide11

Architecture: Front-EndMichael Mathioudakis

, Nick

Koudas

, Nick

Koudas

,

TwitterMonitor

: trend detection over the twitter stream.,

In

: SIGMOD Conference, pp.

1157, 2010Slide12

Architecture: Front-End (Cont.)A webpage reports recent trends in real timeAn interface allows users to rank trends by recency or current activity rate and submit their own short description for trends.Use an additional tab to display daily trends.

Michael

Mathioudakis

, Nick

Koudas

, Nick

Koudas

,

TwitterMonitor

: trend detection over the twitter stream.,

In

: SIGMOD Conference,

pp. 155-1158, 2010Slide13

DemonstrationEvery trend will be represented by the entities, by the related bursty keywords.The audience will have the option to use the interface in order to acquire more information.They will be shown additional keywords and skim through representative tweets

They will be able to track a trend’s popularity over time and spot the origin.

They will interact with the system by tracking the displayed trends according different criteria and submitting descriptions.

Michael

Mathioudakis

, Nick

Koudas

, Nick

Koudas

,

TwitterMonitor

: trend detection over the twitter stream.,

In

: SIGMOD Conference,

pp. 155-1158, 2010Slide14

Hamed Abdelhaq, Christian Sengstock, and Michael GertzEvenTweet: Online Localized Event Detection from TwitterSlide15

1. Introduction2. Localized Event DetectionTemporal Keyword ExtractionSpatial Keyword IdentificationKeyword ClusteringCluster Scoring

3. System overview

4. DemonstrationSlide16

INTRODUCTIONEvenTweet, a system to detect localized events from a stream of tweets in real-time.Only about 1% of tweets are georeferenced. Focuses on detecting localized events from a stream of tweets in real-time.Adopts a continuous analysis of the most recent tweets within a time-based sliding window.

Described by 1) related keywords & 2) estimation of the start time and the geographic location.

Hamed

Abdelhaq

, Christian

Sengstock

, Michael

Gertz

:

EvenTweet

: Online Localized Event Detection from Twitter.

PVLDB

6(12): pp. 1326-1329 (2013)Slide17

INTRODUCTIONTracks evolution over time: a fine-grained temporal resolution. A scoring scheme the gives a score of each event over time.Don’t estimate geo-coordinates for non-geotagged tweets, but be able to identify localized events using a possibly small amont

of geo-tagged tweets:

Both geo- and non-geo-tagged tweets are used to

identify words best describing events.

Only geo-tagged tweets are used to estimate the spatial

distribution of such words.

Hamed

Abdelhaq

, Christian

Sengstock

, Michael

Gertz

: EvenTweet: Online Localized Event Detection from Twitter. PVLDB 6(12): pp. 1326-1329 (2013)Slide18

1. Introduction2. Localized Event DetectionTemporal Keyword ExtractionSpatial Keyword IdentificationKeyword ClusteringCluster Scoring3. System overview

4. DemonstrationSlide19

Localized Event DetectionBasic DefinitionsEvent: a phenomenon that stimulates people to post messages for a certain period of time.Localized events: Events happen within a small region, having a small spatial extent.

(e.g., concerts, soccer matches, road works)

Hamed

Abdelhaq

, Christian

Sengstock

, Michael

Gertz

:

EvenTweet

: Online Localized Event Detection from Twitter.

PVLDB

6(12): pp. 1326-1329 (2013)Slide20

Localized EventA localized event is described as a tuple: le = (el, et

,

K

)

el

is event location, represented as a small set of connected rectangular.

et

is the start time.

K

is a set of words frequently published during the event time and at that location.

Hamed

Abdelhaq

, Christian

Sengstock, Michael

Gertz: EvenTweet: Online Localized Event Detection from Twitter. PVLDB 6(12): pp. 1326-1329 (2013)Slide21

Online DetectionBasic Notation:Each tweet tw = (W

,

uid

,

l

,

t

)

W

: a set of words

uid

: a user id

l

= (

lon, lat): a geographic locationt: timestampUse a timeline divided into a sequence of equal-length time frames (…fc-1,

fc), where fc denotes the current time frame.Each time frame represents a short time interval during which tweets are posted.

Hamed

Abdelhaq

, Christian

Sengstock

, Michael

Gertz

:

EvenTweet

: Online Localized Event Detection from Twitter.

PVLDB

6(12): pp. 1326-1329 (2013)Slide22

Basic Notation (cont.)We use a time-based sliding window winkfc composed of k time frames and f

c

as its end point.

The detection procedure of

EvenTweet

is triggered every time a new time frame elapses.

Hamed

Abdelhaq

, Christian

Sengstock

, Michael

Gertz

: EvenTweet: Online Localized Event Detection from Twitter. PVLDB

6(12): pp. 1326-1329 (2013)Slide23

1. Introduction2. Localized Event DetectionTemporal Keyword ExtractionSpatial Keyword IdentificationKeyword ClusteringCluster Scoring

3. System overview

4. DemonstrationSlide24

Temporal Keyword ExtractionExtraction of words showing a bursty frequency in the current time frame (these words are called keywords, Yc)Given a set of words

W

c

from the tweets published during the recent time frame

f

c,

extract a subset

Y

c

W

c

which represents words likely to describe localized events.Hamed Abdelhaq, Christian Sengstock

, Michael Gertz: EvenTweet: Online Localized Event Detection from Twitter. PVLDB

6(12): pp. 1326-1329 (2013)Slide25

Temporal Keyword Extraction (cont.)Use discrepancy paradigm to extract keywords based on their burstiness.Assume: during timeframe f

c

u

(

w

,

c

):

normalized

by

the number of users publishing tweets containing word w

Hamed

Abdelhaq

, Christian Sengstock, Michael Gertz: EvenTweet: Online Localized Event Detection from Twitter.

PVLDB 6(12): pp. 1326-1329 (2013)Slide26

Temporal Keyword Extraction (cont.)In addition,histw = (u(w,

1

),

u

(

w

,

2

), …,

u

(

w

,

m

)) is a fixed historical sequence of usage values for w collected before the current time frame fc, such that m < c.It is used when the system needs to describe the normal behavior of word w over previous time frames.

Hamed

Abdelhaq

, Christian

Sengstock

, Michael

Gertz

:

EvenTweet

: Online Localized Event Detection from Twitter.

PVLDB

6(12): pp. 1326-1329 (2013)Slide27

Temporal Keyword Extraction (cont.)The discrepancy paradigm measures the deviation between the word usage value u(w,c) in the current time frame and an expected word usage baseline, b

(

w

), which estimated from

hist

w

.

h

ist

w

is drawn from Gaussian distribution with mean

b

(w).μ and deviation b(w).σ

Higher deviation, higher burstiness degreeHamed

Abdelhaq

, Christian

Sengstock

, Michael

Gertz

:

EvenTweet

: Online Localized Event Detection from Twitter.

PVLDB

6(12): pp. 1326-1329 (2013)Slide28

Temporal Keyword Extraction (cont.)The burtinesss degree of a word w is the z-score defined: b_degree(

w

,

c

) :

=

(

u(

w,c

)−b(w).

μ

)/

b(w).

σ

Choose words whose burstiness degree is larger than two standard deviations above the mean as keywords.Keywords observed for the first time will have μ=0 and σ=0.

Hamed Abdelhaq, Christian Sengstock, Michael

Gertz

:

EvenTweet

: Online Localized Event Detection from Twitter.

PVLDB

6(12): pp. 1326-1329 (2013)Slide29

1. Introduction2. Localized Event DetectionTemporal Keyword ExtractionSpatial Keyword IdentificationKeyword ClusteringCluster Scoring

3. System overview

4. DemonstrationSlide30

Spacial Keyword IdentificationFind keywords which are highly localized.Only use georeferenced tweets.

g

During the sliding window

win

k

fc

Usage ratio of

k

i

:

Calculate the

density

of keyword

k

i

in cell

g

:

Repeat this for all cells in

G

. We’ll have

S

i

(discrete spatial distribution of

k

i

). Also called

Spatial Signature

of

k

i

 

grid

G

Hamed

Abdelhaq

, Christian

Sengstock

, Michael

Gertz

:

EvenTweet

: Online Localized Event Detection from Twitter.

PVLDB

6(12): pp. 1326-1329 (2013)Slide31

Spacial Keyword IdentificationOnly use georeferenced tweets.

g

Calculate Entropy

H(S

i

)

Discard all keywords with entropy larger than a threshold

ρ

.

Why?

We’ll have

Y

c

= set of filtered keywords

Hamed

Abdelhaq

, Christian

Sengstock

, Michael

Gertz

:

EvenTweet

: Online Localized Event Detection from Twitter.

PVLDB

6(12): pp. 1326-1329 (2013)Slide32

Keyword ClusteringEach Si is a vector.Clustering event keywords using their SiSimilarity calculation: Cosine similarity

Hamed

Abdelhaq

, Christian

Sengstock

, Michael

Gertz

:

EvenTweet

: Online Localized Event Detection from Twitter.

PVLDB

6(12): pp. 1326-1329 (2013

)

Cosine Similarity, Wikipedia, http

://en.wikipedia.org/wiki/Cosine_similaritySlide33

Keyword Clustering

There is a distance threshold

Т

If

a new keyword falls out of the

threshold

, it forms a new

cluster itself.

Hamed

Abdelhaq

, Christian

Sengstock

, Michael

Gertz

: EvenTweet: Online Localized Event Detection from Twitter. PVLDB 6(12): pp. 1326-1329 (2013

)

Saed

Sayad

,

Kmeans

clustering, http

://www.saedsayad.com/clustering_kmeans.htmSlide34

Cluster ScoringTo determine which clusters of keywords is more likely being referred to localized events, filter out spurious clusters.To score a cluster: 1. Score each keyword 2. Sum up all scores

Hamed

Abdelhaq

, Christian

Sengstock

, Michael

Gertz

:

EvenTweet

: Online Localized Event Detection from Twitter.

PVLDB

6(12): pp. 1326-1329 (2013)Slide35

Cluster Scoring1. Score each keyword k_score

(

i,cl

) :=

o

i

.

b_degree

(

k

i

,e

i

) . prominence(

ki,cl) prominence(k

i , cl) :=

e

i

:

the time frame we’re looking at

o

i

:

the number of times

k

i

was clustered in

cl

2.

Sum up all

scores

score(cl) :=

 

1

2

3

Hamed

Abdelhaq

, Christian

Sengstock

, Michael

Gertz

:

EvenTweet

: Online Localized Event Detection from Twitter.

PVLDB

6(12): pp. 1326-1329 (2013)Slide36

1. Introduction2. Localized Event DetectionTemporal Keyword ExtractionSpatial Keyword IdentificationKeyword Clustering

Cluster Scoring

3. System overview

4. DemonstrationSlide37

System Overview

Hamed

Abdelhaq

, Christian

Sengstock

, Michael

Gertz

:

EvenTweet

: Online Localized Event Detection from Twitter.

PVLDB

6(12): pp. 1326-1329 (2013)Slide38

Demonstration

Hamed

Abdelhaq

, Christian

Sengstock

, Michael

Gertz

:

EvenTweet

: Online Localized Event Detection from Twitter.

PVLDB

6(12): pp. 1326-1329 (2013)