/
Meme-tracking and the Dynamics of the News Cycle Meme-tracking and the Dynamics of the News Cycle

Meme-tracking and the Dynamics of the News Cycle - PowerPoint Presentation

enteringmalboro
enteringmalboro . @enteringmalboro
Follow
342 views
Uploaded On 2020-06-23

Meme-tracking and the Dynamics of the News Cycle - PPT Presentation

Jure Leskovec Lars Backstrom Jon Klienberg Presented by Noam Barkay 1 What Is A Meme An idea behavior or style that spreads from person to person within a culture Meme is like a gene as is replicates mutate so it has many variations and is a subject to selection ID: 784474

news phrase time thread phrase news thread time phrases media blogs volume cluster single graph million edges source function

Share:

Link:

Embed:

Download Presentation from below link

Download The PPT/PDF document "Meme-tracking and the Dynamics of the Ne..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Meme-tracking and the Dynamics of the News Cycle

Jure Leskovec, Lars Backstrom, Jon Klienberg

Presented by Noam Barkay

1

Slide2

What Is A Meme?

An idea, behavior, or style that spreads from person to person within a culture.

Meme is like a gene, as is replicates, mutate so it has many variations, and is a subject to selection.

2

Slide3

Main Goals Of The Research

Tracking memes and news in the internet over time, on mainstream media and blogs.

Modeling the phenomena of the

news cycle.

3

Slide4

Dataset

Data was collected during the last three months of the 2008 US presidential elections.

1.65 Million mainstream media sites and blogs.

90 Million blog posts and new articles (1 million per day)

112 Million quotes.

4

Slide5

TerminologyItem – a news article or a blog postPhrase – a quote that occurs in one or more items.Phrase Cluster – collections of phrases deemed to be close textual variants of one another.

Phrase Graph

5

Slide6

The Phrase GraphPhrases with word length of at least L = 4.At least M = 10 occurrences.Eliminate phrases for which at least an fraction occur on a single domain.

Edge - p is strictly shorter than q, and directed edit distance from p to q is small or at least k=10 words consecutive overlap.Weight decreases in the directed edit distance, and increases in the frequency of q.

6

Slide7

The Phrase Graph

A small portion from the graph of variants of Sarah Palin’s quote.“Our opponent is someone who sees America, it seems, as being so imperfect, imperfect enough that he’s palling around with terrorists who would target their own country.”

7

Slide8

The Phrase Graph

8

Slide9

Partitioning The Phrase Graph9

Recognize a good phrase cluster, given G.The outgoing paths from all phrases in the cluster should flow into a single root node.

Root in G is a node with no outgoing edges.

Slide10

Identify Phrase Clusters10

Delete edges of small total weight from G.Falls apart into disjointed pieces, each with a single root phrase.DAG Partitioning: Given a direct acyclic graph with edge weights, delete a set of edges of minimum total weights, so that each of the resulting components is single-rooted.

DAG partitioning is NP-hard.

Slide11

11

Slide12

DAG Partitioning12

It is enough to find a single edge out of each node to identify the optimal components.Proceed from the roots down the DAG, and greedily assign each node to the cluster to which it has the most edges.

Slide13

13

12

13

1

14

15

2

3

4

5

6

7

8

9

10

11

Slide14

Phrase Volume Distribution14

Phrase volume – the number of items containing the phrase.

Power-law distribution.

Slide15

Global AnalysisThread of a phrase cluster – the set of all items containing some phrase from the cluster.

15

Slide16

Creating The New Cycle Model

Two minimal ingredients that should be taken into account:

Imitation – different sources imitate one another.

Recency – new threads are favored over older ones.

16

Slide17

The Proposed ModelTime runs in discrete periodsA collection of N media sources. Each source reports on a single thread in one time period.At time t=0, each source is reporting on a distinct thread.

In each time step t, a new thread j is produced, and each source must choose which thread to report on.

17

Slide18

The Proposed ModelA given source chooses to report on thread j with probability proportional to the product: - number of stories previously written about thread j.t – current time

- the time when j was first created. - monotonically decreasing function in - monotonically increasing function in

18

Slide19

The Proposed Modelthe results of a simulation of the model with the function f taking a power-law functional form, and with an exponentially decaying form for the recency function.

19

Slide20

Preference only to recency 20

Preference only to imitation

Slide21

Thread Volume Increase And DecayGiven a thread p, peak time is the median of times which the thread occurs in the dataset. For each thread, = 0, and the volume at this time is 1.

For each time t, we plot the median volume over all 1,000 threads with the largest volumes.

21

Slide22

News Media And BlogsWe wish to understand the relationship between news and blogs.Each one of the 1.65 million sites is labeled as “new media” or “blogs”.

22

Slide23

Time Lag For Blogs And News MediaWe create two separate volume curves for each thread.Each curve is the median of the top 1,000 threads’ curves.

23

Slide24

Handoff Of Phrases From News Media To BlogsNow we calculate the ratio of blog volume to total volume for each thread

24

Slide25

Lags Of Individual Sites25

Slide26

Quotes Migrating From Blogs To New MediaUsually phrases first appear in news media, and then in blogs. But there are phrases that do the opposite way.About 3.5% of quoted phrases tend to move from blogs to news media.

26

Slide27

ConclusionWe have developed a framework for tracking short, distinctive phrases, memes, on-line text and presented algorithms for identifying and clustering those phrases.We observed a typical lag of 2.5 hours between the peaks of a phrase in the news media and in blogs, with a “heartbeat” like handoff between news and blogs.

27

Slide28

ConclusionHow can we characterize the dynamics of mutation within phrases?How does information changes as it propagates?A deeper understanding of simple mathematical models for the dynamics of the news cycle would be useful for media analysts.

28

Slide29

Thank You!

29