Jure Leskovec Lars Backstrom Jon Klienberg Presented by Noam Barkay 1 What Is A Meme An idea behavior or style that spreads from person to person within a culture Meme is like a gene as is replicates mutate so it has many variations and is a subject to selection ID: 784474
Download The PPT/PDF document "Meme-tracking and the Dynamics of the Ne..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Meme-tracking and the Dynamics of the News Cycle
Jure Leskovec, Lars Backstrom, Jon Klienberg
Presented by Noam Barkay
1
Slide2What Is A Meme?
An idea, behavior, or style that spreads from person to person within a culture.
Meme is like a gene, as is replicates, mutate so it has many variations, and is a subject to selection.
2
Slide3Main Goals Of The Research
Tracking memes and news in the internet over time, on mainstream media and blogs.
Modeling the phenomena of the
news cycle.
3
Slide4Dataset
Data was collected during the last three months of the 2008 US presidential elections.
1.65 Million mainstream media sites and blogs.
90 Million blog posts and new articles (1 million per day)
112 Million quotes.
4
Slide5TerminologyItem – a news article or a blog postPhrase – a quote that occurs in one or more items.Phrase Cluster – collections of phrases deemed to be close textual variants of one another.
Phrase Graph
5
Slide6The Phrase GraphPhrases with word length of at least L = 4.At least M = 10 occurrences.Eliminate phrases for which at least an fraction occur on a single domain.
Edge - p is strictly shorter than q, and directed edit distance from p to q is small or at least k=10 words consecutive overlap.Weight decreases in the directed edit distance, and increases in the frequency of q.
6
Slide7The Phrase Graph
A small portion from the graph of variants of Sarah Palin’s quote.“Our opponent is someone who sees America, it seems, as being so imperfect, imperfect enough that he’s palling around with terrorists who would target their own country.”
7
Slide8The Phrase Graph
8
Slide9Partitioning The Phrase Graph9
Recognize a good phrase cluster, given G.The outgoing paths from all phrases in the cluster should flow into a single root node.
Root in G is a node with no outgoing edges.
Slide10Identify Phrase Clusters10
Delete edges of small total weight from G.Falls apart into disjointed pieces, each with a single root phrase.DAG Partitioning: Given a direct acyclic graph with edge weights, delete a set of edges of minimum total weights, so that each of the resulting components is single-rooted.
DAG partitioning is NP-hard.
Slide1111
Slide12DAG Partitioning12
It is enough to find a single edge out of each node to identify the optimal components.Proceed from the roots down the DAG, and greedily assign each node to the cluster to which it has the most edges.
Slide1313
12
13
1
14
15
2
3
4
5
6
7
8
9
10
11
Slide14Phrase Volume Distribution14
Phrase volume – the number of items containing the phrase.
Power-law distribution.
Slide15Global AnalysisThread of a phrase cluster – the set of all items containing some phrase from the cluster.
15
Slide16Creating The New Cycle Model
Two minimal ingredients that should be taken into account:
Imitation – different sources imitate one another.
Recency – new threads are favored over older ones.
16
Slide17The Proposed ModelTime runs in discrete periodsA collection of N media sources. Each source reports on a single thread in one time period.At time t=0, each source is reporting on a distinct thread.
In each time step t, a new thread j is produced, and each source must choose which thread to report on.
17
Slide18The Proposed ModelA given source chooses to report on thread j with probability proportional to the product: - number of stories previously written about thread j.t – current time
- the time when j was first created. - monotonically decreasing function in - monotonically increasing function in
18
Slide19The Proposed Modelthe results of a simulation of the model with the function f taking a power-law functional form, and with an exponentially decaying form for the recency function.
19
Slide20Preference only to recency 20
Preference only to imitation
Slide21Thread Volume Increase And DecayGiven a thread p, peak time is the median of times which the thread occurs in the dataset. For each thread, = 0, and the volume at this time is 1.
For each time t, we plot the median volume over all 1,000 threads with the largest volumes.
21
Slide22News Media And BlogsWe wish to understand the relationship between news and blogs.Each one of the 1.65 million sites is labeled as “new media” or “blogs”.
22
Slide23Time Lag For Blogs And News MediaWe create two separate volume curves for each thread.Each curve is the median of the top 1,000 threads’ curves.
23
Slide24Handoff Of Phrases From News Media To BlogsNow we calculate the ratio of blog volume to total volume for each thread
24
Slide25Lags Of Individual Sites25
Slide26Quotes Migrating From Blogs To New MediaUsually phrases first appear in news media, and then in blogs. But there are phrases that do the opposite way.About 3.5% of quoted phrases tend to move from blogs to news media.
26
Slide27ConclusionWe have developed a framework for tracking short, distinctive phrases, memes, on-line text and presented algorithms for identifying and clustering those phrases.We observed a typical lag of 2.5 hours between the peaks of a phrase in the news media and in blogs, with a “heartbeat” like handoff between news and blogs.
27
Slide28ConclusionHow can we characterize the dynamics of mutation within phrases?How does information changes as it propagates?A deeper understanding of simple mathematical models for the dynamics of the news cycle would be useful for media analysts.
28
Slide29Thank You!
29