/
Breaking News Exploring Israeli News Breaking News Exploring Israeli News

Breaking News Exploring Israeli News - PowerPoint Presentation

debby-jeon
debby-jeon . @debby-jeon
Follow
348 views
Uploaded On 2018-12-05

Breaking News Exploring Israeli News - PPT Presentation

Bias using Simple Textual Analysis Yuval Pinter Shuki Tausig Oren Persico Motivation Hypotheses Media is biased Israeli media is superbiased Machine Learning detects bias Headlines could be enough ID: 736424

word headlines news trees headlines word trees news agenda walla length nrg test features media character count sites bias train yuvalpinter february

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Breaking News Exploring Israeli News" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Breaking News

Exploring Israeli News

Bias

using Simple Textual AnalysisYuval Pinter Shuki Tausig Oren PersicoSlide2

Motivation / Hypotheses

Media is biasedIsraeli media is super-biasedMachine Learning detects bias

Headlines could be enough"כותרות הן עיתונות בצורתה הצרופה ביותר"Simon Jenkins, 1992

Which is more significant – class bias or agenda bias?The idea: classify the news outlet using basic featuresMost of the “agenda bias” part will have to waitNo prior work AFAIK; closest field – Authorship attributionSlide3

DataGeneral news sites only

Homepage headlines onlyScraped in 15-minute intervalsJuly 2014 – May 2015Most experiments on February

Data and extraction code is available:

github.com/

yuvalpinter

/

MediaAnalysisSlide4

Data Samples

Nov 23, 15:00:

Feb 15, 15:30:Slide5

Text Processing

Consecutive appearance de-dupingTokenization (inc. lemmatization, affix deletion) using hspell (Har’el

and Kenigsberg)Mostly good, sometimes not so much

הפרלמנט הירדני עמד דקת דומייה לזכר המחבלים =>פרלמנט ירדן עימד דקה דומייה זכר מחבל(NRG, 20/11/2014, 0:15)רעידת אדמה קטלנית כאלף נהרגו בנפאל: "שעות קריטיות"‎ =>

רעידה דימה קטלוניה אילף

נהרג נפאל שעה קריטי

(

Mako

, 25/4/2015, 19:30)Slide6

Features

Form: character length, word count, word length (average/min/median/max), punctuation token count

Lexicon:

quantile word/lemma frequenciesaverage/min/median/maxWordlists (Hermit Dave), Israblog (Linzen 2009)Morphology: affix lettersWord featuresProbably the media cycleFeatures and extraction code are availablehttp://www.the7eye.org.il/50916Slide7

Setup & Results

7 classes, 1785 headlines (all of February)Weka’s Random ForestAccuracy:10 trees: 45.4%

50 trees: 49.5%Most significant features:Number of words

Average word lengthAverage position in word frequency tableSlide8

Feature Example

Character length

Character countSlide9

Pairwise Setup

Binary

classifier accuracy

72.3

88

92.1

73.4

78.5

76.5

84.5

91.8

75.8

78.1

77.9

72.9

86.7

79

79.4

88.9

74

78.2

69.4

64.9

58.6

Class over agenda:

Mako

,

Walla, NRG form a cluster – “online ethos”

Ha’aretz

and

Ma’ariv

relatively unique (newspaper-derived)

Israel

Hayom

resembles tabloid competitor

ynet

most, more than agenda-sharing NRG

(Higher = easier to classify = less similar)Slide10

Results – changing the scenery

Protective Edge: July 8-Aug. 26 (only 4 sites)2768 headlines, 53.4% acc (10 trees)Control: Oct. 8-November 261877, 54.2%

Single week: Jan. 1-7 (no Walla)426, 45.3% (10 trees), 51.4% (100)Single day: December 2 (Tuesday)

89, 39.3% (10), 46% (100)Train on 5 months (Sep-Jan), test on Feb (no Walla)8113 train:1514 test, 45.8% (10), 49.9% (50)Train on 9 months (July-Mar), test on AprNo Walla: 14628:1685, 40.9% (10), 45.6% (50)All sites: 15285:2001, 35.8% (10), 39.8% (50)Slide11

Future Work

Better content (“agenda”) features

Topic Models?

Sentiment?Some weird phenomena to be ironed outAlternating headlines: dedup based on recent kVery similar headlines: merge or use edit distanceLocation-sensitive featuresHeadlines starting with נתניהו: ~ balancedHeadlines starting with רה"מ: 50% in Israel Hayom, another 25% in NRGMore text: main leads / other headlinesSlide12

Thanks!

github.com/

yuvalpinter

/MediaAnalysisSlide13

Thanks!