/
Topical Authority Detection and Sentiment Analysis on Top I Topical Authority Detection and Sentiment Analysis on Top I

Topical Authority Detection and Sentiment Analysis on Top I - PowerPoint Presentation

phoebe-click
phoebe-click . @phoebe-click
Follow
390 views
Uploaded On 2017-09-08

Topical Authority Detection and Sentiment Analysis on Top I - PPT Presentation

Machine Learning with Large Datasets Course Project under the guidance of P rof W illiam W C ohen T eam M embers M anuel S hubham and S oumya 1 Outline ID: 586160

number users sentiment influential users number influential sentiment twitter topic authority influence top author tweets features results data authorities

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Topical Authority Detection and Sentimen..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Topical Authority Detection and Sentiment Analysis on Top Influencers

Machine Learning with Large DatasetsCourse Project (under the guidance of Prof. William W. Cohen)Team Members: Manuel, Shubham and Soumya

1Slide2

Outline

IntroductionRelated WorkProblem StatementMethodologyResultsEvaluation planConclusion2Slide3

Introduction

Topical authority detection in social networks is an active research areaImportant for recommending relevant feed to users interested in certain topicsChallenges -Results should not be overly biased towards:popular authors (such as celebrities)generic authorities (such as news channels)Relatively new users, who may not exist prior to an event, but post dedicatedly on the topic, should also be considered

3Slide4

Related Work

TwitterRank [2]: Authority Detection in Twitter using the idea of PageRankLeverages topical similarity and link structure between usersFails to filter out spammers, or celebrities who are not always influentialMeeyoung Cha et. al. [3] find that popular users who have high in-degree are not necessarily influential in terms of spawning retweets or mentionsAditya Pal et. al. [5] (considered as the baseline):Use clustering to identify influential vs. non-influential users on TwitterRank users in the influential cluster, considering various important features

4Slide5

Problem Statement

Aim:Perform authority detection on a collection of topics in Twitter for a time window Sentiment analysis to determine the influence of top users tweeting on specific topics on their respective communitiesPeriod: June 6th 2010 to June 10th 2010Topics: Oil SpilliPhoneWorld Cup

5Slide6

Methodology - User Metrics

OT1: Number of original tweetsOT1: Number of original tweetsOT2: Number of links sharedOT3: Self-similarity score OT4: Number of keyword hashtags usedCT = Conversational tweetsCT1: Number of conversational tweetsCT2: Tweets where conversation is initiated by the authorRT = Repeated tweets

RT1: Number of retweets of others’ tweets

RT2: Number of unique tweets retweeted by other users

RT3: Number of unique users who retweeted author’s tweets

6

M = Mentions

M1: Number of mentions of other users by the author

M2: Number of unique users mentioned by the author

M3: Number of mentions by others of the author

M4: Number of unique users mentioning the author

G = Graph Characteristics (restricted by the availability of data)

G1: Number of topically active followers

G2: Number of topically active friends

G3: Number of followers tweeting on topic after the author

G4: Number of friends tweeting on topic before the authorSlide7

Methodology - Features Extracted

7Topic Signal (TS)Signal Strength (SS)Non-Chat Signal (NCS)Retweet Impact (RI) - modifiedMention Impact (MI)Information Diffusion (ID)Network Score (NS)URL Impact (UI)Slide8

Methodology - Features Formulae

8Slide9

Methodology - Steps

9Data in Twitter API format -> User Metrics MapReduce (using Hadoop on AWS)Src-follows-Dest edge-list -> Adjacency Lists User Metrics and Adjacency Lists -> FeaturesFeatures -> Clusters -> Influential Cluster Using Gaussian Mixture Model and Expectation MaximizationInfluential Cluster -> Top 20 Influencers Using Gaussian RankingSentiment Analysis and Visualization Using Liu Hu Lexicon and GephiSlide10

Results - Authority Detection

10NormalizedNot Normalized60069699: sandiebanandie17918561: LATenvironment17918827: latimesgreen14323791: dbiello

58315230: mrt7384

138775765:

BPOilSpill

3554721: NWF

28657802:

climateprogress

47739450:

ByronYork

152315367:

Oil_Spill_News

22024951:

SwampSchool

19029137:

BrentSpiner

14717197: TPM

139909476:

USGulfOilSpill

15458181:

kate_sheppard

48365916:

Fertic

138761645:

GulfOilCleanup

11856592:

msnbcvideo

81696616:

alabamainsider

9848:

jimmybuffett

17918561:

LATenvironment

138775765:

BPOilSpill

3554721: NWF

14323791:

dbiello

138761645:

GulfOilCleanup

60069699:

sandiebanandie

14192680:

NOLAnews

139119046:

BoycottBP

26642006:

Alyssa_Milano

139909476:

USGulfOilSpill

20582958:

guardianeco

28657802:

climateprogress

14293310: TIME

47739450:

ByronYork

14138785:

TelegraphNews

2467791:washingtonpost

58315230: mrt7384

139477825:BPOilNews

46969537:greenforyou

14511951:

HuffingtonPostSlide11

Results - Sentiment Analysis

11Dbeillo Negative Sentiment InfluenceLATenvironment Neutral Sentiment InfluenceSlide12

Evaluation - Clustering, Ranking and Authority

We randomly sample users from the “good” and “bad” clusters to ask people how relevant the tweets are for the topic. Using the assigned rank (1 to 5) of the users from the top k Twitter users in our ranking, we run NCGD to compare the relative rank that the users assigned to our ranking.WIth a final survey, we plan to ask people to rank the authoritativeness of the top k users in our rank with anonymized and non-anonymized tweets. 12Slide13

Evaluation

13Slide14

Conclusion

14While the baseline had more authorities who seemed generic, such as news Twitter accounts, our results show more topical authorities.We have also analyzed the sentiment influence of the top authorities, which can have further applications in formulating better marketing strategies for products and to influence consumers. Further, we plan to include evaluation results in our final report, and also improve upon the features related to the follower-following graph.Slide15

15Slide16

References

[1] Pal, Aditya, and Scott Counts. "Identifying topical authorities in microblogs." Proceedings of the fourth ACM international conference on Web search and data mining. ACM, 2011.[2] Weng, Jianshu, et al. "Twitterrank: finding topic-sensitive influential twitterers." Proceedings of the third ACM international conference on Web search and data mining. ACM, 2010. [3] Cha, Meeyoung, et al. "Measuring User Influence in Twitter: The Million Follower Fallacy." ICWSM 10.10-17 (2010): 30.[4] Yoshida, M., & Yamaguchi, Y. (2015). Interactive Tagging Networks (Following/Followers and Tags on 1 million Twitter Users) [Data set]. Zenodo. http://doi.org/10.5281/zenodo.16267[5] Page, Lawrence, et al. "The PageRank citation ranking: bringing order to the web." (1999). [6] Bishop, Christopher M. "Pattern recognition." Machine Learning 128 (2006).16Slide17

Baseline ResultsNWFTIMEHuffingtonpostNOLAnewsReuters

CBSNewsLATenvironmentkate_sheppardMotherNatureNetmparent7777217