Machine Learning with Large Datasets Course Project under the guidance of P rof W illiam W C ohen T eam M embers M anuel S hubham and S oumya 1 Outline ID: 586160
Download Presentation The PPT/PDF document "Topical Authority Detection and Sentimen..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Topical Authority Detection and Sentiment Analysis on Top Influencers
Machine Learning with Large DatasetsCourse Project (under the guidance of Prof. William W. Cohen)Team Members: Manuel, Shubham and Soumya
1Slide2
Outline
IntroductionRelated WorkProblem StatementMethodologyResultsEvaluation planConclusion2Slide3
Introduction
Topical authority detection in social networks is an active research areaImportant for recommending relevant feed to users interested in certain topicsChallenges -Results should not be overly biased towards:popular authors (such as celebrities)generic authorities (such as news channels)Relatively new users, who may not exist prior to an event, but post dedicatedly on the topic, should also be considered
3Slide4
Related Work
TwitterRank [2]: Authority Detection in Twitter using the idea of PageRankLeverages topical similarity and link structure between usersFails to filter out spammers, or celebrities who are not always influentialMeeyoung Cha et. al. [3] find that popular users who have high in-degree are not necessarily influential in terms of spawning retweets or mentionsAditya Pal et. al. [5] (considered as the baseline):Use clustering to identify influential vs. non-influential users on TwitterRank users in the influential cluster, considering various important features
4Slide5
Problem Statement
Aim:Perform authority detection on a collection of topics in Twitter for a time window Sentiment analysis to determine the influence of top users tweeting on specific topics on their respective communitiesPeriod: June 6th 2010 to June 10th 2010Topics: Oil SpilliPhoneWorld Cup
5Slide6
Methodology - User Metrics
OT1: Number of original tweetsOT1: Number of original tweetsOT2: Number of links sharedOT3: Self-similarity score OT4: Number of keyword hashtags usedCT = Conversational tweetsCT1: Number of conversational tweetsCT2: Tweets where conversation is initiated by the authorRT = Repeated tweets
RT1: Number of retweets of others’ tweets
RT2: Number of unique tweets retweeted by other users
RT3: Number of unique users who retweeted author’s tweets
6
M = Mentions
M1: Number of mentions of other users by the author
M2: Number of unique users mentioned by the author
M3: Number of mentions by others of the author
M4: Number of unique users mentioning the author
G = Graph Characteristics (restricted by the availability of data)
G1: Number of topically active followers
G2: Number of topically active friends
G3: Number of followers tweeting on topic after the author
G4: Number of friends tweeting on topic before the authorSlide7
Methodology - Features Extracted
7Topic Signal (TS)Signal Strength (SS)Non-Chat Signal (NCS)Retweet Impact (RI) - modifiedMention Impact (MI)Information Diffusion (ID)Network Score (NS)URL Impact (UI)Slide8
Methodology - Features Formulae
8Slide9
Methodology - Steps
9Data in Twitter API format -> User Metrics MapReduce (using Hadoop on AWS)Src-follows-Dest edge-list -> Adjacency Lists User Metrics and Adjacency Lists -> FeaturesFeatures -> Clusters -> Influential Cluster Using Gaussian Mixture Model and Expectation MaximizationInfluential Cluster -> Top 20 Influencers Using Gaussian RankingSentiment Analysis and Visualization Using Liu Hu Lexicon and GephiSlide10
Results - Authority Detection
10NormalizedNot Normalized60069699: sandiebanandie17918561: LATenvironment17918827: latimesgreen14323791: dbiello
58315230: mrt7384
138775765:
BPOilSpill
3554721: NWF
28657802:
climateprogress
47739450:
ByronYork
152315367:
Oil_Spill_News
22024951:
SwampSchool
19029137:
BrentSpiner
14717197: TPM
139909476:
USGulfOilSpill
15458181:
kate_sheppard
48365916:
Fertic
138761645:
GulfOilCleanup
11856592:
msnbcvideo
81696616:
alabamainsider
9848:
jimmybuffett
17918561:
LATenvironment
138775765:
BPOilSpill
3554721: NWF
14323791:
dbiello
138761645:
GulfOilCleanup
60069699:
sandiebanandie
14192680:
NOLAnews
139119046:
BoycottBP
26642006:
Alyssa_Milano
139909476:
USGulfOilSpill
20582958:
guardianeco
28657802:
climateprogress
14293310: TIME
47739450:
ByronYork
14138785:
TelegraphNews
2467791:washingtonpost
58315230: mrt7384
139477825:BPOilNews
46969537:greenforyou
14511951:
HuffingtonPostSlide11
Results - Sentiment Analysis
11Dbeillo Negative Sentiment InfluenceLATenvironment Neutral Sentiment InfluenceSlide12
Evaluation - Clustering, Ranking and Authority
We randomly sample users from the “good” and “bad” clusters to ask people how relevant the tweets are for the topic. Using the assigned rank (1 to 5) of the users from the top k Twitter users in our ranking, we run NCGD to compare the relative rank that the users assigned to our ranking.WIth a final survey, we plan to ask people to rank the authoritativeness of the top k users in our rank with anonymized and non-anonymized tweets. 12Slide13
Evaluation
13Slide14
Conclusion
14While the baseline had more authorities who seemed generic, such as news Twitter accounts, our results show more topical authorities.We have also analyzed the sentiment influence of the top authorities, which can have further applications in formulating better marketing strategies for products and to influence consumers. Further, we plan to include evaluation results in our final report, and also improve upon the features related to the follower-following graph.Slide15
15Slide16
References
[1] Pal, Aditya, and Scott Counts. "Identifying topical authorities in microblogs." Proceedings of the fourth ACM international conference on Web search and data mining. ACM, 2011.[2] Weng, Jianshu, et al. "Twitterrank: finding topic-sensitive influential twitterers." Proceedings of the third ACM international conference on Web search and data mining. ACM, 2010. [3] Cha, Meeyoung, et al. "Measuring User Influence in Twitter: The Million Follower Fallacy." ICWSM 10.10-17 (2010): 30.[4] Yoshida, M., & Yamaguchi, Y. (2015). Interactive Tagging Networks (Following/Followers and Tags on 1 million Twitter Users) [Data set]. Zenodo. http://doi.org/10.5281/zenodo.16267[5] Page, Lawrence, et al. "The PageRank citation ranking: bringing order to the web." (1999). [6] Bishop, Christopher M. "Pattern recognition." Machine Learning 128 (2006).16Slide17
Baseline ResultsNWFTIMEHuffingtonpostNOLAnewsReuters
CBSNewsLATenvironmentkate_sheppardMotherNatureNetmparent7777217