By Amir Javed Supervisor Dr Pete Burnap Prof Omer Rana Problem Identifies Trending Topics Trending topic lmao this tweet by user was nuts ShortURL User clicks on shortened URL ID: 653844
Download Presentation The PPT/PDF document "Scalable Real Time Prediction Algorithm ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Scalable Real Time Prediction Algorithm for Drive by Download Attack on Twitter
By
Amir Javed
Supervisor : Dr. Pete Burnap
Prof. Omer RanaSlide2
Problem
Identifies Trending Topics
#Trending topic
lmao this tweet by @user was nuts
Short_URL
User clicks on shortened URL
Gets re-directed to malicious page
Infects user system
virus Alert
Challenges
To quickly detect accounts that are spreading malware.
Drive by Download
The end algorithm should be able to handle
the large
tweet load.
To identify malicious websites before they disappear.Slide3
Predicting Algorithm- Training Phase
Extract URL from tweets
Use a honeypot (bait) to identify malicious URL
Algorithm learn what is malicious/ benign based on machine activity and tweet attributes
Malicious URL
Benign URLSlide4
Predicting Algorithm- Testing Phase
Extract URL from tweets
Algorithm segregates malicious/ benign URL based on initial machine activity and tweet attributes
Malicious URL
Benign URL
Training DataSlide5
Experimental ResultsWe captured tweets containing URL around two sporting events European
Football
Championship 2016
and Olympics 2016
First the tweets were segregated into malicious and benign using Capture HPC honeypot and based on these segregated URLs log files representing the time line of events were created to train four machine learning models.
Four machine learning models were used to train the model – J48 , Naïve Bays , Bayesnet and MLPThe highest F-measure was 99.2% for Bayesian Model The log file created for Olympics 2016 was tested using the model trained on Euro cup data.
The F-measure of 83.2% was achieved was for a log file created at 1 second.Slide6
Malware PropagationSlide7
Future PlanResearch to better understand the propagation of malicious tweet on Twitter so that malicious account can be removed at an early stage.
To incorporate prediction of most dominant node spreading the malware to stop malware propagation.
To deploy the Predictive algorithm on Cloud using the Comet-Cloud architecture to enable the algorithm to handle large number of tweets.Slide8
Thank you