/
Spam Email Detection Spam Email Detection

Spam Email Detection - PowerPoint Presentation

pasty-toler
pasty-toler . @pasty-toler
Follow
427 views
Uploaded On 2017-06-18

Spam Email Detection - PPT Presentation

Ethan Grefe December 13 2013 Motivation Spam email is constantly cluttering inboxes Commonly removed using rule based filters Spam often has very similar characteristics This allows ID: 560737

features spam svm email spam features email svm classifier classification rate emails kernel trained ham messages word types bayes

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Spam Email Detection" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Spam Email Detection

Ethan Grefe

December

13,

2013Slide2

Motivation

Spam email

is constantly cluttering inboxes

Commonly removed using rule based filters

Spam often has

very similar characteristics

This allows

them to be detected using

machine learning

Naïve Bayes Classifiers

Support Vector Machines Slide3

SVM Solution

Used training data from

CSDMC2010 SPAM

corpus

4327 labeled emails

2949 non-spam messages (HAM)

1378 spam messages (SPAM).

Extracted features from the subject and body of emails

Used resulting feature vectors to train an SVM

classifier in

MatlabSlide4

Email Features

Features were determined by research and observation

Best results were obtained with the following features

Percentage

of letters that

are

capitalized

Types of punctuation used

Average

length of

a word

Amount of html in the emailSlide5

Classifier Results

Trained on a random 35% of emails

Tested SVM classifier on remaining 65%

Trained SVM using three different kernel functions

Kernel Function

Spam Classification Rate

Ham Classification

Rate

Total Classification Rate

RBF

80.06%

92.33%

86.20%

Linear

78.69%

80.66%

79.67%

Quadratic

82.75%

84.85%

83.80%Slide6

Possible Improvements

Use Naïve Bayes

to classify emails using word frequency

Obtain

a wider variety of input

features

Test other types of learning algorithms