/
Xinyue  Liu Can We Determine Whether an Email is Xinyue  Liu Can We Determine Whether an Email is

Xinyue Liu Can We Determine Whether an Email is - PowerPoint Presentation

marina-yarberry
marina-yarberry . @marina-yarberry
Follow
359 views
Uploaded On 2018-03-17

Xinyue Liu Can We Determine Whether an Email is - PPT Presentation

SPAM The Spambase Data Set Source and Origin Goal Instances and Attributes Examples Tool Goal classify spam from ham based on the frequencies of words in the email Logistic Regression ID: 654607

logistic regression fold http regression logistic http fold validation 2015 web linear cross spambase set data lda analysis discriminant

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Xinyue Liu Can We Determine Whether an ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Xinyue Liu

Can We Determine Whether an Email is

SPAM

?Slide2

The Spambase

Data Set

Source and Origin

GoalInstances and AttributesExamplesTool

Goal: classify spam from ham based on the frequencies of words in the email.Slide3

Logistic Regression

Linear Regression: Assign weights to each of the predictors which minimize the classification error

Logistic regressionSlide4

Linear Discriminant Analysis (LDA)

Bayes Theorem:Slide5

10-fold Cross-ValidationSlide6

Logistic Regression

Linear Discriminant Analysis (LDA

)

Mean Error Rate with 96% CI:10.6% – 11.1%Mean Error Rate with 96% CI:9.9% – 10.5%

Conclusion

We

can filter about 90% of spam emails using LDA

.Slide7

References

Trevor Hastie, Rob

Tibshirani

. “Statistical Learning.” Statistical Learning. Stanford University Online CourseWare, 21 January 2014. Lecture

. <http

://online.stanford.edu/course/statistical-learning-winter-2014 >

Mark Hopkins,

Erik Reeber, George Forman, and Jaap

Suermondt

. "

Spambase

Data Set." Spambase Data Set

.

Hewlett-Packard

Labs, 1 July 1999. Web. 1 Mar.

2015

. <http://archive.ics.uci.edu/ml/datasets/Spambase

>.

"Logistic Regression." 

Logistic Regression

.

N.p

.,

n.d.

Web. 5 May 2015.

<

http://www.saedsayad.com/logistic_regression.htm

>.

"Binary Classification." 

Linear Discriminant Analysis Classifier (LDAC)

.

N.p

.,

n.d.

Web

. 5 May 2015

. <

http://mlpy.sourceforge.net/docs/3.5/lin_class.html

>.

Kaewchinporn

,

Chinnapat

. "10-fold Cross-Validation." 

K-fold Cross-validation

.

N.p

.,

n.d

.

Web. 5 May 2015.

<

http://scriptslines.com/blog/k-fold-cross-validation

/>.Slide8

Thank you!