ALPACAS: A Large-scale Privacy-aware Collaborative Anti-spa

ALPACAS: A Large-scale Privacy-aware Collaborative Anti-spa ALPACAS: A Large-scale Privacy-aware Collaborative Anti-spa - Start

2015-11-27 224K 224 0 0

ALPACAS: A Large-scale Privacy-aware Collaborative Anti-spa - Description

Z. . Zhong. , L. . Ramaswamy. and K. Li, IEEE, INFOCOM 2008. Intelligent E-Commerce System Lab.. Aettie. , . Ji. OUTLINE. INTORDUCTION. PRIOR WORK. THE ALPACAS ANTI-SPAM FRAMEWORK. Feature-Preserving Fingerprint. ID: 207028 Download Presentation

Download Presentation

ALPACAS: A Large-scale Privacy-aware Collaborative Anti-spa




Download Presentation - The PPT/PDF document "ALPACAS: A Large-scale Privacy-aware Col..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.



Presentations text content in ALPACAS: A Large-scale Privacy-aware Collaborative Anti-spa

Slide1

ALPACAS: A Large-scale Privacy-aware Collaborative Anti-spam SystemZ. Zhong, L. Ramaswamy and K. Li, IEEE, INFOCOM 2008

Intelligent E-Commerce System Lab.

Aettie

,

Ji

Slide2

OUTLINE

INTORDUCTION

PRIOR WORK

THE ALPACAS ANTI-SPAM FRAMEWORK

Feature-Preserving Fingerprint

Privacy-Preserving Collaboration Protocol

System Structure

EXPERIMENTS & RESULTS

DISSCUSION

CONCLUSION

Slide3

INRTODUCTION

Motivations

Recent spam attack expose

strong challenges to statistical filters

, which have been popular.

Collaborative spam filtering has a natural defense paradigm

, wherein information of spam is

shared

, since

the spammers sends similar emails to several target receivers.

However,

privacy

of participating collaboration is an important challenge.

For protecting privacy,

digest approaches

have been proposed but they are not sufficient.

Slide4

INRTODUCTION

Contributions

ALPACAS

: Large-scale Privacy-Aware Collaborative Anti-spam System.

A resilient fingerprint generation technique, “

feature-preserving transformation

”, is proposed.

A

privacy-preserving protocol

is designed to control the amount of information to be shared.

The experimental results demonstrate that the ALPACAS outperforms traditional stand-alone statistical filters.

Slide5

PRIOR WORK

Drawbacks of the existing collaborative anti-spam schemes (using DCC).

How it works?

Participating servers in DCC share the

email’s digests

computed through hash functions such as MD5.

DCC system replies back with the recent statistics about the digests.

Drawbacks

Hashing schemes like MD5 generate complete different hash value even if a single byte is altered.

The DCC scheme does not completely address the privacy issue.

 inference-based privacy breaches.

Slide6

THE ALPACAS ANTI-SPAM FRAMEWORK(1/2)

Challenges

To protect email privacy,

The messages have to be encrypted.

It should retain important feature of the messages.

To avoid inference-based privacy beaches,

It is necessary to minimize the information revealed during the collaboration.

ALPACAS framework components

Feature-preserving fingerprint

Privacy-preserving protocol

DHT-based

architecture

Slide7

THE ALPACAS ANTI-SPAM FRAMEWORK(2/2)

Fig. 1: ALPACAS System Overview

(a) ALPACAS Network

(b) Internal mechanism of EA4

Slide8

Feature-Preserving Fingerprint(1/4)

Shingle-based Message TransformationShingle: If two documents vary by a small amount their shingle sets also differ by a small amount.

THE ALPACAS ANTI-SPAM FRAMEWORK

Fig. 2: ALPACAS Feature Sets, DCC and Razor Digests for 2 spam emails (Texts in bold font indicate differences)

Slide9

Feature-Preserving Fingerprint(2/4)

Shingle-based Message TransformationGeneration of transformed feature set of message Ma(TFSet(Ma))Computing Rabin fingerprint[11] of consecutive tokens in sliding window of length WEach fingerprint is in the range of (0, 2K – 1)For a message with X tokens, X – W + 1 fingerprints are obtained.The smallest Y are retained.The similarity between Ma and Mb can be calculated as

THE ALPACAS ANTI-SPAM FRAMEWORK

Slide10

Feature-Preserving Fingerprint(3/4)

Shingle-based Message TransformationIn consideration of the privacy preservation,Rabin fingerprint algorithm is one-way hash function such that it is infeasible to reverse.However, it is possible to infer a word or a group of words from an individual feature value.

THE ALPACAS ANTI-SPAM FRAMEWORK

Slide11

Feature-Preserving Fingerprint(4/4)

Term-level Privacy PreservationControlled shuffling The email text is divided into consecutive h chucks of z consecutive token.The tokens in each chuck are shuffled in a pre-defined manner, remaining the ordering of chucks.Each chuck is divided into y sub-chuck. (y is a factor of z.)The tokens in chuck CKh are shuffled such that the token at rth position in the sth sub-chuck is moved to (r ⅹ y + s)th position in CKh.If two messages contain an identical term, by shuffling the term, the feature set could be different.

THE ALPACAS ANTI-SPAM FRAMEWORK

Slide12

Privacy-Preserving Collaboration Protocol (1/3)

Spam/ham dichotomyProtocol EAj receives Ma, then computes TFSet(Ma).EAj sends query to other agent with subset of TFSet(Ma).EAk receives the query, then check its spam/ham KB. For each matching entry in spam KB, EAk sends back the complete transformed feature set.For each matching entry in ham KB, EAk sends back a small, randomly selected part of the transformed feature set.

THE ALPACAS ANTI-SPAM FRAMEWORK

Revealing the contents of a spam email does not affect the privacy, whereas revealing information about a ham email constitutes a privacy breach.

Slide13

Privacy-Preserving Collaboration Protocol (2/3)

THE ALPACAS ANTI-SPAM FRAMEWORK

Fig. 3: ALPACAS Protocol: Query and Response

Slide14

Privacy-Preserving Collaboration Protocol (3/3)

Protocol(cont’)EAj now computes the ratio of MaxSpamOvlp(Ma) to MaxHamOvlp(Ma) and decides whether the Ma is spam or ham.If the score is greater than a threshold λ, Ma is classified spam, otherwise ham.

THE ALPACAS ANTI-SPAM FRAMEWORK

Slide15

System Structure (1/2)

Design principleDHT-based ArchitectureEAj is responsible for maintaining information about all the emails whose TFSet as one feature element in the range of allocated to it.

THE ALPACAS ANTI-SPAM FRAMEWORK

A query should be sent to an email agent only if it has a reasonable chance of containing information about the email that is being verified. Contacting any other email agent not only introduces inefficiencies but also leads to unnecessary exposure of data.

Slide16

System Structure (2/2)

DHT-based Architecture (cont’)N email agent.All feature elements lie within (0, 2K-1).The range (0, 2K-1) is divided into N overlapping region as {(MinF0,MaxF0), (MinF1,MaxF1), . . . , (MinFN-1, 2K−1)}.(MinFj, MaxFj) denotes the sub-range allocated to EAj.For spam, EAj stores the entire TFSet.For ham, EAj stores the subset of TFSet.If MinFj ≤ Ft ≤ MaxFj, then EAj is called rendezvous agent of feature element Ft.

THE ALPACAS ANTI-SPAM FRAMEWORK

Slide17

EXPERIMENTS & RESULTS

Benchmarked algorithm

Bogofilter

based on Bayesian filtering

Calculating a

spamminess

score of the email.

DCC based on simple hash-based collaborative filtering

Counting the number of times the hash value of the email has been reported as a spam.

Slide18

Experimental Setup

DatasetTREC email corpus & SpamAssassin email corpusTREC corpus is classified into 67 email sets according to their target address (67 agents).Half of each email set including ham and spam is used for training and the remainder for testing.Each individual has a pre-classified email corpus(SpamAssassin) a the initial knowledgebase.

EXPERIMENTS & RESULTS

Slide19

Performance Metrics

Spam filtering accuracyA ham email that is classified a spam by the filtering scheme is termed as false positive.Privacy of collaborative anti-spam systemMessage-level privacy breach percentage is defined as the ratio number of test ham messages suffering privacy compromises to the total number of test ham messages.Communication overhead of the systemPer-test communication cost metric is defined as the total number of messages circulated in the system during the entire experiment.

EXPERIMENTS & RESULTS

Slide20

SPAM Filtering Effectiveness

EXPERIMENTS & RESULTS

Fig. 4: False Positive Percentages of ALPACAS, BogoFilter and DCC

Fig. 5: False Negative Percentages of ALPACAS,

BogoFilter

and DCC

Fig. 6: System Overall Accuracy (DCC is not displayed because its FP is 0)

Slide21

Robustness Against Attacks

EXPERIMENTS & RESULTS

Fig. 7: System Robustness Against

Good-Word Attacks

Fig. 8: System Robustness against

Character Replacement Attacks

Slide22

Privacy Awareness

EXPERIMENTS & RESULTS

Fig. 9: Privacy Breach in ALPACAS (Varying Number of Agents)

Slide23

Communication Oveheads

EXPERIMENTS & RESULTS

Fig. 10: Communication Overheads of the ALPACAS and the DCC systems

Slide24

Massage Transformation Algorithm Analysis

EXPERIMENTS & RESULTS

Fig. 11: False Positive of ALPACAS for Various Parameter Setup

Fig. 12: False Negative of ALPACAS for Various Parameter Setup

Fig. 13: Effectiveness of Controlled Shuffling Strategy

Slide25

DISCUSSION

Approaches like statistical filtering combined the feature preservation transformation scheme.

Applying dynamic nature of email agent to the system using replication and finger-table based routing.

Approaches for preventing malicious email agents.

Slide26

CONCLUSION

In this paper, the design and evaluation of ALPACAS is presented.

The two novel features:

A feature preserving transformation technique

A privacy-preserving protocol

Our initial experiments show that ALPACAS

Is very effective in filtering spam.

Has high resilience towards various attacks.

Has strong privacy protection to the participating entities.


About DocSlides
DocSlides allows users to easily upload and share presentations, PDF documents, and images.Share your documents with the world , watch,share and upload any time you want. How can you benefit from using DocSlides? DocSlides consists documents from individuals and organizations on topics ranging from technology and business to travel, health, and education. Find and search for what interests you, and learn from people and more. You can also download DocSlides to read or reference later.