PHMM Applications 1 Mark Stamp Applications We consider 2 applications of PHMMs from information security Masquerade detection Malware detection Both show some strengths of PHMMs Both are somewhat unique ID: 611393
Download Presentation The PPT/PDF document "PHMM Applications" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
PHMM Applications
PHMM Applications
1
Mark StampSlide2
Applications
We consider 2 applications of PHMMs to problems in information securityMasquerade detectionMalware detection
Both show some strengths of PHMMsBoth are somewhat unique PHMMs not always a first choice…
PHMM Applications
2Slide3
PHMM for Masquerade Detection
PHMM Applications
3
Lin
Huang
Mark StampSlide4
Masquerader?
Masquerader makes unauthorized use of another user’s accountMasquerader tries to evade detection by pretending to be the other userCan we detect masquerader?
Intrusion Detection System (IDS)We consider special case where such an IDS is based on UNIX commands
PHMM Applications
4Slide5
Schonlau Dataset
Collection of UNIX commands, 50 users5k training commands per user, plus
…10k “attack” commands per userAlso, a key to tell which blocks are attack and which belong to same user
Nominally, 100 blocks, 100 commands each
No real session start/end info provided
This could be an issue
…
PHMM Applications
5Slide6
Previous Work
Lots of papers use “Schonlau
dataset”Types of methods that have been usedInformation theoreticText mining
Hidden Markov Model
Naïve Bayes
Sequences and bioinformatics
SVM, and Other
PHMM Applications
6Slide7
Information Theoretic
Schonlau originally used compression-based schemeThe theory is that commands by same user should compress more
By subsequent standard, poor resultsSome other similar work, but…
…
no strong results based on compression
Compression for malware detection?
PHMM Applications
7Slide8
Text Mining
Look for repetitive sequencesCan be used to detect particular userAlmost like a signature
PCA has also been used hereRepetitive sequences, i.e., patternsPCA can find such structure
Training cost considered high
Other ways to do “text mining”?
PHMM Applications
8Slide9
Hidden Markov Model
Need we say more?HMM is one of the most popular detection strategies in this fieldResults are good
Serves as benchmark in many (most) studies of other techniquesWe implement HMM detector and compare to PHMM
PHMM Applications
9Slide10
Naïve Bayes
Naïve Bayes (NB) relies on frequenciesNo sequential info used
Very simpleEfficient training & scoringDiscuss naïve Bayes in later chapter
C
lose connection between HMM and NB
So, not too surprising that this works
But, surprising that it works so well
PHMM Applications
10Slide11
Sequences and Bioinformatics
n-gram approaches very popularLike HMM, also used as benchmark
Sequence alignment has been usedBased on Smith-Waterman algorithmLike constructing MSA in PHMMClosest previous work to PHMM
We’ll compare our PHMM results to both n-gram and HMM
PHMM Applications
11Slide12
Support Vector Machines
Several previous studies use SVMSVM has nice geometric interpretationSVMs very
popular in machine learningFor masquerade detection, SVM results are about same as NBClaimed that SVM is more efficient, as compared to naïve Bayes
But, naïve Bayes is very efficient
…
PHMM Applications
12Slide13
Other
Frequent and/or infrequent commands Neither seems to perform well“Hybrid Bayes one step Markov” and “hybrid multistep Markov”
Nice names, but not so good results“Non-negative matrix factorization”Good results
Ensemble (combination) approaches
Seem to offer slight improvement
PHMM Applications
13Slide14
Experimental Results
Again, we compare HMM and n-grams to several PHMM modelsAll are tested on Schonlau
datasetThen we generate a simulated datasetAll tested again on simulated dataWhy simulated data?
Schonlau
data has limitations
wrt
PHMMThis will be explained later
…
PHMM Applications
14Slide15
HMM & n-Gram ROC Curves
First, compare HMM and n-grams
PHMM Applications
15Slide16
HMM and n-Gram AUC
For ROC curves on previous slide…
PHMM Applications
16Slide17
Training PHMM
How many sequences to use?More sequences, better for E matrix…
…but worse for gapsLength of each sequence?For Schonlau
dataset, we have 5k training commands per user
Where to begin/end sequences?
No good answers for
Schonlau dataset
PHMM Applications
17Slide18
PHMM Sequences
Note that all 5k commands used in each case
PHMM Applications
18Slide19
PHMM ROC Curves
ROC curves for each PHMM caseAny trend?
PHMM Applications
19Slide20
PHMM AUC
AUC for each PHMM case5, 10, and 20 sequences are best cases
PHMM Applications
20Slide21
HMM, n-Gram, and PHMM
Again, for Schonlau
datasetWhich method is better?
PHMM Applications
21Slide22
HMM vs PHMM
HMM and PHMM give similar results on Schonlau
datasetSurprising that PHMM does so wellWhy? No begin/end sequence info!
What if we had “better” sequences?
PHMM could certainly do better and maybe much, much better
But how to get a better dataset?
PHMM Applications
22Slide23
Simulated Dataset
Generate Markov model for each userBased on monograph & digraph stats
Like matrices π and
A
of an HMM
Now we can generate sequences
Use matrix
π
to select initial element
Then use matrix
A
to generate sequence
HMM must do well on this data
(why?)
PHMM might do well
…
or not
…
PHMM Applications
23Slide24
ROC Curves Simulated Data
HMM vs
PHMMBased on 5k training commands
PHMM Applications
24Slide25
AUC for Simulated Data
Again, based on 5k training commands
PHMM Applications
25Slide26
Real World Problem
Masquerade detection in real worldAt first, we have little training dataCan’t protect user until we train a
modelSo, we want to train as soon as possibleMinimum training data needed to obtain a useful model?
We compare HMM and PHMM with
200, 400, and 800 training commands
PHMM Applications
26Slide27
Limited Training Data
Simulated dataHMM
vs PHMMBig difference when very little training data available
PHMM Applications
27Slide28
Limited Training Data
PHMM most impressive with very little data (especially wrt AUC
0.1)
PHMM Applications
28Slide29
Limited Training Data
Same results as previous slide
PHMM Applications
29Slide30
Optimal Masquerade Detection Strategy?
Obtain 200 commands, train PHMMUse this PHMM model until a reliable set of 800+ commands is available
Then train HMM on 800+ commandsUse HMM from then onGives us a reliable model with limited data, and best model with more data
PHMM Applications
30Slide31
Another PHMM Advantage?
PHMM might be better when attacker hijacks ongoing sessionMasquerader mimics average behavior
This is what is modeled by HMMHarder to mimic sequential behaviorAs modeled by PHMM
Depends on position in the sequence
This should be investigated further
…
PHMM Applications
31Slide32
PHMM for Malware Detection
PHMM Applications
32
Swapna
Vemparala
Mark StampSlide33
Malware Detection
In previous work, PHMM tested for metamorphic detectionBased on extracted
opcodesResults were generally not impressiveMSA has many gaps and PHMM is weakCode transposition causes problems
And code transposition common in malware
Opcode
sequence
not strong
wrt
PHMM
PHMM Applications
33Slide34
Malware Detection 2.0
Here, again apply PHMM to malwareBut what to use as features ???Want feature(s) where
…Sequence/order is criticalAnd, difficult for malware writer to
modify sequential information
What feature(s) to use?
(
Static) opcodes
not good in PHMM
PHMM Applications
34Slide35
Software Birthmarks
Birthmark is inherent feature of codeIn contrast to a watermark
We consider both static and dynamic birthmarksStatic
collected without executing
Dynamic
execution/emulation Examples of each?
Advantages/disadvantages of each?
PHMM Applications
35Slide36
This Research
Consider opcodesStatic feature, extracted by disassembly
Also consider API callsDynamic, use Buster Sandbox AnalyzerCompare HMM and PHMM for both
Then 3 cases for each malware family
…
Static and dynamic HMM
Dynamic PHMM
PHMM Applications
36Slide37
Data
Malware data from Malicia
ProjectPHMM Applications
37
Benign set of 20 Windows applications Slide38
HMM & Opcode
Sequences
Scatterplots and ROC curves for Security Shield
PHMM Applications
38Slide39
HMM Results
Results for all families, static and dynamic birthmarks
PHMM Applications
39Slide40
PHMM
Dynamic birthmarks, i.e., API calls
PHMM Applications
40Slide41
Results
Static and dynamic HMMAnd dynamic PHMM
PHMM Applications
41Slide42
Bottom Line
In these cases, dynamic data gives better resultsAPI calls better than (static) opcodes
HMM does very well on API calls……but
PHMM can do even better
Sequential info matters in API calls!
Is PHMM really worth it?
PHMM Applications
42Slide43
References
Masquerade detectionL. Huang and M. Stamp, Masquerade detection using profile hidden Markov models, Computers & Security
, 30(8):732-747, November 2011Malware detectionS. Vemparala
, et al, Malware detection using dynamic birthmarks, 2nd International Workshop on Security & Privacy Analytics (IWSPA 2016), co-located with ACM CODASPY 2016, March 9-11, 2016
PHMM Applications
43