Shabana Kazi Mark Stamp HMMs for Piracy Detection 1 Intro Here we apply metamorphic analysis to software piracy detection Very similar to techniques used in malware detection But problem is completely different ID: 174162
Download Presentation The PPT/PDF document "Hidden Markov Models for Software Piracy..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Hidden Markov Models for Software Piracy Detection
Shabana KaziMark Stamp
HMMs for Piracy Detection
1Slide2
Intro
Here, we apply metamorphic analysis to software piracy detectionVery similar to techniques used in malware detectionBut, problem is completely different Has nothing to do with malwareWe show that there are other applications of such techniquesHMMs for Piracy Detection2Slide3
Software Piracy
Software piracy is major problemBy 2009 estimate, $3 to $4 lost to piracy for every $1 in software salesUsually, piracy consists of taking software without modificationIn some cases, software is modifiedCommercial theft of intellectual propertyThief really doesn’t want to get caught… HMMs for Piracy Detection3Slide4
Software Piracy
We assume software is stolenAnd modified, making it hard to detectIf completely rewritten from scratch, we won’t detect it by our approachWant to make life hard for bad guysIdeally, major modifications requiredHow much modification is need before we cannot reliably detect?
HMMs for Piracy Detection
4Slide5
Goals
Technique applicable to any softwareNo special effort by developerNothing extra inserted into codeWe only require access to exe fileNot a watermarking schemeMore like software “birthmark” analysisAlso not plagiarism detectionHere, want a “deeper” analysisHMMs for Piracy Detection5Slide6
Use Case
You work for Alice’s Software CompanyAnd you develop fancy software for ASCTrudy’s Software Company (TSC) develops suspiciously similar productYou suspect TSC of stealing your codeNot identical, but seems similarWhat can you do?We’ve got some ideas that might help…HMMs for Piracy Detection6Slide7
Use Case
Using the technique discussed hereCan easily measure code similarityLow similarity?Then no hope of proving code is stolenHigh similarity? Further (costly) analysis is warrantedHigh similarity does not prove stolenBut a good reason to take a closer look HMMs for Piracy Detection
7Slide8
Background
Metamorphic softwareMetamorphic techniques (dead code, permutation, substitution)HMMBasic ideas and notationThe 3 problems and their solutions (discussed at a high level)We’ve seen all of this beforeHMMs for Piracy Detection8Slide9
Overview
Training and scoringTrain HMM on slightly morphed copies of given “base” softwareSlight morphing to avoid overfittingScore morphed copies and other filesHere, morphing serves to simulate modifications by attackerWant to know how much morphing required before detection failsHMMs for Piracy Detection9Slide10
Metamorphic Generator
Built our own metamorphic generatorMorph based on extracted opcodesMorphing consists of dead code insertionSpecify a dead code percentage and number of blocks to insertDo not require morphed code worksMakes detection more difficult, not easierA worst-case scenario, detection-wiseHMMs for Piracy Detection
10Slide11
Training
Given a base executable file…Extract its opcode sequenceGenerate 100 slightly morphed copiesEach morphed 10%, using dead code extracted from random “normal” fileTrain HMM on morphed copiesUsing 5-fold cross validationNote: We train one model for each “fold”HMMs for Piracy Detection
11Slide12
Training
Illustration of training processSlightly morphed copies of base programHMMs for Piracy Detection12Slide13
Determine Threshold
For each of 5-foldsTrain HMMScore 20 morphed files (match set) and 15 normal (nomatch set)Determine threshold based on scoresThreshold is highest score of normal fileImplies FPR = 0; equivalently, TNR = 1 (for the given “fold”)HMMs for Piracy Detection
13Slide14
Setting a Threshold
Process used to set thresholdHMMs for Piracy Detection14Slide15
Experiments
Want to determine robustnessFor each base file tested…Train to obtain HMM and thresholdMorph base file at various percentagesUsing various morphing strategiesRefer to this morphing as tamperingScore each tampered copyClassify, based on thresholdHMMs for Piracy Detection15Slide16
Experiments
Scoring tampered filesHMMs for Piracy Detection16Slide17
Experiment Details
For each base file6 models10 tamper percent for each100 files eachSo, 6000 scores!HMMs for Piracy Detection17Slide18
Experiment Details
Tested 10 base files, each data pointSo 60,000 scores computed…HMMs for Piracy Detection18Slide19
Experiment Details
Repeated entire experiment 6 timesUsing different number of blocks in training phaseTraining made little difference on scoresSo, here we only give results where 1 block used in training phaseIn total 360,000 scores computedAnd 360 “models” generateThat is, 1800 HMMs (one per fold)
HMMs for Piracy Detection
19Slide20
Results: Bar Graph
HMMs for Piracy Detection20Slide21
Results: 3-d Plot
HMMs for Piracy Detection21Slide22
Conclusions
Results look very promisingRobust high degree of morphing required before base file undetectedPractical only requires exe, no special effort when developingApplies to any exe, at any timeOverall, strong software “birthmark” strategy with practical implicationsHMMs for Piracy Detection22Slide23
Future Work
Statistical analysis somewhat weakResults may be stronger than it appearsMany other scores/combinations of scores can be testedResults can only get betterConsider other morphing techniquesAnd other file types (e.g., bytecode)And mitigations for 1-block morphing …HMMs for Piracy Detection
23Slide24
References
S. Kazi and M. Stamp, Hidden Markov models for software piracy detection, Information Security Journal: A Global Perspective, 22:140-149, 2013HMMs for Piracy Detection24