/
Hidden Markov Models for Software Piracy Detection Hidden Markov Models for Software Piracy Detection

Hidden Markov Models for Software Piracy Detection - PowerPoint Presentation

jane-oiler
jane-oiler . @jane-oiler
Follow
423 views
Uploaded On 2015-10-27

Hidden Markov Models for Software Piracy Detection - PPT Presentation

Shabana Kazi Mark Stamp HMMs for Piracy Detection 1 Intro Here we apply metamorphic analysis to software piracy detection Very similar to techniques used in malware detection But problem is completely different ID: 174162

detection piracy software hmms piracy detection hmms software morphed morphing base code training file threshold scores dead metamorphic analysis

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Hidden Markov Models for Software Piracy..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Hidden Markov Models for Software Piracy Detection

Shabana KaziMark Stamp

HMMs for Piracy Detection

1Slide2

Intro

Here, we apply metamorphic analysis to software piracy detectionVery similar to techniques used in malware detectionBut, problem is completely different Has nothing to do with malwareWe show that there are other applications of such techniquesHMMs for Piracy Detection2Slide3

Software Piracy

Software piracy is major problemBy 2009 estimate, $3 to $4 lost to piracy for every $1 in software salesUsually, piracy consists of taking software without modificationIn some cases, software is modifiedCommercial theft of intellectual propertyThief really doesn’t want to get caught… HMMs for Piracy Detection3Slide4

Software Piracy

We assume software is stolenAnd modified, making it hard to detectIf completely rewritten from scratch, we won’t detect it by our approachWant to make life hard for bad guysIdeally, major modifications requiredHow much modification is need before we cannot reliably detect?

HMMs for Piracy Detection

4Slide5

Goals

Technique applicable to any softwareNo special effort by developerNothing extra inserted into codeWe only require access to exe fileNot a watermarking schemeMore like software “birthmark” analysisAlso not plagiarism detectionHere, want a “deeper” analysisHMMs for Piracy Detection5Slide6

Use Case

You work for Alice’s Software CompanyAnd you develop fancy software for ASCTrudy’s Software Company (TSC) develops suspiciously similar productYou suspect TSC of stealing your codeNot identical, but seems similarWhat can you do?We’ve got some ideas that might help…HMMs for Piracy Detection6Slide7

Use Case

Using the technique discussed hereCan easily measure code similarityLow similarity?Then no hope of proving code is stolenHigh similarity? Further (costly) analysis is warrantedHigh similarity does not prove stolenBut a good reason to take a closer look HMMs for Piracy Detection

7Slide8

Background

Metamorphic softwareMetamorphic techniques (dead code, permutation, substitution)HMMBasic ideas and notationThe 3 problems and their solutions (discussed at a high level)We’ve seen all of this beforeHMMs for Piracy Detection8Slide9

Overview

Training and scoringTrain HMM on slightly morphed copies of given “base” softwareSlight morphing to avoid overfittingScore morphed copies and other filesHere, morphing serves to simulate modifications by attackerWant to know how much morphing required before detection failsHMMs for Piracy Detection9Slide10

Metamorphic Generator

Built our own metamorphic generatorMorph based on extracted opcodesMorphing consists of dead code insertionSpecify a dead code percentage and number of blocks to insertDo not require morphed code worksMakes detection more difficult, not easierA worst-case scenario, detection-wiseHMMs for Piracy Detection

10Slide11

Training

Given a base executable file…Extract its opcode sequenceGenerate 100 slightly morphed copiesEach morphed 10%, using dead code extracted from random “normal” fileTrain HMM on morphed copiesUsing 5-fold cross validationNote: We train one model for each “fold”HMMs for Piracy Detection

11Slide12

Training

Illustration of training processSlightly morphed copies of base programHMMs for Piracy Detection12Slide13

Determine Threshold

For each of 5-foldsTrain HMMScore 20 morphed files (match set) and 15 normal (nomatch set)Determine threshold based on scoresThreshold is highest score of normal fileImplies FPR = 0; equivalently, TNR = 1 (for the given “fold”)HMMs for Piracy Detection

13Slide14

Setting a Threshold

Process used to set thresholdHMMs for Piracy Detection14Slide15

Experiments

Want to determine robustnessFor each base file tested…Train to obtain HMM and thresholdMorph base file at various percentagesUsing various morphing strategiesRefer to this morphing as tamperingScore each tampered copyClassify, based on thresholdHMMs for Piracy Detection15Slide16

Experiments

Scoring tampered filesHMMs for Piracy Detection16Slide17

Experiment Details

For each base file6 models10 tamper percent for each100 files eachSo, 6000 scores!HMMs for Piracy Detection17Slide18

Experiment Details

Tested 10 base files, each data pointSo 60,000 scores computed…HMMs for Piracy Detection18Slide19

Experiment Details

Repeated entire experiment 6 timesUsing different number of blocks in training phaseTraining made little difference on scoresSo, here we only give results where 1 block used in training phaseIn total 360,000 scores computedAnd 360 “models” generateThat is, 1800 HMMs (one per fold)

HMMs for Piracy Detection

19Slide20

Results: Bar Graph

HMMs for Piracy Detection20Slide21

Results: 3-d Plot

HMMs for Piracy Detection21Slide22

Conclusions

Results look very promisingRobust  high degree of morphing required before base file undetectedPractical  only requires exe, no special effort when developingApplies to any exe, at any timeOverall, strong software “birthmark” strategy with practical implicationsHMMs for Piracy Detection22Slide23

Future Work

Statistical analysis somewhat weakResults may be stronger than it appearsMany other scores/combinations of scores can be testedResults can only get betterConsider other morphing techniquesAnd other file types (e.g., bytecode)And mitigations for 1-block morphing …HMMs for Piracy Detection

23Slide24

References

S. Kazi and M. Stamp, Hidden Markov models for software piracy detection, Information Security Journal: A Global Perspective, 22:140-149, 2013HMMs for Piracy Detection24