/
CS 502 Directed Studies: Adversarial Machine Learning CS 502 Directed Studies: Adversarial Machine Learning

CS 502 Directed Studies: Adversarial Machine Learning - PowerPoint Presentation

clara
clara . @clara
Follow
27 views
Uploaded On 2024-02-09

CS 502 Directed Studies: Adversarial Machine Learning - PPT Presentation

Dr Alex Vakanski Lecture 1 Introduction to Adversarial Machine Learning Lecture Outline Machine Learning ML Adversarial ML AML Adversarial examples Attack taxonomy Common adversarial attacks ID: 1045997

learning adversarial image machine adversarial learning machine image model attack introduction class examples training credit blog attacks function input

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "CS 502 Directed Studies: Adversarial Mac..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1. CS 502Directed Studies: Adversarial Machine LearningDr. Alex Vakanski

2. Lecture 1Introduction to Adversarial Machine Learning

3. Lecture OutlineMachine Learning (ML)Adversarial ML (AML)Adversarial examplesAttack taxonomyCommon adversarial attacksNoise, semantic attack, FGSM, BIM, PGD, DeepFool, CW attackDefense against adversarial attacksAdversarial training, random resizing and padding, detect adversarial examplesConclusionReferencesOther AML resources

4. Machine Learning (ML)ML tasksSupervised, unsupervised, semi-supervised, self-supervised, meta learning, reinforcement learningData collection and preprocessingSensors, cameras, I/O devices, etc.Apply a ML algorithmTraining phase: learn ML model (parameter learning, hyperparameter tuning)Testing phase (inference): predict on unseen dataSlide credit: Binghui Wang: Adversarial Machine Learning — An Introduction

5. ML is UbiquitousHealthcarePicture from: He Xiaoyi – Adversarial Machine Learning

6. Adversarial MLThe classification accuracy of GoogLeNet on MNIST under adversarial attacks drops from 98% to 18% (for ProjGrad attack) or 1% (DeepFool attack)Picture from: https://blog.floydhub.com/introduction-to-adversarial-machine-learning/

7. Adversarial ExamplesWhat do you see?Slide credit: Binghui Wang: Adversarial Machine Learning — An Introduction

8. Adversarial ExamplesThe classifier misclassifies adversarially manipulated imagesSlide credit: Binghui Wang: Adversarial Machine Learning — An Introduction

9. Adversarial ExamplesThe differences between the original and manipulated images are very small (hardly noticeable to the human eye)Original imageAttack imageDifferenceSlide credit: Binghui Wang: Adversarial Machine Learning — An Introduction

10. Adversarial ExamplesAn adversarially perturbated image of a panda is misclassified as a gibbonThe image with the perturbation to the human eye looks indistinguishable from the original imageSmall adversarial noiseClassified as panda57.7% confidenceOriginal imageClassified as gibbon99.3% confidenceAdversarial imageGibbonPicture from: https://blog.floydhub.com/introduction-to-adversarial-machine-learning/

11. Adversarial ExamplesSimilar examplePicture from: Szagedy (2014) – Intriguing Properties of Neural Networks

12. Adversarial ExamplesIf a stop sign is adversarially manipulated and it is not recognized by a self-driving car, it can result in an accidentSmall adversarial noise?Slide credit: He Xiaoyi – Adversarial Machine Learning

13. Adversarial ExamplesRecent work manipulated a stop sign with adversarial patches Caused the DL model of a self-driving car to classify it as a Speed Limit 45 sign (100% attack success in lab test, and 85% in field test)Picture from: Eykholt (2017) - Robust Physical-World Attacks on Deep Learning Visual Classification

14. Adversarial ExamplesLab test images for signs with a target class Speed Limit 45Picture from: Eykholt (2017) - Robust Physical-World Attacks on Deep Learning Visual Classification

15. Adversarial ExamplesIn this example, a 3D-printed turtle is misclassified by a DNN as a rifle (video link)

16. Adversarial ExamplesA person wearing an adversarial patch is not detected by a person detector model (YOLOv2)

17. Adversarial ExamplesA “train” in the hallway?Picture from: Yevgeniy Vorobeychik, Bo Li - Adversarial Machine Learning Tutorial

18. Adversarial ExamplesNon-scientific: a Tesla owner checks if the car can distinguish a person wearing a cover-up from a traffic cone (video link)

19. Adversarial ExamplesAbusive use of machine learningUsing GANs to generate fake content (a.k.a. deep fakes)Videos of politicians saying things they never saidBarak Obama’s deep fake, or the House Speaker Nancy Pelosi appears drunk in a videoBill Hader impersonation of Arnold SchwarzeneggerCan have strong societal implications: elections, automated trolling, court evidence

20. Adversarial MLAML is a research field that lies at the intersection of ML and computer security E.g., network intrusion detection, spam filtering, malware classification, biometric authentication (facial detection)ML algorithms in real-world applications mainly focus on increased accuracyHowever, few techniques and design decisions focus on keeping the ML models secure and robustAdversarial ML: ML in adversarial settingsAttack is a major component of AMLBad actors do bad thingsTheir main objective is not to get detected (change behavior to avoid detection)Slide credit: Binghui Wang: Adversarial Machine Learning — An Introduction

21. Attack TaxonomyData poisoning (Causative attack): Attack on the training phaseAttackers perturb the training set to fool the modelInsert malicious inputs in the training setModify input instances in the training setChange the labels to training inputsAttackers attempt to influence or corrupt the ML model or the ML algorithm itselfSlide credit: Binghui Wang: Adversarial Machine Learning — An Introduction

22. Attack TaxonomyEvasion attack (Exploratory attack):Attack on the testing phaseAttackers do not tamper with the ML model, but instead cause it to produce adversary outputsEvasion attack is the most common attackSlide credit: Binghui Wang: Adversarial Machine Learning — An Introduction

23. Evasion AttackEvasion attack can be further classified into:White-box attackAttackers have full knowledge about the ML modelI.e., they have access to parameters, hyperparameters, gradients, architecture, etc.Black-box attackAttackers don’t have access to the ML model parameters, gradients, architecturePerhaps they have some knowledge about the used ML algorithmE.g., attackers may know that a ResNet50 model is used for classification, but they don’t have access to the model parametersAttackers may query the model to obtain knowledge (can get examples)Slide credit: Binghui Wang: Adversarial Machine Learning — An Introduction

24. Attack TaxonomyDepiction of the adversarial attack taxonomy from Alessio's Adversarial ML presentation at FloydHubEvasion AttackData Poisoning AttackPicture from: https://blog.floydhub.com/introduction-to-adversarial-machine-learning/

25. Attack TaxonomyEach of the above attacks can further be:Non-targeted attackThe goal is to mislead the classifier to predict any labels other than the ground truth labelMost existing work deals with this goalE.g., perturb an image of a military tank, so that the model predicts it is any other class than a military tankTargeted attackThe goal is to mislead the classifier to predict a target label for an imageMore difficultE.g., perturb an image of a turtle, so that the model predicts it is a riffleE.g., perturb an image of a Stop sign, so that the model predicts it is a Speed Limit sign

26. Evasion AttacksFind a new input (similar to original input) but classified as another class (untargeted or targeted) Adversarial attack imageOriginal input warplaneSlide credit: He Xiaoyi – Adversarial Machine Learning

27. Evasion AttacksHow to find adversarial images?Given an image x, which is labeled by the classifier (e.g., LogReg, SVM, or NN) as class , i.e., Create an adversarial image by adding small perturbations to the original image, i.e., such that the distance is minimalSo that the classifier assigns a label to the adversarial image that is different than , i.e.,  distance between x and x+𝛿x+𝛿 is classified as target class teach element of x+𝛿 is in [0,1] (to be a valid image)

28. Evasion AttacksDistance metrics between and : norm: the number of elements in such that Corresponds to the number of pixels that have been changed in the image norm: city-block distance, or Manhattan distance norm: Euclidean distance, or mean-squared error norm: measures the maximum change to any of the pixels in the image  

29. Spam Filtering Adversarial GameBased on cumulative weights assigned to words, an email is classified as a spam or a legitimate messagecheap = 1.0mortgage = 1.5Total score = 2.5From: spammer@example.comCheap mortgage now!!! Feature Weights> 1.0 (threshold)1.2.3.SpamSlide credit: Daniel Lowd - Adversarial Machine Learning

30. Spam Filtering Adversarial GameThe spammers adapt to evade the classifiercheap = 1.0mortgage = 1.5Eugene = -1.0Oregon = -1.0Total score = 0.5From: spammer@example.comCheap mortgage now!!!Eugene OregonFeature Weights< 1.0 (threshold)1.2.3.OKSlide credit: Daniel Lowd - Adversarial Machine Learning

31. Spam Filtering Adversarial GameThe classifier is adapted by changing the feature weights cheap = 1.5mortgage = 2.0Eugene = -0.5Oregon = -0.5Total score = 2.5Feature Weights> 1.0 (threshold)1.2.3.OKSpamFrom: spammer@example.comCheap mortgage now!!!Eugene OregonSlide credit: Daniel Lowd - Adversarial Machine Learning

32. Common Adversarial AttacksNoise attackSemantic attackFast gradient sign method (FGSM) attackBasic iterative method (BIM) attackProjected gradient descent (PGD) attackDeepFool attackCarlini-Wagner (CW) attack

33. Noise AttackNoise attackThe simplest form of adversarial attackNoise is a random arrangement of pixels containing no informationIn Python, noise is created by the randn() functionI.e., random numbers from a normal distribution (0 mean and 1 st. dev.)It represents a non-targeted black-box evasion attack+=Prediction: gorillaPrediction: fountainPicture from: https://blog.floydhub.com/introduction-to-adversarial-machine-learning/

34. Semantic AttackSemantic attackHosseini (2017) - On the Limitation of Convolutional Neural Networks in Recognizing Negative ImagesUse negative imagesReverse all pixels intensitiesE.g., change the sign of all pixels, if the pixels values are in range [-1,1]Prediction: gorillaPrediction: weimaranerWeimaraner (a dog breed) Original imageNegative imagePicture from: https://blog.floydhub.com/introduction-to-adversarial-machine-learning/

35. FGSM AttackFast gradient sign method (FGSM) attackGoodfellow (2015) - Explaining and Harnessing Adversarial ExamplesAn adversarial image xadv is created by adding perturbation noise to an image xNotation: input image x, cost function , NN model h, NN weights (parameters) , gradient (Greek letter “nabla”), noise magnitude Perturbation noise is calculated as the gradient of the loss function with respect to the input image x for the true class label yThis increases the loss for the true class y → the model misclassifies the image xadv 

36. FGSM AttackFGSM is a white-box non-targeted evasion attackWhite-box, since we need to know the gradients to create the adversarial imageThe noise magnitude is ε = 0.007Note: nematode is an insect referred to as roundworm

37. FGSM AttackRecall that training NNs is based on the gradient descent algorithmThe values of the network parameters (weights) w are iteratively changed until a minimum of the loss function is reachedGradients of the loss function with respect to the model parameters () give the direction and magnitude for updating the parametersThe step of each update is the learning rate α Picture from: https://blog.floydhub.com/introduction-to-adversarial-machine-learning/

38. FGSM AttackThe sign and magnitude of the gradient give the direction and the slope of the steepest descentLeft image: + and – sign of the gradientRight image: small, adequate, and large slope of the weight update, based on the magnitude of the gradientMiddle image: small and large α (learning rate)To minimize the loss function, the weights w are changed in the opposite direction of the gradient, i.e.,  Sign + directionSign - directionSlope too largeSlope too smallSlope rightLarge αSmall α

39. FGSM AttackFGSM attack examplePrediction: car mirrorPrediction: sunglassesOriginal imageAdversarial imagePicture from: https://blog.floydhub.com/introduction-to-adversarial-machine-learning/

40. BIM AttackBasic iterative method (BIM) attackKurakin (2017) Adversarial Examples in the Physical WorldBIM is a variant of FGSM: it repeatedly adds noise to the image x in multiple iterations, in order to cause misclassificationThe number of iterations steps is t, and α is the amount of noise that is added at each stepThe perturbed image after the t iterations is Multiple steps of adding noise increase the chances of misclassifying the imageCompare to FGSM 

41. BIM AttackBIM attack example, cell phone image

42. PGD AttackProjected gradient descent (PGD) attackMadry (2017) Towards Deep Learning Models Resistant to Adversarial AttacksPGD is an extension of BIM (and FGSM), where after each step of perturbation, the adversarial example is projected back onto the -ball of x using a projection function ΠDifferent from BIM, PGD uses random initialization for x, by adding random noise from a uniform distribution with values in the range PGD is regarded as the strongest first-order attackFirst-order attack means that the adversary uses only the gradients of the loss function with respect to the input 

43. PGD AttackPGD attack examplePrediction: baboonPrediction: Egyptian catEgyptian catOriginal imageAdversarial imagePicture from: https://blog.floydhub.com/introduction-to-adversarial-machine-learning/

44. PGD AttackGradient approaches can also be designed as targeted white-box attacksThe added perturbation noise aims to minimize the loss function of the image for a specific class labelIn this example, the target class is maracaThe iterations loop doesn’t break until the image is classified into the target class, or until the maximum number of iterations is reachedPrediction: hippopotamusPrediction: maracaMaracaOriginal imageAdversarial imagePicture from: https://blog.floydhub.com/introduction-to-adversarial-machine-learning/

45. PGD AttackFor a targeted attack, if the target class label is denoted t, adversarial examples are created by usingI.e., it is based on minimizing the loss function with respect to the target class tThis is opposite to non-targeted attacks, which maximize the loss function with respect to the true class label  

46. DeepFool AttackDeepFool attackMoosavi-Dezfooli (2015) DeepFool: A Simple and Accurate Method to Fool Deep Neural NetworksDeepFool is an untargeted white-box attackIt mis-classifies the image with the minimal amount of perturbation possibleThere is no visible change to the human eye between the two imagesPrediction: canonPrediction: ProjectorDifferencePicture from: https://blog.floydhub.com/introduction-to-adversarial-machine-learning/

47. DeepFool AttackImage exampleOriginal image: whaleBoth DeepFool and FGSM perturb the image to be classifier as turtleDeepFool leads to a smaller perturbationPrediction: TurtleDifferencePrediction: TurtleDifferenceDeepFoolFGSM

48. DeepFool AttackE.g., consider a linear classifier algorithm applied to objects from 2 classes: green and orange circlesThe line that separates the 2 classes is called the hyperplaneData points falling on either sides of the hyperplane are attributed to different classes (such as benign vs. malicious class)Given an input x, DeepFool projects x onto the hyperplane and pushes it a bit beyond the hyperplane, thus misclassifying itBenignMaliciousHyperplaneBenignMaliciousHyperplanex xadv Picture from: Yevgeniy Vorobeychik, Bo Li - Adversarial Machine Learning

49. DeepFool AttackFor a multiclass problem with linear classifiers, there are multiple hyperplanes that separate an input x from other classesE.g., an example with 4 classes is shown in the image belowDeepFool finds that closest hyperplane to the input x0, in this case the hyperplane (most similar class of the other 3 classes) Then, projects the input and pushes it a little beyond the hyperplane  

50. DeepFool AttackFor non-linear classifiers (such as neural networks), the authors perform several iterations of adding perturbations to the imageAt each iteration, the classifier function is linearized around the current image, and a minimal perturbation is calculatedThe algorithm stopes when the class of the image change to another label than the true class

51. Carlini Wagner (CW) AttackCarlini-Wagner (CW) attackCarlini (2017) Towards Evaluating the Robustness of Neural NetworksThe initial formulation for creating adversarial attacks is difficult to solveCarlini-Wagner propose a reformulation of it which is solvable

52. Carlini Wagner (CW) AttackThe authors considered several variants for the function fThe best results were obtained by f6

53. Carlini Wagner (CW) AttackResults on the MNIST datasetattack attack attack 

54. Evasion Attacks on Black-Box ModelsAdversarial example transferabilityCross-model transferability: the same adversarial example is often misclassified by a variety of classifiers with different architecturesCross-training set transferability: the same adversarial example is often misclassified trained on different subsets of the training dataTherefore, an attacker can take the following steps to reverse-engineer the classifier:Train his own (white-box) substitute modelGenerate adversarial samplesApply the adversarial samples to the target ML modelSlide credit: Binghui Wang: Adversarial Machine Learning — An Introduction

55. Defense Against Adversarial AttacksAdversarial samples can cause any ML algorithm to failHowever, they can be used to build more accurate and robust modelsAML is a two-player game:Attackers aim to produce strong adversarial examples that evade a model with high confidence while requiring only a small perturbation Defenders aim to produce models that are robust to adversarial examples (i.e., the models don’t have adversarial examples, or the adversaries cannot find them easily)Defense strategies against adversarial attacks include:Adversarial trainingDetecting adversarial examplesGradient maskingRobust optimization (regularization, certified defenses)A list of adversarial defenses can be found at this link

56. Adversarial TrainingLearning the model parameters using adversarial samples is referred to as adversarial trainingThe training dataset is augmented with adversarial examples produced by known types of attacksFor each training input add an adversarial exampleHowever, if a model is trained only on adversarial examples, the accuracy to classify regular examples will reduce significantlyPossible strategies:Train the model from scratch using regular and adversarial examplesTrain the model on regular examples and afterward fine-tune with adversarial examples

57. Adversarial TrainingTraining with and without negative images for semantic attackAccuracy on regular imagesAccuracy on negative imagesFine-tunedTrained from scratchPicture from: https://blog.floydhub.com/introduction-to-adversarial-machine-learning/

58. Adversarial TrainingThe plots show the cross-entropy loss values for standard and adversarial training on MNIST and CIFAR10 datasets while creating adversarial examples using PDG attack (Madry, 2018)20 runs are shown, each starting at a random point within a perturbation range The final loss values on adversarially trained models are much smaller than on the original training datasets Picture from: Madry (2018) Towards Deep Learning Models Resistant to Adversarial Attacks

59. Random Resizing and PaddingModel training with randomly resizing the image and applying random padding on all four sides have shown to improve the robustness to adversarial attacks Xie (2018) – Mitigating Adversarial Effects Through Randomization

60. Detecting Adversarial ExamplesA body of work focused on distinguishing adversarial examples from regular clean examplesIf the defense method detects that an input example is adversarial, the classifier will refuse to predict its class labelExample detection defense methods Kernel Density (KD) detector based on Bayesian uncertainty featuresFeinman (2017) – Detecting Adversarial Samples from ArtifactsLocal Intrinsic Dimensionality (LID) of adversarial subspacesMa (2018) - Characterizing Adversarial Subspaces Using Local Intrinsic DimensionalityAdversary detection networksMetzen (2017) On detecting adversarial perturbations

61. Gradient MaskingGradient masking defense methods deliberately hide the gradient information of the modelSince most attacks are based on the model’s gradient informationDistillation defense – changes the scaling of the last hidden layer in NNs, hindering the calculation of gradientsPapernot (2016) Distillation as a defense to adversarial perturbations against deep neural networksInput preprocessing by discretization of image’s pixel values, or resizing and cropping, or smoothingBuckman (2018) Thermometer encoding: One hot way to resist adversarial examplesDefenseGAN – uses a GAN model to transform perturbed images into clean imagesSamangouei (2017) Defense-GAN: Protecting classifiers against adversarial attacks using generative models

62. Robust OptimizationRobust optimization aims to evaluate, and improve, the model robustness to adversarial attacksConsequently, learn model parameters that minimize the misclassification of adversarial examplesRegularization methods – train the model by penalizing large values of the parameters, or large values of the gradientsCisse (2017) Parseval networks: Improving robustness to adversarial examplesCertified defenses – for a given dataset and model, find the lower bound of the minimal perturbation: the model will be safe against any perturbations smaller than the lower boundRaghunathan (2018) Certified defenses against adversarial examples

63. ConclusionML algorithms and methods are vulnerable to many types of attacksAdversarial examples show its transferability in ML modelsI.e., either cross-models or cross-training setsAdversarial examples can be leveraged to improve the performance or the robustness of ML models

64. ReferencesIntroduction to Adversarial Machine Learning – blog post by Arunava ChakrabortyBinghui Wang: Adversarial Machine Learning — An IntroductionDaniel Lowd, Adversarial Machine LearningYevgeniy Vorobeychik, Bo Li, Adversarial Machine Learning (Tutorial)

65. Other AML RecoursesCleverhans - a repository from Google that implements latest research in AMLThe library is being updated to support TensorFlow2, PyTorch, and JaxAdversarial Robustness Toolbox - a toolbox from IBM that implements state-of-the-art attacks and defensesThe algorithms are framework-independent, and support TensorFlow, Keras, PyTorch, MXNet, XGBoost, LightGBM, CatBoost, etc.ScratchAI – a smaller AML library developed in PyTorch, and explained in this blog postRobust ML Defenses - list of adversarial defenses with codeAML Tutorial – by Bo Li, Dawn Song, and Yevgeniy VorobeychikNicholas Carlini website