/
Le Yu, Xiapu Luo §, Xule Liu, Tao Zhang Le Yu, Xiapu Luo §, Xule Liu, Tao Zhang

Le Yu, Xiapu Luo §, Xule Liu, Tao Zhang - PowerPoint Presentation

phoenixbristle
phoenixbristle . @phoenixbristle
Follow
354 views
Uploaded On 2020-10-06

Le Yu, Xiapu Luo §, Xule Liu, Tao Zhang - PPT Presentation

Department of Computing The Hong Kong Polytechnic University The Hong Kong Polytechnic University Shenzhen Research Institute cslyu csxluo csxliu cstzhangcomppolyueduhk 2016 DSN CCF B ID: 813127

policy privacy information analysis privacy policy analysis information app apps module policies static machine graph papers based learning

Share:

Link:

Embed:

Download Presentation from below link

Download The PPT/PDF document "Le Yu, Xiapu Luo §, Xule Liu, Tao Zhang" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Le Yu, Xiapu Luo §, Xule Liu, Tao ZhangDepartment of Computing, The Hong Kong Polytechnic UniversityThe Hong Kong Polytechnic University Shenzhen Research Institute{cslyu, csxluo, csxliu, cstzhang}@comp.polyu.edu.hk2016 DSN (CCF B)

Can We Trust the Privacy Policies of Android Apps?

Slide2

overview

A novel approach to a

utomatically identify three kinds of problems in privacy policy :

(NLP and static analysis)

1.

Incomplete privacy policy. The privacy policy does not cover an app

'

s all behaviors of accessing sensitive information.

2.

Incorrect privacy policy. The privacy policy declares that the app will not access user information but the app does.

3.

Inconsistent privacy policy. The privacy policy of an app is in conflict with that of its third-party libs.

Slide3

overview

点击增加文本

点击增加文本

点击增加文本

Slide4

overview

点击增加文本

点击增加文本

点击增加文本

(1) Privacy policy analysis module. It analyzes a privacy policy to determine the information (not) to be collected, used, retained, or disclosed.

(2) Static analysis module . It inspects an app

'

s bytecode to decide whether the app will collect or retain private information.

(3) Problem identification module . It employs the models of three kinds of problems to identify incomplete privacy policy , incorrect privacy policy, and inconsistent privacy policy.

Slide5

01、Privacy Policy Analysis Module

Collect, use, retain, disclose

and their passive voice.

Step 1

: Sentence extraction:use the natural language toolkit (NLTK) to divide the text into sentences.

Step 2

: Syntactic analysis: use Stanford Parser to obtain its syntactic tree and dependency relations.

Slide6

01、Privacy Policy Analysis Module

Step 3

: Pattern generation: The seed pattern is subject-verb-object and the initial verbs include “collect”, “use”, “retain”, and “disclose”.

Insert the subjects and the objects with frequencies higher than the median into the subject list and the object list, respectively and use them to find new pattern.Then, we look for other new patterns using subject-“allowed”-“access”-object pattern.

Slide7

01、Privacy Policy Analysis Module

Step 5

: Negation analysis: PPChecker determines whether a sentence is negative by checking the existence of negation words.

Information elements extraction

: main verb, action executor, resource, and constraint.

Slide8

02、Static Analysis Module

Collected information and retained information.

Android property graph (APG) ,abstract syntax tree (AST), interprocedure controlflow graph (ICFG), method call graph (MCG), and system dependency graph (SDG) of the app.

Collected information:

Detect invocation of each sensitive API by querying the graph database.

Retained information: Static taint analysis.

Slide9

03、Problem identification module

1: Detecting incomplete privacy policy through description and code

2: Discovering Incorrect Privacy Policy

3: Revealing Inconsistent Privacy Policy:

(1) AppSenti’s and LibSentj ’s main verbs belong to the same category ( V P

collect

, V P

use

, V Pretain, or V Pdisclose);(2) AppSenti is a negative sentence and LibSentj is a positive sentence;

(3) AppSent

i

and LibSent

j

refer to the same resource.

Slide10

04、Other papers

Automated Analysis of Privacy Requirements for Mobile Apps

(NDSS 17)(AAAI 16)

In this study we introduce a scalable system to help analyze and predict Android apps’ compliance with privacy requirements. Our analysis of 17,991 free Android apps shows the viability of combining machine learning-based privacy policy analysis with static code analysis of apps.

opp115

Policy checking->static analysis->identify and analyze potential privacy requirement inconsistencies between policies and apps->predict such potential inconsistencies based on app metadata(Top Developer badge)

71% (6,198/8,696) apps without a policy link are indeed not adhering to the policy requirement.

apps with recent update years have more often a policy than those that were updated longer ago.

Apps with high install rates have more often a policy than apps with average/low install rates.Top Developer badge/for young usersclassifier:OPP115->Using information gain and tf-idf we identified the most meaningful keywords for each practice and created sets of keywords->extract all sentences from a policy that contain at least one of the keywords->second set of keywords that refers to the actions of a data practice->”share”:”will/not share”->SVM and LR

For each app our system builds an API invocation map->check the package names of the callers against the package names of third party libraries(10) to detect sharing of data->Only if the analysis detects that the library has the required permission (permission extraction), the app is classified as sharing device IDs with third parties->

Slide11

04、Other papers

The Creation and Analysis of aWebsite Privacy Policy Corpus

(ACL 2016)

We monitored Google Trends (Google, 2015) for one month (May 2015) to collect the top five search queries for each trend.

Then, for each query we retrieved the first five websites listed on each of the first 10 pages of results. The annotation scheme was then applied to additional policies and

refined over multiple iterations during discussions

among experts.

A

utomatically assign category labels to policy segments: a binary vector of category specificlabels per segments->logistic regression ,

SVM

and HMM.

Slide12

04、Other papers

CLAUDETTE: an Automated Detector of Potentially Unfair Clauses in Online Terms of Service

We present an experimental study where machine learning is employed to automatically detect such potentially unfair clauses.

Slide13

04、Other papers

Polisis: Automated Analysis and Presentation of Privacy Policies Using Deep Learning

It enables scalable, dynamic, and multi-dimensional queries on natural language privacy policies.

(1) an unsupervised stage, in which we build domain-specific word vectors (i.e., word embeddings) for privacy policies from unlabeled data, and (2) a supervised stage, in which we train anovel hierarchy of privacy-text classifiers, based on neural networks, that leverages the word vectors.

opp115

Slide14

04、Other papers

A Machine Learning Solution to Assess Privacy Policy Completeness

We define a set of privacy categories that the policy should cover based on privacy directives, regulations and common prac-tice, then use text categorization and machine learning techniques to check which paragraphs in the natural language privacy policy belong to which category, and grade the policy based on the categories covered.

A high completeness grade only meansthe policy covers the most of the categories, but says nothing about their semantic value.

We selected Na¨ıve Bayes (NB), Linear Support Vector Machine (LSVM), and Ridge Regression (Ridge) from the‘linear’ algorithms and the k-Nearest Neighbor (k-NN), the Decision Tree (DT), and the Support Vector Machine (SVM)with non-linear Kernel from the ‘non-linear’ algorithms. The voting committee method combines theresults of different classifiers, into a voting committee.

In our context, the pre-classified documents are paragraphsfrom manually labeled privacy policies.

In defining the privacy categories we considered differentprivacy regulations and directives, such as the EU 95/46/EC...