Department of Computing The Hong Kong Polytechnic University The Hong Kong Polytechnic University Shenzhen Research Institute cslyu csxluo csxliu cstzhangcomppolyueduhk 2016 DSN CCF B ID: 813127
Download The PPT/PDF document "Le Yu, Xiapu Luo §, Xule Liu, Tao Zhang" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Le Yu, Xiapu Luo §, Xule Liu, Tao ZhangDepartment of Computing, The Hong Kong Polytechnic UniversityThe Hong Kong Polytechnic University Shenzhen Research Institute{cslyu, csxluo, csxliu, cstzhang}@comp.polyu.edu.hk2016 DSN (CCF B)
Can We Trust the Privacy Policies of Android Apps?
Slide2overview
A novel approach to a
utomatically identify three kinds of problems in privacy policy :
(NLP and static analysis)
1.
Incomplete privacy policy. The privacy policy does not cover an app
'
s all behaviors of accessing sensitive information.
2.
Incorrect privacy policy. The privacy policy declares that the app will not access user information but the app does.
3.
Inconsistent privacy policy. The privacy policy of an app is in conflict with that of its third-party libs.
Slide3overview
点击增加文本
点击增加文本
点击增加文本
Slide4overview
点击增加文本
点击增加文本
点击增加文本
(1) Privacy policy analysis module. It analyzes a privacy policy to determine the information (not) to be collected, used, retained, or disclosed.
(2) Static analysis module . It inspects an app
'
s bytecode to decide whether the app will collect or retain private information.
(3) Problem identification module . It employs the models of three kinds of problems to identify incomplete privacy policy , incorrect privacy policy, and inconsistent privacy policy.
Slide501、Privacy Policy Analysis Module
Collect, use, retain, disclose
and their passive voice.
Step 1
: Sentence extraction:use the natural language toolkit (NLTK) to divide the text into sentences.
Step 2
: Syntactic analysis: use Stanford Parser to obtain its syntactic tree and dependency relations.
Slide601、Privacy Policy Analysis Module
Step 3
: Pattern generation: The seed pattern is subject-verb-object and the initial verbs include “collect”, “use”, “retain”, and “disclose”.
Insert the subjects and the objects with frequencies higher than the median into the subject list and the object list, respectively and use them to find new pattern.Then, we look for other new patterns using subject-“allowed”-“access”-object pattern.
Slide701、Privacy Policy Analysis Module
Step 5
: Negation analysis: PPChecker determines whether a sentence is negative by checking the existence of negation words.
Information elements extraction
: main verb, action executor, resource, and constraint.
Slide802、Static Analysis Module
Collected information and retained information.
Android property graph (APG) ,abstract syntax tree (AST), interprocedure controlflow graph (ICFG), method call graph (MCG), and system dependency graph (SDG) of the app.
Collected information:
Detect invocation of each sensitive API by querying the graph database.
Retained information: Static taint analysis.
Slide903、Problem identification module
1: Detecting incomplete privacy policy through description and code
2: Discovering Incorrect Privacy Policy
3: Revealing Inconsistent Privacy Policy:
(1) AppSenti’s and LibSentj ’s main verbs belong to the same category ( V P
collect
, V P
use
, V Pretain, or V Pdisclose);(2) AppSenti is a negative sentence and LibSentj is a positive sentence;
(3) AppSent
i
and LibSent
j
refer to the same resource.
Slide1004、Other papers
Automated Analysis of Privacy Requirements for Mobile Apps
(NDSS 17)(AAAI 16)
In this study we introduce a scalable system to help analyze and predict Android apps’ compliance with privacy requirements. Our analysis of 17,991 free Android apps shows the viability of combining machine learning-based privacy policy analysis with static code analysis of apps.
opp115
Policy checking->static analysis->identify and analyze potential privacy requirement inconsistencies between policies and apps->predict such potential inconsistencies based on app metadata(Top Developer badge)
71% (6,198/8,696) apps without a policy link are indeed not adhering to the policy requirement.
apps with recent update years have more often a policy than those that were updated longer ago.
Apps with high install rates have more often a policy than apps with average/low install rates.Top Developer badge/for young usersclassifier:OPP115->Using information gain and tf-idf we identified the most meaningful keywords for each practice and created sets of keywords->extract all sentences from a policy that contain at least one of the keywords->second set of keywords that refers to the actions of a data practice->”share”:”will/not share”->SVM and LR
For each app our system builds an API invocation map->check the package names of the callers against the package names of third party libraries(10) to detect sharing of data->Only if the analysis detects that the library has the required permission (permission extraction), the app is classified as sharing device IDs with third parties->
Slide1104、Other papers
The Creation and Analysis of aWebsite Privacy Policy Corpus
(ACL 2016)
We monitored Google Trends (Google, 2015) for one month (May 2015) to collect the top five search queries for each trend.
Then, for each query we retrieved the first five websites listed on each of the first 10 pages of results. The annotation scheme was then applied to additional policies and
refined over multiple iterations during discussions
among experts.
A
utomatically assign category labels to policy segments: a binary vector of category specificlabels per segments->logistic regression ,
SVM
and HMM.
Slide1204、Other papers
CLAUDETTE: an Automated Detector of Potentially Unfair Clauses in Online Terms of Service
We present an experimental study where machine learning is employed to automatically detect such potentially unfair clauses.
Slide1304、Other papers
Polisis: Automated Analysis and Presentation of Privacy Policies Using Deep Learning
It enables scalable, dynamic, and multi-dimensional queries on natural language privacy policies.
(1) an unsupervised stage, in which we build domain-specific word vectors (i.e., word embeddings) for privacy policies from unlabeled data, and (2) a supervised stage, in which we train anovel hierarchy of privacy-text classifiers, based on neural networks, that leverages the word vectors.
opp115
Slide1404、Other papers
A Machine Learning Solution to Assess Privacy Policy Completeness
We define a set of privacy categories that the policy should cover based on privacy directives, regulations and common prac-tice, then use text categorization and machine learning techniques to check which paragraphs in the natural language privacy policy belong to which category, and grade the policy based on the categories covered.
A high completeness grade only meansthe policy covers the most of the categories, but says nothing about their semantic value.
We selected Na¨ıve Bayes (NB), Linear Support Vector Machine (LSVM), and Ridge Regression (Ridge) from the‘linear’ algorithms and the k-Nearest Neighbor (k-NN), the Decision Tree (DT), and the Support Vector Machine (SVM)with non-linear Kernel from the ‘non-linear’ algorithms. The voting committee method combines theresults of different classifiers, into a voting committee.
In our context, the pre-classified documents are paragraphsfrom manually labeled privacy policies.
In defining the privacy categories we considered differentprivacy regulations and directives, such as the EU 95/46/EC...