from the Web Writers Immanuel Trummer Alon Halevy Hongrae Lee Sunita Sarawagi Rahul Gupta Presenting Amir Taubenfeld Outline for Todays Lecture Motivation Search future is in structured data ID: 533003
Download Presentation The PPT/PDF document "Mining Subjective Properties" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Mining Subjective Propertiesfrom the Web
Writers:
Immanuel Trummer, Alon Halevy, Hongrae Lee,
Sunita
Sarawagi
, Rahul
Gupta
Presenting:
Amir TaubenfeldSlide2
Outline for Today’s Lecture
Motivation: Search future is in structured data
Introduction
to The
Surveyor
System
Getting into the details:
Extracting
subjective properties from the web and
polarity
of
statements
Determine the
dominant opinion
of the authors of the Web
Experimental Evaluation & conclusionSlide3
Outline for Today’s Lecture
Motivation: Search future is in structured data
Introduction
to The
Surveyor System
Getting into the details:
Extracting
subjective properties from the web and
polarity
of statements
Determine the
dominant opinion
of the authors of the Web
Experimental Evaluation & conclusionSlide4
Answering queries with linksSlide5
Answering queries with links – Page Rank
Page Rank
the algorithm from which Google
began.
It calculates the probability that
a
person
randomly clicking on links will arrive
at
any
particular pageSlide6
Answering queries with links
Page Rank is awesome!
It shows us the best links for the words we are looking for, but it
does not understand
our queries.Slide7
Now: Answering queries from Structured DataSlide8
Now: Answering queries from Structured Data
We all remember
YAGO
from Databases courseSlide9
Now: Answering queries from Structured Data
But regular knowledge bases don’t capture
subjective
propertiesSlide10
Current limitation: Objective Queries
Need Subjective knowledge baseSlide11
Objective: Create a subjective knowledge base.
Main challenge:
No ground of truth - Need to aggregate many opinions.
Subjective Property MiningSlide12
Outline for Today’s LectureMotivation: Search future is in structured data
Introduction
to The
Surveyor System
Getting into the details:
Extracting
subjective properties from the web and
polarity
of statements
Determine the
dominant opinion
of the authors of the Web
Experimental Evaluation & conclusionSlide13
The Surveyor System
Specialized system for mining subjective properties from the Web
(
Surveyor
derived from
survey
)Slide14
Extract statements involving entities and
subjective properties
from the Web
Determine the
polarity
of each statement
Aggregate the results
Determine the
dominant opinion
System overview -
Course of actionSlide15
System overview – Extraction & aggregationSlide16
System overview – dominant opinion
Is it enough for concluding that kittens
are cute and tigers are not cuteSlide17
System overview – dominant opinion
Is it enough for concluding that kittens
are cute and tigers are not cute
Of course not, otherwise I wouldn’t
have asked this question. But why?Slide18
System overview – dominant opinion
(Tel-Aviv, Safe city)
pro: 5, contra: 10
Slide19
System overview – dominant opinion
(Tel-Aviv, Safe city)
pro: 5, contra: 10
Does it means that
Tel-Aviv
is not safe?Slide20
System overview – dominant opinion
(Tel-Aviv, Safe city)
pro: 5, contra: 10
Does it means that
Tel-Aviv
is not safe?
No! we must consider skewSlide21
System overview – dominant opinion
(
Ibtin
, Big city)
pro: 0, contra: 0
Can we take it into our advantage?Slide22
System overview – dominant opinion
Can we take it into our advantage?Slide23
System overview – dominant opinion
Can we take it into our advantage?
Big cities tend to be mentioned more
Often on the Web than small citiesSlide24
System overview – dominant opinion
Example: Taking skew and correlation
into account improves the model.Slide25
System overview – dominant opinion
Conclusion I:
Skew &correlation
exists, and we can use
them for our advantage.
Conclusion II:
Skew
and
correlation are
property
& type
specific.
Slide26
System overview – Putting it
togetherSlide27
Outline for Today’s LectureMotivation: Search future is in structured data
Introduction
to The
Surveyor System
Getting into the details:
Extracting
subjective properties from the web and polarity
of statements
Determine the
dominant opinion
of the authors of the Web
Experimental Evaluation & conclusionSlide28
Getting into the details of Surveyor
The Surveyor system can be divided into two main algorithms:
Extracting
evidence
about entities from the web.
Evidence = Statement
connecting an entity to a property
Aggregating
the evidences from previous step and determine the
dominant opinionSlide29
Outline for Today’s LectureMotivation: Search future is in structured data
Introduction
to The
Surveyor System
Getting into the details:
Extracting
subjective properties from the web and
polarity
of statements
Determine the
dominant opinion
of the authors of the Web
Experimental Evaluation & conclusionSlide30
Input: Collection of annotated web documentsKnowledge base containing entities and their types.
Output:
Set of tuples
where
.
.
.
Extracting evidences – problem definitionSlide31
Surveyor system receives as an input a web snapshot was preprocessed with NLP methods such as Stanford Parser.
The output of such parsers is a
dependencies tree
that represents the lexical structure of the sentence.
Extracting evidences - annotating documentsSlide32
Natural language processing is a very interesting story. Unfortunately it is also a very
long
story, thus we will only discuss it in a
nutshell
60
seconds on NLPSlide33
60
seconds on NLP
Natural language processing refers to the ability of computers to process text in
natural human language
such as Hebrew, rather than artificial language such as
JavaSlide34
60
seconds on NLP
In order to do that we need to parse natural language text into a more
formal representation
, usually this representation is a
treeSlide35
One basic model uses a probabilistic CFG in order to create a parsing tree that
represents a formal
structure of a given
sentence
60
seconds on NLPSlide36
Each derivation rule have a different probability,and the goal is to find the parsing tree with the
highest total probability.
Naïve algorithm has
an
exponential
running
time, but using DP we
can get
polynomial
complexity
60
seconds on NLPSlide37
Back to the paperSlide38
Extracting evidences - matching patterns
Red = Tokens that together form the property.
Green = The entities.Slide39
Extracting evidences – filtering
New York is bad
France is warm
Greece is a southern country
for
parking
SouthernSlide40
Extracting evidences – filtering
New York is bad
France is warm
Greece is a southern country
for
parking
SouthernSlide41
Extracting evidences – filtering
New York is bad
France is warm
Greece is a southern country
Solution: Check for subtrees that can represent
constrictions.
for
parking
SouthernSlide42
Extracting evidences – filtering
New York is bad
France is warm
Greece is a southern country
Solution: Check for subtrees that can represent
constrictions.
for
parking
SouthernSlide43
Extracting evidences – filtering
New York is bad
France is warm
Greece is a southern country
Solution: Check for subtrees that can represent
constrictions.
Solution: Don’t allow co-reference to the same
entity.
for
parking
SouthernSlide44
Extracting evidences – determine polaritySlide45
Outline for Today’s LectureMotivation: Search future is in structured data
Introduction
to The
Surveyor System
Getting into the details:
Extracting
subjective properties from the web and polarity of statements
Determine the
dominant opinion
of the authors of the Web
Experimental Evaluation & conclusionSlide46
Estimating the dominant opinion -
problem definition
Input:
Knowledge base that links types to entities.
Set
of
tuples
(from stage 1)
where
Output:
The dominant of whether P applies to E
Slide47
Estimating the dominant opinion
As we saw previously, estimating the dominant
opinion based on the majority vote counting
does not work very well.
We must take into consideration different types
of biases.Slide48
Estimating the dominant opinion
Each property-type combination is associated with
two
probability distributions over the statement counters
.
The first distribution represent the probability for an
evidence, given that the
dominant
opinion
applies
the
property to the
entity
,
whereas in
the second
,
the
dominant opinion does
not apply.Slide49
Estimating the dominant opinion
Therefore, we assume that each evidence tuple was
drawn from
one of the two
possible probability
If we know how to express those two probability
distributions,
then we can calculate
for each
evidence tuple
the probability with which it was
drawn from one distribution or the
otherSlide50
Estimating the dominant opinionSlide51
Modeling user behavior
In
order to model the probability to receive a certain
number
of positive or negative statements, we must
model
the probability that a single user decides to
issue
a
positive or negative statement. Slide52
Modeling user behaviorSlide53
Modeling user behavior
Now, lets write our model as a
Bayesian networkSlide54
Modeling user behavior
Now, lets write our model as a
Bayesian network
But first, what is a Bayesian network?Slide55
Bayesian
networks - definition
From Wikipedia:
Model that represent a set of
random variables and their conditional
dependencies via a directed acyclic graph
Rain
Sprinkler
Grass wetSlide56
Bayesian
networks - exampleSlide57
Bayesian
networks - example
If we know that the grass is wet, we can calculate
the probability that it was raining.
Slide58
Bayesian
networks - example
We calculate each element in the sum by the tables
=
)
Slide59
Bayesian
networks - example
We calculate each element in the sum by the tables
=
)
Slide60
Modeling user behavior
Now we are ready to write our model as a
Bayesian Network
Slide61
Modeling user behavior
Dominant opinionSlide62
Modeling user behavior
Dominant opinion
User
n
opinion
User 1 opinion
…Slide63
Modeling user behavior
Dominant opinion
User
n
opinion
User 1 opinion
User 1 post opinion?
User
n
post opinion?
…
…Slide64
Modeling user behavior
Dominant opinion
User
n
opinion
User 1 opinion
User 1 post opinion?
User
n
post opinion?
# Pro statements
#
Contra statements
…
…Slide65
Modeling user behavior
Dominant opinion
User
n
opinion
User 1 opinion
User 1 post opinion?
User
n
post opinion?
# Pro statements
#
Contra statements
Given
To infer
…
…Slide66
Modeling user behavior
,
…
…
=
D
ominant opinion
= User’s
O
pinion
=
User makes
S
tatement
=
C
ount
Slide67
Modeling user behavior
,
=
D
ominant opinion
= User’s
O
pinion
=
User makes
S
tatement
=
C
ount
Slide68
Modeling user behavior
,
=
D
ominant opinion
= User’s
O
pinion
=
User makes
S
tatement
=
C
ount
Slide69
Modeling user behavior
Our goal is to compute
=
D
ominant opinion
= User’s
O
pinion
=
User makes
S
tatement
=
C
ount
Slide70
Modeling user behavior
Our goal is to compute
Which is from bayes:
=
D
ominant opinion
= User’s
O
pinion
=
User makes
S
tatement
=
C
ount
Slide71
Modeling user behavior
Our goal is to compute
Which is from bayes:
Because
is a deterministic function of
We first solve
=
D
ominant opinion
= User’s
O
pinion
=
User makes
S
tatement
=
C
ount
Slide72
Modeling user behavior
=
=
=
D
ominant opinion
= User’s
O
pinion
=
User makes
S
tatement
=
C
ount
And from the
Bayesian network
, we obtainSlide73
Modeling user behavior
The variables
are
obtained by summing up n
variables
each of which can be +,−, or neutral.
We
assume that the variables
are
independent
for
different
since the chances that two randomly
selected
documents on the Web are authored by the
same
person are negligible.
This
implies that
follows a
Multinomial
distribution
where
Where
=
Slide74
Modeling user behavior
And by assuming the n
is very big ,
compare to
,
we can estimate the distribution as a multiplication of
two Poisson distribution
Where
Slide75
Modeling user behavior
The
bottom line
is this:
If we know
, than we can use our Bayesian network to compute
two
expressions that
depend on
,
.
The first represent the probability distribution when
the dominant opinion is
positive
,
whereas the second,
represent
the
probability distribution when
the dominant opinion is
negative
Slide76
Modeling user behavior
Example: If we assume
,
The agreement parameter is relatively high,
and the probability to post a statement when the
dominant opinion is ‘+’, is significantly higher than when
it is ‘-’. We get the two distributions that we saw
on the big cities example
Slide77
Estimating model parameters
But how do we choose
,
?
Slide78
Estimating model parameters
But how do we choose
,
?
Well, it is kind of
maximum
likelihood
problemSlide79
Maximum Likelihood - Remainder
Method for
estimating the parameters
of a
statistical model given a set of sample data
Example: Estimating
Bernoulli
parameter.
Assumption:
Input:
Slide80
Estimating model parameters
But in our case we don’t have all the sample dataSlide81
Estimating model parameters
But in our case we don’t have all the sample dataSlide82
Estimating model
parameters - definitions
Set of tuples with positive/negative
E
vidence
count
Vector of random variables that represent the
possible
D
ominant
opinion for each entity
Vector
containing the parameters for our Bayesian
network (which we are trying to estimate)
Slide83
Estimating model parametersSlide84
Estimating model parametersSlide85
Estimating model parameters
Done using the Bayesian network we defined
Previously
Done by a special ML method that take into
account all possible sample data sets. Each data set
is weighted using the probabilities from the
previous step
6:
7:Slide86
Estimating model
parameters - definitions
The “
expectation maximization
” function which
we want to maximize:
Expansion of the ML function that considers all the data sets, each one with
different weight
The
weight
function is the probability function that we computed in the
previous step
Weight
Slide87
Estimating model parameters
Problem:
Exponential
number of terms Slide88
Estimating model parameters
Problem:
Exponential
number of terms
Thus, we will work with a summarized
version, that is
linear
in the number of entitiesSlide89
Estimating model parameters
Last step - differentiate and compare to 0Slide90
Estimating model parameters
Last step - differentiate and compare to 0
I will leave it for you as a home exerciseSlide91
Estimating model parameters
Last step - differentiate and compare to 0
I will leave it for you as a home exercise
Submit it to Schreiber
cell 777Slide92
Outline for Today’s LectureMotivation: Search future is in structured data
Introduction
to The
Surveyor System
Getting into the details:
Extracting
subjective properties from the web and polarity of statements
Determine the
dominant opinion
of the authors of the Web
Experimental Evaluation & conclusionSlide93
Surveyor was applied on a 40TB annotated web snapshotThe data processing pipeline was executed on a large cluster (5000 nodes) and took 2 hoursInferred dominant opinion for over 4 billion entity-property pairs
Experiment evaluationSlide94
Surveyor was applied on a 40TB annotated web snapshotThe data processing pipeline was executed on a large cluster (5000 nodes) and took 2 hoursInferred dominant opinion for over 4 billion entity-property pairs
Statistics Slide95
Selected 500 entity-property pairs: 5 types X 20 entities X 5 propertiesCompared against 20 AMT workers
Each worker was asked about each of the 500 entity property pairs. In total 10000 opinions
Experiment against AMT workersSlide96
Experiment against AMT workersSlide97
Experiment against AMT workersSlide98
Introduced a new problem of “Subjective Property Mining”There is a need for special type of systems to solve this problemIntroduced Surveyor system
ConclusionsSlide99