and Distributional Semantics Between Syntactic Structures and Compositional Distributional Semantics Fabio Massimo Zanzotto ART Group Dipartimento di Ingegneria dellImpresa ID: 628409
Download Presentation The PPT/PDF document "Distributed Tree Kernels" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Distributed Tree Kernels and Distributional Semantics:Between Syntactic Structures and Compositional Distributional Semantics
Fabio Massimo ZanzottoART GroupDipartimento di Ingegneria dell’ImpresaUniversity of Rome ”Tor Vergata”Slide2
PrequelSlide3
Recognizing Textual Entailment (RTE)
The task
(Dagan et al., 2005)
Given
a text T and an
hypothesis H, decide if T implies H
T
1
H
1
“
Farmers feed cows animal extracts”
“Cows eat animal extracts”
P
1
: T
1 H1
RTE as a classification taskSelecting the best learning algorithmDefining the feature space Slide4
Recognizing Textual Entailment (RTE)
T1H1
“
Farmers
feed
cows animal extracts”“Cows eat animal extracts”
P
1
: T
1 H1Slide5
Learning RTE Classifiers: the feature space
T1H1
“
Farmers
feed
cows animal extracts”“Cows eat animal extracts”
P
1
: T
1 H1
T
2H2
“They feed dolphins fish”“Fish eat
dolphins”
P
2
: T2
H2T3
H
3
“
Mothers
feed
babies
milk”
“
Babies
eat
milk”
P
3
:
T
3
H
3
Training
examples
Classification
Relevant
Features
Rules
with
Variables
(First-
order
rules
)
feed
eat
X
Y
X
Y
feed
eat
X
Y
Y
X
feed
eat
X
Y
X
YSlide6
Average
Precision
Accuracy
First Author (Group)
80.8%
75.4%
Hickl
(LCC)
71.3%
73.8%
Tatu
(LCC)
64.4%
63.9%
Zanzotto (Milan & Rome)
62.8%
62.6%
Adams (Dallas)
66.9%
61.6%
Bos (Rome & Leeds)
Learning RTE
Classifiers
: the
feature
space
S
N
P
V
P
VB
NP
X
Y
eat
V
P
VB
NP
X
feed
NP
Y
Rules
with
Variables
(First-
order
rules
)
feed
eat
X
Y
X
Y
Zanzotto&Moschitti
,
Automatic
learning
of
textual
entailments
with cross-
pair
similarities
,
Coling
-ACL,
2006
RTE 2
ResultsSlide7
Adding semanticsDistributional Semantics
Mehdad, Moschitti, Zanzotto, Syntactic/Semantic Structures for Textual Entailment Recognition, Proceedings of NAACL, 2010
NP
VP
VB
NP
X
S
NP
VP
VB
S
X
killed
died
NP
VP
VB
NP
X
NP
VP
VB
X
murdered
died
S
S
Promising
!!!
Distributional
SemanticsSlide8
Compositional Distributional Semantics (CDS)Mitchell&Lapata (2008) set a general model for bigrams that assigns a distributional meaning
to a sequence of two words “x y”:R is the relation between x and y
K
is
an
external knowledge
An
active
research area!Slide9
Compositional
Distributional
Semantics (CDS)
hands
car
moving
moving
hands
moving
car
A
“
distributional
”
semantic
space
Composing
“
distributional
”
meaningSlide10
Compositional Distributional Semantics (CDS)
Mitchell&Lapata (2008) set a general model for bigrams that assigns a distributional
meaning
to a
sequence
of
two words “x y”:R is the relation between x and yK is an external knowledge
hands
moving
y
x
moving
hands
z
fSlide11
Matrices AR and BR can be estimated with:positive examples taken from dictionariesmultivariate regression
modelsCDS: Full Additive ModelThe full additive model
Zanzotto, Korkontzelos, Fallucchi,
Manandhar
,
Estimating
Linear Models for Compositional Distributional
Semantics, Proceedings of the 23rd COLING, 2010contact /ˈkɒntækt/ [
kon-takt] 2. close interactionSlide12
CDS: Recursive Full Additive Modeleatcows extracts
animal
VN
VN
N
N
f(
=f(
=
=
Let’s
scale
up to
sentences
by
recursively
applying
the
model!
Let’s
apply
it
to RTE
Extremely
poor
results
Slide13
Recursive Full Additive Model: a closer look«chickens eat beef extracts»
«cows eat animal
extracts
»
…
f
f
…
evaluating
the
similarity
Ferrone&Zanzotto
,
Linear Compositional Distributional Semantics and Structural Kernels
,
Proceedings
of
Joint Symposium of
Semantic
Processing, 2013Slide14
Ferrone&Zanzotto, Linear Compositional Distributional Semantics and Structural Kernels, Proceedings of Joint Symposium of Semantic Processing, 2013Recursive Full Additive Model: a closer look
structure
meaning
structure
meaning
<
1
?
structure
meaning
Slide15
The prequel …
structure
meaning
Recognizing
Textual
Entailment
Feature
Spaces
of the
Rules with
Variables
adding distributional semantics
Distributional
Semantics
Binary CDS
Recursive CDS
Slide16
Distributed Tree Kernels
structure
Slide17
Zanzotto&Dell'Arciprete, Distributed Tree Kernels, Proceedings of ICML, 2012
Tree
Kernels
VP
VB
NP
NP
S
NP
NNS
VP
VB
NP
feed
NP
NNS
cows
NN
NNS
animal
extracts
S
NP
NNS
Farmers
VP
VB
NP
NP
S
NP
NNS
Farmers
…
…
…
…
…
…
T
t
i
t
j
structure
Slide18
Tree
Kernels
in
Smaller
Vectors
VP
VB
NP
NP
S
NP
NNS
VP
VB
NP
feed
NP
NNS
cows
NN
NNS
animal
extracts
S
NP
NNS
Farmers
VP
VB
NP
NP
S
NP
NNS
Farmers
…
…
…
…
…
…
T
t
i
t
j
…
…
…
CDS desiderata
-
Vectors
are
smaller
-
Vectors
are
obtained
with a
Compositional
Function
Zanzotto&Dell'Arciprete
,
Distributed
Tree
Kernels
,
Proceedings
of
ICML,
2012Slide19
Zanzotto&Dell'Arciprete, Distributed Tree Kernels, Proceedings of ICML, 2012Names for the «Distributed» World
…
…
…
Distributed
Trees
(
DT)
Distributed
Tree
Fragments
(
DTF)
D
istributed
T
ree
K
ernels
(
DTK)
As
we
are
encoding
trees
in small
vectors
,
the
tradition
is
distributed
structures
(
Plate
, 1994)Slide20
Compositionally building Distributed Tree FragmentsDistributed Tree Fragments are a nearly orthonormal base of Rd
Distributed Trees can be efficiently computedDTKs shuold
approximate
Tree
KernelsDTK: Expected properties and challenges
Zanzotto&Dell'Arciprete, Distributed Tree Kernels, Proceedings of ICML, 2012
Slide21
Compositionally building Distributed Tree FragmentsDistributed Tree Fragments are a nearly orthonormal base that embeds Rm in R
d Distributed Trees can be
efficiently
computed
DTKs
shuold approximate Tree KernelsDTK: Expected properties and challenges
Zanzotto&Dell'Arciprete
,
Distributed
Tree
Kernels, Proceedings of
ICML, 2012
Slide22
Compositionally building Distributed Tree FragmentsBasic elementsN a set of nearly orthogonal random vectors for node labels
a basic vector composition function with some ideal properties A distributed tree
fragment
is
the application of the composition function on the node vectors, according to the order given by a depth first visit of the tree.
Zanzotto&Dell'Arciprete
,
Distributed
Tree
Kernels, Proceedings of ICML, 2012Slide23
Building Distributed Tree FragmentsProperties of the Ideal function
Non-commutativity with a very high degree k
Non-
associativity
Bilinearity
Approximation
we demonstrated DTF are a nearly
orthonormal base
Zanzotto&Dell'Arciprete
, Distributed
Tree Kernels, Proceedings of ICML, 2012
Slide24
Compositionally building Distributed
Tree FragmentsDistributed Tree Fragments
are a
nearly
orthonormal
base that embeds Rm in Rd
Distributed Trees can be efficiently computed
DTKs
shuold approximate Tree
KernelsDTK: Expected properties and
challenges
Zanzotto&Dell'Arciprete, Distributed Tree Kernels, Proceedings of ICML, 2012
Slide25
Building Distributed TreesGiven a tree T, the distributed representation of its subtrees is the vector:where S(T) is the set of the subtrees of T
VP
VB
NP
NP
S
NP
NNS
VP
VB
NP
feed
NP
NNS
cows
NN
NNS
animal
extracts
S
NP
NNS
Farmers
VP
VB
NP
NP
S
NP
NNS
Farmers
…
S
(
) = {
,
}
Zanzotto&Dell'Arciprete
,
Distributed
Tree
Kernels
,
Proceedings
of
ICML,
2012Slide26
Building Distributed TreesA more efficient approachN(T) is the set of nodes of Ts(n) is defined as
:
if
n
is terminal
if
nc1…cm
Computing a Distributed
Tree
is
linear with
respect to the size of N(T)Zanzotto&Dell'Arciprete, Distributed Tree
Kernels, Proceedings of ICML, 2012Slide27
Building Distributed TreesA more efficient approach
Assuming the ideal basic composition function , it is possible to show that it exactly computes:
(
see
Theorem
1 in the paper)
Zanzotto&Dell'Arciprete, Distributed Tree Kernels, Proceedings of ICML, 2012Slide28
Compositionally
building Distributed Tree Fragments
Distributed
Tree
Fragments
are a nearly orthonormal base that embeds Rm in Rd
Distributed
Trees
can be efficiently computed
DTKs shuold approximate
Tree KernelsDTK: Expected properties and challengesProperty 1 (Nearly Unit Vectors)
Property 2 (Nearly Orthogonal Vectors)
Zanzotto&Dell'Arciprete
,
Distributed
Tree
Kernels, Proceedings of ICML, 2012Slide29
Experimental evaluationEvaluation of Concrete Composition Functions: How well can concrete composition functions approximate ideal function ?Direct Analysis: How well do DTKs approximate the original tree kernels (TKs)?Task-based Analysis: How well do DTKs perform on actual NLP tasks, with respect to TKs?
Vector dimension = 8192
Zanzotto&Dell'Arciprete
,
Distributed
Tree
Kernels, Proceedings of ICML, 2012Slide30
Towards the reality: Approximating is an ideal function!Proposed approximations:Shuffled normalized element-wise product
Shuffled circular convolution
It
is
possible
to show
that
properties of statistically hold for the
two approximations
Zanzotto&Dell'Arciprete, Distributed Tree Kernels, Proceedings of ICML, 2012Slide31
Direct Analysis for zSpearman’s correlation between DTK and TK valuesTest trees taken from QC corpus and RTE corpus
Zanzotto&Dell'Arciprete
,
Distributed
Tree
Kernels, Proceedings of ICML, 2012Slide32
Task-based Analysis for x
Question Classification
Recognizing
Textual
EntailmentZanzotto&Dell'Arciprete, Distributed Tree Kernels, Proceedings of ICML, 2012
with these realizations of the ideal function :Shuffled normalized element-wise
product
Shuffled circular
convolution
Slide33
Remarks
…
…
…
Distributed
Trees
(
DT)
Distributed
Tree
Fragments
(
DTF)
D
istributed
T
ree Kernels (DTK)
are
a
nearly
orthonormal
base
that
embeds
R
m
in
R
d
can
be
efficiently
computed
approximate
Tree
KernelsSlide34
Side effect: reduced time complexityTree kernels (TK) (Collins & Duffy, 2001) have quadratic complexityCurrent techniques control this complexity (Moschitti, 2006), (Rieck et al., 2010), (Shin et al.,2011)DTKs
change the complexity as they can be used with Linear Kernel Machines
SVM+TK
LinearSVM+DTK
Shuf
.
Prod
.
Shuf
. Circ.
Conv
.
Training
O(n
3
|N
(T)|
2)
O(n|N(T)|d)O(n|N
(T)|d
log d
)
Classification
O(
n
|N
(T)|
2
)
O(|N(T)|d)
O(|N(T)|d
log d
)
n: # of training
examples
|N(T)|: # of
nodes
of the
tree
T
Zanzotto&Dell'Arciprete
,
Distributed
Convolution
Kernels
on
Countable
Sets
, Journal of Machine Learning Research,
Accepted
conditioned
to minor
revisionsSlide35
SequelTowards Structured Prediction: Distributed Representation
ParsingGeneralizing the theory:Distributed Convolution Kernels on Countable
Sets
Adding
back
distributional
semantics:Distributed Smoothed Tree KernelsSlide36
SequelTowards Structured Prediction : Distributed
Representation ParsingGeneralizing the theory:Distributed
Convolution
Kernels
on
Countable SetsAdding back distributional semantics:Distributed Smoothed
Tree KernelsSlide37
Distributed Representation Parsing (DRP): the idea
Distributed Tree Encoder
(DT)
Zanzotto&Dell'Arciprete
,
Transducing
Sentences to Syntactic Feature Vectors: an Alternative Way to "Parse"?
, Proceedings of the
ACL-Workshop
on
CVSC, 2013Slide38
Zanzotto&Dell'Arciprete, Transducing Sentences to Syntactic Feature Vectors: an Alternative Way to "Parse"?, Proceedings of the ACL-Workshop on CVSC, 2013Distributed Representation Parsing (DRP): the idea
Symbolic
Parser
(SP)
Distributed
Tree
Encoder
(DT)
«
We
booked the flight»
Distributed
Representation
Parsing (DRP)
Sentence Encoder(D)
Transducer
(P)Slide39
Zanzotto&Dell'Arciprete, Transducing Sentences to Syntactic Feature Vectors: an Alternative Way to "Parse"?, Proceedings of the ACL-Workshop on CVSC, 2013Non-Lexicalized Sentence ModelsBag-of-postagsN-grams of postags
Lexicalized Sentence ModelsUnigramsUnigrams + N-grams of postagsDRP: Sentence Encoder
«
We
booked
the
flight
»
Distributed
Representation
Parsing
(DRP)
Sentence
Encoder
(D)
Transducer
(P)Slide40
Zanzotto&Dell'Arciprete, Transducing Sentences to Syntactic Feature Vectors: an Alternative Way to "Parse"?, Proceedings of the ACL-Workshop on CVSC, 2013Estimation: Principal Component Analysis and Partial Least Square Estimation
T=PSApproximation: Moore-Penrose pseudoinverse to derive (Penrose, 1955) where k is the number of selected singular values DRP: Transducer
«
We
booked
the
flight
»
Distributed
Representation
Parsing
(DRP)
Sentence
Encoder
(D)
Transducer
(P)Slide41
DataEnglish Penn Treebank with standard splitDistributed trees with 3 l (0, 0.2, 0.4) and 2 models Unlexicalized/LexicalizedDimension of the reduced space (4,096 and 8,192)System ComparisonDistributed Symbolic Parser DSP(s) = DT(SP(s))Symbolic Parser: Bikel Parser (Bikel, 2004) with Collins Settings (Collins, 2003)Parameter Estimation
Parametersk for the pseudo-inversej for the sentence encoders DMaximization of the similarity (see parsing performance) on Section 24Experimental set-up
Zanzotto&Dell'Arciprete
,
Transducing
Sentences to Syntactic Feature Vectors: an Alternative Way to "Parse"?
, Proceedings of the ACL-Workshop on CVSC, 2013Slide42
Evaluation Measure«Distributed» Parsing Performance
Unlexicalized
Trees
Lexicalized
TreesZanzotto&Dell'Arciprete, Transducing Sentences to Syntactic Feature Vectors: an Alternative Way to "Parse"?, Proceedings of the ACL-Workshop
on CVSC, 2013Slide43
SequelTowards Structured
Prediction : Distributed Representation Parsing
Generalizing
the
theory
:
Distributed Convolution Kernels on Countable SetsAdding back distributional semantics:Distributed Smoothed
Tree KernelsSlide44
The following general property holds:
w
here
CK
is
a convolution kernelDCK is the related distributed convolution kernelImplemented Distributed
Convolution KernelsDistributed Tree KernelDistributed Subpath Kernel
Distributed Route
KernelDistributed String
KernelDistributed Partial Tree Kernel
Distributed Convolution Kernels on Contable Sets
Zanzotto&Dell'Arciprete, Distributed Convolution Kernels on Countable Sets, Journal of Machine Learning Research, Accepted conditioned
to minor revisionsSlide45
SequelTowards Structured
Prediction : Distributed Representation Parsing
Generalizing
the
theory
:
Distributed Convolution Kernels on Countable SetsAdding back distributional semantics:Distributed
Smoothed Tree KernelsSlide46
Going back to RTE and distributional semanticsMehdad, Moschitti, Zanzotto, Syntactic/Semantic Structures for
Textual Entailment Recognition, Proceedings of NAACL, 2010NP
VP
VB
NP
X
S
NP
VP
VB
S
X
killed
died
NP
VP
VB
NP
X
NP
VP
VB
X
murdered
died
S
S
Promising
!!!
Distributional
SemanticsSlide47
Ferrone&Zanzotto, Linear Compositional Distributional Semantics and Structural Kernels, Proceedings of Joint Symposium of Semantic Processing, 2013A Novel Look at the Recursive Full Additive Model
structure
meaning
structure
meaning
<
1
?
structure
meaning
Slide48
A Novel Look at the Recursive Full Additive Model
Zanzotto, Ferrone, Baroni,
When the whole is not greater than the sum of its parts:
A
decompositional
look at compositional distributional semantics
, re-
submitted
if
Struct
i
=
Struct
j
if
Struct
i
≠Struct
j
Choosing
:
Slide49
«Convolution Conjecture»
Compositional Distributional Models based on linear algebra and Convolution Kernels are intimately related
Zanzotto, Ferrone, Baroni,
When the whole is not greater than the sum of its parts:
A
decompositional
look at compositional distributional semantics
, re-
submitted
For example:
Convolution Kernel
Recursive Full Additive Model
The
similarity equations
between two vectors/tensors obtained with CDSMs
can be
decomposed
into operations performed on the subparts of the input
phrases.Slide50
Distributed Smoothed Tree KernelsNPVP
VBNP
S
killed
NP
VP
VB
NP
murdered
S
:
murdered
:
killed
NP
VP
VB
NP
S
:
killed
synt
(
) =
NP
VP
VB
NP
S
NP
VP
VB
NP
S
:
killed
head(
) =
killed
Ferrone, Zanzotto,
Towards
Syntax-aware
Compositional
Distributional
Semantic
Models
,
Proceedings
of
CoLing
, 2014Slide51
In general, for a lexicalized tree:we defineDistributed Smoothed Tree Kernels
Ferrone, Zanzotto,
Towards
Syntax-aware
Compositional Distributional Semantic Models, Proceedings of CoLing, 2014Slide52
Distributed Smoothed Tree Kernels
Ferrone, Zanzotto,
Towards
Syntax-aware
Compositional
Distributional
Semantic
Models
,
Proceedings
of
CoLing
, 2014
Distributed
Smoothed
Tree
The
resulting
dot (
frobenius
)
productSlide53
What’s next...Slide54
Recognizing
Textual
Entailment
Feature
Spaces of the Rules with Variablesadding
distributional
semantics
Distributional
Semantics
Binary
CDS
Tree
Kernels
D
istributed
T
ree
K
ernels
D
istributed
R
epresentation
P
arsing
D
istributed
C
onvolution
K
ernels
on
Countable
Sets
structure
meaning
Recursive CDS
Prequel
SequelSlide55
Recognizing
Textual
Entailment
Feature
Spaces of the Rules with Variablesadding
distributional
semantics
Distributional
Semantics
Binary
CDS
Tree
Kernels
D
istributed
T
ree
K
ernels
D
istributed
R
epresentation
P
arsing
D
istributed
C
onvolution
K
ernels
on
Countable
Sets
structure
meaning
Recursive CDS
Prequel
Sequel
Adding
back
distributional
meaning
D
istributed
C
onvolution
K
ernels
on
Countable
Sets
Lexicalized
D
istributed
R
epresentation
P
arsing
What’s
next
Distributed
Tree
Kernels
and
Distributional
Semantics:
Between
Syntactic
Structures
and
Compositional
Distributional
SemanticsSlide56
ApplicationsIndexing structured information for Fast Syntax-aware Information RetrievalSemantic Text SimilarityFast Document SummarizationIndexing structured information for XML Information RetrievalAny other suggestion? AcceleratorOptimizing
the code with GPU programming (CUDA)After what’s next… what’s for?Slide57
Lorenzo Dell’ArcipreteMarco PennacchiottiAlessandro MoschittiYashar MehdadIoannis KorkontzelosLorenzo FerroneCode Distributed Tree Kernels and Distributed Convolution Kernels:
http://code.google.com/p/distributed-tree-kernels/CreditsSlide58Slide59
Distributed
Tree Kernels
Compositional
Distributional
Semantics
Brain&Computer
VP
VB
NP
NP
S
C
N
F
VB
NP
NP
S
VP
Slide60
Distributed Tree KernelsZanzotto, F. M. & Dell'Arciprete, L. Distributed Tree Kernels, Proceedings of International Conference on Machine Learning, 2012Tree Kernels and Distributional
SematicsMehdad, Y.; Moschitti, A. & Zanzotto, F. M. Syntactic/Semantic Structures for Textual Entailment Recognition, Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, 2010Compositional Distributional SemanticsZanzotto, F. M.; Korkontzelos, I.; Fallucchi, F. & Manandhar, S. Estimating Linear
Models
for
Compositional
Distributional Semantics, Proceedings of the 23rd International Conference on Computational Linguistics (COLING), 2010Distributed and Distributional Tree KernelsZanzotto, F. M. & Dell'arciprete, L. Distributed Representations and Distributional Semantics, Proceedings of the ACL-HLT 2011 workshop on Distributional Semantics and Compositionality (DiSCo), 2011
If you want to read more…Slide61
Initial IdeaZanzotto, F. M. & Moschitti, A. Automatic learning of textual entailments with cross-pair similarities, ACL-44: Proceedings of the 21st International Conference on
Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, 2006First refinement
of the
algorithm
Moschitti, A. & Zanzotto, F. M. Fast and
Effective
Kernels for Relational Learning from Texts, Proceedings of 24th Annual International Conference on Machine Learning, 2007Adding shallow semanticsPennacchiotti, M. & Zanzotto, F. M. Learning Shallow Semantic Rules for Textual Entailment,
Proceeding of International Conference RANLP - 2007, 2007A comprehensive descriptionZanzotto, F. M.; Pennacchiotti, M. & Moschitti, A. A Machine Learning Approach to Textual
Entailment
Recognition, NATURAL LANGUAGE ENGINEERING, 2009
My first lifeLearning Textual Entailment
Recognition SystemsSlide62
Adding Distributional SemanticsMehdad, Y.; Moschitti, A. & Zanzotto, F. M. Syntactic/Semantic Structures for Textual Entailment Recognition, Human Language Technologies: The 2010 Annual
Conference of the North American Chapter of the Association for Computational Linguistics, 2010A valid kernel with an
efficient
algorithm
Zanzotto, F. M. & Dell'Arciprete, L.
Efficient kernels for sentence pair classification, Conference on Empirical Methods on Natural Language Processing, 2009Zanzotto, F. M.; Dell'arciprete, L. & Moschitti, A. Efficient Graph Kernels for Textual Entailment Recognition, FUNDAMENTA INFORMATICAEApplications
Zanzotto, F. M.; Pennacchiotti, M. & Tsioutsiouliklis, K. Linguistic Redundancy in Twitter, Proceedings of 2011 Conference on Empirical
Methods on Natural Language Processing (
EmNLP), 2011
Extracting RTE CorporaZanzotto, F. M. & Pennacchiotti, M. Expanding textual entailment corpora from Wikipedia using co-training, Proceedings of the COLING-Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources,
2010Learning Verb RelationsZanzotto, F. M.; Pennacchiotti, M. & Pazienza, M. T. Discovering asymmetric entailment relations between verbs using
selectional preferences, ACL-44: Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational LinguisticsMy first lifeLearning Textual Entailment Recognition SystemsSlide63
Zanzotto, F. M. & Croce, D. Comparing EEG/ERP-like and fMRI-like Techniques for Reading Machine Thoughts, BI 2010: Proceedings of the Brain Informatics Conference - Toronto, 2010Zanzotto, F. M.; Croce, D. & Prezioso, S. Reading what
Machines "Think": a Challenge for Nanotechnology, Joint Conferences on Avdanced Materials, 2009Zanzotto, F. M. & Croce, D. Reading
what
machines
"
think", BI 2009: Proceedings of the Brain Informatics Conference - Bejing, China, October 2009Prezioso, S.; Croce, D. & Zanzotto, F. M. Reading what machines "think": a challenge for nanotechnology, JOURNAL OF COMPUTATIONAL AND THEORETICAL NANOSCIENCE, 2011Zanzotto, F. M.; Dell'arciprete, L. & Korkontzelos, Y. Rappresentazione distribuita e semantica distribuzionale dalla prospettiva dell'Intelligenza Artificiale, TEORIE & MODELLI, 2010
My second lifeParallels between Brains and ComputersSlide64
Thank you for the attentionSlide65
Structured
Feature
Spaces
:
Dimensionality Reduction
VP
VB
NP
NP
S
NP
NNS
VP
VB
NP
feed
NP
NNS
cows
NN
NNS
animal
extracts
S
NP
NNS
Farmers
VP
VB
NP
NP
S
NP
NNS
Farmers
…
…
…
…
…
…
T
t
i
t
j
…
…
…
Traditional Dimensionality Reduction Techniques
Singular Value Decomposition
Random Indexing
Feature Selection
Not
applicableSlide66
Computational Complexity of DTKn size of the treek selected tree fragmentsq
w reducing factorO(.) worst-case complexityA(.) average-case complexitySlide67
Time Complexity AnalysisDTK time complexity is independent of the tree sizes!Slide68
OutlineDTK: Expected properties and challengesModel:Distributed Tree FragmentsDistributed TreesExperimental evaluation
RemarksBack to Compositional Distributional SemanticsFuture WorkSlide69
Towards Distributional Distributed TreesDistributed Tree FragmentsNon-terminal nodes n: random vectorsTerminal nodes w: random vectorsDistributional Distributed Tree FragmentsNon-terminal nodes n: random vectorsTerminal nodes w:
distributional vectorsCaveat: Property 2Random vectors
are nearly orthogonal
Distributional vectors
are not
Zanzotto&Dell‘Arciprete
,
Distributed Representations and Distributional Semantics, Proceedings of the ACL-workshop
DiSCo, 2011Slide70
Experimental Set-upTask Based Comparison:Corpus: RTE1,2,3,5Measure: AccuracyDistributed/Distributional Vector Size: 250Distributional Vectors: Corpus: UKWaC (Ferraresi et al., 2008)LSA: applied with k=250
Zanzotto&Dell‘Arciprete, Distributed Representations and Distributional Semantics, Proceedings of the ACL-workshop DiSCo, 2011Slide71
Accuracy Results
Zanzotto&Dell‘Arciprete, Distributed Representations and Distributional Semantics, Proceedings of the ACL-workshop DiSCo, 2011Slide72
Adding semanticsShallow semanticsPennacchiotti&Zanzotto, Learning Shallow Semantic Rules for Textual Entailment
, Proceeding of RANLP, 2007T
H
“
For my younger readers,
Chapman killed
John Lennon more than twenty years ago.”“John Lennon died more than twenty years ago.”
T
H
Learning example
NP
VP
VB
NP
Y
X
S
NP
VP
VB
Y
S
X
A
generalized
rule
causes
cs
cs
killed
died
Variables
with
TypesSlide73
Empirical Evaluation of PropertiesNon-commutativityDistributivity over the sumNorm preservationOrthogonality preservation
OK
OK
?
?Slide74
Symbolic Parser
(SP)
Distributed
Tree
Encoder
(DT)
«
We
booked
the
flight»
Distributed Representation Parsing (DRP)
Sentence
Encoder
(D)
Transducer
(P)