/
Distributed  Tree   Kernels Distributed  Tree   Kernels

Distributed Tree Kernels - PowerPoint Presentation

debby-jeon
debby-jeon . @debby-jeon
Follow
365 views
Uploaded On 2018-02-06

Distributed Tree Kernels - PPT Presentation

and Distributional Semantics Between Syntactic Structures and Compositional Distributional Semantics Fabio Massimo Zanzotto ART Group Dipartimento di Ingegneria dellImpresa ID: 628409

tree distributed kernels amp distributed tree amp kernels distributional zanzotto proceedings semantics arciprete dell nns compositional textual vectors entailment icml convolution 2012

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Distributed Tree Kernels" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Distributed Tree Kernels and Distributional Semantics:Between Syntactic Structures and Compositional Distributional Semantics

Fabio Massimo ZanzottoART GroupDipartimento di Ingegneria dell’ImpresaUniversity of Rome ”Tor Vergata”Slide2

PrequelSlide3

Recognizing Textual Entailment (RTE)

The task

(Dagan et al., 2005)

Given

a text T and an

hypothesis H, decide if T implies H

T

1

H

1

Farmers feed cows animal extracts”

“Cows eat animal extracts”

P

1

: T

1  H1

RTE as a classification taskSelecting the best learning algorithmDefining the feature space Slide4

Recognizing Textual Entailment (RTE)

T1H1

Farmers

feed

cows animal extracts”“Cows eat animal extracts”

P

1

: T

1  H1Slide5

Learning RTE Classifiers: the feature space

T1H1

Farmers

feed

cows animal extracts”“Cows eat animal extracts”

P

1

: T

1  H1

T

2H2

“They feed dolphins fish”“Fish eat

dolphins”

P

2

: T2

 H2T3

H

3

Mothers

feed

babies

milk”

Babies

eat

milk”

P

3

:

T

3

H

3

Training

examples

Classification

Relevant

Features

Rules

with

Variables

(First-

order

rules

)

feed

eat

X

Y

X

Y

feed

eat

X

Y

Y

X

feed

eat

X

Y

X

YSlide6

Average

Precision

Accuracy

First Author (Group)

80.8%

75.4%

Hickl

(LCC)

71.3%

73.8%

Tatu

(LCC)

64.4%

63.9%

Zanzotto (Milan & Rome)

62.8%

62.6%

Adams (Dallas)

66.9%

61.6%

Bos (Rome & Leeds)

Learning RTE

Classifiers

: the

feature

space

S

N

P

V

P

VB

NP

X

Y

eat

V

P

VB

NP

X

feed

NP

Y

Rules

with

Variables

(First-

order

rules

)

feed

eat

X

Y

X

Y

Zanzotto&Moschitti

,

Automatic

learning

of

textual

entailments

with cross-

pair

similarities

,

Coling

-ACL,

2006

RTE 2

ResultsSlide7

Adding semanticsDistributional Semantics

Mehdad, Moschitti, Zanzotto, Syntactic/Semantic Structures for Textual Entailment Recognition, Proceedings of NAACL, 2010

NP

VP

VB

NP

X

S

NP

VP

VB

S

X

killed

died

NP

VP

VB

NP

X

NP

VP

VB

X

murdered

died

S

S

Promising

!!!

Distributional

SemanticsSlide8

Compositional Distributional Semantics (CDS)Mitchell&Lapata (2008) set a general model for bigrams that assigns a distributional meaning

to a sequence of two words “x y”:R is the relation between x and y

K

is

an

external knowledge

An

active

research area!Slide9

Compositional

Distributional

Semantics (CDS)

hands

car

moving

moving

hands

moving

car

A

distributional

semantic

space

Composing

distributional

meaningSlide10

Compositional Distributional Semantics (CDS)

Mitchell&Lapata (2008) set a general model for bigrams that assigns a distributional

meaning

to a

sequence

of

two words “x y”:R is the relation between x and yK is an external knowledge

hands

moving

y

x

moving

hands

z

fSlide11

Matrices AR and BR can be estimated with:positive examples taken from dictionariesmultivariate regression

modelsCDS: Full Additive ModelThe full additive model

Zanzotto, Korkontzelos, Fallucchi,

Manandhar

,

Estimating

Linear Models for Compositional Distributional

Semantics, Proceedings of the 23rd COLING, 2010contact   /ˈkɒntækt/ [

kon-takt] 2. close interactionSlide12

CDS: Recursive Full Additive Modeleatcows extracts

animal

VN

VN

N

N

f(

 

=f(

 

=

 

=

 

Let’s

scale

up to

sentences

by

recursively

applying

the

model!

Let’s

apply

it

to RTE

Extremely

poor

results

Slide13

Recursive Full Additive Model: a closer look«chickens eat beef extracts»

«cows eat animal

extracts

»

 

 

 

 

 

 

 

 

 

f

f

evaluating

the

similarity

Ferrone&Zanzotto

,

Linear Compositional Distributional Semantics and Structural Kernels

,

Proceedings

of

Joint Symposium of

Semantic

Processing, 2013Slide14

Ferrone&Zanzotto, Linear Compositional Distributional Semantics and Structural Kernels, Proceedings of Joint Symposium of Semantic Processing, 2013Recursive Full Additive Model: a closer look

 

 

 

 

structure

meaning

structure

meaning

<

1

?

structure

meaning

 Slide15

The prequel …

 

structure

meaning

Recognizing

Textual

Entailment

Feature

Spaces

of the

Rules with

Variables

adding distributional semantics

Distributional

Semantics

Binary CDS

Recursive CDS

 Slide16

Distributed Tree Kernels

structure

 Slide17

Zanzotto&Dell'Arciprete, Distributed Tree Kernels, Proceedings of ICML, 2012

Tree

Kernels

VP

VB

NP

NP

S

NP

NNS

VP

VB

NP

feed

NP

NNS

cows

NN

NNS

animal

extracts

S

NP

NNS

Farmers

VP

VB

NP

NP

S

NP

NNS

Farmers

T

t

i

t

j

 

structure

 Slide18

Tree

Kernels

in

Smaller

Vectors

VP

VB

NP

NP

S

NP

NNS

VP

VB

NP

feed

NP

NNS

cows

NN

NNS

animal

extracts

S

NP

NNS

Farmers

VP

VB

NP

NP

S

NP

NNS

Farmers

T

t

i

t

j

CDS desiderata

-

Vectors

are

smaller

-

Vectors

are

obtained

with a

Compositional

Function

Zanzotto&Dell'Arciprete

,

Distributed

Tree

Kernels

,

Proceedings

of

ICML,

2012Slide19

Zanzotto&Dell'Arciprete, Distributed Tree Kernels, Proceedings of ICML, 2012Names for the «Distributed» World

Distributed

Trees

(

DT)

Distributed

Tree

Fragments

(

DTF)

D

istributed

T

ree

K

ernels

(

DTK)

As

we

are

encoding

trees

in small

vectors

,

the

tradition

is

distributed

structures

(

Plate

, 1994)Slide20

Compositionally building Distributed Tree FragmentsDistributed Tree Fragments are a nearly orthonormal base of Rd

Distributed Trees can be efficiently computedDTKs shuold

approximate

Tree

KernelsDTK: Expected properties and challenges

Zanzotto&Dell'Arciprete, Distributed Tree Kernels, Proceedings of ICML, 2012

 Slide21

Compositionally building Distributed Tree FragmentsDistributed Tree Fragments are a nearly orthonormal base that embeds Rm in R

d Distributed Trees can be

efficiently

computed

DTKs

shuold approximate Tree KernelsDTK: Expected properties and challenges

Zanzotto&Dell'Arciprete

,

Distributed

Tree

Kernels, Proceedings of

ICML, 2012

 Slide22

Compositionally building Distributed Tree FragmentsBasic elementsN a set of nearly orthogonal random vectors for node labels

a basic vector composition function with some ideal properties A distributed tree

fragment

is

the application of the composition function  on the node vectors, according to the order given by a depth first visit of the tree.

Zanzotto&Dell'Arciprete

,

Distributed

Tree

Kernels, Proceedings of ICML, 2012Slide23

Building Distributed Tree FragmentsProperties of the Ideal function 

Non-commutativity with a very high degree k

Non-

associativity

Bilinearity

Approximation

we demonstrated DTF are a nearly

orthonormal base

Zanzotto&Dell'Arciprete

, Distributed

Tree Kernels, Proceedings of ICML, 2012

 Slide24

Compositionally building Distributed

Tree FragmentsDistributed Tree Fragments

are a

nearly

orthonormal

base that embeds Rm in Rd

Distributed Trees can be efficiently computed

DTKs

shuold approximate Tree

KernelsDTK: Expected properties and

challenges

Zanzotto&Dell'Arciprete, Distributed Tree Kernels, Proceedings of ICML, 2012

 Slide25

Building Distributed TreesGiven a tree T, the distributed representation of its subtrees is the vector:where S(T) is the set of the subtrees of T

VP

VB

NP

NP

S

NP

NNS

VP

VB

NP

feed

NP

NNS

cows

NN

NNS

animal

extracts

S

NP

NNS

Farmers

VP

VB

NP

NP

S

NP

NNS

Farmers

S

(

) = {

,

}

Zanzotto&Dell'Arciprete

,

Distributed

Tree

Kernels

,

Proceedings

of

ICML,

2012Slide26

Building Distributed TreesA more efficient approachN(T) is the set of nodes of Ts(n) is defined as

:

if

n

is terminal

if

nc1…cm

Computing a Distributed

Tree

is

linear with

respect to the size of N(T)Zanzotto&Dell'Arciprete, Distributed Tree

Kernels, Proceedings of ICML, 2012Slide27

Building Distributed TreesA more efficient approach

Assuming the ideal basic composition function , it is possible to show that it exactly computes:

(

see

Theorem

1 in the paper)

Zanzotto&Dell'Arciprete, Distributed Tree Kernels, Proceedings of ICML, 2012Slide28

Compositionally

building Distributed Tree Fragments

Distributed

Tree

Fragments

are a nearly orthonormal base that embeds Rm in Rd

Distributed

Trees

can be efficiently computed

DTKs shuold approximate

Tree KernelsDTK: Expected properties and challengesProperty 1 (Nearly Unit Vectors)

Property 2 (Nearly Orthogonal Vectors)

Zanzotto&Dell'Arciprete

,

Distributed

Tree

Kernels, Proceedings of ICML, 2012Slide29

Experimental evaluationEvaluation of Concrete Composition Functions: How well can concrete composition functions approximate ideal function ?Direct Analysis: How well do DTKs approximate the original tree kernels (TKs)?Task-based Analysis: How well do DTKs perform on actual NLP tasks, with respect to TKs?

Vector dimension = 8192

Zanzotto&Dell'Arciprete

,

Distributed

Tree

Kernels, Proceedings of ICML, 2012Slide30

Towards the reality: Approximating   is an ideal function!Proposed approximations:Shuffled normalized element-wise product

Shuffled circular convolution

It

is

possible

to show

that

properties of  statistically hold for the

two approximations

Zanzotto&Dell'Arciprete, Distributed Tree Kernels, Proceedings of ICML, 2012Slide31

Direct Analysis for zSpearman’s correlation between DTK and TK valuesTest trees taken from QC corpus and RTE corpus

Zanzotto&Dell'Arciprete

,

Distributed

Tree

Kernels, Proceedings of ICML, 2012Slide32

Task-based Analysis for x

Question Classification

Recognizing

Textual

EntailmentZanzotto&Dell'Arciprete, Distributed Tree Kernels, Proceedings of ICML, 2012

with these realizations of the ideal function :Shuffled normalized element-wise

product

Shuffled circular

convolution

 Slide33

Remarks

Distributed

Trees

(

DT)

Distributed

Tree

Fragments

(

DTF)

D

istributed

T

ree Kernels (DTK)

are

a

nearly

orthonormal

base

that

embeds

R

m

in

R

d

can

be

efficiently

computed

approximate

Tree

KernelsSlide34

Side effect: reduced time complexityTree kernels (TK) (Collins & Duffy, 2001) have quadratic complexityCurrent techniques control this complexity (Moschitti, 2006), (Rieck et al., 2010), (Shin et al.,2011)DTKs

change the complexity as they can be used with Linear Kernel Machines

SVM+TK

LinearSVM+DTK

Shuf

.

Prod

.

Shuf

. Circ.

Conv

.

Training

O(n

3

|N

(T)|

2)

O(n|N(T)|d)O(n|N

(T)|d

log d

)

Classification

O(

n

|N

(T)|

2

)

O(|N(T)|d)

O(|N(T)|d

log d

)

n: # of training

examples

|N(T)|: # of

nodes

of the

tree

T

Zanzotto&Dell'Arciprete

,

Distributed

Convolution

Kernels

on

Countable

Sets

, Journal of Machine Learning Research,

Accepted

conditioned

to minor

revisionsSlide35

SequelTowards Structured Prediction: Distributed Representation

ParsingGeneralizing the theory:Distributed Convolution Kernels on Countable

Sets

Adding

back

distributional

semantics:Distributed Smoothed Tree KernelsSlide36

SequelTowards Structured Prediction : Distributed

Representation ParsingGeneralizing the theory:Distributed

Convolution

Kernels

on

Countable SetsAdding back distributional semantics:Distributed Smoothed

Tree KernelsSlide37

Distributed Representation Parsing (DRP): the idea

Distributed Tree Encoder

(DT)

Zanzotto&Dell'Arciprete

,

Transducing

Sentences to Syntactic Feature Vectors: an Alternative Way to "Parse"?

, Proceedings of the

ACL-Workshop

on

CVSC, 2013Slide38

Zanzotto&Dell'Arciprete, Transducing Sentences to Syntactic Feature Vectors: an Alternative Way to "Parse"?, Proceedings of the ACL-Workshop on CVSC, 2013Distributed Representation Parsing (DRP): the idea

Symbolic

Parser

(SP)

Distributed

Tree

Encoder

(DT)

«

We

booked the flight»

Distributed

Representation

Parsing (DRP)

Sentence Encoder(D)

Transducer

(P)Slide39

Zanzotto&Dell'Arciprete, Transducing Sentences to Syntactic Feature Vectors: an Alternative Way to "Parse"?, Proceedings of the ACL-Workshop on CVSC, 2013Non-Lexicalized Sentence ModelsBag-of-postagsN-grams of postags

Lexicalized Sentence ModelsUnigramsUnigrams + N-grams of postagsDRP: Sentence Encoder

«

We

booked

the

flight

»

Distributed

Representation

Parsing

(DRP)

Sentence

Encoder

(D)

Transducer

(P)Slide40

Zanzotto&Dell'Arciprete, Transducing Sentences to Syntactic Feature Vectors: an Alternative Way to "Parse"?, Proceedings of the ACL-Workshop on CVSC, 2013Estimation: Principal Component Analysis and Partial Least Square Estimation

T=PSApproximation: Moore-Penrose pseudoinverse to derive (Penrose, 1955) where k is the number of selected singular values DRP: Transducer

«

We

booked

the

flight

»

Distributed

Representation

Parsing

(DRP)

Sentence

Encoder

(D)

Transducer

(P)Slide41

DataEnglish Penn Treebank with standard splitDistributed trees with 3 l (0, 0.2, 0.4) and 2 models Unlexicalized/LexicalizedDimension of the reduced space (4,096 and 8,192)System ComparisonDistributed Symbolic Parser DSP(s) = DT(SP(s))Symbolic Parser: Bikel Parser (Bikel, 2004) with Collins Settings (Collins, 2003)Parameter Estimation

Parametersk for the pseudo-inversej for the sentence encoders DMaximization of the similarity (see parsing performance) on Section 24Experimental set-up

Zanzotto&Dell'Arciprete

,

Transducing

Sentences to Syntactic Feature Vectors: an Alternative Way to "Parse"?

, Proceedings of the ACL-Workshop on CVSC, 2013Slide42

Evaluation Measure«Distributed» Parsing Performance

Unlexicalized

Trees

Lexicalized

TreesZanzotto&Dell'Arciprete, Transducing Sentences to Syntactic Feature Vectors: an Alternative Way to "Parse"?, Proceedings of the ACL-Workshop

on CVSC, 2013Slide43

SequelTowards Structured

Prediction : Distributed Representation Parsing

Generalizing

the

theory

:

Distributed Convolution Kernels on Countable SetsAdding back distributional semantics:Distributed Smoothed

Tree KernelsSlide44

The following general property holds:

w

here

CK

is

a convolution kernelDCK is the related distributed convolution kernelImplemented Distributed

Convolution KernelsDistributed Tree KernelDistributed Subpath Kernel

Distributed Route

KernelDistributed String

KernelDistributed Partial Tree Kernel

 

Distributed Convolution Kernels on Contable Sets

Zanzotto&Dell'Arciprete, Distributed Convolution Kernels on Countable Sets, Journal of Machine Learning Research, Accepted conditioned

to minor revisionsSlide45

SequelTowards Structured

Prediction : Distributed Representation Parsing

Generalizing

the

theory

:

Distributed Convolution Kernels on Countable SetsAdding back distributional semantics:Distributed

Smoothed Tree KernelsSlide46

Going back to RTE and distributional semanticsMehdad, Moschitti, Zanzotto, Syntactic/Semantic Structures for

Textual Entailment Recognition, Proceedings of NAACL, 2010NP

VP

VB

NP

X

S

NP

VP

VB

S

X

killed

died

NP

VP

VB

NP

X

NP

VP

VB

X

murdered

died

S

S

Promising

!!!

Distributional

SemanticsSlide47

Ferrone&Zanzotto, Linear Compositional Distributional Semantics and Structural Kernels, Proceedings of Joint Symposium of Semantic Processing, 2013A Novel Look at the Recursive Full Additive Model

 

 

 

 

structure

meaning

structure

meaning

<

1

?

structure

meaning

 Slide48

A Novel Look at the Recursive Full Additive Model

 

Zanzotto, Ferrone, Baroni,

When the whole is not greater than the sum of its parts:

A

decompositional

look at compositional distributional semantics

, re-

submitted

 

if

Struct

i

=

Struct

j

if

Struct

i

≠Struct

j

 

Choosing

:

 Slide49

«Convolution Conjecture»

 

Compositional Distributional Models based on linear algebra and Convolution Kernels are intimately related

 

Zanzotto, Ferrone, Baroni,

When the whole is not greater than the sum of its parts:

A

decompositional

look at compositional distributional semantics

, re-

submitted

For example:

Convolution Kernel

Recursive Full Additive Model

The

similarity equations

between two vectors/tensors obtained with CDSMs

can be

decomposed

into operations performed on the subparts of the input

phrases.Slide50

Distributed Smoothed Tree KernelsNPVP

VBNP

S

killed

NP

VP

VB

NP

murdered

S

:

murdered

:

killed

NP

VP

VB

NP

S

:

killed

synt

(

) =

NP

VP

VB

NP

S

NP

VP

VB

NP

S

:

killed

head(

) =

killed

Ferrone, Zanzotto,

Towards

Syntax-aware

Compositional

Distributional

Semantic

Models

,

Proceedings

of

CoLing

, 2014Slide51

In general, for a lexicalized tree:we defineDistributed Smoothed Tree Kernels

Ferrone, Zanzotto,

Towards

Syntax-aware

Compositional Distributional Semantic Models, Proceedings of CoLing, 2014Slide52

Distributed Smoothed Tree Kernels

 

 

Ferrone, Zanzotto,

Towards

Syntax-aware

Compositional

Distributional

Semantic

Models

,

Proceedings

of

CoLing

, 2014

Distributed

Smoothed

Tree

The

resulting

dot (

frobenius

)

productSlide53

What’s next...Slide54

Recognizing

Textual

Entailment

Feature

Spaces of the Rules with Variablesadding

distributional

semantics

Distributional

Semantics

Binary

CDS

 

Tree

Kernels

D

istributed

T

ree

K

ernels

 

D

istributed

R

epresentation

P

arsing

D

istributed

C

onvolution

K

ernels

on

Countable

Sets

 

structure

meaning

Recursive CDS

Prequel

SequelSlide55

Recognizing

Textual

Entailment

Feature

Spaces of the Rules with Variablesadding

distributional

semantics

Distributional

Semantics

Binary

CDS

 

Tree

Kernels

D

istributed

T

ree

K

ernels

 

D

istributed

R

epresentation

P

arsing

D

istributed

C

onvolution

K

ernels

on

Countable

Sets

 

structure

meaning

Recursive CDS

Prequel

Sequel

Adding

back

distributional

meaning

D

istributed

C

onvolution

K

ernels

on

Countable

Sets

Lexicalized

D

istributed

R

epresentation

P

arsing

What’s

next

Distributed

Tree

Kernels

and

Distributional

Semantics:

Between

Syntactic

Structures

and

Compositional

Distributional

SemanticsSlide56

ApplicationsIndexing structured information for Fast Syntax-aware Information RetrievalSemantic Text SimilarityFast Document SummarizationIndexing structured information for XML Information RetrievalAny other suggestion? AcceleratorOptimizing

the code with GPU programming (CUDA)After what’s next… what’s for?Slide57

Lorenzo Dell’ArcipreteMarco PennacchiottiAlessandro MoschittiYashar MehdadIoannis KorkontzelosLorenzo FerroneCode Distributed Tree Kernels and Distributed Convolution Kernels:

http://code.google.com/p/distributed-tree-kernels/CreditsSlide58
Slide59

Distributed

Tree Kernels

Compositional

Distributional

Semantics

Brain&Computer

VP

VB

NP

NP

S

C

N

F

VB

NP

NP

S

VP

Slide60

Distributed Tree KernelsZanzotto, F. M. & Dell'Arciprete, L. Distributed Tree Kernels, Proceedings of International Conference on Machine Learning, 2012Tree Kernels and Distributional

SematicsMehdad, Y.; Moschitti, A. & Zanzotto, F. M. Syntactic/Semantic Structures for Textual Entailment Recognition, Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, 2010Compositional Distributional SemanticsZanzotto, F. M.; Korkontzelos, I.; Fallucchi, F. & Manandhar, S. Estimating Linear

Models

for

Compositional

Distributional Semantics, Proceedings of the 23rd International Conference on Computational Linguistics (COLING), 2010Distributed and Distributional Tree KernelsZanzotto, F. M. & Dell'arciprete, L. Distributed Representations and Distributional Semantics, Proceedings of the ACL-HLT 2011 workshop on Distributional Semantics and Compositionality (DiSCo), 2011

If you want to read more…Slide61

Initial IdeaZanzotto, F. M. & Moschitti, A. Automatic learning of textual entailments with cross-pair similarities, ACL-44: Proceedings of the 21st International Conference on

Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, 2006First refinement

of the

algorithm

Moschitti, A. & Zanzotto, F. M. Fast and

Effective

Kernels for Relational Learning from Texts, Proceedings of 24th Annual International Conference on Machine Learning, 2007Adding shallow semanticsPennacchiotti, M. & Zanzotto, F. M. Learning Shallow Semantic Rules for Textual Entailment,

Proceeding of International Conference RANLP - 2007, 2007A comprehensive descriptionZanzotto, F. M.; Pennacchiotti, M. & Moschitti, A. A Machine Learning Approach to Textual

Entailment

Recognition, NATURAL LANGUAGE ENGINEERING, 2009

My first lifeLearning Textual Entailment

Recognition SystemsSlide62

Adding Distributional SemanticsMehdad, Y.; Moschitti, A. & Zanzotto, F. M. Syntactic/Semantic Structures for Textual Entailment Recognition, Human Language Technologies: The 2010 Annual

Conference of the North American Chapter of the Association for Computational Linguistics, 2010A valid kernel with an

efficient

algorithm

Zanzotto, F. M. & Dell'Arciprete, L.

Efficient kernels for sentence pair classification, Conference on Empirical Methods on Natural Language Processing, 2009Zanzotto, F. M.; Dell'arciprete, L. & Moschitti, A. Efficient Graph Kernels for Textual Entailment Recognition, FUNDAMENTA INFORMATICAEApplications

Zanzotto, F. M.; Pennacchiotti, M. & Tsioutsiouliklis, K. Linguistic Redundancy in Twitter, Proceedings of 2011 Conference on Empirical

Methods on Natural Language Processing (

EmNLP), 2011

Extracting RTE CorporaZanzotto, F. M. & Pennacchiotti, M. Expanding textual entailment corpora from Wikipedia using co-training, Proceedings of the COLING-Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources,

2010Learning Verb RelationsZanzotto, F. M.; Pennacchiotti, M. & Pazienza, M. T. Discovering asymmetric entailment relations between verbs using

selectional preferences, ACL-44: Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational LinguisticsMy first lifeLearning Textual Entailment Recognition SystemsSlide63

Zanzotto, F. M. & Croce, D. Comparing EEG/ERP-like and fMRI-like Techniques for Reading Machine Thoughts, BI 2010: Proceedings of the Brain Informatics Conference - Toronto, 2010Zanzotto, F. M.; Croce, D. & Prezioso, S. Reading what

Machines "Think": a Challenge for Nanotechnology, Joint Conferences on Avdanced Materials, 2009Zanzotto, F. M. & Croce, D. Reading

what

machines

"

think", BI 2009: Proceedings of the Brain Informatics Conference - Bejing, China, October 2009Prezioso, S.; Croce, D. & Zanzotto, F. M. Reading what machines "think": a challenge for nanotechnology, JOURNAL OF COMPUTATIONAL AND THEORETICAL NANOSCIENCE, 2011Zanzotto, F. M.; Dell'arciprete, L. & Korkontzelos, Y. Rappresentazione distribuita e semantica distribuzionale dalla prospettiva dell'Intelligenza Artificiale, TEORIE & MODELLI, 2010

My second lifeParallels between Brains and ComputersSlide64

Thank you for the attentionSlide65

Structured

Feature

Spaces

:

Dimensionality Reduction

VP

VB

NP

NP

S

NP

NNS

VP

VB

NP

feed

NP

NNS

cows

NN

NNS

animal

extracts

S

NP

NNS

Farmers

VP

VB

NP

NP

S

NP

NNS

Farmers

T

t

i

t

j

Traditional Dimensionality Reduction Techniques

Singular Value Decomposition

Random Indexing

Feature Selection

Not

applicableSlide66

Computational Complexity of DTKn size of the treek selected tree fragmentsq

w reducing factorO(.) worst-case complexityA(.) average-case complexitySlide67

Time Complexity AnalysisDTK time complexity is independent of the tree sizes!Slide68

OutlineDTK: Expected properties and challengesModel:Distributed Tree FragmentsDistributed TreesExperimental evaluation

RemarksBack to Compositional Distributional SemanticsFuture WorkSlide69

Towards Distributional Distributed TreesDistributed Tree FragmentsNon-terminal nodes n: random vectorsTerminal nodes w: random vectorsDistributional Distributed Tree FragmentsNon-terminal nodes n: random vectorsTerminal nodes w:

distributional vectorsCaveat: Property 2Random vectors

are nearly orthogonal

Distributional vectors

are not

Zanzotto&Dell‘Arciprete

,

Distributed Representations and Distributional Semantics, Proceedings of the ACL-workshop

DiSCo, 2011Slide70

Experimental Set-upTask Based Comparison:Corpus: RTE1,2,3,5Measure: AccuracyDistributed/Distributional Vector Size: 250Distributional Vectors: Corpus: UKWaC (Ferraresi et al., 2008)LSA: applied with k=250

Zanzotto&Dell‘Arciprete, Distributed Representations and Distributional Semantics, Proceedings of the ACL-workshop DiSCo, 2011Slide71

Accuracy Results

Zanzotto&Dell‘Arciprete, Distributed Representations and Distributional Semantics, Proceedings of the ACL-workshop DiSCo, 2011Slide72

Adding semanticsShallow semanticsPennacchiotti&Zanzotto, Learning Shallow Semantic Rules for Textual Entailment

, Proceeding of RANLP, 2007T

H

For my younger readers,

Chapman killed

John Lennon more than twenty years ago.”“John Lennon died more than twenty years ago.”

T

H

Learning example

NP

VP

VB

NP

Y

X

S

NP

VP

VB

Y

S

X

A

generalized

rule

causes

cs

cs

killed

died

Variables

with

TypesSlide73

Empirical Evaluation of PropertiesNon-commutativityDistributivity over the sumNorm preservationOrthogonality preservation

OK

OK

?

?Slide74

Symbolic Parser

(SP)

Distributed

Tree

Encoder

(DT)

«

We

booked

the

flight»

Distributed Representation Parsing (DRP)

Sentence

Encoder

(D)

Transducer

(P)