/
Learning From Chris Dyer Group Learning From Chris Dyer Group

Learning From Chris Dyer Group - PowerPoint Presentation

calandra-battersby
calandra-battersby . @calandra-battersby
Follow
398 views
Uploaded On 2017-04-15

Learning From Chris Dyer Group - PPT Presentation

Chao Xing CSLT Tsinghua Why Chris Dyer group had gotten a lot of brilliant achievements in 2015 and their research interest match to ours And in some area we two groups almost think same way but we didnt do so well as they did ID: 537651

vector word vectors representations word vector representations vectors sparse semantic retrofitting 2015 lexicons coding chris dyer learning distributional manaal faruqui hierarchical results

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Learning From Chris Dyer Group" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Learning From Chris Dyer Group

Chao Xing

CSLT TsinghuaSlide2

Why?

Chris Dyer

group had gotten a lot of brilliant achievements in 2015, and their research interest match to ours.

And in some area, we two groups almost think same way, but we didn’t do so well as they did.

This PPT will introduce some works they did in 2015.Slide3

Learning Word Representations with Hierarchical Sparse Coding

In this

they proposed a novel model for decompose a PMI matrix with hierarchical forest constraint.

The forest structure is something like below.

 Slide4

Learning Word Representations with Hierarchical Sparse Coding

The factorization cost function will turn from :

To :

For sparse coding, they could just consider the non-zero element, no factorize matrix by consider all elements. This could speedup former

model.

 Slide5

Learning Word Representations with Hierarchical Sparse Coding

The results are as follow:

And the results consistence with our experiments in CVSS 500Slide6

Sparse Over-complete Word

Vector Representations

In this

, they want to find a transform from

pre-trained word

vectors to sparse and binary(optional) vectors.

 Slide7

Sparse Over-complete Word Vector Representations

In sparse coding process the optimal function seek to optimize :

And in their so-called model A, they want to optimize :

The non-negative sparse coding cost function is as follow:

In their model B, the optimize function will be

:

 Slide8

Sparse Over-complete Word Vector Representations

How to binary?

In which

is the mean of positive-valued elements and

is the mean of negative-valued elements.

 Slide9

Sparse Over-complete Word Vector Representations

They test the transform word vectors’ quality in five benchmarks :

word similarity,

sentiment analysis,

question classification,

text classification, noun phrases bracketing.Slide10

Non-distributional Word Vector Representations

This

constructed

task-independent

word vector representations

using linguistic knowledge derived from pre-constructed linguistic resources like WordNet (Miller, 1995), FrameNet (Baker

et

al

., 1998), Penn Treebank (Marcus et al., 1993) etc. Slide11

Non-distributional Word Vector Representations

They built their non-distributional word vector based on knowledge base. Such word vectors contain high sparseness and long dimensions.

They combined 8 knowledge base, and contain 119,257 synset-level word vector, each vector contain 172,418 feature.Slide12

Non-distributional Word Vector Representations

The results show that unlike distributional word vector need to train a lot of time and data-dependent non-distributional word vector can also achieve good results.Slide13

Retrofitting Word Vectors to Semantic Lexicons

This

proposes

a method for refining vector

space representations

using relational

information from semantic lexicons by encouraging linked words to have similar vector representations, and it makes no assumptions about how the

input vectors

were constructed.

 Slide14

Retrofitting Word Vectors to Semantic Lexicons

The contribution of this paper is a

graph-based learning

technique for using lexical relational

resources to

obtain higher quality semantic vectors, which we call “retrofitting.”Slide15

Retrofitting Word Vectors to Semantic Lexicons

Let V = {w1, . . . ,wn} be a vocabulary, i.e, the

set of

word types,

and

be an ontology that encodes semantic relations between words in V.

 Slide16

Retrofitting Word Vectors to Semantic Lexicons

The previous work is post-processing, and this one shows retrofitting word vectors during learning.Slide17

Retrofitting Word Vectors to Semantic Lexicons

There are two results table showed, first is try to compare different semantic lexicons in different tasks using retrofitting.Slide18

Retrofitting Word Vectors to Semantic Lexicons

This result compare different way to use retrofitting, during training or post training.

I think both them are reasonableSlide19

References

[1]

Dani

Yogatama

,

Manaal Faruqui, Chris Dyer, Noah A. Smith. LearningWord Representations with Hierarchical Sparse Coding. ICML, 2015.[2]

Levy, Omar and Goldberg, Yoav. Neural word embeddings as implicit matrix factorization. In Proc. of NIPS, 2014

.

[3

] Manaal Faruqui, Yulia Tsvetkov, Dani Yogatama, Chris Dyer, and Noah A. Smith. 2015. Sparse overcomplete word vector representations. In Proceedings of ACL 2015.[4] Manaal Faruqui and Chris Dyer. 2015. Nondistributional word vector representations. In Proc. ACL.Slide20

References

[5]

Manaal

Faruqui and Chris Dyer. 2015. Nondistributional word vector representations. In Proc. ACL[6] Manaal Faruqui, Jesse Dodge, Sujay

Kumar Jauhar, Chris Dyer, Eduard Hovy, and Noah A. Smith. 2015. Retrofitting word vectors to semantic lexicons. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1606– 1615.