Chao Xing CSLT Tsinghua Why Chris Dyer group had gotten a lot of brilliant achievements in 2015 and their research interest match to ours And in some area we two groups almost think same way but we didnt do so well as they did ID: 537651
Download Presentation The PPT/PDF document "Learning From Chris Dyer Group" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Learning From Chris Dyer Group
Chao Xing
CSLT TsinghuaSlide2
Why?
Chris Dyer
group had gotten a lot of brilliant achievements in 2015, and their research interest match to ours.
And in some area, we two groups almost think same way, but we didn’t do so well as they did.
This PPT will introduce some works they did in 2015.Slide3
Learning Word Representations with Hierarchical Sparse Coding
In this
they proposed a novel model for decompose a PMI matrix with hierarchical forest constraint.
The forest structure is something like below.
Slide4
Learning Word Representations with Hierarchical Sparse Coding
The factorization cost function will turn from :
To :
For sparse coding, they could just consider the non-zero element, no factorize matrix by consider all elements. This could speedup former
model.
Slide5
Learning Word Representations with Hierarchical Sparse Coding
The results are as follow:
And the results consistence with our experiments in CVSS 500Slide6
Sparse Over-complete Word
Vector Representations
In this
, they want to find a transform from
pre-trained word
vectors to sparse and binary(optional) vectors.
Slide7
Sparse Over-complete Word Vector Representations
In sparse coding process the optimal function seek to optimize :
And in their so-called model A, they want to optimize :
The non-negative sparse coding cost function is as follow:
In their model B, the optimize function will be
:
Slide8
Sparse Over-complete Word Vector Representations
How to binary?
In which
is the mean of positive-valued elements and
is the mean of negative-valued elements.
Slide9
Sparse Over-complete Word Vector Representations
They test the transform word vectors’ quality in five benchmarks :
word similarity,
sentiment analysis,
question classification,
text classification, noun phrases bracketing.Slide10
Non-distributional Word Vector Representations
This
constructed
task-independent
word vector representations
using linguistic knowledge derived from pre-constructed linguistic resources like WordNet (Miller, 1995), FrameNet (Baker
et
al
., 1998), Penn Treebank (Marcus et al., 1993) etc. Slide11
Non-distributional Word Vector Representations
They built their non-distributional word vector based on knowledge base. Such word vectors contain high sparseness and long dimensions.
They combined 8 knowledge base, and contain 119,257 synset-level word vector, each vector contain 172,418 feature.Slide12
Non-distributional Word Vector Representations
The results show that unlike distributional word vector need to train a lot of time and data-dependent non-distributional word vector can also achieve good results.Slide13
Retrofitting Word Vectors to Semantic Lexicons
This
proposes
a method for refining vector
space representations
using relational
information from semantic lexicons by encouraging linked words to have similar vector representations, and it makes no assumptions about how the
input vectors
were constructed.
Slide14
Retrofitting Word Vectors to Semantic Lexicons
The contribution of this paper is a
graph-based learning
technique for using lexical relational
resources to
obtain higher quality semantic vectors, which we call “retrofitting.”Slide15
Retrofitting Word Vectors to Semantic Lexicons
Let V = {w1, . . . ,wn} be a vocabulary, i.e, the
set of
word types,
and
be an ontology that encodes semantic relations between words in V.
Slide16
Retrofitting Word Vectors to Semantic Lexicons
The previous work is post-processing, and this one shows retrofitting word vectors during learning.Slide17
Retrofitting Word Vectors to Semantic Lexicons
There are two results table showed, first is try to compare different semantic lexicons in different tasks using retrofitting.Slide18
Retrofitting Word Vectors to Semantic Lexicons
This result compare different way to use retrofitting, during training or post training.
I think both them are reasonableSlide19
References
[1]
Dani
Yogatama
,
Manaal Faruqui, Chris Dyer, Noah A. Smith. LearningWord Representations with Hierarchical Sparse Coding. ICML, 2015.[2]
Levy, Omar and Goldberg, Yoav. Neural word embeddings as implicit matrix factorization. In Proc. of NIPS, 2014
.
[3
] Manaal Faruqui, Yulia Tsvetkov, Dani Yogatama, Chris Dyer, and Noah A. Smith. 2015. Sparse overcomplete word vector representations. In Proceedings of ACL 2015.[4] Manaal Faruqui and Chris Dyer. 2015. Nondistributional word vector representations. In Proc. ACL.Slide20
References
[5]
Manaal
Faruqui and Chris Dyer. 2015. Nondistributional word vector representations. In Proc. ACL[6] Manaal Faruqui, Jesse Dodge, Sujay
Kumar Jauhar, Chris Dyer, Eduard Hovy, and Noah A. Smith. 2015. Retrofitting word vectors to semantic lexicons. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1606– 1615.