/
A theory of semantic spaces with some applications A theory of semantic spaces with some applications

A theory of semantic spaces with some applications - PowerPoint Presentation

spottletoefacebook
spottletoefacebook . @spottletoefacebook
Follow
344 views
Uploaded On 2020-08-28

A theory of semantic spaces with some applications - PPT Presentation

Xiangen Hu CCNU amp UoM Agenda Introduction Basic semantic comparison techniques Examples of semantic spaces A general framework A few applications Hands on if time permits ID: 808828

sule semantic spaces iss semantic sule iss spaces framework basic comparison similarity space analysis sle general similar word matching

Share:

Link:

Embed:

Download Presentation from below link

Download The PPT/PDF document "A theory of semantic spaces with some ap..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

A theory of semantic spaces with some applications

Xiangen

Hu

CCNU &

UoM

Slide2

Agenda

Introduction

Basic semantic comparison techniques

Examples

of semantic spaces

A general framework

A few

applications

Hands-

on

if time permits)

Summary

Slide3

Introduction

Slide4

IntroductionAs more information becomes available

, it becomes

more difficult

to find and

discover what

we need

.

We need new tools to help us organize, search, and understand these vast amounts of information.

Slide5

IntroductionAs more information becomes available

, it becomes

more difficult

to find and

discover what

we need

.

We need new tools to help us organize, search, and understand these vast amounts of information.

Slide6

Introduction

As more information

becomes available

, it becomes

more difficult

to find and

discover what

we need

.We need new tools to help us organize, search, and understand these vast amounts of information.

Slide7

Introduction

Semantic Representation Analysis (SRA) enables us to automatically organize,

understand, search,

and

summarize

large

electronic

archives

.Discover the hidden themes that pervade the collection.

Annotate the documents according to those themes.Use annotations to organize, summarize, and search the texts.

Slide8

Introduction

Slide9

IntroductionSemantic representation Analysis (SRA) is part of a semester course, such as “computational linguistics”

What will be covered

Basic semantic comparison techniques

Examples of semantic spaces

A general framework

A few applications

Hands-on (if time permits)

Summary

Slide10

Basic Semantic Comparison Techniques

Slide11

Basic Semantic Comparison Techniques

What is Semantics?

Slide12

Basic Semantic Comparison Techniques

What is Semantics?

Slide13

Basic Semantic Comparison Techniques

What is Semantics Structure?

... ... semantic structure is defined as the arrangement of the terms relative to each other as represented in a metric space in which items judged more similar are placed closer to each other than items judged as less similar... ... 

Romney

, A. K., Boyd, J. P., Moore, C. C.,

Batchelder

, W. H., &

Brazill

, T. J. (1996). Culture as shared cognitive representations. Proceedings of the National Academy of Sciences, 93(10), 4699-4705.

Slide14

Basic Semantic Comparison Techniques

Slide15

Basic Text Comparison Techniques

How can we compare two pieces of texts?

String

/Word Matching

Key String/Word Matching

Weighted Key String/Word

Matching

Slide16

Basic Text Comparison Techniques

Semantic

How can we compare two pieces of texts?

String

/Word Matching

Key String/Word Matching

Weighted Key String/Word Matching

Extended

Weighted Key String/Word Matching

Semantic Similarity

Slide17

Basic Text Comparison Techniques

Semantic

How can we compare two pieces of texts?

String

/Word Matching

Key String/Word Matching

Weighted Key String/Word Matching

Extended

Weighted Key String/Word Matching

Semantic

SimilarityDeep level comparison

Parsing (syntactic), regular expression comparison, etc.

Slide18

Basic Semantic Comparison Techniques

Extended

Weighted Key String/Word

Matching and

Semantic similarity

Recommended reading

Jones

(2010)

: Redundancy in perceptual and linguistic experience: Comparing feature-based and distributional models of semantic representation

Turney & Pantel (2010):

From Frequency to Meaning: Vector Space Models of SemanticsMcNamara (2010): Computational Methods to Extract Meaning From Text and Advance Theories of Human Cognition 

Slide19

Examples of semantic spaces

Slide20

Examples of semantic spacesQuestion: How to (automatically) find extended

keywords?

Synonym?

existing

thesaurus (what would be he problem?)

Slide21

Examples of semantic spacesIntuition from matrix algebra

SVD

keeps

relations between rows of a matrix.

Slide22

Examples of semantic spacesLatent Semantic Analysis

Slide23

Term-Document Matrix

Slide24

Example of TD Matrix

Slide25

Singular Value Decomposition

Slide26

Example of SVD

Slide27

Parameters in SVD

Slide28

Examples of semantic spaces

Slide29

Examples of semantic spaces

Latent Semantic Analysis (LSA)

==> Variations of LSA

HAL

Topics Model

BEAGLE 

...  

Slide30

Examples of semantic spaces

Slide31

A General Framework

Slide32

There are different methods for semantic encoding

LSA (also called LSI, LSM), HAL, Topics Model, etc.

Within each encoding method,  

Many variations

example: LSA (7 parameters)

Challenges

Slide33

Challenges

There is no way to determine which one to use!

Slide34

There are different methods for semantic encoding

LSA (also called LSI, LSM), HAL, Topics Model, etc.

Within each encoding method, Many variations

example: LSA (7 parameters)

How can we evaluate semantic space?

Need of a framework

Challenges

Slide35

A General Framework

Some basic hypothesis

Semantic Structure (Romney, (1996) from the perspective of anthropology)

... ... semantic structure is defined as the arrangement of the terms relative to each other as represented in a metric space in which items judged more similar are placed closer to each other than items judged as less similar... ... 

Slide36

A General Framework

Some basic hypothesis

General hypothesis about semantic spaces

(Summarized by Turney & Pantel, 2010)

Statistical semantics hypothesis

: Statistical patterns of human word usage can be used to figure out what people mean

Bag of words hypothesis:

The frequencies of words in a document tend to indicate the relevance of the document to a query

Distributional hypothesis:

Words that occur in similar contexts tend to have similar meanings

Extended distributional hypothesis:

Patterns that co-occur with similar pairs tend to have similar meanings

Latent relation hypothesis

: Pairs of words that co-occur in similar patterns tend to have similar semantic relations

Slide37

A General Framework

Some (very) early work (of Hu)

1985-1986: Dynamic Fuzzy Sets

a mathematical model for adverbs and adjectives

Slide38

A General Framework

Some (very) early work (of Hu)

1985-1986: Dynamic Fuzzy Sets: a mathematical model for adverbs and adjectives

Framework of

context

Existence of difference

Interpreted with preferences

Comparison of order relations

examples: How to "measure" similarity/difference when "great weather!" from two people?

Slide39

A General Framework

Some (very) early work (of Hu)

Some later work

A Mathematical Model of Semantics (Hu, 2005)

Concept of "layers": words, phrases, sentences, paragraphs, documents

Formal framework

Language neutral

Computational (vector-based)

Implementable

Slide40

A General Framework

Some (very) early work (of Hu)

Some later work

A Mathematical Model of Semantics (Hu, 2005)

Essence of semantic space:

Semantic similarity between items can be computed (numerically).

"semantic of any item (words, phrases, etc) in a given language is embedded within its relations with other items"

Slide41

A General Framework

Three assumptions

 

Hierarchical assumption

Semantics of different

levels

of a language entity may be represented differently

Representational assumption

Semantics of any level of language entities can be represented numerically or algebraically.

Slide42

A General Framework

Three assumptions

Computational assumption

Semantics of a higher-level language entity is computed as a function of semantics of its lower-level language entities. 

There exists a (numerical) semantic similarity measure that measures any two items at the lowest level.

Slide43

A General Framework

Slide44

Definition

A vector-based semantic space contains five components:

A set of "words" X

0

= {x

1

,…,

x

N

}.

A hierarchy of

layers

, X

1

,…,X

M

, where an element in the set X

i

is a finite-ordered array of elements in X

i–1

(

i

= 1,…,M).

Vector representation for elements in each of the layers.

Measure of similarity (based on the

representation)

between elements within each of the layers.

Mappings from lower-level representations to higher-level representations.

Slide45

Definition

A framework for semantic space

"ordered array"

Existence of similarity measure (based on vector 

representation

) at each layer

Existence of mapping from lower layer representations to higher layer representations

Slide46

Definition

Induced Semantic Structure (ISS)

Only consider relationship (based on the similarity 

measure

) between items within each layer

For any item, consider (numerical) relations with all other items in the same layer

"nearest 

neighbor

"

Ordering information

Numerical information

Slide47

Definition

Induced Semantic Structure (ISS)

Slide48

Definition

Induced Semantic Structure (ISS)

Slide49

Similarity Measures based on ISS

Slide50

Similarity Measures based on ISS

Slide51

Similarity Measures based on ISS

Slide52

What does it do?

Provide new measures

Introduce new ways of semantic overlap (similarity) at a higher level.

Measure semantic overlap (similarity) between semantic spaces

Measure semantic similarity between two semantic spaces

Domain-specific semantic structure

Individualized, Domain-specified semantic structure

Slide53

Domain-Specific Semantic Spaces

Outline of Algorithms and Applications

Xiangen Hu

Slide54

Slide55

Overview of Steps

Basic Text Data (Corpus)

Algorithms

DSSPP

Some Details

Slide56

Basic Text Data (Corpus)

Definitions

SULE: Smallest unit of language entity. 

Most of the time it is at the level of words (letter strings), 

there are cases where the smallest unit may include special combination of words such as phrases such as "

to be or not to be

".

SLE: Smallest language environment.

Most of the time it is paragraph.

Some cases are customized, such as an discussion of a topic, may include several lines of texts (such as dialog in a story).

Some case it may be all n-sentences walking windows.

Slide57

Basic Text Data (Corpus)

Definitions

Global Weight of the SULE: How much they are weighted

Most offen it is weighted as a function of the "inverse document frequency": How often the SULE appear in the document.

Customized Weight of the SULE: Replacing/Calculating  the weight by considering expert's judgements, or glossaries in a given domain.

Local Weight of SLE: How important is the SLE in the corpus.

function of the "size" of SLE

function of the "density" of important SULE (of a given domain).

Slide58

Basic Text Data (Corpus)

Definitions

SULE, SLE

Global Weight of SULE

Local Weight of SLE

Raw Data Structure 

The Matrix: SULE by SLE matrix: 

Size determined by SULE, SLE

Entry of a matrix determined by 

Frequency of SULE in SLE

Global Weight of SULE

Local Weight of SLE

It is sparse

It is a function of the parameters of 

SULE, SLE, Global Weight of SULE, Local Weight of SLE

Slide59

Algorithms

In the S-S-M, each SULE is already uniquely defined by its row: How is the SULE appearing in all SLE in the corpus

Capture all information about each SULE

Capture "first order" relation between any two SULE

Questions/Issues

How to capture "high order" relations?

How many different "high order" relations?

Commonly use Algorithms

Slide60

Algorithms

Commonly use Algorithms

Latent Semantic Analysis

Sparse matrix SVD

Standard already there

Topics Model

Good alternative

Probabilistic Topics Model

Probabilistic LSA

HAL

BEAGLE

Slide61

DSSPP

S

ummary of Semantic Spaces

Semantic of any SULE can be 

extracted from large amounts of linguistic data (in any language), 

represented mathematically as numerical vectors in a high dimensional space encoded/decoded by digital computers 

Semantic similarity between any pair of SULE can be computed when it is represented in

s.1)

Semantics of any SULE can also be represented by its semantic relations to all other SULE that are derived from

s.2)

 

For any body of texts, semantic space is the collection of numerical vectors built from

s.1)

.

Slide62

DSSPP

S

ummary of Semantic Spaces

 

I

nduced semantic structure (ISS)

For any semantic space from

s.4)

, induced semantic structure (ISS) is an alternative semantic representation built from s.2).

For any semantic space with N items, ISS is collection of (T,A,a,w), where T is the target SULE, A is any other SULE, a is the association (computed with the semantic space), and w is the weight of A.

Semantic space can be created by

s.2)

, which is depend on the method of a). ISS of a semantic space created in

d.1)

no longer explicitly depends on the method of

s.1)

Given any two semantic spaces created from

s.1)

with different parameters (original source of text, different dimensionality, or different encoding/decoding algorithms), if they share common items (lexicon), then the two semantic spaces can be compared (numerically and computable) by their perspective ISS.

Slide63

DSSPP

S

ummary of Semantic Spaces

 

I

nduced semantic structure (ISS)

D

omain-Specific Semantic Processing (DSSP)

Semantic spaces can be used in their original form (e.g., vector-based representation)

We always use ISS, instead of their original form.

ISS are in the explicit form of SULE (e.g., (T,A,a,w))

Domain-Specific Semantic Processing can be achieved in the following different methods

Domain-specific processing from start when processing corpus.

Domain-specific processing only at ISS

Limiting ISS to SULE in specific domains

Papers can be written for this.

Define of Domains

Human Generated v.s. Statistical Algorithm

Slide64

DSSPP

S

ummary of Semantic Spaces

 

I

nduced semantic structure (ISS)

D

omain-Specific Semantic Processing (DSSP)

A

pplication of ISS

New semantic similarity measure between SULEs derived from ISS

three types: combinatorial, permutational, quantitative

No longer limited as one scale value.

Papers can be written about these three types of similarity measures.

 Semantic differences between any two ISS (so their original semantic encoding methods)

Papers can be written for this.

 

Slide65

Some Details

Encoding the Sparse Matrix (SULE by SLE)

Semantic Encoding Methods (LSA, etc.)

Management and computation in CLOUD

Streamline processes with configurable parameters

Slide66

A few applications

Slide67

A few applications

Similarity Comparison Made Easy

 

Unified framework

Any semantic space that provide "semantic" measures between terms.

Slide68

A few applications

Similarity Comparison Made Easy

 

Domain-Specific

Semantic relations can be "projected" onto domains.

Slide69

A few applications

Similarity Comparison Made Easy

 

Flexible

Control what to be considered when compute semantic relations.

Slide70

Semantic "Spectrum" Display

Similar to spectrum analysis in physics

Slide71

Semantic "Spectrum" Display

Similar to spectrum analysis in physics

Slide72

Semantic "Spectrum" Display

Similar to spectrum analysis in physics

Slide73

Content Analysis

Example Application in ITS

Slide74

Apply to Social Media Analysis

Slide75

Apply to Social Media Analysis

Virus of the mind

 

 

 

 

 

 

 

The science of memes.

Slide76

Apply to Social Media Analysis

A simple data structure of social media

 

(Senders, Receivers, Environment, Messages, Time) 

A

Google DataStore

Style sparse matrix  

With semantic framework that are

individualized & domain-specific

, the following questions can be explored

 

 

 

Slide77

Apply to Social Media Analysis

With semantic framework that are

individualized & domain-specific

,

Messages can be classified into culturally meaningful categories that are potential

memes

(virus of mind)

How these

virus

spread from mind to mind (sender, receiver).

Who is actively sending (spreading) the viruses?

Who are easily "infected"?

 HUMAN

centered  v. s. 

MEME

centered

 

 

 

Slide78

Apply to Social Media Analysis

CDC

&

"CMC"

Enabling theory and technology

Theory and implementation of 

semantic spaces

 

 

Slide79

Summary