/
Finite State  Morphology Finite State  Morphology

Finite State Morphology - PowerPoint Presentation

trish-goza
trish-goza . @trish-goza
Follow
363 views
Uploaded On 2018-02-26

Finite State Morphology - PPT Presentation

Alexander Fraser amp Luisa Berlanda frasercisunimuenchende CIS LudwigMaximiliansUniversität München Computational Morphology and Electronic Dictionaries SoSe 2016 20162505 SFST ID: 637206

morph alphabet cons vowel alphabet morph vowel cons letter family analyze ger transducer fst test transducers nroot ing vroot

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Finite State Morphology" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Finite State Morphology

Alexander Fraser & Luisa

Berlanda

fraser@cis.uni-muenchen.de

CIS, Ludwig-Maximilians-Universität München

Computational Morphology and Electronic Dictionaries

SoSe

2016

2016-25-05Slide2

SFST

programming

language

for

developing

finite-

state

transducers

compiler

which

translates

programs

to

transducers

tools

for

applying

transducers

printing

transducers

comparing

transducersSlide3

SFST

Example

Session

> echo "

Hello

\ World\!" > test.fst

storing

a

small

test

program

>

fst-compiler

test.fst

test.a

calling

the

compiler

test.fst: 2

>

fst-mor

test.a

interactive

transducer

usage

reading

transducer

...

transducer

is

loaded

finished

.

analyze

>

Hello

World!

input

Hello

World!

recognised

analyze

>

Hello

World

another

input

no

result

for

Hello

World

not

recognised

analyze

> q

terminate

programSlide4

Transducer

Variables

$

Vroot

$ = walk | talk |

bark

%

list

of

verbs

with

regular

inflection

$

Vinfl

$ = <V>:<> (\

%

regular

verbal

inflection

[<

inf

><n3s>]:<> |\

{<3s>}:{s} |\

{<

ger

>}:{

ing

} |\

{<

past

>}:{

ed

})

$

Nroot

$ = hat |

head

|

trick

%

list

of

nouns

with

regular

inflection

$

Ninfl

$ = <N>:<> (\

%

regular

nominal

inflection

{<

sg

>}:{} |\

{<

pl

>}:{s})

$

Vroot

$ $

Vinfl

$ | $

Nroot

$ $

Ninfl

$

%

combine

stems

and

inflectional

endingsSlide5

Homework

Write a

pipeline

that

maps

all

letters

to

lowercase

and

orders

them

backwards

Family Huber

has

three

children

.

Their

first

child

is

called

Mia,

the

next

one

Toni

and

the

last

one

Pia.

Family Band

has

three

children

as

well

. Michael, Paul

and

Pia.

Write a

program

,

that

can

tell

us

the

following

details

about

the

children

:

which

family

does

he/

she

belong

to

,

is

it

a

son

or

a

daugter

, was he/

she

the

first

,

second

or

third

child

.

Output

format

:

<

family

><

family

name

><

first

,

second

child

><

gender

>Slide6

Solution

Write a

pipeline

that

maps

all

letters

to

lowercase

and

orders

them

backwards

[

a-z]:[A-Z]* || [ZYXWVUTSRQPONMLKJIHFECBA]:[A-Z]*Slide7

Solution

{<

family

><

huber

>}:{} ({<1><

daughter

>}:{

mia

} |{<1><

son

>}:{

toni

} |{<2><

daughter

>}:{

pia

}) |\

{<

family

><band>}:{} ({<1><

son

>}:{

michael

} |{<2><

son

>}:{

paul

} |{<1><

daughter

>}:{

pia

})Slide8

Lexicon

Files

$

Vroot

$ = “verb.lex“

$

Nroot

$ = “noun.lex“

The

command

filename

reads

the

respective

file

line

by

line

and

forms

the

disjunction

of

all

lines

.

Only

the

symbols

: \ < > %

are

treated

as

operators

. Slide9

Lexicon

Files

$

Vroot

$ =

“verb.lex“

$

Vinfl

$ = <V>:<> (\

[<

inf

><n3s>]:<> |\

{<3s>}:{s} |\

{<

ger

>}:{

ing

} |\

{<

past

>}:{

ed

})

$

Nroot

$ =

“noun.lex“

$

Ninfl

$ = <N>:<> (\

{<

sg

>}:{} |\

{<

pl

>}:{s})

$

Vroot

$ $

Vinfl

$ | $

Nroot

$ $

Ninfl

$Slide10

Symbol Set Variables

#

cons

# =

bcdfghjklmnpqrstvwxzß

#CONS# =

BCDFGHJKLMNPQRSTVWXZß

#

Cons

# = #

cons

# #CONS#

#

vowel

# =

aeiouäöü

#VOWEL# = AEIOUÄÖÜ

#

Vowel

# = #

vowel

# #VOWEL#

#

letter

# = #

vowel

# #

cons

#

#LETTER# = #VOWEL# #CONS#

[#LETTER# #

letter

#]:[#

letter

# #LETTER#]*

What

would

you

get

for

Hallo

and

Ruß

?Slide11

Solution

What

would

you

get

for

Hallo

and

Ruß

?

hALLO

rUßSlide12

Alphabet

The

alphabet

defines

the

set

of

available

symbol

pairs

which

is

relevant

for

the

wildcard

symbol

„.“,

the

negation

operator

„!“

and

the

replacement

operators

(

introduced

later

).

ALPHABET = [A-Z] [A-Z]:[a-z]

The

expression

on

the

right-hand

side

is

compiled

into

a

transducer

and

the

set

of

character

pairs

is

extracted

from

its

transitions

.

A:.

is

here

identical

to

A:[Aa]

.

is

identical

to

.:.

[^A-Z]

all

characters

appearing

in

the

alphabet

which

are

not

uppercase

letters

, i.e.

the

set

of

lowercase

letters

.

.*

maps

mixed

letter

sequences

to

all

uppercase

letter

sequences

(

analysis

)Slide13

Alphabet

fullform.lex:

house

<N><

sg

>

house

<>:s<N><

pl

>

walk<V><

inf

>

walk<>:i<>:n<>:g<V><

ger

>

emorph.fst:

ALPHABET = [a-

zA

-Z] [<V><N><

sg

><

pl

><

ger

><

inf

>]:<>

fullform.lex

“ || .*

reads

the

lexicon

and

deletes

the

grammatical

markers

on

the

surface

side

.Slide14

Orthographic

Rules

Replace

operator

:

t ^─> (l _ r)

applies

the

mapping

implemented

in

the

transducer

t in

the

left

context

l

and

right

context

r. l

and

r

are

automata

(i.e.

transducers

mapping

strings

to

themselves

.)

e-

elision

:

bake<V>

ing

baking

$Morph$ = bake<V>{<

ger

>}:{

ing

}

ALPHABET = [A-

Za

-z] <V>

$e-

elision

$ =

e:<> ^─> (_ _<V> [ei] )

%

delete

e

before

<V>e

or

<V>i

$Morph$ = $Morph$ || $e-

elision

$

%

apply

the

rule

ALPHABET = [A-

Za

-z] <V>:<>

%

delete

the

<V>

marker

$Morph$ || .*Slide15

Orthographic

Rules

What

does

this

program

?

$

Morph$ = bake<V>{<

ger

>}:{

ing

} |

crash

<V>{<3><

sg

>}:s | happy<ADJ>{<

comp

>}:{er} |

fly

<V>{<3><

sg

>}:s

ALPHABET = [A-

Za

-z] <V><N><ADJ>

$e-

elision

$ =

e:<> ^-> (__<V> [ei])

$e-

epenthesis

$ =

(<V> <>:e) ^-> ([

sh

]__ s)

$y2i$ =

y:i ^-> ([^ae] __[<ADJ><V>] e)

$y2ie$ =

y:{

ie

} ^-> ([^ae] __ [<V><N>] s)

$Morph$ = $Morph$ || $e-

elision

$ || $e-

epenthesis

$ || $y2i$ || $y2ie$

ALPHABET = [A-

Za

-z] [<V><N><ADJ>]:<>

$Morph$ || .*Slide16

Solution

e-epenthesis:

crash

<V>s →

crashes

y

to

i:

happy<ADJ>er→

happier

fly

<V>s → flies

o

nly

in

the

analyze

mode

!Slide17

Agreement Variables

$

Morph$ =

big

<ADJ>{<

comp

>}:{er} |

fat

<ADJ>{<

comp

>}:{er}

$

cons

$ = [

bcdfghjklmnpqrstvwxz

]

$

vowel

$ = [

aeiouy

]

#=g# =

bdglmnpt

$g$ = [#=g#] <>:[#=g#]

ALPHABET = [A-

Za

-z] <V><N><ADJ>

$

gemination

$ =

$g$ ^-> ($

cons

$ $

vowel

$ __<ADJ> e)

$Morph$ = $Morph$ || $

gemination

$

ALPHABET = [A-

Za

-z] [<V><N><ADJ>]:<>

$Morph$ || .*Slide18

Solution

analyze

>

bigger

big

<ADJ

><

comp

>

analyze

>

fater

no

result

for

fater

analyze

>

fatter

fat

<ADJ

><

comp

>Slide19

Thank

you

for

your

attention