Alexander Fraser amp Luisa Berlanda frasercisunimuenchende CIS LudwigMaximiliansUniversität München Computational Morphology and Electronic Dictionaries SoSe 2016 20162505 SFST ID: 637206
Download Presentation The PPT/PDF document "Finite State Morphology" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Finite State Morphology
Alexander Fraser & Luisa
Berlanda
fraser@cis.uni-muenchen.de
CIS, Ludwig-Maximilians-Universität München
Computational Morphology and Electronic Dictionaries
SoSe
2016
2016-25-05Slide2
SFST
programming
language
for
developing
finite-
state
transducers
compiler
which
translates
programs
to
transducers
tools
for
applying
transducers
printing
transducers
comparing
transducersSlide3
SFST
Example
Session
> echo "
Hello
\ World\!" > test.fst
storing
a
small
test
program
>
fst-compiler
test.fst
test.a
calling
the
compiler
test.fst: 2
>
fst-mor
test.a
interactive
transducer
usage
reading
transducer
...
transducer
is
loaded
finished
.
analyze
>
Hello
World!
input
Hello
World!
recognised
analyze
>
Hello
World
another
input
no
result
for
Hello
World
not
recognised
analyze
> q
terminate
programSlide4
Transducer
Variables
$
Vroot
$ = walk | talk |
bark
%
list
of
verbs
with
regular
inflection
$
Vinfl
$ = <V>:<> (\
%
regular
verbal
inflection
[<
inf
><n3s>]:<> |\
{<3s>}:{s} |\
{<
ger
>}:{
ing
} |\
{<
past
>}:{
ed
})
$
Nroot
$ = hat |
head
|
trick
%
list
of
nouns
with
regular
inflection
$
Ninfl
$ = <N>:<> (\
%
regular
nominal
inflection
{<
sg
>}:{} |\
{<
pl
>}:{s})
$
Vroot
$ $
Vinfl
$ | $
Nroot
$ $
Ninfl
$
%
combine
stems
and
inflectional
endingsSlide5
Homework
Write a
pipeline
that
maps
all
letters
to
lowercase
and
orders
them
backwards
Family Huber
has
three
children
.
Their
first
child
is
called
Mia,
the
next
one
Toni
and
the
last
one
Pia.
Family Band
has
three
children
as
well
. Michael, Paul
and
Pia.
Write a
program
,
that
can
tell
us
the
following
details
about
the
children
:
which
family
does
he/
she
belong
to
,
is
it
a
son
or
a
daugter
, was he/
she
the
first
,
second
or
third
child
.
Output
format
:
<
family
><
family
name
><
first
,
second
child
><
gender
>Slide6
Solution
Write a
pipeline
that
maps
all
letters
to
lowercase
and
orders
them
backwards
[
a-z]:[A-Z]* || [ZYXWVUTSRQPONMLKJIHFECBA]:[A-Z]*Slide7
Solution
{<
family
><
huber
>}:{} ({<1><
daughter
>}:{
mia
} |{<1><
son
>}:{
toni
} |{<2><
daughter
>}:{
pia
}) |\
{<
family
><band>}:{} ({<1><
son
>}:{
michael
} |{<2><
son
>}:{
paul
} |{<1><
daughter
>}:{
pia
})Slide8
Lexicon
Files
$
Vroot
$ = “verb.lex“
$
Nroot
$ = “noun.lex“
The
command
“
filename
“
reads
the
respective
file
line
by
line
and
forms
the
disjunction
of
all
lines
.
Only
the
symbols
: \ < > %
are
treated
as
operators
. Slide9
Lexicon
Files
$
Vroot
$ =
“verb.lex“
$
Vinfl
$ = <V>:<> (\
[<
inf
><n3s>]:<> |\
{<3s>}:{s} |\
{<
ger
>}:{
ing
} |\
{<
past
>}:{
ed
})
$
Nroot
$ =
“noun.lex“
$
Ninfl
$ = <N>:<> (\
{<
sg
>}:{} |\
{<
pl
>}:{s})
$
Vroot
$ $
Vinfl
$ | $
Nroot
$ $
Ninfl
$Slide10
Symbol Set Variables
#
cons
# =
bcdfghjklmnpqrstvwxzß
#CONS# =
BCDFGHJKLMNPQRSTVWXZß
#
Cons
# = #
cons
# #CONS#
#
vowel
# =
aeiouäöü
#VOWEL# = AEIOUÄÖÜ
#
Vowel
# = #
vowel
# #VOWEL#
#
letter
# = #
vowel
# #
cons
#
#LETTER# = #VOWEL# #CONS#
[#LETTER# #
letter
#]:[#
letter
# #LETTER#]*
What
would
you
get
for
Hallo
and
Ruß
?Slide11
Solution
What
would
you
get
for
Hallo
and
Ruß
?
hALLO
rUßSlide12
Alphabet
The
alphabet
defines
the
set
of
available
symbol
pairs
which
is
relevant
for
the
wildcard
symbol
„.“,
the
negation
operator
„!“
and
the
replacement
operators
(
introduced
later
).
ALPHABET = [A-Z] [A-Z]:[a-z]
The
expression
on
the
right-hand
side
is
compiled
into
a
transducer
and
the
set
of
character
pairs
is
extracted
from
its
transitions
.
A:.
is
here
identical
to
A:[Aa]
.
is
identical
to
.:.
[^A-Z]
all
characters
appearing
in
the
alphabet
which
are
not
uppercase
letters
, i.e.
the
set
of
lowercase
letters
.
.*
maps
mixed
letter
sequences
to
all
uppercase
letter
sequences
(
analysis
)Slide13
Alphabet
fullform.lex:
house
<N><
sg
>
house
<>:s<N><
pl
>
walk<V><
inf
>
walk<>:i<>:n<>:g<V><
ger
>
emorph.fst:
ALPHABET = [a-
zA
-Z] [<V><N><
sg
><
pl
><
ger
><
inf
>]:<>
“
fullform.lex
“ || .*
reads
the
lexicon
and
deletes
the
grammatical
markers
on
the
surface
side
.Slide14
Orthographic
Rules
Replace
operator
:
t ^─> (l _ r)
applies
the
mapping
implemented
in
the
transducer
t in
the
left
context
l
and
right
context
r. l
and
r
are
automata
(i.e.
transducers
mapping
strings
to
themselves
.)
e-
elision
:
bake<V>
ing
→
baking
$Morph$ = bake<V>{<
ger
>}:{
ing
}
ALPHABET = [A-
Za
-z] <V>
$e-
elision
$ =
e:<> ^─> (_ _<V> [ei] )
%
delete
e
before
<V>e
or
<V>i
$Morph$ = $Morph$ || $e-
elision
$
%
apply
the
rule
ALPHABET = [A-
Za
-z] <V>:<>
%
delete
the
<V>
marker
$Morph$ || .*Slide15
Orthographic
Rules
What
does
this
program
?
$
Morph$ = bake<V>{<
ger
>}:{
ing
} |
crash
<V>{<3><
sg
>}:s | happy<ADJ>{<
comp
>}:{er} |
fly
<V>{<3><
sg
>}:s
ALPHABET = [A-
Za
-z] <V><N><ADJ>
$e-
elision
$ =
e:<> ^-> (__<V> [ei])
$e-
epenthesis
$ =
(<V> <>:e) ^-> ([
sh
]__ s)
$y2i$ =
y:i ^-> ([^ae] __[<ADJ><V>] e)
$y2ie$ =
y:{
ie
} ^-> ([^ae] __ [<V><N>] s)
$Morph$ = $Morph$ || $e-
elision
$ || $e-
epenthesis
$ || $y2i$ || $y2ie$
ALPHABET = [A-
Za
-z] [<V><N><ADJ>]:<>
$Morph$ || .*Slide16
Solution
e-epenthesis:
crash
<V>s →
crashes
y
to
i:
happy<ADJ>er→
happier
fly
<V>s → flies
o
nly
in
the
analyze
mode
!Slide17
Agreement Variables
$
Morph$ =
big
<ADJ>{<
comp
>}:{er} |
fat
<ADJ>{<
comp
>}:{er}
$
cons
$ = [
bcdfghjklmnpqrstvwxz
]
$
vowel
$ = [
aeiouy
]
#=g# =
bdglmnpt
$g$ = [#=g#] <>:[#=g#]
ALPHABET = [A-
Za
-z] <V><N><ADJ>
$
gemination
$ =
$g$ ^-> ($
cons
$ $
vowel
$ __<ADJ> e)
$Morph$ = $Morph$ || $
gemination
$
ALPHABET = [A-
Za
-z] [<V><N><ADJ>]:<>
$Morph$ || .*Slide18
Solution
analyze
>
bigger
big
<ADJ
><
comp
>
analyze
>
fater
no
result
for
fater
analyze
>
fatter
fat
<ADJ
><
comp
>Slide19
Thank
you
for
your
attention