/
Modeling infant word segmentation: Another example of discovery fueled by CHILDES Modeling infant word segmentation: Another example of discovery fueled by CHILDES

Modeling infant word segmentation: Another example of discovery fueled by CHILDES - PowerPoint Presentation

araquant
araquant . @araquant
Follow
351 views
Uploaded On 2020-07-01

Modeling infant word segmentation: Another example of discovery fueled by CHILDES - PPT Presentation

Alejandrina Cristia Laboratoire de Sciences Cognitives et Psycholinguistique Language Emergence Competition Usage and Analyses 20190606 2 No overt amp unambiguous wordmorpheme boundaries in the input ID: 792075

directed amp word bilingual amp directed bilingual word monolingual child segmentation adult register english input corpus models infant 2019

Share:

Link:

Embed:

Download Presentation from below link

Download The PPT/PDF document "Modeling infant word segmentation: Anoth..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Modeling infant word segmentation: Another example of discovery fueled by CHILDES

Alejandrina CristiaLaboratoire de Sciences Cognitives et Psycholinguistique@Language Emergence: Competition, Usage, and Analyses, 2019-06-06

Slide2

2

No

overt & unambiguous word/morpheme boundaries in the input…

“no silences”

Kuhl 2004

Slide3

Tincoff

&

Jusczyk

2012;

Bergelson

&

Swingle

y 2012;

Ngon

et al. 2014

Kuhl 2004

“no silences”

3

yet

by the end of the first

year

, infants know

some

words

/

morphemes

‘Feet’ ‘mommy’ ‘baby’

alldone’ ‘tobed’

Slide4

How to study segmentability?

mommy talking

… cute … something shiny go by?

Let’s just get to the facts.

Slide5

Today’s menu

A methodology for studying word form segmentation using modelsSegmentability differences for child-directed versus adult-directed register (in French

)… bilingual versus monolingual settings (English, Spanish, & Catalan)Implications for infant studies

Slide6

Today’s menu

A methodology for studying word form segmentation using modelsSegmentability differences for child-directed versus adult-directed register (in French

)… bilingual versus monolingual settings (English, Spanish, & Catalan)Implications for infant studies

Slide7

Input representation

Acoustic+ realistic…… provided representations match babies’few appropriate corpora (natural discourse & good quality audio)

only one (reproducible) algorithmSymbolic (‘Phonological text’)+ lots of corpora can be used+ lots of algorithms proposed + algorithms represent a wide range of strategiesassumes babies represent input abstract, with zero errors

Slide8

Example

*MOT: look at the doggie

lUk At D2 dOgi

Phonologizel U k A t D 2 d O g i

Remove

word

boundaries

&

unitize

lU

kAt

D2

dO

gi

Segment

with

some

algorithm

Token F-score =

2* (

Precision

*

Recall

)

Precision

+

Recall

Evaluate

Precision

= 1 of the 5

words

found

were

words

in the input = .2

Recall

=

1

of the 4

words

in the input

was

recovered

= .25

Note -- one can also unitize at the syllable level:

lUk

At D2

dO

gi

(input)

lUk

At

D2

dO

gi

(output

)

Slide9

Goal is to “cut” using local cues

2. Sub-lexical Package: wordseg.readthedocs.io Preprint: https://osf.io/nx49h/

Bernard et al. 2019 Beh Res MethTransitional Probabilities (TP)

TP_abs

TP_rel

x Absolute/Relative

threshold

Goal is to learn a set of “minimal

recombinable

units”

Adaptor Grammar

(AG)

Phonotactics

from Utterances Determine Distributional Lexical Elements

(Puddle

)

3.

Lexical

Simplest strategies

1.

Baseline

Every sentence is a word (

SentBase

)

Every syllable is a word (

SyllBase

)

Johnson +

2007;

Monaghan + 2010

Diphone

-Based Segmentation

(

DiBS

)

Example algorithms

Daland

+ 2009;

Saksida

+ 2016

Lignos

2012

Slide10

The process in WordSeg

Package: wordseg.readthedocs.io Preprint:

https://osf.io/nx49h/Bernard et al. 2019 Beh Res Meth

Slide11

Sample results:precision, recall, & F-score are correlated

Providence corpus (Demuth, Culbertson, & Alter, 2006) on CHILDES

Slide12

Sample results:Effects

of algorithm and input represent-ation

Naima, in Providence corpus (Demuth, Culbertson, & Alter, 2006) on CHILDES

Slide13

Today’s menu

A methodology for studying word form segmentation using modelsSegmentability differences for child-directed versus adult-directed register (in

French)… bilingual versus monolingual settings (English, Spanish, & Catalan)Implications for infant studies

Slide14

Why look at register?

In child-directed speech, probably…More utterances consist of a single word (+ all models)Utterances are overall shorter in length (+ all models)

*MOT: Attends! *MOT: Ouaistuvastemettreausoleilpourtesecherlescheveux!

Slide15

Why look at register?

In child-directed speech, probably…More utterances consist of a single word (+ all models)Utterances are overall shorter in length (+ all models)Utterances are more repetitious (+? lexical models)

*MOT: coucoucoucousitufaisaisdespetitssourirestoi.*MOT:

tumefaisdespetitssouriresXXXcoucoumongrand. *MOT: coucoutumefais

dessouriresoupas.

Slide16

(Ask me about crosslinguistic

extensions if curious!)JapaneseRiken corpusCollected in the lab

 adult-directed speech is with experimenterEnglishWinnipeg corpusCollected with child-worn device worn whole day  adult-directed speech is among caregivers

FrenchLENA-Lyon corpus (LeNormand et al. HomeBank)Collected with child-worn device worn whole day  adult-directed speech is among caregivers

Bogdan

Ludusan

Georgia

Loukatou

Slide17

on Le Normand, Canault

, & Van Thai’s LENA-Lyon corpus

French“wild” ADSLoukatou + 2019 Proc Cog Sci

Slide18

CDS-ADS: Conclusions

Overall trend for better performance for child- than adult-directed speechBut:reversed for some algorithmseffect of register < 15%(in the best controlled cases, 2%)

Slide19

Today’s menu

A methodology for studying word form segmentation using modelsSegmentability differences for child-directed versus adult-directed register (in French

)… bilingual versus monolingual settings (English, Spanish, & Catalan)Implications for infant studies

Slide20

‘Feet’ ‘mommy’ ‘baby’

‘alldone’ ‘tobed’

Bilinguals

need to:

Learn

words

,

like

monolinguals

do, but

in

two

languages

Overall less input

in each language

‘pié’ ‘mamá’ ‘bebé’ …

Hoff

+

2012

20

Why study word segmentation in a bilingual setting?

Fibla

& Cristia (submitted very soon, I hope)

Slide21

Questions & predictions

Are segmentation strategies equally successful when applied to bilingual and monolingual corpora? → Measure the performance of previously studied segmentation algorithms in a controlled

monolingual versus bilingual corpus. Possible outcomes:

The confusion hypothesis: variable and inconsistent input→ Poorer performance for the bilingual than for the monolingual The resistant hypothesis:

(if

switching

only

at

utterance

edges

) local statistical

and lexical are

still

reliable

→ Similar performance for the bilingual and the monolingual

21

Fibla

& Cristia (submitted very soon, I hope)

Slide22

Creating bilingual corpora

Slide23

Slide24

Slide25

Three cases of bilingual

< monolingual

Slide26

Three cases of bilingual

< monolingual

11 cases of bilingual ‘in between’ monolingual

Slide27

Today’s menu

A methodology for studying word form segmentation using modelsSegmentability differences for child-directed versus adult-directed register (in

French)… bilingual versus monolingual settings (English, Spanish, & Catalan)Implications for infant studies

Slide28

Effects of algorithm and input represent-

ation

size of algorithm x level effect = 40-60%?Cristia + 2019 Open Mind

Slide29

Effect

of registeron LENA-Lyon corpus

Size of register effect < 10%?Loukatou + 2019

Proc Cog Sci

Slide30

Effect

of bilingualism

Fibla

& Cristia (submitted very soon, I hope)

Size of bilingualism effect

~

0%?

Slide31

Today’s menu

A methodology for studying word form segmentation using modelsSegmentability differences as a function of language properties… child-directed versus adult-directed register

(in Japanese, English, & French)… bilingual versus monolingual settings (English, Spanish, & Catalan)Implications for infant studies

Slide32

What may babies be doing? Using CDI results & frequency effects

Larsen + 2017 Interspeech & in prep

Slide33

What may babies be doing? Using CDI results & frequency effects

Larsen + 2017 Interspeech & in prep

Coefficient of determination R2=.1

Slide34

Slide35

Slide36

Slide37

phoneme-based

models

Slide38

syllable-based

models

phoneme-based models

Slide39

Cut only at utterance edges

frequency of words in isolation

Slide40

To be continued…

Slide41

Thanks to...

Families who agree to be recorded & for their data to be

sharedResearchers who record them and share on TalkBankTalkBank ~ Brian MacWhinney

&

you!

Slide42

Japanese“lab” ADS

on Reiko Mazuka’s RIKEN corpus

much of this is in Ludusan et al. 2017 ACL(now working on journal paper with more material)

Slide43

English

“wild” ADSon Melanie Soderstrom’s Winnipeg corpus

Cristia + 2019 Open Mind