/
Thai Broadcast News Corpus Construction and Evaluation Thai Broadcast News Corpus Construction and Evaluation

Thai Broadcast News Corpus Construction and Evaluation - PowerPoint Presentation

cheryl-pisano
cheryl-pisano . @cheryl-pisano
Follow
421 views
Uploaded On 2015-11-30

Thai Broadcast News Corpus Construction and Evaluation - PPT Presentation

Markpong Jongtaveesataporn Chai Wutiwiwatchai Koji Iwano Sadaoki Furui Tokyo Institute of Technology Japan NECTEC Thailand Background on Thai speech recognition research ID: 209860

corpus speech broadcast news speech corpus news broadcast transcription thai system recognition speaker

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Thai Broadcast News Corpus Construction ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Thai Broadcast News Corpus Construction and Evaluation

Markpong Jongtaveesataporn

Chai

Wutiwiwatchai

Koji

Iwano

Sadaoki

Furui

Tokyo Institute of Technology, Japan

NECTEC, ThailandSlide2

Background on Thai speech recognition research

2

1987

Isolated syllable recognition

1995

Isolated word recognition

Connected sub-word recognition

1999

Small task continuous speech recognition

2003

LVCSR

2005

Broadcast news

transcription system

2007

Difficulty

Thienlikit

et al.

,

2004

Newspaper read-speech recognitionSlide3

Development of Thai Broadcast News Transcription System

Research on broadcast news transcription system for Thai

falls behind

other languages

English:

1995 (Stern, 1997

) Japanese: 1997 (Matsuoka et al.,

1997) Mandarin: 1998

(Guo et al., 1998) Italian:

2000 (Federico et al., 2000

)We need to speed up our research activities to catch up with others3

Targets

Development of Thai broadcast news corpus

Speech corpus: training and testing dataText corpus

: language modelingDevelopment of a prototype system Slide4

Speech corpus

Structure information of broadcast news was annotated

Section, Speaker’s turn, Segments

Property tags were annotated to each speaker’s turn

Speaker’s name, if known

Speaker’s gender: male / femaleSpeaking mode: planned / spontaneousBackground noise: clean / music / noise

Only speech from announcers speaking in the studio was transcribedTranscription and annotation was created by one transcriber and checked by another

transcriber4Slide5

Episode : one broadcast news session

Structure of broadcast news

5

Section

1

: one news topic

Section

1

: one news topic

Section

2

Section

3Slide6

Episode :

one broadcast news session

Section

1

: one news topic

Structure of broadcast news

5

Speaker’s turn : speaker A

Speaker’s turn : speaker A

Speaker’s

turn : speaker B

Speaker’s

turn : speaker ASlide7

Episode : one broadcast news session

Structure of broadcast news

7

Section

1

: one news topic

Speaker’s turn : speaker A

Segment : one sentence or clause

Segment

: one sentence or clause

Segment

: one sentence or clauseSlide8

Speech corpus

Structure information of broadcast news was annotated

Section, Speaker’s turn, Segments

Property tags were annotated to each speaker’s turn

Speaker’s name, if knownSpeaker’s gender: male / femaleSpeaking mode: planned / spontaneous

Background noise: clean / music / noiseOnly speech from announcers speaking in the studio was transcribedTranscription and annotation was created by one transcriber

and checked by another transcriber8Slide9

Episode : one broadcast news session

Example of structure information

9

Section

1

:

Speaker’s turn :

Segment :

sentence A

Segment

: sentence B

Segment

: sentence C

Sports

Mr. A, male, p

lanned speech, c

lean speechSlide10

Speech corpus

Structure information of broadcast news was annotated

Section, Speaker’s turn, Segments

Property tags were annotated to each speaker’s turn

Speaker’s name, if knownSpeaker’s gender: male / femaleSpeaking mode: planned / spontaneous

Background noise: clean / music / noiseOnly speech from announcers speaking in the studio was transcribedTranscription and annotation was created by one transcriber and checked by another transcriber

10Slide11

Text corpus

No structure information was annotated

Additional information

Speaking mode: planned / spontaneous

11Slide12

Problems of Thai transcription text

No space between words

Definition of word is very ambiguous

No good morphological analyzer

Difficulties in transcription and checking processManually word-segmented transcription was made

Instruction was created for transcribersAutomatically segmented transcription

12Future targetSlide13

Broadcast news collection

News programs from one public TV station in Thailand were recorded

Total of

105

news episodesSpeech corpus :

35 news episodes 17 hoursText corpus :

70 news episodes13Slide14

Analysis of speech corpus

14Slide15

Information of speech & text corpora

Attribute

Speech corpus

Text corpus

No.

of sentences

13k

32kNo. of words

224k

573kNo. of unique words

10k14k

No. of phonemes

899k-

No. of speakers8 female,

4 male

-15Slide16

Data used in experiments

Test set data

Randomly selected from the speech corpus

3,000

utterances

Acoustic model training data for the baseline systemPhonetically balanced sentence speech corporaLOTUS (Kasuriya et al.,

2003) and the corpus developed internallyRead speech corpora40.3 hours (

68 male and 68 female)Acoustic model adaptation data

Selected from the speech corpusNo overlap between adaptation data and test set dataLanguage model training dataText corpus + transcript from speech corpus excluded test set

16Slide17

Experimental condition

Acoustic model

Gender-dependent acoustic model

12

MFCCs, delta, and delta energy

Triphones, 1000 tied-states,

8 Gaussian mixturesLanguage modelTri-gramsDictionary size: about

18k wordsTITech WFST speech recognition system (Dixon et al., 2007) was used as a speech decoder

17Slide18

Acoustic model adaptation

Supervised adaptation using MLLR

F-condition adaptation

F

0 : clean, planned F

1 : clean, spontaneous F3 : music noise F

4 : other noiseAdaptation data: 200

utterances regardless of speaker randomly selected from the speech corpusSpeaker adaptationAdaptation data: 200 utterances regardless of F-condition randomly selected from the speech corpus

18Slide19

WER results

19

Speaker adaptation yielded

better WER

F-condition

Proportion

Time

#words

F0

35.3%

17160F1

1.0%

629

F314.0%

7882

F449.7%

27542Slide20

Discussion

High WER

Mismatch recording condition

The speech corpus was only used as testing and adaptation data

Small text corpus

Inefficient language model

20Slide21

Conclusion

Construction of the first Thai broadcast news corpus and overview of the corpus analysis was presented

Speech corpus was annotated with structure information which is useful for further research purpose

An LVCSR system was setup and tested with the corpus

21Slide22

Future work

Applying our Thai language modeling technique (Jongtaveesataporn et al.,

2007

)

Compound pseudo-morpheme (CPM) unitPseudo-morpheme error rate (F

0 condition)Manually-segmented word unit system: 20.5%

CPM unit system: 19.9%Improving language model by using newspaper textCollaboration with NECTEC: additional

50 hours of speech corpus22Slide23

Thank you

23Slide24

Thank you

24Slide25

Thank you

25Slide26

Background

26

1987

Isolated syllable recognition

1995

Isolated word recognition

Connected sub-word recognition

1999

Small task continuous speech recognition

2003

LVCSR

2005

Broadcast news LVCSR

2007

Difficulty

Thienlikit

,

2004

Newspaper read-speech recognitionSlide27

Development of Thai Broadcast News LVCSR System

Development of an LVCSR system requires speech and text corpora

Existing speech corpora for Thai LVCSR research

NECTEC-ATR

LOTUS (NECTEC)GlobalPhone (CMU)

27

Newspaper read-speech

Development of Thai broadcast news corpus

Speech corpus: training and testing data

Text corpus: language modeling

Development of a prototype of LVCSR system Slide28

Experiments & Developed corpora

Speech corpus

The size of the speech corpus is still rather small

It was used in three ways

Test dataAdaptation dataA part of transcription text was used for training LM

Text corpusIt was used for training LM28Slide29

Perplexity & OOV rates

F-condition

Perplexity

OOV rate

Male

Female

MaleFemale

F0

107.5

106.90.9

0.8

F1

126.4100.1

0.9

0.6F3

145.2100.0

0.7

0.9

F4

141.6

157.6

1.5

1.9

Overall

126.9

125.6

1.2

1.3

29Slide30

Transcription process

Text corpus transcribing

7 persons

Guideline

30

Speech corpus transcribing

4 persons

Speech corpus checking

2 persons

Lexical entries checking

1 person

Speech corpus

Lexical entries checking

1 person

Text

corpusSlide31

Speech corpus

Transcription and annotation of about

17

hours of TV broadcast news

Tool: “Transcriber” (Barras et al.,

2001)Additional informationspeaker information: name, genderspeaking mode: planned/spontaneous speechSpeech from announcers speaking in the studio

31Slide32

Transcription conventions

Guideline for the transcription process

Segment segmentation

Word segmentation

Repeating wordThai/English abbreviationNumber entity

Special tags32Slide33

Introduction

Thai speech processing research in

TokyoTech

Dialogue system

[Whittiwiwattchai

, 2003]LVCSR systemDictation system [Tianlikid,2005]Broadcast news recognition system

33Slide34

Overview

Introduction

Corpus description

Recording and transcription processes

Corpus evaluationConclusion

34Slide35

Thai language corpora

Large language corpora are crucial to a state-of-the-art natural language processing system

Thai speech resources for speech processing

NECTEC-ATR

LOTUS (NECTEC)GlobalPhone (CMU)

TSynC-1 (NECTEC)

35

Newspaper read-speech

Unit-selection speech synthesisSlide36

WER Result

F-condition

Time proportion

WER (%)

Male

Female

F0

28.1%

44.4

40.8

F11.5%

62.4

60.2

F3

11.5%82.2

72.4F4

58.9%

54.9

57.5

Overall

100%

56.8

45.5

36Slide37

Text corpus

Text transcribed from

35

hours of TV broadcast news

Additional information

Speaking mode: planned/spontaneous

37Slide38

Transcription conventions (1)

Sentence segmentation

No sentence marker in Thai language

Ambiguous

Grammatically, there are

3 types of sentenceSimple sentenceCompound sentenceComplex sentenceSentence was defined as a simple sentence or clause with the help of delimited breaths

38Composed from several of clauses or simple sentencesSlide39

Transcription conventions (2)

Word segmentation

No word boundary marker in Thai language

Lead to difficulties in transcription and data checking processes

Too ambiguous to define all rulesA few rules of simple segmentation patterns were defined

Undefined patterns were left to the decision of transcribers39Slide40

Transcription conventions (3)

Repeating word

Thai/English abbreviation

Number entity

Special tagsDisfluencies, filled-pauses, exclamationsForeign words

Some other events: uncertainly transcribed part, etc.40Slide41

Recorded programs

News programs from one public TV station in Thailand was recorded

Total of

105

news episodesSpeech corpus

35 news episodesAbout 17 hours of speech dataText corpus:

70 news episodes41