/
ICASSP2013 ICASSP2013

ICASSP2013 - PowerPoint Presentation

faustina-dinatale
faustina-dinatale . @faustina-dinatale
Follow
387 views
Uploaded On 2016-06-01

ICASSP2013 - PPT Presentation

SLPL1 Human Spoken Language Acquisition and Learning Hsiao Tsung Hung Outline SLPL11 FEEDBACK UTTERANCES FOR COMPUTERAIDED LANGUAGE LEARNING USING ACCENT REDUCTION AND VOICE CONVERSION METHOD ID: 344331

pronunciation language learning eps language pronunciation eps learning dialogue university taiwan computer phoneme assisted discovery framework learner unsupervised units

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "ICASSP2013" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

ICASSP2013SLP-L1 Human Spoken Language Acquisition and Learning

Hsiao-

Tsung

HungSlide2

OutlineSLP-L1.1: FEEDBACK UTTERANCES FOR COMPUTER-AIDED LANGUAGE LEARNING USING ACCENT REDUCTION AND VOICE CONVERSION METHODSixuan

Zhao,

Soo

Ngee Koh, Ing Yann Soon, Kang Kwong Luke, Nanyang Technological University, SingaporeSLP-L1.2: A DIALOGUE GAME FRAMEWORK WITH PERSONALIZED TRAINING USING REINFORCEMENT LEARNING FOR COMPUTER-ASSISTED LANGUAGE LEARNINGPei-hao Su, Yow-Bang Wang, Tien-han Yu, Lin-shan Lee, National Taiwan University, TaiwanSLP-L1.3: AUDIOVISUAL SYNTHESIS OF EXAGGERATED SPEECH FOR CORRECTIVE FEEDBACK IN COMPUTER-ASSISTED PRONUNCIATION TRAININGJunhong Zhao, IECAS, China; Hua Yuan, Tsinghua University, China; Wai-Kim Leung, Helen Meng, CUHK, Hong Kong SAR of China; Jia Liu, Tsinghua University, China; Shanhong Xia, IECAS, ChinaSLP-L1.4: A NOVEL DISCRIMINATIVE METHOD FOR PRONUNCIATION QUALITY ASSESSMENTJunbo Zhang, Fuping Pan, Bin Dong, Yonghong Yan, Institute of Acoustics, Chinese Academy of Sciences, ChinaSLP-L1.5: MISPRONUNCIATION DETECTION VIA DYNAMIC TIME WARPING ON DEEP BELIEF NETWORK-BASED POSTERIORGRAMSAnn Lee, Yaodong Zhang, James Glass, Massachusetts Institute of Technology, United StatesSLP-L1.6: TOWARD UNSUPERVISED DISCOVERY OF PRONUNCIATION ERROR PATTERNS USING UNIVERSAL PHONEME POSTERIORGRAM FOR COMPUTER-ASSISTED LANGUAGE LEARNINGYow-Bang Wang, Lin-Shan Lee, National Taiwan University, Taiwan

2Slide3

TOWARD UNSUPERVISED DISCOVERY OF PRONUNCIATION ERROR PATTERNS USINGUNIVERSAL PHONEME POSTERIORGRAM FOR COMPUTER-ASSISTED LANGUAGE LEARNING

Yow-Bang Wang, Lin-Shan Lee

National Taiwan University, Taiwan

3Slide4

Introductionmanual labeling process is very time consumingfor EP detection the need for expertise to define and label EPs may be even more difficult and expensive

Building HMM-based ASR system for each language and acoustic condition can be costly

lack of well annotated corpus

In this paper, we learn the experiences of unsupervised speech pattern discovery, and propose a preliminary framework for automatic discovery of EPs from a corpus of learners’ recordings without relying on expert knowledge.4Slide5

Problem DefinitionHere we assume the task is to discover the EPs for each phoneme given a corpus of learners’ voice.each time we are given a set of acoustic segments corresponding to a specific phoneme, and the goal is to divide this set into several clusters, each of which corresponds to an EP.

5Slide6

Proposed Framework for Unsupervised EP Discovery

6

SAMPA

MFCC39ㄚ=>a=>010…ㄨ=>u=>001…ㄠ=>au=>011…ASTMIC (Mandarin)TIMIT (English)不同精細程度對分群的影響K-means=>已知K群GMM-MDL=>未知

預期可以降

speaker variationSlide7

GMM-MDLMDL: minimum description lengthIdea:

把建立模型視為資料壓縮問題,希望用較少的

bit

即可表現較多資訊objective function:

 

7Slide8

Experimental Results8

對每個音素分別進行分群Slide9

Corpus, EP definition and annotation278 learners30 sentences X 6 ~ 24 charactersThere is a total of 39 canonical Mandarin phoneme units, and 152 EPs were summarized by language

teachers based on their expert knowledge and pedagogical experiences

The definition of EPs includes not only phoneme level substitution, but also insertion and deletion, and is not limited to any specific corpus including the one mentioned above9Slide10

Experimental ResultsK-means with known number of EPs

10Slide11

Experimental ResultsGMM-MDL with automatically estimated number of EPs

11

Note both UPP

and log-UPP yielded 1 to 3 more automatically derived EPs than human definedEPs in average.In contrast MFCC resulted in less number of clusters.Slide12

A DIALOGUE GAME FRAMEWORK WITH PERSONALIZED TRAINING USING REINFORCEMENT LEARNING FOR COMPUTER-ASSISTED LANGUAGE LEARNING

Pei-

hao

Su, Yow-Bang Wang, Tien-han Yu, Lin-shan LeeNational Taiwan University, Taiwan12Slide13

IntroductionWe here propose a dialogue game framework for language learning, which combines pronunciation scoring and a statistical dialogue manager based on a tree-structured dialogue script designed by language teachers.Sentences to be learned can be adaptively selected for each learner, based on the pronunciation unit practiced and scores obtained along with the dialogue progress

13Slide14

Markov Decision Process14

State:

Sentence index

quantized percentage of poorly-pronounced unitspredefined thresholdIndices of the worst-pronounced unitsAction: 根據現在的狀態,選取接下來要練習的句子

Reward Function:

More Practiced Needed

Practice completeness

overall objective function:

 

發音不好的音素

 

分數越低的重要性越高,

v

為挑整參數

選定

出現

和平均對話會出現

次數

 

可以練習到的音素

所有的音素Slide15

Learner Simulation From Real Datait is practically infeasible to collect “enough” real dialogue episodes for policy training

, studies have focused on generating simulated users

to interact

with the dialogue managerReal Learner Data278 learners36 different countries30 sentences (6~24 characters)15Slide16

Simulated Learner Creation16

All pronunciation unit considered

( Initial/Finals, Tone)

GMM

US?

JP?

TH?

JP?

Unsupervised

Clustering

Choose one

mixture by mixture weight

Reinforcement

Learning

Policy

(

State

 Action)

Missing valueSlide17

Training Phase:PSV Clustering

Small problem: some units do not appear in the utterance

Treat them as

missing (latent) data latent: A certain variable is never observed.missing at random1771.392.060.1???70.580.266.5???……..

71.3

???

14.1

50.6

43.5

80.2

26.5

20.0

……..

20.3

91.0

???

???

20.5

80.2

46.5

???

……..

Incomplete data

unknown

 Slide18

Training Phase:Reinforcement Learning

使用

Q-Learning

學習預期報酬

Optimal policy

Choose the action with the highest Q value with probability

and the remaining actions with probability

.

 

18

Q=10

Q=9

Q=18

Q’ =

18 +

[ 7 +

10]

 Slide19

EXPERIMENTWe compared the proposed approach with the following polices:Always select the sentence with the most diverse pronunciation units from learner’s practiced units

Always select the sentence with the most count of worst-pronounced units

Cast the above two heuristic policies as two actions in an MDP.

19Slide20

20Slide21

21

Fig. 7. Average scores and overage percentages of pronunciation

units for an example testing simulated learner with random

and proposed policies (v=0,1).

Related Contents


Next Show more