SLPL1 Human Spoken Language Acquisition and Learning Hsiao Tsung Hung Outline SLPL11 FEEDBACK UTTERANCES FOR COMPUTERAIDED LANGUAGE LEARNING USING ACCENT REDUCTION AND VOICE CONVERSION METHOD ID: 344331
Download Presentation The PPT/PDF document "ICASSP2013" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
ICASSP2013SLP-L1 Human Spoken Language Acquisition and Learning
Hsiao-
Tsung
HungSlide2
OutlineSLP-L1.1: FEEDBACK UTTERANCES FOR COMPUTER-AIDED LANGUAGE LEARNING USING ACCENT REDUCTION AND VOICE CONVERSION METHODSixuan
Zhao,
Soo
Ngee Koh, Ing Yann Soon, Kang Kwong Luke, Nanyang Technological University, SingaporeSLP-L1.2: A DIALOGUE GAME FRAMEWORK WITH PERSONALIZED TRAINING USING REINFORCEMENT LEARNING FOR COMPUTER-ASSISTED LANGUAGE LEARNINGPei-hao Su, Yow-Bang Wang, Tien-han Yu, Lin-shan Lee, National Taiwan University, TaiwanSLP-L1.3: AUDIOVISUAL SYNTHESIS OF EXAGGERATED SPEECH FOR CORRECTIVE FEEDBACK IN COMPUTER-ASSISTED PRONUNCIATION TRAININGJunhong Zhao, IECAS, China; Hua Yuan, Tsinghua University, China; Wai-Kim Leung, Helen Meng, CUHK, Hong Kong SAR of China; Jia Liu, Tsinghua University, China; Shanhong Xia, IECAS, ChinaSLP-L1.4: A NOVEL DISCRIMINATIVE METHOD FOR PRONUNCIATION QUALITY ASSESSMENTJunbo Zhang, Fuping Pan, Bin Dong, Yonghong Yan, Institute of Acoustics, Chinese Academy of Sciences, ChinaSLP-L1.5: MISPRONUNCIATION DETECTION VIA DYNAMIC TIME WARPING ON DEEP BELIEF NETWORK-BASED POSTERIORGRAMSAnn Lee, Yaodong Zhang, James Glass, Massachusetts Institute of Technology, United StatesSLP-L1.6: TOWARD UNSUPERVISED DISCOVERY OF PRONUNCIATION ERROR PATTERNS USING UNIVERSAL PHONEME POSTERIORGRAM FOR COMPUTER-ASSISTED LANGUAGE LEARNINGYow-Bang Wang, Lin-Shan Lee, National Taiwan University, Taiwan
2Slide3
TOWARD UNSUPERVISED DISCOVERY OF PRONUNCIATION ERROR PATTERNS USINGUNIVERSAL PHONEME POSTERIORGRAM FOR COMPUTER-ASSISTED LANGUAGE LEARNING
Yow-Bang Wang, Lin-Shan Lee
National Taiwan University, Taiwan
3Slide4
Introductionmanual labeling process is very time consumingfor EP detection the need for expertise to define and label EPs may be even more difficult and expensive
Building HMM-based ASR system for each language and acoustic condition can be costly
lack of well annotated corpus
In this paper, we learn the experiences of unsupervised speech pattern discovery, and propose a preliminary framework for automatic discovery of EPs from a corpus of learners’ recordings without relying on expert knowledge.4Slide5
Problem DefinitionHere we assume the task is to discover the EPs for each phoneme given a corpus of learners’ voice.each time we are given a set of acoustic segments corresponding to a specific phoneme, and the goal is to divide this set into several clusters, each of which corresponds to an EP.
5Slide6
Proposed Framework for Unsupervised EP Discovery
6
SAMPA
MFCC39ㄚ=>a=>010…ㄨ=>u=>001…ㄠ=>au=>011…ASTMIC (Mandarin)TIMIT (English)不同精細程度對分群的影響K-means=>已知K群GMM-MDL=>未知
預期可以降
低
speaker variationSlide7
GMM-MDLMDL: minimum description lengthIdea:
把建立模型視為資料壓縮問題,希望用較少的
bit
即可表現較多資訊objective function:
7Slide8
Experimental Results8
對每個音素分別進行分群Slide9
Corpus, EP definition and annotation278 learners30 sentences X 6 ~ 24 charactersThere is a total of 39 canonical Mandarin phoneme units, and 152 EPs were summarized by language
teachers based on their expert knowledge and pedagogical experiences
The definition of EPs includes not only phoneme level substitution, but also insertion and deletion, and is not limited to any specific corpus including the one mentioned above9Slide10
Experimental ResultsK-means with known number of EPs
10Slide11
Experimental ResultsGMM-MDL with automatically estimated number of EPs
11
Note both UPP
and log-UPP yielded 1 to 3 more automatically derived EPs than human definedEPs in average.In contrast MFCC resulted in less number of clusters.Slide12
A DIALOGUE GAME FRAMEWORK WITH PERSONALIZED TRAINING USING REINFORCEMENT LEARNING FOR COMPUTER-ASSISTED LANGUAGE LEARNING
Pei-
hao
Su, Yow-Bang Wang, Tien-han Yu, Lin-shan LeeNational Taiwan University, Taiwan12Slide13
IntroductionWe here propose a dialogue game framework for language learning, which combines pronunciation scoring and a statistical dialogue manager based on a tree-structured dialogue script designed by language teachers.Sentences to be learned can be adaptively selected for each learner, based on the pronunciation unit practiced and scores obtained along with the dialogue progress
13Slide14
Markov Decision Process14
State:
Sentence index
quantized percentage of poorly-pronounced unitspredefined thresholdIndices of the worst-pronounced unitsAction: 根據現在的狀態,選取接下來要練習的句子
Reward Function:
More Practiced Needed
Practice completeness
overall objective function:
發音不好的音素
分數越低的重要性越高,
v
為挑整參數
選定
的
對
話
出現
和平均對話會出現
次數
可以練習到的音素
所有的音素Slide15
Learner Simulation From Real Datait is practically infeasible to collect “enough” real dialogue episodes for policy training
, studies have focused on generating simulated users
to interact
with the dialogue managerReal Learner Data278 learners36 different countries30 sentences (6~24 characters)15Slide16
Simulated Learner Creation16
All pronunciation unit considered
( Initial/Finals, Tone)
GMM
US?
JP?
TH?
JP?
Unsupervised
Clustering
Choose one
mixture by mixture weight
Reinforcement
Learning
Policy
(
State
Action)
Missing valueSlide17
Training Phase:PSV Clustering
Small problem: some units do not appear in the utterance
Treat them as
missing (latent) data latent: A certain variable is never observed.missing at random1771.392.060.1???70.580.266.5???……..
71.3
???
14.1
50.6
43.5
80.2
26.5
20.0
……..
20.3
91.0
???
???
20.5
80.2
46.5
???
……..
Incomplete data
unknown
Slide18
Training Phase:Reinforcement Learning
使用
Q-Learning
學習預期報酬
Optimal policy
Choose the action with the highest Q value with probability
and the remaining actions with probability
.
18
Q=10
Q=9
Q=18
Q’ =
18 +
[ 7 +
10]
Slide19
EXPERIMENTWe compared the proposed approach with the following polices:Always select the sentence with the most diverse pronunciation units from learner’s practiced units
Always select the sentence with the most count of worst-pronounced units
Cast the above two heuristic policies as two actions in an MDP.
19Slide20
20Slide21
21
Fig. 7. Average scores and overage percentages of pronunciation
units for an example testing simulated learner with random
and proposed policies (v=0,1).