/
Dual-domain Hierarchical Classification of Phonetic Time Se Dual-domain Hierarchical Classification of Phonetic Time Se

Dual-domain Hierarchical Classification of Phonetic Time Se - PowerPoint Presentation

pamella-moone
pamella-moone . @pamella-moone
Follow
394 views
Uploaded On 2017-10-25

Dual-domain Hierarchical Classification of Phonetic Time Se - PPT Presentation

Hossein Hamooni Abdullah Mueen University of New Mexico Department of Computer Science What is Phoneme Phonemes are very small units of intelligible sound usually less than 200 ID: 599524

classification phoneme domain bounding phoneme classification bounding domain dtw phonemes time data dual set accuracy size 100 work signal

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Dual-domain Hierarchical Classification ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Dual-domain Hierarchical Classification of Phonetic Time Series

Hossein Hamooni, Abdullah MueenUniversity of New MexicoDepartment of Computer ScienceSlide2

What is Phoneme?

Phonemes are very small units of intelligible sound (usually less than 200 ms).

Phonetic spelling is the sequence of phonemes that a

word comprises.

Example:

Coat ([kōt] /K OW T/)From ([frəm] /F R AH M/)impressive ([imˈpresiv] /IH M P R EH S IH V/)

2Slide3

Phoneme Classification

What is phoneme classification?Input: A short segment of audio signal.

Output

: What phoneme it is.

Phoneme classification is a complex task:

More than 100 classes (based on International Phonetic Alphabet)Variation in speakers, dialects, accents, noise in the environment, etc.Phoneme classification can be used in:Robust speech recognitionAccent/dialect detection

Speech quality scoring

3Slide4

Related Work

Different methods for phoneme classification have been used in the literature:Hidden Markov model [Lee, 1989]

Neural network [Schwarz, 2009]

Deep belief network

[Mohamed, 2012]

Support vector machine [Salomon, 2001]Hierarchical methods [Dekel, 2005] Boltzmann machine [Mohamed, 2010]

Although data mining society has shown that k-NN classifiers can work well on time series data, it hasn’t been tried on phoneme yet.

4

[C. Lopes, F. Perdigao, 2011]Slide5

Our Dual-domain Approach

5Time Domain:

Using k-NN

Dynamic

Time Warping (DTW

) ExpensiveSpeed up by lower bounding techniquesFrequency Domain:Using k-NN

Euclidean distance between Mel-frequency

cepstrum

coefficients

(MFCC)

FastSlide6

Real Example

6Slide7

Challenge

7DTW is expensive (quadratic in time and space complexity)We need to apply a speed up techniqueSolution: Lower bounding techniques

w

wSlide8

DTW Lower bounding

8Resampling to equal length doesn’t always work !!!Slide9

DTW Lower bounding

9We use the prefix of the longer signal (Prefixed LB_Keogh)

We show that Prefixed

LB_Keogh

is a lower bound if:

w > difference between lengths of two signalsWe set w = c * length of the longer signalWe ignore all pairs of signals that don’t satisfy the above condition.

2

4

6

8

10

12

14

16

18

x

10

4

0

0.5

1

1.5

2

2.5

3

3.5

Speedup

Training Set Size

10

20

30

40

50

60

70

80

90

100

80.2

80.4

80.6

80.8

81

81.2

81.4

81.6

81.8

Window Size (c%)

Accuracy(%)

c = 30%Slide10

Data Collection

10370,000 phonemes are segmented from:

Data is publicly available.

Slide11

Phoneme Segmentation

11The Penn Phonetics Lab Forced Aligner (p2fa

)

Takes a signal and a transcript

Produces timing segmentations (word level and phoneme level)Slide12

Accuracy (All layers)

12

10-fold cross validation

100 random phonemes in each foldSlide13

Accented Phoneme Classification

13

0

0.5

1

1.5

2

2.5

3

3.5

x 10

4

0.65

0.7

0.75

0.8

0.85

0.9

0.95

Training Set Size

Accuracy

MFCC

DTW

British vs. American

accent

Using Oxford test set

2-class classification problem

No hierarchySlide14

Conclusion

We present a dual-domain hierarchical method for phoneme classification.

We generate a novel dataset of 370,000 phonemes

.

We achieve up to

73% accuracy rate for 39 classes.Our lower bounding technique gives us up

to

3X

speedup.

14Slide15

15

Thank You

Data and code available at:

http://

cs.unm.edu

/~hamooni/papers/Dual_2014