/
Speech  Compression Speech  Compression

Speech Compression - PowerPoint Presentation

luanne-stotts
luanne-stotts . @luanne-stotts
Follow
406 views
Uploaded On 2018-02-27

Speech Compression - PPT Presentation

Course o utline and rules Properties of the speech signal A Enis Cetin Course Outline Week 1 Properties of Speech Signals Pitch Formants Phonemes etc Quantization and PCM ID: 637996

coding speech signal voiced speech coding voiced signal unvoiced processing celp bit band compression consonants vibrate audio pitch lpc digital gsm waveform

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Speech Compression" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Speech Compression - Course outline and rules - Properties of the speech signal

A. Enis CetinSlide2

Course OutlineWeek 1: Properties of Speech Signals, Pitch, Formants, Phonemes etc; Quantization and PCMWeek 2: Vector Quantization, DPCM and ADPCMWeek 3: Subband Coding (Wavelet based Speech Coding)Week 4: AR random processes and Autoregressive (AR) Modelling of SpeechWeek 5: LPC-10 Standard (Linear Predictive Coding) and pitch estimation and midterm examWeek 6: Line Spectral Frequencies, LPC-10e and MELPWeek 7: Analysis by Synthesis LPC Coding and Code-Excited Linear Predictive Coding (CELP)Week 8: Harmonic Speech Coding

Week 9:

European Digital Cellular Telephony Standards

(GSM)

Week 10:

North American Digital Cellular Telephony StandardsSlide3

GradingMidterm Exam: 30%Homeworks: 10%Term Project (implementation of a speech coding algorithm using MATLAB or on an Android or Iphone platform): 20% Final Exam: 40%Slide4

Books and ReferencesKey Reference: Digital Speech: Coding for Low Bit Rate Communication Systems, 2nd Edition Oct 29, 2004 by A. M. Kondoz, WileyOther related books: Signal Compression, Coding of Speech, Audio, Image and Video (Selected Topics in Electronics and Systems) May 1997 by N.S. JayantDigital Speech Processing Using Matlab (Signals and Communication Technology) Dec 4, 2013 by E. S. Gopi, SpringerVoice Compression and Communications: Principles and Applications for Fixed and Wireless

Channels

,

Sep

11, 2001by 

Lajos

L.

Hanzo

and F. Clare A.

Somerville

Theory and Applications of Digital Speech

Processing

,

Mar

13,

2010 by

 Lawrence

Rabiner

and Ronald

Schafer

Discrete-Time Speech Signal Processing: Principles and Practice

Nov

8,

2001 by

 Thomas F.

Quatieri

, Prentice Hall

Video, Speech, and Audio Signal Processing and Associated Standards (The Digital Signal Processing

Handbook), Nov

20,

2009 by

 Vijay

Madisetti

Speech and Audio Signal Processing: Processing and Perception of Speech and

Music,

Aug

23,

2011

by

 Ben Gold and Nelson

MorganSlide5

Speech GenerationBlow air through lungs Vibration of vocal folds (chords) in larynx Vocal tract shape introduces resonance  Tongue, teeth, lips, velum (nasal passage) modify the soundSlide6

Speech signal Sampling freq. = 8kHz Slide7

HearingSound waves reach our ears Vibrate ear drum Cause fluid in cochlear to vibrate  Spiral cochlearVibrate hairs inside cochlearDifferent frequencies vibrate different hairs Converts time domain to frequency domain (Fourier Transform magnitude of the speech is more important than phase).People hear with their brainsSlide8

PhonemesConsidered as fundamental units of speech When you change it, it (can) change the meaning of the word“pat” to “fat” “pat” to “rat” There are more phonemes than “Latin or Roman alphabet” letters.Phonetic alphabet concept is first invented by Phonecians in the Middle East.Slide9

US English WovelsAA wAshington

AE

fAt, bAd

AH

bUt, hUsh

AO

lAWn, mAll

AW

hOW, sOUth

AX

About, cAnoe

AY

hIde, bUY

EH

gEt, fEAther

ER

makER, sEARch

EY

gAte, EIght

IH

bIt, shIp

IY

bEAt, shEEp

OW

lOne, nOse

OY

tOY, OYster

UH

fUll

UW

fOOl

Slide10

VowelsAlmost periodic signalsThey have high amplitudes compared to other phonemes (consonants).They are all voiced sounds (vocal cords vibrate).Information content is low compared to consonants: Washington = w-sh-n-gt-n versus a-i-o Slide11

Period (1/pitch) of Vowels Pitch: rate of vibration during voiced speech Males: 80-140 times a second Females: 130-220 times a second Children: 180-320 times a secondFormant (F0) First peak of the spectrum of the smoothed speech signal of a vowel(Some consonants can be also periodic, e.g. “l”, “r”)Slide12

US English ConsonantsStops: P, B, T, D, K, G Fricatives: F, V, HH, S, Z, SH, ZHAffricatives: CH, JHNasals: N, M, NGGlides: L, R, Y, W They can be voiced or unvoiced!They carry more “information” compared to vowels.Slide13

Voiced and Unvoiced SpeechSpeech signal segments can be also classified into voiced, e.g., a, e, i,…,b,r, or unvoiced: p, t, ch,..A voiced speech segment: Relatively high energy content and it is almost periodic => the pitch of voiced speech. The unvoiced part of speech looks like random noise with no periodicity (not white noise!)Mixed segments: neither voiced nor unvoiced, but a mixture of the two. They occur at transition regions, when

there is a change either from voiced to unvoiced or unvoiced to voiced. Slide14

Speech Waveform examplesUnvoiced speech is amplified 5 times!Sampling frequency is 8 kHzSlide15

Voiced and unvoiced consonantsUnvoiced consonants: “P”, “t”, “ch”, “k”, “f”, “th”, “s”, “sh”, “h” (no vibration)Voiced consonants: “b”, “d”, “j” of joke, “g”, “v”, “th” of that, “z”, “s” of vision, “m”, “n”, “ng” of thing, “l”, “r”, “w”, “y” of you (vocal cords vibrate)Voiced consonants have low amplitudes compared to vowelsSlide16

Number of Phonemes in a LanguageUS English: 43 UK English: 44 Japanese: 25 Hindi: 81 Hawaian Language: about 12Since it is some sort of “quantization” phoneme numbers can be different in different books. Slide17

Prosody of Speech Intonation Tune or melody Duration How long/short of each phoneme  Phrasing Where the breaks are Slide18

Narrow Band Speech CompressionNarrow-band speech compression (telephone speech, Skype, Messenger, Google-Talk…) - Sampling frequency = 8 KHz (narrow band) - Intelligible speech - We can recognize the speakerWide-band speech and audio compression - Sampling freq = 44.1 KHz - MP3 (MPEG Audio Coding Layer 2) - Music and teleconferencing - CD: almost no compressionWe will mainly study narrow-band speech compression in this course Slide19

Narrow-band Speech CompressionWaveform coding - tries to preserve the shape of the waveform - e.g., PCM, DPCM, Subband coding (wavelet), (MP3 is also a waveform coder) - High bit rates: 64 Kbit/sec to 16 Kbit/secVocoders (Voice-coders): - parametric coders ( they extract parameters from speech and parameters are transmitted to the receiver) - shape of the waveform is not important - e.g., LPC-10, MELP, CELP, GSM vocoder - Low bit rates, e.g., 2.4Kbit/secSlide20

MOS: Mean Opinion Score Scale Grade (MOS) Subjective opinion Quality

5 Excellent

Imperceptible

Transparent

4 Good

Perceptible, but not annoying

Toll

3 Fair

Slightly annoying

Communication

2 Poor

Annoying

Synthetic

1 Bad

Very annoying

Bad

Measure of the quality of a coder

It is based on psychological tests

Sound quality testing was carried out by people with “golden ears”. Slide21

Comparison of telephone band speech coding standards Standard Y ear

Algorithm

Bit rate (kb/s)

MOS

Delay

+

G.711

1972

Companded PCM

64

4.3

0.125

G.726

1991

VBR-ADPCM

16/24/32/40

toll

0.125

G.728

1994

LD-CELP

16

4

0.625

G.729

1995

CS-ACELP

8

4

15

G.723.1

1995

A/MP-MLQ CELP

5.3/6.3

toll

37.5

ITU 4

4

toll

25

GSM FR

1989

RPE-L TP

13

3.7

20

GSM EFR

1995

ACELP

12.2

4

20

GSM/2

1994

VSELP

5.6

3.5

24.375

IS54

1989

VSELP

7.95

3.6

20

IS96

1993

Q-CELP

0.8/2/4/8.5

3.5

20

JDC

1990

VSELP

6.7

commun

.

20

JDC/2

1993

PSI-CELP

3.45

commun

.

40

Inmarsat-M

1990

IMBE

4.15

3.4

78.75

FS1015

1984

LPC-10

2.4

synthetic

112.5

FS1016

1991

CELP

4.8

3

37.5

New FS 2.4

1997

MELP

2.4

3

45.5 Slide22

Experiments8 kHz sampled 1 bit/sample (sign information) speech is intelligible All-pass filter the speech. You may not notice any difference between input and outputSpeech and audio waveforms are zero mean signalsSpeech can be assumed to be wide-sense stationary during each phoneme.Slide23

ReferencesDigital Speech: Coding for Low Bit Rate Communication Systems, 2nd Edition Oct 29, 2004 by A. M. Kondoz, WileyLecture notes, Speech Processing 11-492/18-492, CMUA-law and Mu-law Companding Implementations Using TMS320C54x, Texas Instruments, Applications note: SPRA163A