Course o utline and rules Properties of the speech signal A Enis Cetin Course Outline Week 1 Properties of Speech Signals Pitch Formants Phonemes etc Quantization and PCM ID: 637996
Download Presentation The PPT/PDF document "Speech Compression" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Speech Compression - Course outline and rules - Properties of the speech signal
A. Enis CetinSlide2
Course OutlineWeek 1: Properties of Speech Signals, Pitch, Formants, Phonemes etc; Quantization and PCMWeek 2: Vector Quantization, DPCM and ADPCMWeek 3: Subband Coding (Wavelet based Speech Coding)Week 4: AR random processes and Autoregressive (AR) Modelling of SpeechWeek 5: LPC-10 Standard (Linear Predictive Coding) and pitch estimation and midterm examWeek 6: Line Spectral Frequencies, LPC-10e and MELPWeek 7: Analysis by Synthesis LPC Coding and Code-Excited Linear Predictive Coding (CELP)Week 8: Harmonic Speech Coding
Week 9:
European Digital Cellular Telephony Standards
(GSM)
Week 10:
North American Digital Cellular Telephony StandardsSlide3
GradingMidterm Exam: 30%Homeworks: 10%Term Project (implementation of a speech coding algorithm using MATLAB or on an Android or Iphone platform): 20% Final Exam: 40%Slide4
Books and ReferencesKey Reference: Digital Speech: Coding for Low Bit Rate Communication Systems, 2nd Edition Oct 29, 2004 by A. M. Kondoz, WileyOther related books: Signal Compression, Coding of Speech, Audio, Image and Video (Selected Topics in Electronics and Systems) May 1997 by N.S. JayantDigital Speech Processing Using Matlab (Signals and Communication Technology) Dec 4, 2013 by E. S. Gopi, SpringerVoice Compression and Communications: Principles and Applications for Fixed and Wireless
Channels
,
Sep
11, 2001by
Lajos
L.
Hanzo
and F. Clare A.
Somerville
Theory and Applications of Digital Speech
Processing
,
Mar
13,
2010 by
Lawrence
Rabiner
and Ronald
Schafer
Discrete-Time Speech Signal Processing: Principles and Practice
Nov
8,
2001 by
Thomas F.
Quatieri
, Prentice Hall
Video, Speech, and Audio Signal Processing and Associated Standards (The Digital Signal Processing
Handbook), Nov
20,
2009 by
Vijay
Madisetti
Speech and Audio Signal Processing: Processing and Perception of Speech and
Music,
Aug
23,
2011
by
Ben Gold and Nelson
MorganSlide5
Speech GenerationBlow air through lungs Vibration of vocal folds (chords) in larynx Vocal tract shape introduces resonance Tongue, teeth, lips, velum (nasal passage) modify the soundSlide6
Speech signal Sampling freq. = 8kHz Slide7
HearingSound waves reach our ears Vibrate ear drum Cause fluid in cochlear to vibrate Spiral cochlearVibrate hairs inside cochlearDifferent frequencies vibrate different hairs Converts time domain to frequency domain (Fourier Transform magnitude of the speech is more important than phase).People hear with their brainsSlide8
PhonemesConsidered as fundamental units of speech When you change it, it (can) change the meaning of the word“pat” to “fat” “pat” to “rat” There are more phonemes than “Latin or Roman alphabet” letters.Phonetic alphabet concept is first invented by Phonecians in the Middle East.Slide9
US English WovelsAA wAshington
AE
fAt, bAd
AH
bUt, hUsh
AO
lAWn, mAll
AW
hOW, sOUth
AX
About, cAnoe
AY
hIde, bUY
EH
gEt, fEAther
ER
makER, sEARch
EY
gAte, EIght
IH
bIt, shIp
IY
bEAt, shEEp
OW
lOne, nOse
OY
tOY, OYster
UH
fUll
UW
fOOl
Slide10
VowelsAlmost periodic signalsThey have high amplitudes compared to other phonemes (consonants).They are all voiced sounds (vocal cords vibrate).Information content is low compared to consonants: Washington = w-sh-n-gt-n versus a-i-o Slide11
Period (1/pitch) of Vowels Pitch: rate of vibration during voiced speech Males: 80-140 times a second Females: 130-220 times a second Children: 180-320 times a secondFormant (F0) First peak of the spectrum of the smoothed speech signal of a vowel(Some consonants can be also periodic, e.g. “l”, “r”)Slide12
US English ConsonantsStops: P, B, T, D, K, G Fricatives: F, V, HH, S, Z, SH, ZHAffricatives: CH, JHNasals: N, M, NGGlides: L, R, Y, W They can be voiced or unvoiced!They carry more “information” compared to vowels.Slide13
Voiced and Unvoiced SpeechSpeech signal segments can be also classified into voiced, e.g., a, e, i,…,b,r, or unvoiced: p, t, ch,..A voiced speech segment: Relatively high energy content and it is almost periodic => the pitch of voiced speech. The unvoiced part of speech looks like random noise with no periodicity (not white noise!)Mixed segments: neither voiced nor unvoiced, but a mixture of the two. They occur at transition regions, when
there is a change either from voiced to unvoiced or unvoiced to voiced. Slide14
Speech Waveform examplesUnvoiced speech is amplified 5 times!Sampling frequency is 8 kHzSlide15
Voiced and unvoiced consonantsUnvoiced consonants: “P”, “t”, “ch”, “k”, “f”, “th”, “s”, “sh”, “h” (no vibration)Voiced consonants: “b”, “d”, “j” of joke, “g”, “v”, “th” of that, “z”, “s” of vision, “m”, “n”, “ng” of thing, “l”, “r”, “w”, “y” of you (vocal cords vibrate)Voiced consonants have low amplitudes compared to vowelsSlide16
Number of Phonemes in a LanguageUS English: 43 UK English: 44 Japanese: 25 Hindi: 81 Hawaian Language: about 12Since it is some sort of “quantization” phoneme numbers can be different in different books. Slide17
Prosody of Speech Intonation Tune or melody Duration How long/short of each phoneme Phrasing Where the breaks are Slide18
Narrow Band Speech CompressionNarrow-band speech compression (telephone speech, Skype, Messenger, Google-Talk…) - Sampling frequency = 8 KHz (narrow band) - Intelligible speech - We can recognize the speakerWide-band speech and audio compression - Sampling freq = 44.1 KHz - MP3 (MPEG Audio Coding Layer 2) - Music and teleconferencing - CD: almost no compressionWe will mainly study narrow-band speech compression in this course Slide19
Narrow-band Speech CompressionWaveform coding - tries to preserve the shape of the waveform - e.g., PCM, DPCM, Subband coding (wavelet), (MP3 is also a waveform coder) - High bit rates: 64 Kbit/sec to 16 Kbit/secVocoders (Voice-coders): - parametric coders ( they extract parameters from speech and parameters are transmitted to the receiver) - shape of the waveform is not important - e.g., LPC-10, MELP, CELP, GSM vocoder - Low bit rates, e.g., 2.4Kbit/secSlide20
MOS: Mean Opinion Score Scale Grade (MOS) Subjective opinion Quality
5 Excellent
Imperceptible
Transparent
4 Good
Perceptible, but not annoying
Toll
3 Fair
Slightly annoying
Communication
2 Poor
Annoying
Synthetic
1 Bad
Very annoying
Bad
Measure of the quality of a coder
It is based on psychological tests
Sound quality testing was carried out by people with “golden ears”. Slide21
Comparison of telephone band speech coding standards Standard Y ear
Algorithm
Bit rate (kb/s)
MOS
∗
Delay
+
G.711
1972
Companded PCM
64
4.3
0.125
G.726
1991
VBR-ADPCM
16/24/32/40
toll
0.125
G.728
1994
LD-CELP
16
4
0.625
G.729
1995
CS-ACELP
8
4
15
G.723.1
1995
A/MP-MLQ CELP
5.3/6.3
toll
37.5
ITU 4
–
–
4
toll
25
GSM FR
1989
RPE-L TP
13
3.7
20
GSM EFR
1995
ACELP
12.2
4
20
GSM/2
1994
VSELP
5.6
3.5
24.375
IS54
1989
VSELP
7.95
3.6
20
IS96
1993
Q-CELP
0.8/2/4/8.5
3.5
20
JDC
1990
VSELP
6.7
commun
.
20
JDC/2
1993
PSI-CELP
3.45
commun
.
40
Inmarsat-M
1990
IMBE
4.15
3.4
78.75
FS1015
1984
LPC-10
2.4
synthetic
112.5
FS1016
1991
CELP
4.8
3
37.5
New FS 2.4
1997
MELP
2.4
3
45.5 Slide22
Experiments8 kHz sampled 1 bit/sample (sign information) speech is intelligible All-pass filter the speech. You may not notice any difference between input and outputSpeech and audio waveforms are zero mean signalsSpeech can be assumed to be wide-sense stationary during each phoneme.Slide23
ReferencesDigital Speech: Coding for Low Bit Rate Communication Systems, 2nd Edition Oct 29, 2004 by A. M. Kondoz, WileyLecture notes, Speech Processing 11-492/18-492, CMUA-law and Mu-law Companding Implementations Using TMS320C54x, Texas Instruments, Applications note: SPRA163A