Realtime Enhancement of Noisy Speech Using Spectral Subtraction Santosh K Waddi 10307932 Supervisor Prof P C Pandey IIT Bombay June 2013 Overview Introduction Speech Enhancement Using Spectral Subtraction ID: 917030
Download Presentation The PPT/PDF document "M. Tech. project, EE Dept., IIT Bombay, ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
M. Tech. project, EE Dept., IIT Bombay, Jun. 2013.Real-time Enhancement of Noisy Speech Using Spectral SubtractionSantosh K. Waddi (10307932)Supervisor: Prof. P. C. Pandey
IIT Bombay
June 2013
Slide2OverviewIntroductionSpeech Enhancement Using Spectral SubtractionInvestigations Using Offline ProcessingImplementation for Real-time ProcessingSummary & Conclusions
Slide31. IntroductionSensorineural hearing lossIncreased hearing thresholds and high frequency lossDecreased dynamic range & abnormal loudness growth Reduced speech perception due to increased spectral & temporal masking→ Decreased speech intelligibility in noisy environmentSignal processing in hearing aidsFrequency selective amplificationAutomatic volume control Multichannel dynamic range compression (settable attack time, release time, and compression ratios)
Processing for reducing the effect of increased spectral masking in sensorineural loss
Binaural dichotic presentation (
Lunner
et al.
1993,
Kulkarni
et al.
2012)
Spectral contrast enhancement
(Yang
et al.
2003)
Multiband frequency compression
(Arai
et al.
2004,
Kulkarni
et al.
2012)
Slide4Techniques for reducing the background noise Directional microphoneAdaptive filtering (a second microphone needed for noise reference)Single-channel noise suppression using spectral subtraction (Boll 1979, Berouti et al.1979, Martin 1994, Kamath & Loizou 2002, Loizou 2007, Lu & Loizou 2008, Paliwal et al. 2010) Processing stepsDynamic estimation of non-stationary noise spectrum
- During non-speech segments using voice activity detection
- Continuously using statistical techniques
Estimation of noise-free speech spectrum
- Spectral noise subtraction
- Multiplication by noise suppression function
Speech resynthesis
(using enhanced magnitude and noisy phase)
Slide5Research objective Real-time single-input speech enhancement for use in hearing aids and other sensory aids (cochlear prostheses, etc) for hearing impaired listeners and in communication devices Main challenges Noise estimation without voice activity detection to avoid errors under low-SNR & during long speech segmentsLow signal delay(algorithmic + computational) for real-time applicationLow computational complexity & memory requirement for implementation on a low-power processor
Slide62. Speech Enhancement Using Spectral SubtractionDynamic estimation of non-stationary noise spectrumEstimation of noise-free speech spectrum Speech resynthesis
Slide7Generalized spectral subtraction (Berouti et al. 1979)Power subtractionWindowed speech spectrum = Xn(k)Estimated noise mag. spectrum = Dn(k)Estimated speech spectrum Yn
(
k) = [|
X
n
(
k
)|
2
– (
D
n
(
k
))
2
]
0.5
e
j<
Xn
(
k
)
Problems:
residual noise due to under-subtraction, distortion in the form of musical noise & clipping due to over-subtraction.
|
Y
n
(
k
)|
= [ |
X
n
(
k
)|
γ
–
α
(
D
n
(
k
))
γ
]
1/
γ
, if |
X
n
(
k
)| > (
α
+
β
)
1/
γ
D
n
(
k
)
β
1/
γ
D
n
(
k
) otherwise
γ
=
exponent factor (
2:
power subtraction,
1:
magnitude subtraction)
α
=
o
ver-subtraction
factor
(for limiting
the effect of short-term variations in noise
spectrum)
β
= f
loor
factor to mask the
musical noise due to over-subtraction
Re-synthesis with noisy phase without explicit phase calculation
Y
n
(
k
) = |
Y
n
(
k
)|
X
n
(
k
) / |
X
n
(
k
)|
Slide8Multi-band spectral subtraction (Kamath & Loizou 2002)Noise does not effect spectrum uniformlySpeech spectrum divided into B non-overlapping bands, spectral subtraction is performed independentlyTest material: 10 sentences from HINT database, noise: speech-shaped noise, Noisy speech: 0 dB and 5 dB SNREvaluation: Itakura-Saito (IS) distance method as an objective measure Improvement over the conventional power spectral subtraction, a very little trace of musical noiseGeometric approach to spectral subtraction (Lu & Loizou 2008)
Without assuming the cross-terms as zero
Test material:
NOIZEUS database, noise: babble, street, car, white, Noisy speech: 0 dB, 5 dB and 15 dB SNR
Evaluation:
mean square error (MSE), PESQ, log likelihood ratio
Cross terms can be ignored at very low and high SNRs but not near to 0 dB
Proc. output:
no audible musical noise , smooth and pleasant residual noise
Performed significantly better than power spectral subtraction in all conditions
Slide9Noise estimationMinimal-tracking algorithmsMinimum statistics (Martin 1994)Tracks the noise as minima of past frames Minimum tracking (Doblinger 1995)Smoothing noisy speech power spectra in each frequency bin using a non-linear smoothingTime-recursive averaging algorithmsSNR-dependent recursive averaging (Lin 2003)Noisy speech decomposed into sub-band signals, noisy signal power is smoothened and noise estimated adaptivelySmoothing parameter: function of estimated SNRWeighted spectral averaging (Hirsch and Ehrlischer 1995)
First order recursive weighted average of past spectral magnitude values over 400 ms which are below an adaptive threshold
Slide10Improved minima-controlled recursive averaging (Cohen 2007)Two iterations of smoothing and trackingFirst iteration: rough voice activity detection is provided in each frequency bandSecond iteration: smoothing excludes strong speech components, makes the minimum tracking robust during speech activitySmoothing parameter: frequency-dependent & dynamically adjusted by signal presence probabilityLower estimator error than minimum statisticsMethod is combined with log-spectral amplitude estimator Higher segmental SNR improvement than minimum statisticsHistogram-based technique (Hirsch & Ehrlicher 1995)Histogram: noisy speech over 400 ms
Noise estimated:
maximum of distribution in each sub-band
Avoid spikes:
estimated values smoothed along time axis
Objective evaluation:
relative error
Relative error is low compared to weighted spectral average method (
Hirsch &
Ehrlicher
1995)
Slide11Cascaded-median based estimation (Basha & Pandey 2012) Moving median approximated by p-point q-stage cascaded-median, with a saving in memory & computation for real-time implementation.Quantile-based noise estimation (Stahl 2000)Speech signal energy: low in most of the frames high in only 10 – 20 % framesNoise estimation: Selecting certain quantile value from previous frames of noisy speech spectrumFrequency-dependent and SNR-dependent for quantile selection
Median-based noise estimation work well in robust manner
Slide12Median Storage No of sortings per frame per freq. bin per freq. binM-point 2M (M–1)/2p-pont q-stage (M = pq)
pq
p
(
p–
1)/2
MBNE
vs
CMBNE Comparison
Project objective
Implementation of generalized spectral subtraction along with cascaded-median based noise estimation for real-time processing using a low-power DSP
Selection of optimal set of processing steps and parameters, using offline processing
Implementation on a DSP board with a16-bit fixed-point processor & evaluation
Condition for reducing sorting operations and storage:
low
p
,
q
≈
ln
(
M
)
p
= 3 → code simplification for sorting operations
Slide133. Investigations Using Offline ProcessingTest materialSpeech material1: Recording with three isolated vowels, a Hindi sentence, an English sentence (-/a/-/i/-/u/– “aayiye aap kaa naam kyaa hai?” – “Where were you a year ago?”) from a male speaker. Referred to as "VHSES"Speech material2: Six sentences from NOIZEUS database of one male speaker
Noise: white, pink, street, babble, car, and train noises.
SNR: ∞, 18, 15,12, 9, 6, 3, 0, -3, -6 dB.
Evaluation methods
Informal listening
Objective evaluation using PESQ measure (0
– 4.5)
Investigations
(
f
s
= 10 kHz)
Overlap of
50%
&
75%
: indistinguishable outputs
γ
=
1 (
magnitude subtraction) :
higher tolerance to variation in
α
,
β
values
Slide14Investigation on noise estimation(a) Clean speech signal(b) White noise(c) Noisy speech: white noise, 3 dB SNR(d) Noisy speech: white noise, 0 dB SNRScatter plots for magnitude spectra. Speech material: VHSES
Slide15(a) Clean speech signal(b) White noise(c) Noisy speech: white noise, 3 dB SNR(d) Noisy speech: white noise, 0 dB SNRScatter plots for magnitude spectra. Speech material: NOIZEUS
Slide16(a) Mean(b) Median(c) MinimumMean, median and minimum of magnitude spectra of clean speech signal, noise and noisy speech (white, SNR: 0 dB), speech material: VHSESNoisy signal median tracks the noise median & Noisy signal minimum tracks the noise minimum at almost all the frequencies
Slide17(a) Mean(b) Median(c) MinimumMean, median and minimum of magnitude spectra of clean speech signal, noise and noisy speech (white, SNR: 0 dB), speech material: NOIZEUS
Slide18(a) Speech material: VHSES(b) Speech material: NOIZEUSRelative RMS error (dB)Objective evaluation of the accuracy of noise estimationRelative RMS error (dB) decreases as SNR decreases
Slide19Effect of window length and noise estimation durationProcessing: Magnitude spectral subtraction with median based noise estimationHigh PESQ score: Noise estimation across 81 past frames & 20 – 40 ms window length30ms window length was chosen (approximately 1.2 s duration)(a) Speech: VHSES, noise: white, SNR: 0dB(b) Speech: NOIZEUS, noise: white, SNR: 0dB
Slide20Comparison of enhanced speech using MBNE and CMBNEMBNE requires large memory & computation intensive3-point 4-stage cascaded-median significantly reduces memory requirement & computationsReduction in storage requirement per freq. bin: from 162 to 12 samples Reduction in number of sorting operations per frame per freq. bin: from 40 to 3Information listening: Perceptually sameObjective evaluation: Almost same in most cases and maximum difference of 0.06Noise type
Un proc.
Proc. using MBNE
Proc. using CMBNE
white
1.55
1.84
1.84
babble
1.75
1.80
1.81
street
1.83
2.08
2.04
pink
1.60
2.00
1.98
train
2.05
2.40
2.35
car
1.72
1.95
1.89
PESQ score of the enhanced speech. Speech: NOIZEUS, SNR: 0 dB
Slide21Effect of spectral subtraction parametersProcessing: Magnitude spectral subtraction using 3-point 4-stage CMBNEAnalysis-synthesis: 30 ms window length & 50% overlapSpectral floor factor β : 0.01 appropriate for all the casesSubtraction factor α: in 2 – 2.5 for VHSES and in 1.2 – 1.4 for NOIZEUS speech materialPhase estimation for spectral subtractionProcessing: Magnitude spectral subtraction using 3-point 4-stage CMBNE
Phase:
zero,
Cepstrum
1978,
Quatieri
& Oppenheim 1981,
Nawab
et al.
1983
Analysis-synthesis:
50% overlap rect. win., 75% overlap rect. win., Griffin-Lim method
(Griffin & Lim 1984)
Informal listening:
No improvement over by using phase using noisy phase
Objective evaluation:
signal estimated using noisy phase has higher PESQ score
Slide22Comparison of proposed method with other methodsProposed method: Magnitude spectral subtraction with cascaded-median based noise estimation. Analysis-synthesis with 30 ms and 50% overlapComparison: spectral-subtractive, statistical-model based, and subspace algorithms (implementations available on CD accompanying Loizou 2007: specsub, mband, ga, wiener_iter, wiener_as, wiener_wt, mt_mask, audnoise, mmse, logmmse, logmmse_spu, stsa_weuchild, stsa_wcosh, stsa_mis, kli, pklt)Enhancement method
Noise type
white
babble
street
pink
train
car
un proc.
1.54
1.73
1.78
1.59
2.00
1.67
specsub
1.78
1.74
1.85
2.01
2.33
1.91
mband
1.43
1.88
2.06
1.72
2.61
2.09
ga
1.82
2.05
2.42
2.11
2.67
2.26
mmse
1.95
1.86
1.99
2.22
2.69
2.24
klt
2.51
1.89
1.84
2.27
2.32
1.91
Proposed method
2.14
1.93
2.16
2.31
2.63
2.15
Comparison of PESQ scores for VHSES speech material, 0 dB SNR
Slide23Comparison of PESQ scores for NOIZEUS speech material, 0 dB SNREnhancement methodNoise type
white
babble
street
pink
train
car
un proc.
1.55
1.75
1.83
1.60
2.05
1.72
specsub
1.63
1.58
1.81
1.78
2.04
1.77
mband
1.59
1.84
2.02
1.73
2.30
1.91
ga
1.61
1.82
2.07
1.86
2.35
1.96
mmse
1.82
1.78
2.02
2.01
2.43
1.99
klt
1.89
1.71
1.87
1.97
2.07
1.78
Proposed method
1.83
1.81
2.03
1.95
2.33
1.90
Observation:
Comparable to the best ones
Slide24DiscussionFFT length N = 512 & higher: indistinguishable outputsProcessing: Magnitude spectral subtraction with 3-point 4-stage CMBNE, analysis-synthesis with 30ms window length & 50% overlap Informal listening: Significant enhancement for all noises with different SNR'sSpectral subtraction parameters: β = 0.01 appropriate for all the cases, α in 2 – 2.5 for VHSES and in1.2 – 1.4 for NOIZEUS speech materialSNR advantage: 4 – 13 dB for VHSES & 2 – 7 dB for NOIZEUS speech material
Noise type
Material: VHSES
Material:
NOIZEUS
SNR advantage
Optimal
α
SNR advantage
Optimal
α
white
13 dB
2.0
7 dB
1.4
babble
4 dB
2.0
2 dB
1.2
street
5 dB
2.5
4 dB
1.4
pink
11 dB
2.0
7 dB
1.4
train
7 dB
2.0
5 dB
1.4
car
6 dB
2.0
5 dB
1.4
Slide254. Implementation for Real-time Processing16-bit fixed point DSP: TI/TMS320C551516 MB memory space : 320 KB on-chip RAM with 64 KB dual access RAM, 128 KB on-chip ROMThree 32-bit programmable timers, 4 DMA controllers each with 4 channelsFFT hardware accelerator (8 to 1024-point FFT)Max. clock speed: 120 MHzDSP Board: eZdsp 4 MB on-board NOR flash for user programCodec TLV320AIC3204: stereo ADC & DAC, 16/20/24/32-bit quantization , 8 – 192 kHz sampling
Development environment for C: TI's '
CCStudio, ver. 4.0'
Slide26ImplementationOne codec channel (ADC and DAC) with 16-bit quantizationSampling frequency: 10 kHzWindow length of 30 ms (L = 300) with 50% overlap, FFT length N = 512Storage of input samples, spectral values, processed samples: 16-bit real & 16-bit imaginary parts
Slide27Data transfers and buffering operations (S = L/2)DMA cyclic buffers3 block input buffer2 block output buffer
(each with
S samples)
Pointers
current input block
just-filled input block
current output block
write-to output block
(incremented cyclically on DMA interrupt)
Signal delay
Algorithmic:
1 frame (30 ms)
Computational ≤
frame shift (15 ms)
Slide28ResultsPESQ Score vs SNR for noisy and enhanced speech using offline and real-time processing(a) Speech: VHSES(b) Speech: NOIZEUSOffline proc. improvement: 0.57 – 0.80 for VHSES & 0.28 – 0.44 for NOIZEUSReal-time proc. improvement: 0.39 – 0.71 for VHSES & 0.22 – 0.32 for NOIZEUS
Slide29Example of Processing : "-/a/-/i/-/u/– "aayiye aap kaa naam kyaa hai?" – "Where were you a year ago?", with white noise at 3 dB SNR(a) Clean speech
(c) Offline processed
(b) Noisy speech
(d) Real-time processed
Slide30Noise typeUn proc.Offline proc.Real-time proc.white
1.54
2.14
2.10
babble
1.73
1.93
1.87
street
1.78
2.16
1.92
pink
1.59
2.31
2.20
train
2.00
2.63
2.45
car
1.67
2.15
2.09
Real-time processing tested using white, babble, car, pink, train noises:
real-time processed output perceptually similar to the offline processed output
Signal delay =
48 ms
Lowest clock for satisfactory operation = 16.4 MHz
→
Processing capacity used ≈ 1/7 of the capacity with highest clock (120 MHz)
Comparison of enhanced speech between offline and real-time processed. Speech: VHSES, SNR: 0dB
Slide315. Summary & ConclusionsInvestigation & implementation of spectral subtraction for real-time operation: Magnitude spectrum subtraction and resynthesis using noisy phase, along with cascaded-median based dynamic noise estimation for reducing computation and memory requirementEnhancement of speech with different types of additive stationary and non-stationary noise: SNR advantage : 4 – 13 dB for VHSES & 2 – 7 dB for NOIZEUSImplementation for real-time operation using 16-bit fixed-point processor TI/TMS320C5515: Implementation with 10 kHz sampling using 1/7 of processing capacity, signal delay = 48 msFurther work Frequency & a posteriori SNR-dependent subtraction & spectral floor factorsCombination of speech enhancement technique with other processing techniques in the sensory aids Implementation using other processorsSubjective evaluation of intelligibility and quality of enhanced speech
Slide32Thank You
Slide33AbstractSensorineural loss is generally associated with increased spectral masking due to widened auditory filters and the listeners having this kind of hearing impairment often experience great difficulty when the speech is contaminated by noise. This thesis presents investigations for real-time enhancement of noisy speech using spectral subtraction for suppressing the external noise. Investigation using offline processing for enhancing the noisy speech with different types of noise and SNR values is carried out to select the optimal set of steps and parameters for real-time processing. PESQ score is used for objective comparison of quality of the enhanced speech. Results show that median based noise estimation is effective in estimating noise from noisy speech without a voice activity detector, for a wide variety of stationary and non-stationary noises and range of SNR values and that a cascaded-median can be used as an approximation to median for significantly reducing the computation and memory requirement, without adversely affecting the noise estimation. Speech enhancement using magnitude spectrum subtraction with 3-point 4-stage cascaded median for noise estimation and resynthesis using noisy phase resulted in improvements in PESQ scores in the range 0.28 – 0.44 for speech material from NOIZEUS database with added white noise. Resynthesis using phase estimated from the enhanced magnitude spectrum did not result in any further improvement in the scores. The processing technique is implemented and tested for satisfactory operation, with sampling frequency of 10 kHz, 30 ms analysis window with 50% overlap, using a DSP board based on 16-bit fixed-point DSP processor TMS320C5515 with on-chip FFT hardware. The implementation uses
data transfer and buffering operations devised for an efficient realization of analysis-synthesis
and codec and DMA for acquisition of the input signal and outputting of the processed output signal
.
The real-time operation is achieved with signal delay of approximately 48 ms and using about one-seventh of the computing capacity of the processor.
Slide34References[1] H. Levitt, J. M. Pickett, and R. A. Houde, Eds., Senosry Aids for the Hearing Impaired. New York: IEEE Press, 1980, pp. 3–10.[2] J. M. Pickett, The Acoustics of Speech Communication: Fundamentals, Speech Perception Theory, and Technology. Boston, Mass.: Allyn Bacon, 1999, pp. 289–323.[3] H. Dillon, Hearing Aids. New York: Thieme Medical, 2001.[4] B. C. J. Moore, An Introduction to the Psychology of Hearing, London, UK: Academic, 1997, pp 66–107.
[5] T.
Lunner, S.
Arlinger
, and J.
Hellgren
, “8-channel digital filter bank for hearing aid use: preliminary results in monaural,
diotic
, and dichotic modes,”
Scand.
Audiol
. Suppl
., vol. 38, pp. 75–81, 1993.
[6] P. N.
Kulkarni
, P. C. Pandey, and D. S.
Jangamashetti
, “Binaural dichotic presentation to reduce the effects of spectral masking in moderate bilateral sensorineural hearing loss,”
Int. J.
Audiol
., vol. 51, no. 4, pp. 334–344, 2012.
[7] J. Yang, F.
Luo
, and A.
Nehorai
, “Spectral contrast enhancement: Algorithms and comparisons,”
Speech
Commun
., vol. 39, no. 1–2, pp. 33–46, 2003.
[8] T. Arai, K.
Yasu
, and N.
Hodoshima
, “Effective speech processing for various impaired listeners,” in
Proc. 18th Int. Cong.
Acoust
.
(ICA 2004)
, Kyoto, Japan, 2004 pp. 1389–1392.
[9] P. N.
Kulkarni
, P. C. Pandey, and D. S.
Jangamashetti
, "Multi-band frequency compression for improving speech perception by listeners with moderate sensorineural hearing loss,"
Speech
Commun
.
, vol. 54, no. 3, pp. 341–350, 2012.
[10] P. C.
Loizou
,
Speech Enhancement: Theory and Practice
. New York: CRC, 2007.
[11] S. F. Boll, “Suppression of acoustic noise in speech using spectral subtraction,”
IEEE Trans.
Acoust
., Speech, Signal Process.
, vol. 27, no. 2, pp. 113–120, 1979.
[12] M.
Berouti
, R. Schwartz, and J.
Makhoul
, “Enhancement of speech corrupted by acoustic noise,” in
Proc. IEEE ICASSP
1979, Washington, DC, pp. 208–211.
[13] S.
Kamath
and P.
Loizou
, “A multi-band spectral subtraction method for enhancing speech corrupted by colored noise,” in
Proc. IEEE ICASSP
, 2002, Orlando, Florida, vol. 4, pp. IV–4164.
[14] Y. Lu and P. C.
Loizou
, “ A geometric approach to spectral subtraction,”
Speech
Commun
.
, vol. 50, no. 6, pp. 453–466, 2008.
Slide35[15] K. Paliwal, K. Wojcicki, and B. Schwerin, “Single-channel speech enhancement using spectral subtraction in the short-time modulation domain,” Speech Commun.
, vol. 52, no. 5, pp. 450–475, 2010.
[
16
]
R. Martin, “Spectral subtraction based on minimum statistics,” in
Proc. Eur. Signal Process. Conf.
, 1994, pp. 1182-1185.
[
17
] I. Cohen, “Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging,”
IEEE Trans. Speech Audio Process
., vol. 11, no. 5, pp. 466
–
475, 2003.
[
18
] H. Hirsch and C.
Ehrlicher
, “Noise estimation techniques for robust speech recognition,” in
Proc. IEEE ICASSP
, 1995, Detroit, MI, pp. 153
–
156.
[
19
]
V. Stahl, A. Fisher, and R.
Bipus
, “Quantile based noise estimation for spectral subtraction and Wiener filtering,” in
Proc. IEEE ICASSP
, 2000, Istanbul, Turkey, pp. 1875–1878.
[
20
] G.
Doblinger
, “Computationally efficient speech enhancement by spectral minima tracking in
subbands
,” in
Proc. 4th Eur. Conf. Speech
Commun
. and Technology (EUROSPEECH’95)
, Madrid, Spain, 1995, pp. 1513
–
1516.
[
21
] L. Lin, W.H. Holmes, and E.
Ambikairajah
, "Adaptive noise estimation algorithm for speech enhancement,"
Electronics Letters
, vol.39, no. 9, pp.754-755, 2003.
[
22
] C.
Ris
and S.
Dupont
, “Assessing local noise level estimation methods: application to noise robust ASR,”
Speech
Commun
.
, vol. 34, no. 1-2, pp. 141–158, 2001.
[
23
] S. K. Basha and P. C. Pandey, “Real-time enhancement of
electrolaryngeal
speech by spectral subtraction,” in
Proc. Nat. Conf. on
Commun
. 2012 (NCC 2012)
,
Kharagpur
, India, 2012, pp. 516
–
520.
[
24
] S. K.
Waddi
, P. C. Pandey, and N.
Tiwari
, “Speech enhancement using spectral subtraction and cascaded-median based noise estimation for hearing impaired listeners,” in
Proc. Nat. Conf.
Commun
. (NCC 2013)
, Delhi, India, 2013, paper no. 1569696063.[25] ITU, “Perceptual evaluation of speech quality (PESQ): an objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs,” ITU-T Rec., P.862, 2001.[26] Y. Hu and P. C. Loizou, “Subjective evaluation and comparison of speech enhancement algorithms,” Speech Communication, vol. 49, pp. 588–601, 2007.[27] T. F. Quatieri, and A. V. Oppenheim, “Iterative techniques for minimum phase signal reconstruction from phase or magnitude,” IEEE Trans. Acoust., Speech, Signal Process., vol. 29, no. 6, pp. 1187–1193, 1981.[28] S. H. Nawab, T. F. Quatieri, and J. S. Lim, “Signal-reconstruction from short time Fourier transform magnitude,” IEEE Trans. Acoust., Speech Signal Process., vol. 31, no. 4, pp. 986–998, 1983.
Slide36[29] L. R. Rabiner and R. W. Schafer, Digital Processing of Speech Signals. Englewood Cliffs, New Jersey: Prentice Hall, 1978, pp. 356–362.
[
30
]
D. W. Griffin and J. S. Lim, “Signal estimation from modified short-time Fourier transform,”
IEEE Trans.
Acoust
., Speech, and Signal Process.
, vol. 32, no. 2, pp. 236–243, 1984.
[
31
] Spectrum Digital, Inc. (2010) TMS320C5515
eZdsp
USB Stick Technical Reference. [online]. Available: support.spectrumdigital.com/boards/usbstk5515/
reva
/files/
usbstk
5515_TechRef_RevA.pdf
[
32
] T
exas Instruments, Inc. (2011) TMS320C5515 Fixed-Point Digital Signal Processor. [online]. Available: focus.ti.com/lit/
ds
/
symlink
/tms320c5515.pdf.
[
33
] Texas Instruments, Inc. (2008) TLV320AIC3204 Ultra Low Power Stereo Audio Codec. [online]. Available: focus.ti.com/lit/
ds
/
symlink
/tlv320aic3204.pdf.