/
Structured  sparse acoustic modeling for speech separation Structured  sparse acoustic modeling for speech separation

Structured sparse acoustic modeling for speech separation - PowerPoint Presentation

maisie
maisie . @maisie
Follow
70 views
Uploaded On 2023-11-18

Structured sparse acoustic modeling for speech separation - PPT Presentation

Afsaneh Asaei Joint work with Mohammad Golbabaee Herve Bourlard Volkan Cevher φ 21 φ 52 s 1 s 2 s 3 s 4 s 5 x 1 x 2 φ 11 φ 42 2 Speech Separation Problem ID: 1032725

speech acoustic sparsity sparse acoustic speech sparse sparsity source sources structured spectral model overlapping recovery channel separation recognition room

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Structured sparse acoustic modeling for..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1. Structured sparse acoustic modeling for speech separationAfsaneh AsaeiJoint work with: Mohammad Golbabaee,Herve Bourlard, Volkan Cevher

2. φ21φ52s1s2s3 s4s5x1x2φ11φ422Speech Separation ProblemSPARSITY is essential to deal with the ill-posed source separation problem

3. 3Listening resultshttp://www.idiap.ch/~aasaei/MONC-Demo.html

4. Incorporation of acoustic channel model for speech separationCast speech separation problem as spatio-spectral information recovery from compressive acoustic measurementsKey ideaStructured Sparse Speech RepresentationAcoustic Reverberation ModelsMicrophone ArraySpeech Separation Structured Sparse Acoustic Modeling

5. Spectrographic SpeechSource 1Source 2Source 3Overlapping speechN sourcesM sensor< M source < M source5Spectral Sparsity

6. Spectral sparsityCompressibility of speech information bearing componentsEnables high accuracy speech recognition original spectrogramauditory spectrogramFigs. Ref. “Hearing is Believing”, R. Stern and N. Morgan, IEEE SPS Mag. Nov. 2012

7. Spectral sparsity Disjointness of overlapping spectrographic speech Histogram of the energy of point-wise multiplication of two histograms of independent sources Diagonal Gram matrix

8. X21X22X23X24X25X16X17X18X19X20X11X12X13X14X15X6X7X8X9X10X1X2X3X4X58Spatial sparsity Discretization of the planar area of the roomLocation of sound sources is sparse X21X22X23X24X25X16X17X18X19X20X11X12X13X14X15X6X7X8X9X10X1X2X3X4X5

9. 9objectiveSpatio-spectral sparse representation of overlapping speech sourcesGOAL: Model the acoustic reverberant channel Number of MicrophonesNumber of cells on a Grid

10. multipath channelReflection coefficientSpeed of soundSensor locationSource locationNumber of reflectionsMicrophone array measurement matrixImage Model and Green’s function of sound propagation

11. Structured sparsity underlying multipath propagationSpatial sparsity actual sourcesStructured sparsityactual-virtual sourcesReverberant acousticImage Map

12. New factorized formulation of multipath acquisition Free-space Green’s function matrix Permutation map; Actual sources  actual/virtual sources Source matrix; spatio-spectral content of frames at a given frequency Image map of ith sourcefactorized formulation XOS=P

13. Measurement correlation Structured sparsity underlying correlation matrixGoal: estimation of Enables source localization and absorption coefficients estimation

14. group sparse representationKronecker product property Kronecker product Element-wise conjugate (number of sources) groups of contain nonzero elements Identifying those groups determines source locationRecovering the corresponding elements of and normalization by source energy determines absorption coefficients

15. joint localization & absorption coefficient estimationGroup sparse recovery

16. Room impulse response

17. Numerical evaluationsMultichannel overlapping numbers corpus (MONC)Numbers corpus are played backRecorded by 8-channel circular array in a room 8.2m×3.6m×2.4mReverberation time is 300 msInverse filtering the acoustic channel following by linear post-filtering to enhance the separated signals17

18. Absorption coefficients

19. Word recognition rate19

20. Perceptual quality20

21. Concluding remark Characterization of the acoustic measurements for reverberant enclosures enables acoustic-aware source separationHigh quality and recognition rate Estimation of the reflections and attenuations for an unconstrained environment Reconstruction of the sound field using plenacoustic function Calibration of the acoustic measurement model Non-uniform sampling the acoustic field Extension to continuous sources Incorporation of signal dependent models and low-rank structures Post-processing of the signal recovery residual error

22. J. B. Allen and D. A. Berkley, “Image method for efficiently simulating small-room acoustics,” Journal of Acoustical Society of America, vol. 60(s1), 1979.A. Asaei, M. Golbabaee, H. Bourlard, and V. Cevher, “Structured Sparsity Models for Multiparty Speech Recovery from Convolutive Recordings,” TASL submission, 2012.“Can one hear the shape of a room: The 2-D polygonal case”, I. Dokmanic, Y. M. Lu and M. Vetterli, ICASSP 2011.A. Asaei, H. Bourlard, and V. Cevher, “Model-based compressive sensing for multi-party distant speech recognition,” in Intl. Conference on Acoustic Speech and Signal Processing (ICASSP), 2011.“The Multichannel Overlapping Numbers Corpus,” Idiap resources available online:, http://www.cslu.ogi.edu/corpora/monc.pdf 22referencesThank you!