/
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING VOL IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING VOL

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING VOL - PDF document

natalia-silvester
natalia-silvester . @natalia-silvester
Follow
482 views
Uploaded On 2014-12-13

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING VOL - PPT Presentation

7 NO 6 NOVEMBER 1999 609 Robustness of GroupDelayBased Method for Extraction of Signi64257cant Instants of Excitation from Speech Signals P Satyanarayana Murthy and B Yegnanarayana Senior Member IEEE Abstract In this paper we study the robustness of ID: 23211

NOVEMBER

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "IEEE TRANSACTIONS ON SPEECH AND AUDIO PR..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

IEEETRANSACTIONSONSPEECHANDAUDIOPROCESSING,VOL.7,NO.6,NOVEMBER1999609RobustnessofGroup-Delay-BasedMethodforExtractionofSigni cantInstantsofExcitationfromSpeechSignalsP.SatyanarayanaMurthyandB.Yegnanarayana,SeniorMember,IEEEAbstractÐInthispaper,westudytherobustnessofagroup-delay-basedmethodfordeterminingtheinstantsofsigni cantexcitationinspeechsignals.Theseinstantscorrespondtotheinstantsofglottalclosureforvoicedspeech.Themethodusesthepropertiesoftheglobalphasecharacteristicsofminimumphase 610IEEETRANSACTIONSONSPEECHANDAUDIOPROCESSING,VOL.7,NO.6,NOVEMBER1999II.DETERMINATIONOFNSTANTSOFXCITATIONInthissection,webrie ypresentthegroup-delay-basedmethodproposedin[9]and[10]fordeterminingtheinstantsofsigni cantexcitationfromspeechsignals,andproposesomere nementstothemethod.Themethodisbasedontheglobalphasecharacteristicsofminimumphasesignals.Sincetheaveragegroup-delayofaminimumphasesystemiszero[11],theaverageslopeofthephasespectrumoftheimpulseresponseofthesystemcorrespondstothelocationoftheexcitationimpulsewithintheanalysisframe[9].Inpractice,thecomputedphasespectrumorthegroup-delayfunctiondependsonthewindowfunctionusedforanalysis.Toreducetheeffectsofthewindowfunctionontheestimatedgroup-delayfunction,itispreferabletocomputethegroup-delayfunctionfromtheLPresidualsignal.Theresidualsignalisalsopreferablebecausesomecharacteristicsoftheglottalsourcecanbeseenbetterintheresidualerrorsignalthaninthespeechsignal.Theaverageslopeofthephasespectrumofthespeechsignalisthesamefortheresidualsignalalso,becausetheinverse lteroftheLPanalysisisaminimumphasesystem[12].Theresidualsignalisderivedbyinverse lteringthespeechsignal,andtheinverse lterisobtainedusingLPanalysis.ForLPanalysis,aframesizeofabout25msforevery10msmaybechosen[9],[10].Theinstantsofsigni cantexcitationcanbederivedfromtheLPresidualsignalasfollows[10].Aroundeachsamplinginstanta10mssegmentoftheLPresidualsignalisconsideredandthegroup-delayfunctioniscomputedusingtheformula[13] (1)where and aretheFouriertransformsofthewindowedresidual and respectively.Thegroup-delayfunctionissmoothedusingathree-pointmedian ltertoremoveanydiscontinuitiesinthegroup-delayfunction.Thenegativeoftheaverageofthesmoothedgroup-delayfunctioniscalledphaseslope.Thephaseslopevalueiscomputedateachsamplinginstanttoobtainthephaseslopefunction.Iftheinstantofsigni cantexcitationwithinaframeisatthemidpointoftheframe,thenthephaseslopeiszero.Thereforethepositivezero-crossingsofthephaseslopefunctioncorre-spondtotheinstantsofsigni cantexcitation.Short-time(1±3ms)energyoftheLPresidualsignalaroundtheinstantcanbeusedtorepresentthestrengthofexcitationassociatedwiththeinstant[9],[10].Fig.1(a)±(d)showasegmentofspeechsignal,theLPresidualsignal,thephaseslopefunctionandtheextractedinstantswithestimatedstrengths,respectively.Thespeechsignalshowncorrespondstotheutterance where isavoicedpalatalfricativeasin SometimestheLPresidualsignalmaycontainsomespu-riousimpulseswhichmayresultinwrongestimationoftheinstantsofsigni cantexcitation,ascanbeseeninFig.1(d),wherethestrengthsarecomputedusingtheshort-timeenergyoftheresidualsignalcenteredaroundtheestimatedinstantsofsigni cantexcitation.Theeffectofthesespuriousimpulsescanbereducedbyenhancingtheregionaroundtheinstants (a) (b) (c) (d) Fig.1.(a)Cleanspeechfortheutterance=dzua=:(b)LPresidualsignalderivedfromthesignalin(a).(c)Phaseslopefunction.(d)Signi cantinstants,weightedbytheirstrengths,derivedfromthesignalin(a).(e)Signi cantinstants,derivedfromthesignalin(a)usingtheproposedalgorithm.ofsigni cantexcitationrelativetootherregionsintheLPresidualsignal.ThiscanbeaccomplishedbyderivingaweightfunctionfortheLPresidualsignal.TheweightfunctionisderivedherebysmoothingtheLPresidualsignalwithaHammingwindowofduration0.75ms(eightsamplesat11kHzsamplingrate).Thissmoothingreducesthenoise uctuationsintheresidualsignal.Theshort-timeenergyofthesmoothedresidualsignaliscomputedateverysampleusingaframesizeof1.4ms(15samplesat11kHzsamplingrate).Theshort-timeenergycurvewillhavelargeamplitudesaroundthesigni cantinstants.Theshort-timeenergyisnormalizedtoamaximumvalueofoneandisusedasaweightfunctionfortheresidualsignaltoenhancetheregionsintheresidualsignalaroundthesigni cantexcitations.Theweightedresidualsignalisusedtoderivetheinstantsofsigni cantexcitation.Thephaseslopefunctionissmoothedusinga ve-pointHammingwindow.Positivezero-crossingsofthesmoothedphaseslopefunctionareusedastheinstantsofsigni cantexcitation.Fig.1(e)showstheplotoftheinstantsderivedafterthese MURTHYANDYEGNANARAYANA:EXTRACTIONOFSIGNIFICANTINSTANTSOFEXCITATIONFROMSPEECHSIGNALS611re nements.SomeoftheerrorsintheestimationofinstantsinFig.1(d)arecorrectedinFig.1(e).Thedifferentstepsinthealgorithmforthecomputationoftheinstantsofsigni cantexcitationaregiveninFig.9.III.MEASUREOFTRENGTHOFXCITATIONReliabilityoftheextractedinstantsdependsonthestrengthofexcitationaroundtheinstants.In[9],[10]theshort-timeenergyoftheLPresidualsignalwasusedtorepresentthestrengthofexcitationateachinstant.Insomecasesitisdif culttousetheshort-timeenergyaroundtheinstantasameasureofthestrength,especiallywhentheresidualsignalisnoisy,asintheregionBCinFig.1(b).Moreover,thederivedresidualsignalenergydependsontheeffectivenessoftheLPanalysisforthesesegments.Weproposeanalternativemeasureforthestrengthofexcitation,whichisbasedontheuseoftheFrobeniusnorm.In[8]theFrobeniusnormofasignalpredictionmatrix,formedbyusingthesamplesinaframeofabout3ms,wasproposedtolocatetheinstantsofglottalclosure.TheFrobeniusnormwascomputedateachsamplinginstant.ThelocationsofthepeaksintheplotoftheFrobeniusnormasafunctionoftimewereconsideredasthedesiredinstants.InthissectionweproposethattheFrobeniusnorm[14]ofthesignalpredictionmatrix[8]formedbyusingthesamplesina3-msframeofdifferencedspeechcenteredaroundtheidenti edinstantofsigni cantexcitationcanbeusedtorepresentthestrengthofexcitationatthatinstant.Consideraframeofthedifferencedspeechsignalwith samples, Assumingalinearpredictionorder thefollowingpredictionerrorvectorcanbeformed: (2)where istheToeplitzsignalpredictionmatrixofdimension ............ .. (3)and istheaugmentedvectorofLPC's Assuming arethesamplesofasignalattheoutputofanall-polesystemexcitedbyaperiodicimpulsetrain,thereisalineardependencebetweenthecolumnvectors whentheinstantofexcitationisnotincludedintheanalysisframe[8].Theerrorvectoristhenzero.Butwhentheinstantofexcitationisincluded,thenormoftheerrorvectorgoesup.Theamplitudesofsignalsamplesinthesignalpredictionmatrixalsogoup,becauseoftheexcitation.Thus,theFrobeniusnormofthesignalpredictionmatrix,computedasthesquarerootofthesumofallsquaredelementsofthematrix,alsogoesup.ThesquareoftheEuclideannormof whichisameasureoftheenergy(strength)ofexcitation,isgivenby (4)where istheFrobeniusnormof Theratio isupperboundedby Ignoringthevariationin comparedto wecanuse asameasureofthestrengthofexcitation.ComputingtheEuclideannormof from(2),weget (5) (6)where istheRayleighquotientof [14].ItisshowninAppendixA[see(A.8)]that (7)where arethesingularvaluesof andarealsotheeigenvaluesof ItisalsoknownthatthesquareoftheFrobeniusnormisthesumofsquaredsingularvalues[15].Sowehavetheinequality (8)since isthearithmeticmeanofsquaredsingularvalues.Itisknownthatallthesingularvaluesriseinmagnitudewhenthereisanexcitationwithintheanalysisframeandfallwhenthereisnoexcitation[8].Therefore,both in(7) in(8)trackthesechanges.Therefore canbeusedasameasureofthestrengthofexcitation.Wenotethatthoughthisisameasureofenergyoftheresidualsignal,itiscomputeddirectlyfromthespeechItistobenotedthatsincethesquareoftheFrobeniusnormofthesignalpredictionmatrixisthesumofsquaresofallsamplesinthematrix,itisnothingbuttheshort-timeenergyofthespeechsignalcomputedfromtheweightedsamplesofthespeechsignal.Toillustratetheneedforameasureforthestrengthofexcitation,letusconsiderthedifferentiatedglottalpulses[Fig.2(a)]generatedusingtheLFmodel[16].Alltheparame-tersofthemodelarekeptconstantexceptthetimeconstantofthereturnphaseandtheinstantofpeakpositiveexcitation.Tovarytherateofclosure,thetimeconstantofthereturnphaseisincreasedfrom0.05±1.5msfromlefttoright.Theamplitudesofthepulsesareprogressivelyscaledup(fromlefttoright)sothatallthepulseshaveequalnegativepeakamplitudes.Thesedifferentiatedglottalpulsesareusedtoexciteanall-polemodeltoobtainasyntheticvoicedsoundshowninFig.2(c).Itshouldbenotedthat,inthe rst40msofthespeechsignal,thesignalcomponentsduetohigherformantscanbeclearlyseen.Thisisduetothesharpclosingphase,whichresultsinamagnitudespectrumoftheexcitationpulsesthatislesssteep.ThisfeatureisnotseeninthelatterportionofthesignalinFig.2(c)duetothegradualclosingphase.Thesecondderivativeoftheglottalpulseandthetwelfth-orderLPresidualsignalareshowninFig.2(b)and(d),respectively.Fromthese guresitisevident 612IEEETRANSACTIONSONSPEECHANDAUDIOPROCESSING,VOL.7,NO.6,NOVEMBER1999 (a) (b) (c) (d) Fig.2.(a)Differentiatedglottalpulses.(b)Secondderivativeofglottalpulses.(c)Syntheticsignal.(d)Residualsignalderivedfromthesignalin(c).(e)First-orderdifferenceofthesignalin(c).thattheamplitudesoftheexcitationimpulsesarehigherfortheglottalpulseswithsharperclosure.Thestrengthofexcitationishigherforsharperclosure,althoughtheamplitudeandenergyofthespeechsignalinFig.2(c)isnearlythesamethroughoutforalltheglottalpulseshapes.ItshouldbenotedinFig.2(a)thattheenergyconcentrationishigherforthepulsesintheinitialportionthaninthelatterportionofthesignal.IfweconsiderthedifferencedsignalofFig.2(c),asshowninFig.2(e),wenoticethatthestrengthofexcitationisalsoevidentinthedifferencedsignal.Itcanalsobeseenbyconsideringadifferenceoperation onthe transformofthesignal, where correspondstothedifferentiatedglottalpulseexcitation,and correspondstothevocaltractsystem.Wehave Thus,thedifferencedsignalcanbeviewedasasignalthatresultsduetotheexcitationofthevocaltractsystemwiththesecondderivativeoftheglottalpulse.ThesecondderivativeoftheglottalpulseinFig.2(b)andthedifferencedsignalinFig.2(e)bothshowthecharacteristicsofthestrengthofexcitation.These guressuggestthattheFrobeniusnormofthedifferencedsignalcanbeusedasameasureofthestrengthofexcitationaroundtheinstantofsigni cantexcitation.IV.ROBUSTNESSOFTHEELAYInthissectionweshallexaminetherobustnessofthegroup-delay-basedmethodfortwotypesofdegradations,namely,additiverandomnoiseandecho/reverberation.A.RobustnessAgainstAdditiveNoiseLetusconsideranexcitationsignal consistingofanimpulseofamplitude attime andazero-meanadditivewhiteGaussiannoise TheFouriertransformof is (11)where (12) and arerandomvariablescorrespondingtothemagnitudeandphaseof respectively.Withoutlossofgenerality,thephasespectrum canbeassumedtohaveauniformprobabilitydensityfunctionovertherange [17].Let and bethemagnitudeandphaseof respectively. ItisshowninAppendix-B[see(B.4)]that (14)where denotesensembleaverageand istheexcitationSNR,de nedasthelogarithmoftheratioofaverageexcitationsignalpowerpersample totheaveragenoisepowerpersample dB (15)For dB,theupperboundontheexpectedvalueofthemagnitudeof isone.IftheFouriertransformin(11)isevaluatedusingan discreteFouriertransform(DFT),themagnitudeoftheDFT canbeshowntobelessthan with99%con dence dB[seeApp.B,(B.7)].Expandingthethird MURTHYANDYEGNANARAYANA:EXTRACTIONOFSIGNIFICANTINSTANTSOFEXCITATIONFROMSPEECHSIGNALS613termontherighthandsideof(13)byTaylorseriesexpansion,thephasetermof(13)canbeapproximatedas Thegroup-delayfunction isgivenby Hence,theaveragevalueofthegroup-delayfunctionisgiven Substituting(16)in(18)andnotingthat isanoddfunctionof andthatthesecondtermin(16)vanishesat wehave i.e.,theaveragevalueofthegroup-delayfunctiongivesthelocationoftheimpulse.Inpractice,thegroup-delayfunctioniscomputedatdiscretefrequencies,andhencethecomputedaveragedeviatesfrom(19).Random uctuationsandspikesappearinthegroup-delayfunction[18].Thesespikesmaybiasthemeanvalueofthegroup-delayfunction.Therefore,itispreferabletousemediansmoothingofthecomputedgroup-delayfunctionbeforecomputingtheaverage.Sofarwehaveconsideredanexcitationsignalcorruptedbyadditivenoise.Letusnowconsideranoisyspeechsignal (20)where isthespeechsignaland istheadditivewhitenoise.Toderivetheinstantsofsigni cantexcitation,letusconsidertheLPresidualsignal.Thefrequencyresponseoftheinverse lterobtainedfromtheLPanalysisisgivenby (21)where and aretheLPcoef cients(LPC's).Theresidualerrorsignalobtainedafterinverse lteringisgivenby (22)where isthecomponentattheoutputoftheinverse lterduetothespeechsignal and isthecolorednoisedueto lteringofthewhitenoise Notethateventhoughthespeechsignalisassumedtobetheoutputofanall-polesystem,thenoisysignal correspondstoapole-zerosystem[19].Thepowerspectrumofthecolorednoisecomponent givenby (23) (a) (b) (c) Fig.3.(a)SyntheticspeechofFig.2(c)atanaverageSNRof5dB.(b)LPresidualsignalderivedfromthesignalin(a).(c)Thetruelocationsoftheinstantsofsigni cantexcitation.(d)Theinstantsofsigni cantexcitationderivedfromthenoisysignalin(a).Thesecondmoment dependsonthefrequency Letusconsidertheworstcasesituation,i.e.,themaximumvalueof Let (24)where isthemaximumvalueof givenby Intheexpressionfor in(15),the isreplacedby Assumingthat theeffective fortheresidualsignalisreduced.Theaboveanalysisisvalidevenwhenthespeechiscorruptedbyadditivecoloredrandomnoise,exceptthat nowalsodependsonthemaximumvalueofthepowerspectrumofthecolorednoise.TherobustnessofestimationoftheinstantofexcitationdependsontheexcitationSNR Foraconstantadditive willdecreaseasthestrengthoftheexcitationdecreases.ThisisillustratedinFig.3foranoisycaseofthesyntheticsignalgeneratedbyexcitinganall-pole lterwiththedifferentiatedglottalpulsesofFig.2(a).TheoverallSNRofthespeechsignalis5dB.Notethattheperiodicitycannotbeimmediatelyseenfromthenoisecorruptedspeechsignal.Sinceitisasyntheticcase,thestrengthofexcitationcanbeapproximatedtotheamplitudeofthesecondderivative 614IEEETRANSACTIONSONSPEECHANDAUDIOPROCESSING,VOL.7,NO.6,NOVEMBER1999oftheglottalpulseshowninFig.2(b).Fig.3(c)showstheactualinstantsofsigni cantexcitation.Fig.3(d)showstheinstantsofsigni cantexcitationestimatedfromthenoisyspeechsignal.The gureshowsthattheaccuracyoftheextractedinstantsdependsontheexcitationsignal-to-noiseratio.ReliabilityoftheextractedinstantsdecreaseswithadecreaseintheexcitationSNR,ascanbeseenfromthedeviationoftheinstantsinFig.3(d)relativetotheinstantsinFig.3(c).TheexcitationSNR isde nedastheratioofthesquareoftheamplitudeoftheimpulseandthenoisepower.NotethateventhoughtheaverageSNRofthespeechsignalisnearlyconstant,i.e.,5dB,theexcitationSNRisdecreasingfromlefttorightonthetimescale.B.RobustnessAgainstEchoandReverberationLetusconsiderthefollowingreverberantsignal foranimpulseofstrength anddelayedby samples. (26)where istheattenuationfactor and isthedelayduetoreverberation.TheFouriertransformationof(26)yields (27)where and arethemagnitudeandphaseoftheFouriertransformof respectively.Takingnaturallogarithmonbothsidesof(27),weget[20] NeglectingthehigherordertermsintheTaylorseriesexpan-sionofthelasttermabove,thephasecomponentisgiven Thegroup-delayis Themeanvalueofthegroup-delay is Forasingleecho,theterm in(28)canbereplaced Theexpressionforthephaseissameasin(29)andhencethegroup-delayforthecaseofechoissameasin(30).Itshouldbenotedthattheaboveanalysisisvalidonlyundermildechoandreverberantconditions Wehavealsoassumedthatthesignalcharacteristicsarestationary.Duetononstationarityofspeechsignals,themodelofreverberationin(26)maynotbevalidinrealsituations.C.RobustnessDuetoWeightingoftheLPResidualSignalInthissection,weshowthatsuitableweightingoftheLPresidualsignalimprovestherobustnessofthealgorithmforextractionoftheinstantsofsigni cantexcitation.ThisisbecausetheexcitationSNR canbeimprovedbyweighting,asshownbelow.Considertheimpulse-in-noisesequence in(10).Let beapositivewindowfunctionsuchthat Let betheweightedexcitationsignal,suchthattheimpulseat isgiventhemaximalweightof ByfollowingthestepsintheanalysispresentedinSectionIV-A,wehave (32)where (33)and isthephaseoftheFouriertransformoftheweightedexcitationsequence Theapproximationin(32)isjusti edprovidedthat Assumingthat arezero-meanGaussianrandomvari-ableswithvariance wehavefrom(33) (34)where FollowingthestepsintheanalysispresentedinAppendixB,wede netheweightedexcitationSNRas dB dB(36)Using(15),weget NotethatforthecasewithoutweightingoftheLPresidual Therefore,from(35)and(37), Foranyotherwindowfunctionwithabroadpeakaroundthelocationoftheimpulsei.e., Thus,thereissomegainintheexcitationSNR.Forthelimitingcaseofaweightfunctionwithanarrowpeakat thegainintheexcitationSNRtendsto MURTHYANDYEGNANARAYANA:EXTRACTIONOFSIGNIFICANTINSTANTSOFEXCITATIONFROMSPEECHSIGNALS615 (a) (b) (c) (d) (e) Fig.4.(a)Cleanspeechfortheutterance=dzua=:(b)StrengthsofexcitationbasedontheFrobeniusnorm.(c)Speechdegradedbyambientnoise.(d)Signi cantinstantsderivedfromthesignalin(c).(e)Telephonespeech.(f)Signi cantinstantsderivedfromthesignalin(e).V.PVALUATIONOFELAYInthissection,weconsidersomeexamplesofspeechdataunderactualconditionsofdegradation,andexaminetheperformanceofthegroup-delay-basedmethodforextractionoftheinstantsofsigni cantexcitation.SincewedonothaveamethodforestimatingtheSNRofthestrengthofexcitationforsignalswithnaturaldegradations,theresultscanonlybeinterpretedfromouraprioriknowledgeofthecharacteristicsoftheexcitationfordifferentcategoriesofsounds.Whereverappropriate,theFrobeniusnormofthedifferencedspeechsignalcanbeusedasameasureofthestrengthofexcitation.Fig.4showstheperformanceofthealgorithmfornoiseandtelephonechanneldegradationsforthesegmentofspeechgiveninFig.1(a).ThestrengthsofexcitationattheextractedinstantscomputedusingtheFrobeniusnormareshowninFig.4(b).Forthissignal,thestrengthofexcitationislowerforthesegment intheregionBCcomparedtotheregionCD.ThenoisyspeechsignalinFig.4(c)correspondstothesamespeechasinFig.4(a),butrecordedbyamicrophoneplaced50cmawayfromthespeaker.ThesignalintheregionABisaffectedbytheadditivenoisemorethanthesignalintheregionCDduetolowersignalamplitudes.HencetheinstantsextractedforthesignalinregionABarenotreliable.Mostoftheextractedinstants[Fig.4(d)]forthesignalintheregionBCarecorrect,eventhoughinFig.4(c)thereappearstobenovisibleperiodicityinthesignalintheregionBC.FromFig.4(b)and(d),itcanbeseenthattheinstantsarecorrectlyextractedforthesignalintheregionCD.TheresultsaresimilarforthecaseoftelephonespeechshowninFig.4(e)and(f).InthetelephonespeechshowninFig.4(e),thesignalintheregionABislostanditissigni cantlyattenuatedintheregionBC.Thisisbecausethelow rstformantofthevowel isseverelyattenuatedduetothebandpassnatureofthetelephonechannelcharacteristics.TheerrorsintheregionABareduetolowlevelsofthesignalitselfinthatregion.ItisimportanttonotethatalthoughthesignallevelishighintheregionBCforthecleanspeech,thestrengthofexcitationislowfortheinstantsinthatregion.Hence,theextractedinstantsinthisregionaremorepronetoerrorscomparedtotheextractedinstantsintheregionCD.Asystematicinvestigationwascarriedouttostudytheaccuracyoftheextractedinstantsforsyntheticandnaturalvowels.HistogramsofthespreadoftheerrorsareshowninFigs.5and6for vesyntheticandnaturalvowels and respectively,foranoverallSNRof10dB.AllthesyntheticvowelsaregeneratedbythesameLF-model-baseddifferentiatedglottalpulses.Thelengthofeachpulsewaschosentobe80samples.Inthecaseofthenaturalvowels,theglottalcycledurationvariedfrom9msforvowel to7msforvowel InFig.5,thehistogramforeachsyntheticvowelisobtainedbycomputingthehistogramofdeviationsoftheestimatedinstantsofsigni cantexcitationfromthetruelocationsfor50realizationsofnoise.Therearetenglottalcyclesinthesignalforeachvowelandhenceweget500suchdeviationsforeachvowel.InFig.6,thedeviationsareobtainedbysubtractingtheestimatedlocationsfromthelocationsextractedfromthecleanspeechsignal.Largerspreadofthehistogramsindicateslargerdeviationoftheextractedinstantsfromthetruelocationsoftheinstants.Theerrorsaretypicallylargerfortheclosevowels and thanfortheopenvowels and ForthesyntheticcaseshowninFig.5,alltheinstantshavethesamestrengthandhencethespreadoferrorsislesscomparedtothecaseofnaturalvowels.ItisimportanttonotethatthevariationinthespreadoftheerrorsfordifferentvowelsisalsoduetotheartifactsoftheLPanalysis.ForthesyntheticcaseshowninFig.5,thespreadislargerfortheclosevowels and despitetheexcitationstrengthbeingthesameforallthe vevowels,becauseofthedominanceofthe rstformantintheLPanalysisofthenoise-corruptedsignalsfortheseclosevowels.ThisisalsotrueinthecaseofnaturalvowelsshowninFig.6.Thereisasystematicbiasintheestimatedlocationsoftheinstantsofexcitationfor 616IEEETRANSACTIONSONSPEECHANDAUDIOPROCESSING,VOL.7,NO.6,NOVEMBER1999 (a) (b) (c) (d) Fig.5.Histogramoferrorsintheestimatedinstantsfor vesyntheticvowelsforSNR10dB.(a)=a=,(b)=e=,(c)=i=,(d)=o==u=:thecaseofsyntheticvowels.Thebiasisabouttwosamplesfortheaverageglottalcyclelengthof80samples.Thatis,thebiasisabout3%.ThebiasmayhavebeencausedduetoweightingtheLPresidualsignalbeforecomputingtheinstantsofexcitation.Theweightfunctiondependsonthenatureofthevoicedsound,andtheextentofdegradationcausedbynoise.Thatiswhythebiasispositiveinsomecasesandnegativeinsomeothercases.Errorsintheextractedinstantswerealsostudiedforutter-ancestakenfromthestandardNTIMIT[21],[22]dataformaleandfemalespeech.SincetheTIMIT[23]datawasavailableforreference,thespreadwasestimatedusingthedeviationsoftheextractedinstantsfortheNTIMITdatafromthosefortheTIMITdata.TheTIMITandNTIMITdatatakenforstudywerelowpass lteredanddownsampledto8kHzbeforeprocessing.TheTIMITandNTIMITdatawas rsttime-alignedbeforethedeviationswerecomputed.ThehistogramsoferrorsforonemalevoiceandonefemalevoiceareshowninFigs.7and8.Thedataforthemalevoicecorrespondstothe le:intheTIMIT/NTIMIT (a) (b) (c) (d) Fig.6.Histogramoferrorsintheestimatedinstantsfor venaturalvowelsforSNR10dB.(a),(b),(c),(d),(e)database.Thedataforthefemalevoicecorrespondstothe le:.Theinstantsofsigni cantexcitationwereextractedonlyfromthevoicedregions,whichwereidenti edusingthephonetictranscription lesprovidedwiththeTIMITdatabase.FromFigs.7and8,itcanbeseenthattherearemorevaluesofdeviationinthehistogramofdeviationsforfemalespeechthanforthemalespeech.Thisisbecausetheaveragepitchofthefemalespeakerisabout210Hzandthatofthemalespeakerisabout100Hz.Sotherearemoreglottalcyclesintheutteranceofthefemalespeakerthanforthemalespeaker.ThespreadoferrorsislargerfortheseutterancescomparedtotheerrorsforthevowelsinFig.6,becausetheSNRisdifferentfordifferentsegmentsinthiscase,whereasforvowelsitwasconstant.ThespeechSNRvariesoverarangeof20±50dBfortheutterancestakenfromtheTIMITdataandoverarangeof5±40dBfortheutterancestakenfromtheNTIMITdataforbothmaleandfemalevoices.TheSNRfordifferentsegmentswascomputedastheratiooftheenergyofthesignalsamplestotheenergyofthenoisesamplesinthesilenceregions.Thebiasandspread MURTHYANDYEGNANARAYANA:EXTRACTIONOFSIGNIFICANTINSTANTSOFEXCITATIONFROMSPEECHSIGNALS617 Fig.7.HistogramoferrorsfortheutteranceªShehadyourdarksuitingreasywashwaterallyearºutteredbyamalespeaker. Fig.8.HistogramoferrorsfortheutteranceªShehadyourdarksuitingreasywashwaterallyearºutteredbyafemalespeaker.oftheerrorsinFigs.7and8canbeattributednotonlytothevariationsofSNRfordifferentsegments,butalsototheweightfunctionusedontheLPresidualsignalbeforecomputingtheinstantsofexcitation.VI.CInthispaper,wehavedemonstratedthatthegroup-delay-basedmethodproposedin[9]and[10]isindeedrobustagainstdegradationsinspeechduetoadditivenoiseandchanneldistortion.Therobustnessisduetothefactthattheenergyofthesignalisconcentratedaroundtheinstantofsigni cantexcitation,whichforvoicedspeechcorrespondstotheinstantaroundglottalclosure.Wehavediscussedtheimportanceofthestrengthofexcitation,whichcannotbedirectlyinferredfromthespeechsignal.WehaveshownthattheerrorsintheextractedinstantsaresmallformanypracticalsignalssuchasintheNTIMITspeechdata.OUNDSONTHEAYLEIGHLetthesingularvaluedecomposition(SVD)[15]of be wherethecolumnsof and aretheleftandrightsingularvectorsof respectively. isthematrixofsingularvalues Therefore (A.2)So aretheeigenvaluesof and thecolumnsof areitseigenvectors.TheRayleighquotientof isde nedas[14] (A.3)where Assumingthattheeigenvaluesof arealldistinct,itseigenvectorsformanorthonormalbasisin Hence, canbeexpressedas (A.4)where arethecomponentsof w.r.t.the Premultiplyingbothsidesof(A.4)by andnotingthat and areitseigenvaluesandeigenvectors,respectively,wehave Premultiplying(A.5)by andnotingthattheeigenvectorsformanorthonormalset,wehave From(A.3),(A.4),and(A.6),wehave From(A.7),itisclearthat i.e.,theRayleighquotientisboundedbytheextremeeigen-valuesof XCITATIONATIOForthezero-meanGaussiandistributedrandomvariables theFouriertransform isacomplexzero-meanGaussianrandomvariable.Thereforewehave Sincethesquareofthemeanisalwayslessthanthesecondmoment,i.e., (B.2) 618IEEETRANSACTIONSONSPEECHANDAUDIOPROCESSING,VOL.7,NO.6,NOVEMBER1999 Fig.9.Algorithmfordeterminationofinstantsofsigni cantexcitation.wehave (B.3)Hence (B.4)where istheexcitationSNR: dB(B.5)Letusconsideran -pointdiscreteFouriertransform(DFT)ofthesequencegivenin(10),computedat Itcanbeshown[24]thattherealandimaginarypartsoftheDFTof and are(real)independentidenticallydistributed(i.i.d.)Gaussianrandomvariablesfor Therefore,the and are UndertheseconditionsthemagnitudeoftheDFTof isRayleighdistributed[24].Sincewehavetheknowledgeofboththemeanandvariance weget whichisindeedclosetotheupperbound givenin(B.4)above.FromthecumulativedistributionfunctionofaRayleighdistribution[25],wemaywrite (B.7)where istheprobabilitythat isless From(B.7),wenotethat withmorethan99%con dence,when TheauthorswouldliketothankDr.H.A.Murthyforprovidingthedatarequiredforsomeofthestudiesinthis MURTHYANDYEGNANARAYANA:EXTRACTIONOFSIGNIFICANTINSTANTSOFEXCITATIONFROMSPEECHSIGNALS619paper,andthethreeanonymousreviewersfortheircriticalcomments,whichgreatlyhelpedimprovethepresentationofthepaper.[1]K.S.Nathan,Y.-T.Lee,andH.F.Silverman,ªAtime-varyinganalysismethodforrapidtransitionsinspeech,ºIEEETrans.SignalProcessingvol.39,pp.815±824,Apr.1991.[2]A.K.Krishnamurthy,ªGlottalsourceestimationusingasum-of-exponentialsmodel,ºIEEETrans.SignalProcessing,vol.40,pp.682±686,Mar.1992.[3]C.Hamon,E.Moulines,andF.J.Charpentier,ªAdiphonesynthesissystembasedontimedomainprosodicmodi cationsofspeech,ºinProc.IEEEInt.Conf.Acoust.,Speech,SignalProcessing,Glasgow,U.K,May1989,pp.238±241.[4]T.V.AnanthapadmanabhaandB.Yegnanarayana,ªEpochextractionfromlinearpredictionresidualforidenti cationofclosedglottisinter-IEEETrans.Acoust.,Speech,SignalProcessing,vol.ASSP-27,pp.309±319,Aug.1979.[5]H.W.Strube,ªDeterminationoftheinstantofglottalclosure,ºJ.Acoust.Soc.Amer.,vol.56,pp.1625±1629,1974.[6]T.V.AnanthapadmanabhaandB.Yegnanarayana,ªEpochextractionofvoicedspeech,ºIEEETrans.Acoust.,Speech,SignalProcessing,vol.ASSP-23,pp.562±570,Dec.1975.[7]Y.M.ChengandD.O'Shaughnessy,ªAutomaticandreliableestimationofglottalclosureinstantandperiod,ºIEEETrans.Acoust.,Speech,SignalProcessing,vol.37,pp.1805±1814,Dec.1989.[8]C.Ma,Y.Kamp,andL.F.Willems,ªAFrobeniusnormapproachtoglottalclosuredetectionfromthespeechsignal,ºIEEETrans.Speech,AudioProcessing,vol.2,pp.258±265,Apr.1994.[9]R.SmitsandB.Yegnanarayana,ªDeterminationofinstantsofsigni cantexcitationinspeechusinggroupdelayfunctions,ºIEEETrans.Speech,AudioProcessing,vol.3,pp.325±333,Sept.1995.[10]B.YegnanarayanaandR.Smits,ªArobustmethodfordetermininginstantsofmajorexcitationsinvoicedspeech,ºinProc.IEEEInt.Conf.Acoust.,Speech,SignalProcessing,Detroit,MI,May1995,pp.776±779.[11]E.A.Robinson,T.S.Durrani,andL.G.Peardon,GeophysicalSignalProcessing.EnglewoodCliffs,NJ:Prentice-Hall,1986.[12]J.Makhoul,ªLinearprediction:Atutorialreview,ºProc.IEEE,vol.63,pp.561±580,Apr.1975.[13]A.V.OppenheimandR.W.Schafer,DigitalSignalProcessing.En-glewoodCliffs,NJ:Prentice-Hall,1975.[14]G.H.GolubandC.F.VanLoan,MatrixComputations.Baltimore,MD:JohnsHopkinsUniv.Press,1983.[15]S.J.Leon,LinearAlgebrawithApplications.NewYork:Macmillan,[16]G.Fant,ªGlottal ow:Modelsandinteraction,ºJ.Phonet.,vol.14,pp.393±399,Oct.±Dec.1986.[17]X.LiandN.M.Bilgutay,ªWiener lterrealizationfortargetdetectionusinggroupdelaystatistics,ºIEEETrans.SignalProcessing,vol.41,pp.2067±2074,June1993.[18]B.YegnanarayanaandH.A.Murthy,ªSigni canceofgroupdelayfunctionsinspectrumestimation,ºIEEETrans.SignalProcessing,vol.40,pp.2281±2289,Sept.1992.[19]S.M.Kay,ModernSpectralEstimationÐTheoryandApplication.En-glewoodCliffs,NJ:Prentice-Hall,1988.[20]R.C.KemeriatandD.G.Childers,ªSignaldetectionandextractionbycepstrumtechniques,ºIEEETrans.Inform.Theory,vol.IT-18,pp.745±759,Nov.1972.[21]C.Jankowski,A.Kalyanswamy,S.Basson,andJ.Spitz,ªNTIMIT:Aphoneticallybalanced,continuousspeech,telephonebandwidthspeechdatabase,ºinProc.IEEEInt.Conf.Acoust.,Speech,andSignalProcess-,vol.1,Albuquerque,NM,Apr.1990,pp.109±112.[22]C.Jankowski,ªTheNTIMITspeechdatabase,ºfromdocumentationaccompanyingtheNTIMITCD-ROM,NynexSci.Technol.Ctr.,WhitePlains,NY,Jan.1991.[23]W.M.Fisher,G.R.Doddington,andK.M.Goudie-Marshall,ªTheDARPAspeechrecognitionresearchdatabase:Speci cationsandsta-tus,ºinProc.DARPAWorkshoponSpeechRecognition,Feb.1986,pp.[24]S.M.Kay,FundamentalsofStatisticalSignalProcessingÐEstimation.EnglewoodCliffs,NJ:Prentice-Hall,1993. P.SatyanarayanaMurthywasborninKakinada,India,in1971.HereceivedtheB.E.degreeinelectronicsandcommunicationengineeringfromChaitanyaBharathiInstituteofTechnology,Osma-niaUniversity,Hyderabad,theM.Tech.andPh.D.degreesinelectricalengineeringfromtheIndianInstituteofTechnology(IIT),Madras,in1994and1999,respectively.FromJanuarytoJuly1994,hewasaSeniorProjectOf cerintheDepartmentofComputerScienceandEngineering,IIT.HeiscurrentlyaManagerwithSpeechandSoftwareTechnologies,Madras.Hisresearchinterestisinspeechsignalprocessing. B.Yegnanarayana(M'78±SM'84)wasborninIndiaonJanuary9,1944.HereceivedtheB.E.,M.E.,andPh.D.degreesinelectricalcommunica-tionengineeringfromtheIndianInstituteofSci-ence,Bangalore,India,in1964,1966,and1974,HewasaLecturerfrom1966to1974andanAssistantProfessorfrom1974to1978intheDe-partmentofElectricalCommunicationEngineering,IndianInstituteofScience.From1966to1971,hewasengagedinthedevelopmentofenvironmentaltestfacilitiesfortheAcousticLaboratory,IndianInstituteofScience.From1977to1980,hewasavisitingAssociateProfessorofcomputerscienceatCarnegieMellonUniversity,Pittsburgh,PA.HewasaVisitingScientistatISROSatelliteCenter,Bangalore,fromJulytoDecember1980.Since1980,hehasbeenaProfessorintheDepartmentofComputerScienceandEngineering,IndianInstituteofTechnology,Madras.HewasaVisitingProfessorattheInstituteforPerceptionResearch,EindhovenTechnicalUniversity,Eindhoven,TheNetherlands,fromJuly1994toJanuary1995.Since1972,hehasbeenworkingonproblemsintheareaofspeechsignalprocessing.Heispresentlyengagedinresearchactivitiesindigitalsignalprocessing,speechrecognition,andneuralnetworks.Dr.YegnanarayanaisamemberoftheComputerSocietyofIndia,aFellowoftheInstitutionofElectronicsandTelecommunicationsEngineersofIndia,aFellowoftheIndianNationalScienceAcademy,andaFellowoftheIndianNationalAcademyofEngineering.