7 NO 6 NOVEMBER 1999 609 Robustness of GroupDelayBased Method for Extraction of Signi64257cant Instants of Excitation from Speech Signals P Satyanarayana Murthy and B Yegnanarayana Senior Member IEEE Abstract In this paper we study the robustness of ID: 23211
Download Pdf The PPT/PDF document "IEEE TRANSACTIONS ON SPEECH AND AUDIO PR..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
IEEETRANSACTIONSONSPEECHANDAUDIOPROCESSING,VOL.7,NO.6,NOVEMBER1999609RobustnessofGroup-Delay-BasedMethodforExtractionofSignicantInstantsofExcitationfromSpeechSignalsP.SatyanarayanaMurthyandB.Yegnanarayana,SeniorMember,IEEEAbstractÐInthispaper,westudytherobustnessofagroup-delay-basedmethodfordeterminingtheinstantsofsignicantexcitationinspeechsignals.Theseinstantscorrespondtotheinstantsofglottalclosureforvoicedspeech.Themethodusesthepropertiesoftheglobalphasecharacteristicsofminimumphase 610IEEETRANSACTIONSONSPEECHANDAUDIOPROCESSING,VOL.7,NO.6,NOVEMBER1999II.DETERMINATIONOFNSTANTSOFXCITATIONInthissection,webrie ypresentthegroup-delay-basedmethodproposedin[9]and[10]fordeterminingtheinstantsofsignicantexcitationfromspeechsignals,andproposesomerenementstothemethod.Themethodisbasedontheglobalphasecharacteristicsofminimumphasesignals.Sincetheaveragegroup-delayofaminimumphasesystemiszero[11],theaverageslopeofthephasespectrumoftheimpulseresponseofthesystemcorrespondstothelocationoftheexcitationimpulsewithintheanalysisframe[9].Inpractice,thecomputedphasespectrumorthegroup-delayfunctiondependsonthewindowfunctionusedforanalysis.Toreducetheeffectsofthewindowfunctionontheestimatedgroup-delayfunction,itispreferabletocomputethegroup-delayfunctionfromtheLPresidualsignal.Theresidualsignalisalsopreferablebecausesomecharacteristicsoftheglottalsourcecanbeseenbetterintheresidualerrorsignalthaninthespeechsignal.Theaverageslopeofthephasespectrumofthespeechsignalisthesamefortheresidualsignalalso,becausetheinverselteroftheLPanalysisisaminimumphasesystem[12].Theresidualsignalisderivedbyinverselteringthespeechsignal,andtheinverselterisobtainedusingLPanalysis.ForLPanalysis,aframesizeofabout25msforevery10msmaybechosen[9],[10].TheinstantsofsignicantexcitationcanbederivedfromtheLPresidualsignalasfollows[10].Aroundeachsamplinginstanta10mssegmentoftheLPresidualsignalisconsideredandthegroup-delayfunctioniscomputedusingtheformula[13] (1)where and aretheFouriertransformsofthewindowedresidual and respectively.Thegroup-delayfunctionissmoothedusingathree-pointmedianltertoremoveanydiscontinuitiesinthegroup-delayfunction.Thenegativeoftheaverageofthesmoothedgroup-delayfunctioniscalledphaseslope.Thephaseslopevalueiscomputedateachsamplinginstanttoobtainthephaseslopefunction.Iftheinstantofsignicantexcitationwithinaframeisatthemidpointoftheframe,thenthephaseslopeiszero.Thereforethepositivezero-crossingsofthephaseslopefunctioncorre-spondtotheinstantsofsignicantexcitation.Short-time(1±3ms)energyoftheLPresidualsignalaroundtheinstantcanbeusedtorepresentthestrengthofexcitationassociatedwiththeinstant[9],[10].Fig.1(a)±(d)showasegmentofspeechsignal,theLPresidualsignal,thephaseslopefunctionandtheextractedinstantswithestimatedstrengths,respectively.Thespeechsignalshowncorrespondstotheutterance where isavoicedpalatalfricativeasin SometimestheLPresidualsignalmaycontainsomespu-riousimpulseswhichmayresultinwrongestimationoftheinstantsofsignicantexcitation,ascanbeseeninFig.1(d),wherethestrengthsarecomputedusingtheshort-timeenergyoftheresidualsignalcenteredaroundtheestimatedinstantsofsignicantexcitation.Theeffectofthesespuriousimpulsescanbereducedbyenhancingtheregionaroundtheinstants (a) (b) (c) (d) Fig.1.(a)Cleanspeechfortheutterance=dzua=:(b)LPresidualsignalderivedfromthesignalin(a).(c)Phaseslopefunction.(d)Signicantinstants,weightedbytheirstrengths,derivedfromthesignalin(a).(e)Signicantinstants,derivedfromthesignalin(a)usingtheproposedalgorithm.ofsignicantexcitationrelativetootherregionsintheLPresidualsignal.ThiscanbeaccomplishedbyderivingaweightfunctionfortheLPresidualsignal.TheweightfunctionisderivedherebysmoothingtheLPresidualsignalwithaHammingwindowofduration0.75ms(eightsamplesat11kHzsamplingrate).Thissmoothingreducesthenoise uctuationsintheresidualsignal.Theshort-timeenergyofthesmoothedresidualsignaliscomputedateverysampleusingaframesizeof1.4ms(15samplesat11kHzsamplingrate).Theshort-timeenergycurvewillhavelargeamplitudesaroundthesignicantinstants.Theshort-timeenergyisnormalizedtoamaximumvalueofoneandisusedasaweightfunctionfortheresidualsignaltoenhancetheregionsintheresidualsignalaroundthesignicantexcitations.Theweightedresidualsignalisusedtoderivetheinstantsofsignicantexcitation.Thephaseslopefunctionissmoothedusingave-pointHammingwindow.Positivezero-crossingsofthesmoothedphaseslopefunctionareusedastheinstantsofsignicantexcitation.Fig.1(e)showstheplotoftheinstantsderivedafterthese MURTHYANDYEGNANARAYANA:EXTRACTIONOFSIGNIFICANTINSTANTSOFEXCITATIONFROMSPEECHSIGNALS611renements.SomeoftheerrorsintheestimationofinstantsinFig.1(d)arecorrectedinFig.1(e).ThedifferentstepsinthealgorithmforthecomputationoftheinstantsofsignicantexcitationaregiveninFig.9.III.MEASUREOFTRENGTHOFXCITATIONReliabilityoftheextractedinstantsdependsonthestrengthofexcitationaroundtheinstants.In[9],[10]theshort-timeenergyoftheLPresidualsignalwasusedtorepresentthestrengthofexcitationateachinstant.Insomecasesitisdifculttousetheshort-timeenergyaroundtheinstantasameasureofthestrength,especiallywhentheresidualsignalisnoisy,asintheregionBCinFig.1(b).Moreover,thederivedresidualsignalenergydependsontheeffectivenessoftheLPanalysisforthesesegments.Weproposeanalternativemeasureforthestrengthofexcitation,whichisbasedontheuseoftheFrobeniusnorm.In[8]theFrobeniusnormofasignalpredictionmatrix,formedbyusingthesamplesinaframeofabout3ms,wasproposedtolocatetheinstantsofglottalclosure.TheFrobeniusnormwascomputedateachsamplinginstant.ThelocationsofthepeaksintheplotoftheFrobeniusnormasafunctionoftimewereconsideredasthedesiredinstants.InthissectionweproposethattheFrobeniusnorm[14]ofthesignalpredictionmatrix[8]formedbyusingthesamplesina3-msframeofdifferencedspeechcenteredaroundtheidentiedinstantofsignicantexcitationcanbeusedtorepresentthestrengthofexcitationatthatinstant.Consideraframeofthedifferencedspeechsignalwith samples, Assumingalinearpredictionorder thefollowingpredictionerrorvectorcanbeformed: (2)where istheToeplitzsignalpredictionmatrixofdimension ............ .. (3)and istheaugmentedvectorofLPC's Assuming arethesamplesofasignalattheoutputofanall-polesystemexcitedbyaperiodicimpulsetrain,thereisalineardependencebetweenthecolumnvectors whentheinstantofexcitationisnotincludedintheanalysisframe[8].Theerrorvectoristhenzero.Butwhentheinstantofexcitationisincluded,thenormoftheerrorvectorgoesup.Theamplitudesofsignalsamplesinthesignalpredictionmatrixalsogoup,becauseoftheexcitation.Thus,theFrobeniusnormofthesignalpredictionmatrix,computedasthesquarerootofthesumofallsquaredelementsofthematrix,alsogoesup.ThesquareoftheEuclideannormof whichisameasureoftheenergy(strength)ofexcitation,isgivenby (4)where istheFrobeniusnormof Theratio isupperboundedby Ignoringthevariationin comparedto wecanuse asameasureofthestrengthofexcitation.ComputingtheEuclideannormof from(2),weget (5) (6)where istheRayleighquotientof [14].ItisshowninAppendixA[see(A.8)]that (7)where arethesingularvaluesof andarealsotheeigenvaluesof ItisalsoknownthatthesquareoftheFrobeniusnormisthesumofsquaredsingularvalues[15].Sowehavetheinequality (8)since isthearithmeticmeanofsquaredsingularvalues.Itisknownthatallthesingularvaluesriseinmagnitudewhenthereisanexcitationwithintheanalysisframeandfallwhenthereisnoexcitation[8].Therefore,both in(7) in(8)trackthesechanges.Therefore canbeusedasameasureofthestrengthofexcitation.Wenotethatthoughthisisameasureofenergyoftheresidualsignal,itiscomputeddirectlyfromthespeechItistobenotedthatsincethesquareoftheFrobeniusnormofthesignalpredictionmatrixisthesumofsquaresofallsamplesinthematrix,itisnothingbuttheshort-timeenergyofthespeechsignalcomputedfromtheweightedsamplesofthespeechsignal.Toillustratetheneedforameasureforthestrengthofexcitation,letusconsiderthedifferentiatedglottalpulses[Fig.2(a)]generatedusingtheLFmodel[16].Alltheparame-tersofthemodelarekeptconstantexceptthetimeconstantofthereturnphaseandtheinstantofpeakpositiveexcitation.Tovarytherateofclosure,thetimeconstantofthereturnphaseisincreasedfrom0.05±1.5msfromlefttoright.Theamplitudesofthepulsesareprogressivelyscaledup(fromlefttoright)sothatallthepulseshaveequalnegativepeakamplitudes.Thesedifferentiatedglottalpulsesareusedtoexciteanall-polemodeltoobtainasyntheticvoicedsoundshowninFig.2(c).Itshouldbenotedthat,intherst40msofthespeechsignal,thesignalcomponentsduetohigherformantscanbeclearlyseen.Thisisduetothesharpclosingphase,whichresultsinamagnitudespectrumoftheexcitationpulsesthatislesssteep.ThisfeatureisnotseeninthelatterportionofthesignalinFig.2(c)duetothegradualclosingphase.Thesecondderivativeoftheglottalpulseandthetwelfth-orderLPresidualsignalareshowninFig.2(b)and(d),respectively.Fromtheseguresitisevident 612IEEETRANSACTIONSONSPEECHANDAUDIOPROCESSING,VOL.7,NO.6,NOVEMBER1999 (a) (b) (c) (d) Fig.2.(a)Differentiatedglottalpulses.(b)Secondderivativeofglottalpulses.(c)Syntheticsignal.(d)Residualsignalderivedfromthesignalin(c).(e)First-orderdifferenceofthesignalin(c).thattheamplitudesoftheexcitationimpulsesarehigherfortheglottalpulseswithsharperclosure.Thestrengthofexcitationishigherforsharperclosure,althoughtheamplitudeandenergyofthespeechsignalinFig.2(c)isnearlythesamethroughoutforalltheglottalpulseshapes.ItshouldbenotedinFig.2(a)thattheenergyconcentrationishigherforthepulsesintheinitialportionthaninthelatterportionofthesignal.IfweconsiderthedifferencedsignalofFig.2(c),asshowninFig.2(e),wenoticethatthestrengthofexcitationisalsoevidentinthedifferencedsignal.Itcanalsobeseenbyconsideringadifferenceoperation onthe transformofthesignal, where correspondstothedifferentiatedglottalpulseexcitation,and correspondstothevocaltractsystem.Wehave Thus,thedifferencedsignalcanbeviewedasasignalthatresultsduetotheexcitationofthevocaltractsystemwiththesecondderivativeoftheglottalpulse.ThesecondderivativeoftheglottalpulseinFig.2(b)andthedifferencedsignalinFig.2(e)bothshowthecharacteristicsofthestrengthofexcitation.TheseguressuggestthattheFrobeniusnormofthedifferencedsignalcanbeusedasameasureofthestrengthofexcitationaroundtheinstantofsignicantexcitation.IV.ROBUSTNESSOFTHEELAYInthissectionweshallexaminetherobustnessofthegroup-delay-basedmethodfortwotypesofdegradations,namely,additiverandomnoiseandecho/reverberation.A.RobustnessAgainstAdditiveNoiseLetusconsideranexcitationsignal consistingofanimpulseofamplitude attime andazero-meanadditivewhiteGaussiannoise TheFouriertransformof is (11)where (12) and arerandomvariablescorrespondingtothemagnitudeandphaseof respectively.Withoutlossofgenerality,thephasespectrum canbeassumedtohaveauniformprobabilitydensityfunctionovertherange [17].Let and bethemagnitudeandphaseof respectively. ItisshowninAppendix-B[see(B.4)]that (14)where denotesensembleaverageand istheexcitationSNR,denedasthelogarithmoftheratioofaverageexcitationsignalpowerpersample totheaveragenoisepowerpersample dB (15)For dB,theupperboundontheexpectedvalueofthemagnitudeof isone.IftheFouriertransformin(11)isevaluatedusingan discreteFouriertransform(DFT),themagnitudeoftheDFT canbeshowntobelessthan with99%condence dB[seeApp.B,(B.7)].Expandingthethird MURTHYANDYEGNANARAYANA:EXTRACTIONOFSIGNIFICANTINSTANTSOFEXCITATIONFROMSPEECHSIGNALS613termontherighthandsideof(13)byTaylorseriesexpansion,thephasetermof(13)canbeapproximatedas Thegroup-delayfunction isgivenby Hence,theaveragevalueofthegroup-delayfunctionisgiven Substituting(16)in(18)andnotingthat isanoddfunctionof andthatthesecondtermin(16)vanishesat wehave i.e.,theaveragevalueofthegroup-delayfunctiongivesthelocationoftheimpulse.Inpractice,thegroup-delayfunctioniscomputedatdiscretefrequencies,andhencethecomputedaveragedeviatesfrom(19).Random uctuationsandspikesappearinthegroup-delayfunction[18].Thesespikesmaybiasthemeanvalueofthegroup-delayfunction.Therefore,itispreferabletousemediansmoothingofthecomputedgroup-delayfunctionbeforecomputingtheaverage.Sofarwehaveconsideredanexcitationsignalcorruptedbyadditivenoise.Letusnowconsideranoisyspeechsignal (20)where isthespeechsignaland istheadditivewhitenoise.Toderivetheinstantsofsignicantexcitation,letusconsidertheLPresidualsignal.ThefrequencyresponseoftheinverselterobtainedfromtheLPanalysisisgivenby (21)where and aretheLPcoefcients(LPC's).Theresidualerrorsignalobtainedafterinverselteringisgivenby (22)where isthecomponentattheoutputoftheinverselterduetothespeechsignal and isthecolorednoiseduetolteringofthewhitenoise Notethateventhoughthespeechsignalisassumedtobetheoutputofanall-polesystem,thenoisysignal correspondstoapole-zerosystem[19].Thepowerspectrumofthecolorednoisecomponent givenby (23) (a) (b) (c) Fig.3.(a)SyntheticspeechofFig.2(c)atanaverageSNRof5dB.(b)LPresidualsignalderivedfromthesignalin(a).(c)Thetruelocationsoftheinstantsofsignicantexcitation.(d)Theinstantsofsignicantexcitationderivedfromthenoisysignalin(a).Thesecondmoment dependsonthefrequency Letusconsidertheworstcasesituation,i.e.,themaximumvalueof Let (24)where isthemaximumvalueof givenby Intheexpressionfor in(15),the isreplacedby Assumingthat theeffective fortheresidualsignalisreduced.Theaboveanalysisisvalidevenwhenthespeechiscorruptedbyadditivecoloredrandomnoise,exceptthat nowalsodependsonthemaximumvalueofthepowerspectrumofthecolorednoise.TherobustnessofestimationoftheinstantofexcitationdependsontheexcitationSNR Foraconstantadditive willdecreaseasthestrengthoftheexcitationdecreases.ThisisillustratedinFig.3foranoisycaseofthesyntheticsignalgeneratedbyexcitinganall-polelterwiththedifferentiatedglottalpulsesofFig.2(a).TheoverallSNRofthespeechsignalis5dB.Notethattheperiodicitycannotbeimmediatelyseenfromthenoisecorruptedspeechsignal.Sinceitisasyntheticcase,thestrengthofexcitationcanbeapproximatedtotheamplitudeofthesecondderivative 614IEEETRANSACTIONSONSPEECHANDAUDIOPROCESSING,VOL.7,NO.6,NOVEMBER1999oftheglottalpulseshowninFig.2(b).Fig.3(c)showstheactualinstantsofsignicantexcitation.Fig.3(d)showstheinstantsofsignicantexcitationestimatedfromthenoisyspeechsignal.Thegureshowsthattheaccuracyoftheextractedinstantsdependsontheexcitationsignal-to-noiseratio.ReliabilityoftheextractedinstantsdecreaseswithadecreaseintheexcitationSNR,ascanbeseenfromthedeviationoftheinstantsinFig.3(d)relativetotheinstantsinFig.3(c).TheexcitationSNR isdenedastheratioofthesquareoftheamplitudeoftheimpulseandthenoisepower.NotethateventhoughtheaverageSNRofthespeechsignalisnearlyconstant,i.e.,5dB,theexcitationSNRisdecreasingfromlefttorightonthetimescale.B.RobustnessAgainstEchoandReverberationLetusconsiderthefollowingreverberantsignal foranimpulseofstrength anddelayedby samples. (26)where istheattenuationfactor and isthedelayduetoreverberation.TheFouriertransformationof(26)yields (27)where and arethemagnitudeandphaseoftheFouriertransformof respectively.Takingnaturallogarithmonbothsidesof(27),weget[20] NeglectingthehigherordertermsintheTaylorseriesexpan-sionofthelasttermabove,thephasecomponentisgiven Thegroup-delayis Themeanvalueofthegroup-delay is Forasingleecho,theterm in(28)canbereplaced Theexpressionforthephaseissameasin(29)andhencethegroup-delayforthecaseofechoissameasin(30).Itshouldbenotedthattheaboveanalysisisvalidonlyundermildechoandreverberantconditions Wehavealsoassumedthatthesignalcharacteristicsarestationary.Duetononstationarityofspeechsignals,themodelofreverberationin(26)maynotbevalidinrealsituations.C.RobustnessDuetoWeightingoftheLPResidualSignalInthissection,weshowthatsuitableweightingoftheLPresidualsignalimprovestherobustnessofthealgorithmforextractionoftheinstantsofsignicantexcitation.ThisisbecausetheexcitationSNR canbeimprovedbyweighting,asshownbelow.Considertheimpulse-in-noisesequence in(10).Let beapositivewindowfunctionsuchthat Let betheweightedexcitationsignal,suchthattheimpulseat isgiventhemaximalweightof ByfollowingthestepsintheanalysispresentedinSectionIV-A,wehave (32)where (33)and isthephaseoftheFouriertransformoftheweightedexcitationsequence Theapproximationin(32)isjustiedprovidedthat Assumingthat arezero-meanGaussianrandomvari-ableswithvariance wehavefrom(33) (34)where FollowingthestepsintheanalysispresentedinAppendixB,wedenetheweightedexcitationSNRas dB dB(36)Using(15),weget NotethatforthecasewithoutweightingoftheLPresidual Therefore,from(35)and(37), Foranyotherwindowfunctionwithabroadpeakaroundthelocationoftheimpulsei.e., Thus,thereissomegainintheexcitationSNR.Forthelimitingcaseofaweightfunctionwithanarrowpeakat thegainintheexcitationSNRtendsto MURTHYANDYEGNANARAYANA:EXTRACTIONOFSIGNIFICANTINSTANTSOFEXCITATIONFROMSPEECHSIGNALS615 (a) (b) (c) (d) (e) Fig.4.(a)Cleanspeechfortheutterance=dzua=:(b)StrengthsofexcitationbasedontheFrobeniusnorm.(c)Speechdegradedbyambientnoise.(d)Signicantinstantsderivedfromthesignalin(c).(e)Telephonespeech.(f)Signicantinstantsderivedfromthesignalin(e).V.PVALUATIONOFELAYInthissection,weconsidersomeexamplesofspeechdataunderactualconditionsofdegradation,andexaminetheperformanceofthegroup-delay-basedmethodforextractionoftheinstantsofsignicantexcitation.SincewedonothaveamethodforestimatingtheSNRofthestrengthofexcitationforsignalswithnaturaldegradations,theresultscanonlybeinterpretedfromouraprioriknowledgeofthecharacteristicsoftheexcitationfordifferentcategoriesofsounds.Whereverappropriate,theFrobeniusnormofthedifferencedspeechsignalcanbeusedasameasureofthestrengthofexcitation.Fig.4showstheperformanceofthealgorithmfornoiseandtelephonechanneldegradationsforthesegmentofspeechgiveninFig.1(a).ThestrengthsofexcitationattheextractedinstantscomputedusingtheFrobeniusnormareshowninFig.4(b).Forthissignal,thestrengthofexcitationislowerforthesegment intheregionBCcomparedtotheregionCD.ThenoisyspeechsignalinFig.4(c)correspondstothesamespeechasinFig.4(a),butrecordedbyamicrophoneplaced50cmawayfromthespeaker.ThesignalintheregionABisaffectedbytheadditivenoisemorethanthesignalintheregionCDduetolowersignalamplitudes.HencetheinstantsextractedforthesignalinregionABarenotreliable.Mostoftheextractedinstants[Fig.4(d)]forthesignalintheregionBCarecorrect,eventhoughinFig.4(c)thereappearstobenovisibleperiodicityinthesignalintheregionBC.FromFig.4(b)and(d),itcanbeseenthattheinstantsarecorrectlyextractedforthesignalintheregionCD.TheresultsaresimilarforthecaseoftelephonespeechshowninFig.4(e)and(f).InthetelephonespeechshowninFig.4(e),thesignalintheregionABislostanditissignicantlyattenuatedintheregionBC.Thisisbecausethelowrstformantofthevowel isseverelyattenuatedduetothebandpassnatureofthetelephonechannelcharacteristics.TheerrorsintheregionABareduetolowlevelsofthesignalitselfinthatregion.ItisimportanttonotethatalthoughthesignallevelishighintheregionBCforthecleanspeech,thestrengthofexcitationislowfortheinstantsinthatregion.Hence,theextractedinstantsinthisregionaremorepronetoerrorscomparedtotheextractedinstantsintheregionCD.Asystematicinvestigationwascarriedouttostudytheaccuracyoftheextractedinstantsforsyntheticandnaturalvowels.HistogramsofthespreadoftheerrorsareshowninFigs.5and6forvesyntheticandnaturalvowels and respectively,foranoverallSNRof10dB.AllthesyntheticvowelsaregeneratedbythesameLF-model-baseddifferentiatedglottalpulses.Thelengthofeachpulsewaschosentobe80samples.Inthecaseofthenaturalvowels,theglottalcycledurationvariedfrom9msforvowel to7msforvowel InFig.5,thehistogramforeachsyntheticvowelisobtainedbycomputingthehistogramofdeviationsoftheestimatedinstantsofsignicantexcitationfromthetruelocationsfor50realizationsofnoise.Therearetenglottalcyclesinthesignalforeachvowelandhenceweget500suchdeviationsforeachvowel.InFig.6,thedeviationsareobtainedbysubtractingtheestimatedlocationsfromthelocationsextractedfromthecleanspeechsignal.Largerspreadofthehistogramsindicateslargerdeviationoftheextractedinstantsfromthetruelocationsoftheinstants.Theerrorsaretypicallylargerfortheclosevowels and thanfortheopenvowels and ForthesyntheticcaseshowninFig.5,alltheinstantshavethesamestrengthandhencethespreadoferrorsislesscomparedtothecaseofnaturalvowels.ItisimportanttonotethatthevariationinthespreadoftheerrorsfordifferentvowelsisalsoduetotheartifactsoftheLPanalysis.ForthesyntheticcaseshowninFig.5,thespreadislargerfortheclosevowels and despitetheexcitationstrengthbeingthesameforallthevevowels,becauseofthedominanceoftherstformantintheLPanalysisofthenoise-corruptedsignalsfortheseclosevowels.ThisisalsotrueinthecaseofnaturalvowelsshowninFig.6.Thereisasystematicbiasintheestimatedlocationsoftheinstantsofexcitationfor 616IEEETRANSACTIONSONSPEECHANDAUDIOPROCESSING,VOL.7,NO.6,NOVEMBER1999 (a) (b) (c) (d) Fig.5.HistogramoferrorsintheestimatedinstantsforvesyntheticvowelsforSNR10dB.(a)=a=,(b)=e=,(c)=i=,(d)=o==u=:thecaseofsyntheticvowels.Thebiasisabouttwosamplesfortheaverageglottalcyclelengthof80samples.Thatis,thebiasisabout3%.ThebiasmayhavebeencausedduetoweightingtheLPresidualsignalbeforecomputingtheinstantsofexcitation.Theweightfunctiondependsonthenatureofthevoicedsound,andtheextentofdegradationcausedbynoise.Thatiswhythebiasispositiveinsomecasesandnegativeinsomeothercases.Errorsintheextractedinstantswerealsostudiedforutter-ancestakenfromthestandardNTIMIT[21],[22]dataformaleandfemalespeech.SincetheTIMIT[23]datawasavailableforreference,thespreadwasestimatedusingthedeviationsoftheextractedinstantsfortheNTIMITdatafromthosefortheTIMITdata.TheTIMITandNTIMITdatatakenforstudywerelowpasslteredanddownsampledto8kHzbeforeprocessing.TheTIMITandNTIMITdatawasrsttime-alignedbeforethedeviationswerecomputed.ThehistogramsoferrorsforonemalevoiceandonefemalevoiceareshowninFigs.7and8.Thedataforthemalevoicecorrespondstothele:intheTIMIT/NTIMIT (a) (b) (c) (d) Fig.6.HistogramoferrorsintheestimatedinstantsforvenaturalvowelsforSNR10dB.(a),(b),(c),(d),(e)database.Thedataforthefemalevoicecorrespondstothele:.Theinstantsofsignicantexcitationwereextractedonlyfromthevoicedregions,whichwereidentiedusingthephonetictranscriptionlesprovidedwiththeTIMITdatabase.FromFigs.7and8,itcanbeseenthattherearemorevaluesofdeviationinthehistogramofdeviationsforfemalespeechthanforthemalespeech.Thisisbecausetheaveragepitchofthefemalespeakerisabout210Hzandthatofthemalespeakerisabout100Hz.Sotherearemoreglottalcyclesintheutteranceofthefemalespeakerthanforthemalespeaker.ThespreadoferrorsislargerfortheseutterancescomparedtotheerrorsforthevowelsinFig.6,becausetheSNRisdifferentfordifferentsegmentsinthiscase,whereasforvowelsitwasconstant.ThespeechSNRvariesoverarangeof20±50dBfortheutterancestakenfromtheTIMITdataandoverarangeof5±40dBfortheutterancestakenfromtheNTIMITdataforbothmaleandfemalevoices.TheSNRfordifferentsegmentswascomputedastheratiooftheenergyofthesignalsamplestotheenergyofthenoisesamplesinthesilenceregions.Thebiasandspread MURTHYANDYEGNANARAYANA:EXTRACTIONOFSIGNIFICANTINSTANTSOFEXCITATIONFROMSPEECHSIGNALS617 Fig.7.HistogramoferrorsfortheutteranceªShehadyourdarksuitingreasywashwaterallyearºutteredbyamalespeaker. Fig.8.HistogramoferrorsfortheutteranceªShehadyourdarksuitingreasywashwaterallyearºutteredbyafemalespeaker.oftheerrorsinFigs.7and8canbeattributednotonlytothevariationsofSNRfordifferentsegments,butalsototheweightfunctionusedontheLPresidualsignalbeforecomputingtheinstantsofexcitation.VI.CInthispaper,wehavedemonstratedthatthegroup-delay-basedmethodproposedin[9]and[10]isindeedrobustagainstdegradationsinspeechduetoadditivenoiseandchanneldistortion.Therobustnessisduetothefactthattheenergyofthesignalisconcentratedaroundtheinstantofsignicantexcitation,whichforvoicedspeechcorrespondstotheinstantaroundglottalclosure.Wehavediscussedtheimportanceofthestrengthofexcitation,whichcannotbedirectlyinferredfromthespeechsignal.WehaveshownthattheerrorsintheextractedinstantsaresmallformanypracticalsignalssuchasintheNTIMITspeechdata.OUNDSONTHEAYLEIGHLetthesingularvaluedecomposition(SVD)[15]of be wherethecolumnsof and aretheleftandrightsingularvectorsof respectively. isthematrixofsingularvalues Therefore (A.2)So aretheeigenvaluesof and thecolumnsof areitseigenvectors.TheRayleighquotientof isdenedas[14] (A.3)where Assumingthattheeigenvaluesof arealldistinct,itseigenvectorsformanorthonormalbasisin Hence, canbeexpressedas (A.4)where arethecomponentsof w.r.t.the Premultiplyingbothsidesof(A.4)by andnotingthat and areitseigenvaluesandeigenvectors,respectively,wehave Premultiplying(A.5)by andnotingthattheeigenvectorsformanorthonormalset,wehave From(A.3),(A.4),and(A.6),wehave From(A.7),itisclearthat i.e.,theRayleighquotientisboundedbytheextremeeigen-valuesof XCITATIONATIOForthezero-meanGaussiandistributedrandomvariables theFouriertransform isacomplexzero-meanGaussianrandomvariable.Thereforewehave Sincethesquareofthemeanisalwayslessthanthesecondmoment,i.e., (B.2) 618IEEETRANSACTIONSONSPEECHANDAUDIOPROCESSING,VOL.7,NO.6,NOVEMBER1999 Fig.9.Algorithmfordeterminationofinstantsofsignicantexcitation.wehave (B.3)Hence (B.4)where istheexcitationSNR: dB(B.5)Letusconsideran -pointdiscreteFouriertransform(DFT)ofthesequencegivenin(10),computedat Itcanbeshown[24]thattherealandimaginarypartsoftheDFTof and are(real)independentidenticallydistributed(i.i.d.)Gaussianrandomvariablesfor Therefore,the and are UndertheseconditionsthemagnitudeoftheDFTof isRayleighdistributed[24].Sincewehavetheknowledgeofboththemeanandvariance weget whichisindeedclosetotheupperbound givenin(B.4)above.FromthecumulativedistributionfunctionofaRayleighdistribution[25],wemaywrite (B.7)where istheprobabilitythat isless From(B.7),wenotethat withmorethan99%condence,when TheauthorswouldliketothankDr.H.A.Murthyforprovidingthedatarequiredforsomeofthestudiesinthis MURTHYANDYEGNANARAYANA:EXTRACTIONOFSIGNIFICANTINSTANTSOFEXCITATIONFROMSPEECHSIGNALS619paper,andthethreeanonymousreviewersfortheircriticalcomments,whichgreatlyhelpedimprovethepresentationofthepaper.[1]K.S.Nathan,Y.-T.Lee,andH.F.Silverman,ªAtime-varyinganalysismethodforrapidtransitionsinspeech,ºIEEETrans.SignalProcessingvol.39,pp.815±824,Apr.1991.[2]A.K.Krishnamurthy,ªGlottalsourceestimationusingasum-of-exponentialsmodel,ºIEEETrans.SignalProcessing,vol.40,pp.682±686,Mar.1992.[3]C.Hamon,E.Moulines,andF.J.Charpentier,ªAdiphonesynthesissystembasedontimedomainprosodicmodicationsofspeech,ºinProc.IEEEInt.Conf.Acoust.,Speech,SignalProcessing,Glasgow,U.K,May1989,pp.238±241.[4]T.V.AnanthapadmanabhaandB.Yegnanarayana,ªEpochextractionfromlinearpredictionresidualforidenticationofclosedglottisinter-IEEETrans.Acoust.,Speech,SignalProcessing,vol.ASSP-27,pp.309±319,Aug.1979.[5]H.W.Strube,ªDeterminationoftheinstantofglottalclosure,ºJ.Acoust.Soc.Amer.,vol.56,pp.1625±1629,1974.[6]T.V.AnanthapadmanabhaandB.Yegnanarayana,ªEpochextractionofvoicedspeech,ºIEEETrans.Acoust.,Speech,SignalProcessing,vol.ASSP-23,pp.562±570,Dec.1975.[7]Y.M.ChengandD.O'Shaughnessy,ªAutomaticandreliableestimationofglottalclosureinstantandperiod,ºIEEETrans.Acoust.,Speech,SignalProcessing,vol.37,pp.1805±1814,Dec.1989.[8]C.Ma,Y.Kamp,andL.F.Willems,ªAFrobeniusnormapproachtoglottalclosuredetectionfromthespeechsignal,ºIEEETrans.Speech,AudioProcessing,vol.2,pp.258±265,Apr.1994.[9]R.SmitsandB.Yegnanarayana,ªDeterminationofinstantsofsignicantexcitationinspeechusinggroupdelayfunctions,ºIEEETrans.Speech,AudioProcessing,vol.3,pp.325±333,Sept.1995.[10]B.YegnanarayanaandR.Smits,ªArobustmethodfordetermininginstantsofmajorexcitationsinvoicedspeech,ºinProc.IEEEInt.Conf.Acoust.,Speech,SignalProcessing,Detroit,MI,May1995,pp.776±779.[11]E.A.Robinson,T.S.Durrani,andL.G.Peardon,GeophysicalSignalProcessing.EnglewoodCliffs,NJ:Prentice-Hall,1986.[12]J.Makhoul,ªLinearprediction:Atutorialreview,ºProc.IEEE,vol.63,pp.561±580,Apr.1975.[13]A.V.OppenheimandR.W.Schafer,DigitalSignalProcessing.En-glewoodCliffs,NJ:Prentice-Hall,1975.[14]G.H.GolubandC.F.VanLoan,MatrixComputations.Baltimore,MD:JohnsHopkinsUniv.Press,1983.[15]S.J.Leon,LinearAlgebrawithApplications.NewYork:Macmillan,[16]G.Fant,ªGlottal ow:Modelsandinteraction,ºJ.Phonet.,vol.14,pp.393±399,Oct.±Dec.1986.[17]X.LiandN.M.Bilgutay,ªWienerlterrealizationfortargetdetectionusinggroupdelaystatistics,ºIEEETrans.SignalProcessing,vol.41,pp.2067±2074,June1993.[18]B.YegnanarayanaandH.A.Murthy,ªSignicanceofgroupdelayfunctionsinspectrumestimation,ºIEEETrans.SignalProcessing,vol.40,pp.2281±2289,Sept.1992.[19]S.M.Kay,ModernSpectralEstimationÐTheoryandApplication.En-glewoodCliffs,NJ:Prentice-Hall,1988.[20]R.C.KemeriatandD.G.Childers,ªSignaldetectionandextractionbycepstrumtechniques,ºIEEETrans.Inform.Theory,vol.IT-18,pp.745±759,Nov.1972.[21]C.Jankowski,A.Kalyanswamy,S.Basson,andJ.Spitz,ªNTIMIT:Aphoneticallybalanced,continuousspeech,telephonebandwidthspeechdatabase,ºinProc.IEEEInt.Conf.Acoust.,Speech,andSignalProcess-,vol.1,Albuquerque,NM,Apr.1990,pp.109±112.[22]C.Jankowski,ªTheNTIMITspeechdatabase,ºfromdocumentationaccompanyingtheNTIMITCD-ROM,NynexSci.Technol.Ctr.,WhitePlains,NY,Jan.1991.[23]W.M.Fisher,G.R.Doddington,andK.M.Goudie-Marshall,ªTheDARPAspeechrecognitionresearchdatabase:Specicationsandsta-tus,ºinProc.DARPAWorkshoponSpeechRecognition,Feb.1986,pp.[24]S.M.Kay,FundamentalsofStatisticalSignalProcessingÐEstimation.EnglewoodCliffs,NJ:Prentice-Hall,1993. P.SatyanarayanaMurthywasborninKakinada,India,in1971.HereceivedtheB.E.degreeinelectronicsandcommunicationengineeringfromChaitanyaBharathiInstituteofTechnology,Osma-niaUniversity,Hyderabad,theM.Tech.andPh.D.degreesinelectricalengineeringfromtheIndianInstituteofTechnology(IIT),Madras,in1994and1999,respectively.FromJanuarytoJuly1994,hewasaSeniorProjectOfcerintheDepartmentofComputerScienceandEngineering,IIT.HeiscurrentlyaManagerwithSpeechandSoftwareTechnologies,Madras.Hisresearchinterestisinspeechsignalprocessing. B.Yegnanarayana(M'78±SM'84)wasborninIndiaonJanuary9,1944.HereceivedtheB.E.,M.E.,andPh.D.degreesinelectricalcommunica-tionengineeringfromtheIndianInstituteofSci-ence,Bangalore,India,in1964,1966,and1974,HewasaLecturerfrom1966to1974andanAssistantProfessorfrom1974to1978intheDe-partmentofElectricalCommunicationEngineering,IndianInstituteofScience.From1966to1971,hewasengagedinthedevelopmentofenvironmentaltestfacilitiesfortheAcousticLaboratory,IndianInstituteofScience.From1977to1980,hewasavisitingAssociateProfessorofcomputerscienceatCarnegieMellonUniversity,Pittsburgh,PA.HewasaVisitingScientistatISROSatelliteCenter,Bangalore,fromJulytoDecember1980.Since1980,hehasbeenaProfessorintheDepartmentofComputerScienceandEngineering,IndianInstituteofTechnology,Madras.HewasaVisitingProfessorattheInstituteforPerceptionResearch,EindhovenTechnicalUniversity,Eindhoven,TheNetherlands,fromJuly1994toJanuary1995.Since1972,hehasbeenworkingonproblemsintheareaofspeechsignalprocessing.Heispresentlyengagedinresearchactivitiesindigitalsignalprocessing,speechrecognition,andneuralnetworks.Dr.YegnanarayanaisamemberoftheComputerSocietyofIndia,aFellowoftheInstitutionofElectronicsandTelecommunicationsEngineersofIndia,aFellowoftheIndianNationalScienceAcademy,andaFellowoftheIndianNationalAcademyofEngineering.