/
Electronicmail:loizou@utdallas.eduJ.Acoust.Soc.Am.,August2007 Electronicmail:loizou@utdallas.eduJ.Acoust.Soc.Am.,August2007

Electronicmail:loizou@utdallas.eduJ.Acoust.Soc.Am.,August2007 - PDF document

karlyn-bohler
karlyn-bohler . @karlyn-bohler
Follow
370 views
Uploaded On 2015-07-29

Electronicmail:loizou@utdallas.eduJ.Acoust.Soc.Am.,August2007 - PPT Presentation

getsignalSeveralstudiesetal2003RomanandWang2006Cooke2006Brungartetal2006Anzaloneetal2006haveattemptedtoanswerthesequestionsanddemonstratedthatspeechsynthesizedfromtheidealbinarymaskishigh ID: 96040

getsignal.Severalstudiesetal. 2003;RomanandWang 2006;Cooke 2006;Brungartetal. 2006;Anzaloneetal. 2006haveattemptedtoanswerthesequestionsanddemonstratedthatspeechsynthesizedfromtheidealbinarymaskishigh

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Electronicmail:loizou@utdallas.eduJ.Acou..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Electronicmail:loizou@utdallas.eduJ.Acoust.Soc.Am.,August2007©2007AcousticalSocietyofAmerica1165/1165/8/$23.00 getsignal.Severalstudiesetal.,2003;RomanandWang,2006;Cooke,2006;Brungartetal.,2006;Anzaloneetal.,2006haveattemptedtoanswerthesequestionsanddemonstratedthatspeechsynthesizedfromtheidealbinarymaskishighlyintelligibleevenwhenextractedfrommulti-sourcemixturesetal.,2003orunderreverberantRomanandWang,2006.Theidealbinary“mask”takesvaluesof0and1,andisconstructedbycom-paringthelocalSNRineachunitagainstathresholde.g.,0dB.Theidealmaskiscommonlyappliedtotherepresentationofamixturesignalandeliminatesportionsofasignalthoseassignedtoa“0”valuewhileallowingothersthoseassignedtoa“1”valuetopassthroughintact.Romanetal.assessedtheperformanceofanalgorithmthatusedlocationcuesandanidealtime-frequencybinarymasktosynthesizespeech.Largeimprovementsinintelligibilitywereobtainedfrompartialspectro-temporalinformationex-tractedfromtheidealtime-frequencymask.SimilarndingswerealsoreportedbyBrungartetal.,forarangeofSNRthresholds12to0dBusedforconstructingtheidealbinarymask.AdifferentmethodforconstructingtheidealbinarymaskwasusedbyAnzaloneetal.basedoncomparisonsofthespeechenergydetectedinvariousbandsagainstapresetthreshold.Thethresholdvaluewaschosensuchthataxedpercentageofthetotalenergycon-tainedintheentirestimuluswasabovethisthreshold.Re-sultswiththeidealspeechenergydetectorindicatedsigni-cantreductionsinspeechreceptionthresholdsSRTsbothnormal-hearingandhearing-impairedlisteners.Cookeusedacomputationalmodelofglimpsingalongwithbehavioraldatacollectedfromnormal-hearinglistenersonaconsonantidenticationtask.SeveraldifferentglimpsingmodelsweretesteddifferinginthelocalSNRusedforde-tection,theminimumglimpsesize,andtheuseofinforma-tioninthemaskedregions.Closetstolistener’sperfor-manceonaconsonanttaskwereobtainedwithlocalSNRthresholdsintherangeof2to8dB.Theidealtime-frequencymaskusedintheaboveintel-ligibilitystudiesforsynthesizingspeechmakestheimplicitassumptionthatallunitsfallingbelowaprescribedSNRe.g.,0dBarenotdetectableandshouldthereforebeeliminated.Whilethisassumptionisvalidinsituationswhereinthereislittleornospectraloverlapbetweenthemaskerandthetargetsignalinindividualunits,itisnotvalidforspeechbabbleorotherbroadbandtypeofmaskerswherethereexistsagreatdealofspectraloverlapbetweenthemaskerandthetarget.Itisverylikelythatthemaskerhasenoughenergytodistortthesignal,butnottothepointthatitmakesthetargetsignalundetectable.Nonsimultaneousmaskingeffects,forinstance,arenottakenintoaccountwhenzeroingouttheunitsfallingbelowtheSNRthreshold.Furthermore,itisknownfromintelligibilitystud-Drullman,1995thattheweakelementsofspeechlyingbelowthenoiseleveldocontributetosomeextentupto4dBtointelligibilityandshouldthereforebepreserved.Adifferentapproachistakeninthispapertoaddresstheabovelimitationsofusingtheidealbinarymaskasatooltostudyspeechsegregationorauditorysceneanalysis.Intheproposedapproach,ratherthaneliminatingcompletelyanyunitfallingbelowtheSNRthreshold,weconsiderre-tainingthoseunits.Themaskisnolongerbinarybuttakesrealvalues.Intheproposedapproach,speechissyn-thesizedbyretainingallunitsfallingbelowthelocalSNRthresholdwhilecarefullycontrollingthedurationandfrequencyregionoftheunitsabovetheSNRthreshold.Thesynthesizedstimulibetterapproximatetheacousticstimuliencounteredbynormal-hearinglistenersinareal-worldnoisyscenario.Underthisframework,thepresentstudyaimstoanswerthequestionofwhatisausefulglimpseandexaminethevariousfactorsthatcouldpotentiallyinu-enceglimpsinginnoise.Thetotaldurationofglimpsingisoneofmanyfactorshypothesizedtoinuenceperformance.InmostCASA-basedmethods,itisassumedthatglimpsingopportunitiesareavail-ablethroughouttheutterance.Inpractice,onlyaportionofthesignalmightbeglimpsed,whichinturnraisestheques-tion:Whatistheminimumdurationofglimpsingrequiredtoachievehighlevelsofperformance?Anexperimentiscon-ductedinthepresentstudytoanswerthisquestion.InthestudybyMillerandLicklider,50%ofthestimuluswasuninterruptedandavailableforglimpsing,withperfor-mancesteadilyimprovingasthetotaldurationincreased.Listeners,however,hadaccesstothefullspectrumduringtheuninterruptedportionsofspeech,anassumptionthatgen-erallydoesnotholdinacomplexlisteningsituation.Onlyaportionofthespectrumistypicallyavailabletolistenersforglimpsinginnoisyenvironmentsdependingonthetemporal/spectralcharacteristicsofthemasker.This,inturn,raisesanotherquestion:Whatistheinuenceofthelocationand/orwidthofthefrequencyregionthatisavailableforglimpsing?Clearly,theglimpsewindowwidthi.e.,glimpsewindowwillaffecttheanswertothisquestion,andforthatreasonweexaminesystematicallyinexperiment.1theinu-enceofglimpsewindowwidthfordifferentfrequencyre-gionsofglimpsing.Previousstudiesshowedthatlistenerscanexploitglimpsewindowwidthslastingaslongasapho-nemeforsentence/wordrecognitiontaskse.g.,MillerandLicklider,1950,andasshortas10msforadouble-vowelidenticationtaskCullingandDarwin,1994.Inmostofthesestudies,however,listenershadeitheraccesstothefullspectrumordisjointsegmentsofthespectrumi.e.,“check-erboard”noiseoccurringperiodicallyintime.Thesecondi-tionsmightnotreectthetruescenarioinnoisyenviron-mentsfacedbylistenerswhereinglimpsingopportunitiesmayoccurrandomlyinbothtimeandfrequency.Thendingsfromthepresentstudyhaveimportantim-plicationsforCASAandspeechenhancementalgorithmsaimingtoimprovespeechintelligibility.Inmanyoftheabovestudies,itisassumedthatanidealbinarymaskisavailablethroughouttheutteranceandacrossthewholespectrum.Inapracticalsystem,thebinarymaskneedstobeestimatedfromthenoisydata,andthatisachallengingtask,particularlyinadversenoisyconditions.Sinceitispracti-callyimpossibletocomputeaccuratelytheidealbinarymaskforallframesandallfrequencies,itisofinteresttodeter-mineattheveryleasttheregioninthespectrumthatisperceptuallymostimportantandalsotheminimumdurationofglimpsingrequiredtosynthesizehighlyintelligible1166J.Acoust.Soc.Am.,Vol.122,No.2,August2007N.LiandP.C.Loizou:Glimpsinginnoise 0–1kHz,amiddle-frequency1–3kHz,andahigh-frequency3kHz.Thesebandswerechosentoassesstheindividualcontributionofformantfrequencies1andonglimps-inginnoise.TheLFbandcontainsprimarily1informationandtheMFbandcontains2information.Inadditiontotheabovethreebands,wealsoconsideredalow-to-mid-LF+MFband:0–3kHz.Thisbandwasincludedasitcontainsboth1and2informationcriticallyimpor-tantforspeechrecognition.Forcomparativepurposes,wealsoconsideredthefollowingtwoconditions:aconditionspanningthefullsignalbandwidth,andacondition,termedRF,inwhichtheLF,MFandHFbandswereran-domlyselectedineachframewithequalprobability.Toassesstheeffectofnumberofglimpsesi.e.,thenumberofglimpseopportunitiesonspeechrecognition,wecreatedstimuliwithdifferentglimpsewindowwidthsglimpsewindowdurations.Morespecically,wecreatedstimuliwithglimpsewindowwidthsof20,200,400,and800msspanningthedurationofaphonemetoafewwords.Theglimpsewindowwidthisdenedhereasthetotaldura-tionofasingleglimpsespanningmultiple,andneighboringintime,framesofspeech.Forinstance,asingle200-msglimpseiscomposedoftenconsecutiveframes20mseachallcontainingglimpseinformationinaprescribedfrequencyband.Similarly,one400-msglimpseiscomposedof20con-secutiveframes,andone800-msglimpseiscomposedof40consecutiveframes.Thetotaldurationofallglimpsesintro-ducedoverthewholeutterancewasxedto800ms.Thisnumberwaschosenasitcorrespondsapproximatelyto33%ofthetotaldurationofmostsentencesintheIEEEdatabaseaveragedurationofsentencesintheIEEEcorpuswas2.4swithastandarddeviationof0.3s.Cookethatspeechcorruptedbyeighttalkerscontainsapproximately30%glimpsesbasedona3-dBSNR.Sincethesignalprocessinginvolvedisbasedonspectrallymodifyingthemaskerspectrumonaframe-by-framebasis,whichis20msinourexperiments,wechose20msasthesmallestwindowwidthtobeevaluated.Pilotdatashowedthatglimpsewindowwidthsbetween20and200msyieldedcomparableperformance.Giventhatthetotaldurationofallglimpsesacrossthewholeutterancewasxedat800ms,wecreatedstimulithathadeither4020-mswindowglimpses,four200-mswindowglimpses,two400-mswindowglimpses,orone800-mswindowglimpse.Thetimelocationofeachglimpsewithintheutterancewasselectedrandomly.Forcomparativepurposes,wealsoconstructedstimuliinwhichtheglimpseswerepresentthroughoutthewholedura-tionofeachutterance.Insummary,wecreatedstimuliwhichhadlow-glimpseinformation,middle-frequencyglimpseinformation,high-frequencyglimpseinforma-tion,low-to-midfrequencyLF+MFglimpseinformation,randomlyselectedfrequencyinformation,andfull-glimpseinformation.Foreachoftheabovespectralregions,theglimpsewindowwidthwassetto20,200,400,800ms,andthewholeutterance.Toassessthepotentialgaininintelligibilityintroducedbyglimpsing,wealsoincludedasabaselineconditiontheunmodiednoisy5-dBSNR.Twolistsofsentencesi.e.,20sen-wereusedpercondition,andnoneofthelistswererepeatedacrossconditions.4.ProcedureTheexperimentswereperformedinasound-proofroomAcousticSystems,IncusingaPCconnectedtoaTucker-Davissystem3.Stimuliwereplayedtothelistenersmonau-rallythroughSennheiserHD250LinearIIcircumauralhead-phonesatacomfortablelisteninglevel.Priortothetest,eachsubjectlistenedtoasetofnoisysentencestogetfamiliarwiththetestingprocedure.Duringthetest,thesubjectswereaskedtowritedownthewordstheyheard.Theorderofthetestconditionswasrandomizedacrosssubjects.B.ResultsanddiscussionThemeanscoresforallconditionsareshowninFig.3.Performancewasmeasuredintermsofpercentofwordsidentiedcorrectlyallwordswerescored.Themeanbase-linescoreoftheunprocessedstimuliwas25.8%corrects.d.=9.2%.Two-wayANOVArepeatedmeasurescatedasignicanteffectofglimpsewindowwidth4,12=193.9,,asignicanteffectoffre-quencybandlocation5,15=122.9,,andasignicantinteraction20,60=7.75,Fisher’sLSDwereruntoex-aminewhethertherewereanydifferencesinperformancebetweenthevariousglimpsewindowwidths.Thisanalysisaimstoanswerthequestionwhetheritismorebenecialtohavemultiple,butshort,glimpseopportunitiesorfew,butlong,glimpseopportunities.Separateanalysiswasper-formedforeachfrequencyband.FortheLFband,andcon-sideringonlyglimpsewindowwidthsfrom20to800ms,performancepeakedat400ms.Thatis,performanceat400mswassignicantlyhigherthanperformanceat20,200,or800ms.Adifferentpatternemergedfortheotherfrequencybands.FortheMF,HF,LF+MF,andRFbands,performanceremainedrelativelyatacrossall FIG.3.ColoronlineMeansubjectrecognitionperformanceasafunctionofglimpsewindowwidthinmsfordifferentfrequencybands.The“infty”conditioncorrespondstotheconditioninwhichtheindicatedfrequencybandswereglimpsedthroughoutthewholeutterance.Thebaselineconditioncorrespondstotheunprocessedstimuliembeddedin5-dBSNR.Errorbarsindicatestandarderrorsofthemean.1168J.Acoust.Soc.Am.,Vol.122,No.2,August2007N.LiandP.C.Loizou:Glimpsinginnoise glimpsewindowwidths20–800ms.Thatis,therewasnostatisticallysignicantdifferenceinperformancebetweenthe20,200,or800-msconditions.Whenthefullwasavailableforglimpsing,performancepeakedat20ms.Thissuggeststhatitismorebenecialtohavemultiple,butshort20ms,glimpseopportunitiesratherthanfew,butlong400–800ms,glimpseopportuni-ties.Thisndingappliesonlytothefull-bandwidthdition,whichdoesnotreecttherealisticscenariooflisten-inginnoise.Itdoes,however,haveimportantimplicationsforspeechenhancementalgorithms.Ifanenhancementalgo-rithmimprovesthespectralSNRacrossthewholesignalbandwidth,anddoessoforatleast33%oftheutterancewhichisthedurationusedinexperiment1,thenthereisagoodlikelihoodthatthealgorithmwillsignicantlyimprovespeechintelligibility.Inpractice,itisextremelychallengingtoimprovethespectralSNRatallfrequencies;hence,itismorepracticaltolookforfrequencybandsthatperformaswellornearlyaswellaswhenglimpsingthefullsignalbandwidthmoreonthisfollowsNext,weexaminedtheeffectoffrequencybandlocationonglimpsinginnoise.Wewereinterestedinknowingwhetheraparticularfrequencybandoffersmorebenetthanintermsofintelligibility;hence,weranprotectedFisher’sLSDonthedataforaxedglimpse-windowwidth.ResultsindicatedtheLF+MFbandperformedsignicantlybetterthantheotherbandsLF,MF,RFinnearlyallconditions.Theexceptionwasinthe400and800-msconditionswhereinperformancewiththeLFbandwasnotstatisticallydifferentfromtheperformanceobtainedwiththeLF+MFband.ComparisonbetweentheperformanceobtainedwiththeLF+MFbandandthefullbandwidthconditionindicatedthatthein-telligibilityscoresdidnotdiffersignicantlythreeoftheveconditionstested.Morespecically,perfor-mancewiththeLF+MFbandinthe200-ms,400-ms,andwholeutteranceglimpseconditionswasthesameasthatob-tainedwiththeFFbandwholebandwidth,andwassigni-lowerthantheFFconditiononlyinthe20and800-msconditions.ThendingthattheLF+MFbandconditionperformedthebestandattainedinnearlyallcasestheupperboundinperformancei.e.,wasasgoodasFFnotsurprisinggiventhattheLF+MFbandcontains1and2informationcriticallyimportantforspeechrecognition.TheimplicationsofthisndingforspeechenhancementandCASAapplicationsisthatinordertoimprovespeechintel-ligibilityisitextremelyimportanttoimproveattheveryleastthespectralSNRintheregionof0–3kHzLF+MF,whichistheregioncontaining1and2information.Finally,weassessedthegaininspeechintelligibilityin-troducedbyglimpsinginthevariousfrequencybands.Thisgainisassessedinreferencetothebaselinenoisycondition5-dBSNR.Figure4plotsthedifferenceinscorebetweenthescoresreportedinFig.3andthebaselinescore.ProtectedFisher’sLSDwereruntoexaminewhethertherewereanysignicantdifferencesbe-tweenthescoresobtainedwithandwithoutglimpsingbaselinescore.AsterisksinFig.4indicatethepresenceofstatisticallysignicantdifferences.Resultsindicatedthatin-troducingglimpsesintheLFbandproducedsmall,butstatisticallysignicant,improve-mentinperformance.ThisoutcomeisconsistentwiththendingsbyAnzaloneetal.,whoapplied,inonecon-dition,theidealspeechenergydetectoronlytothelower70–500Hz.SignicantreductionsinSRTwereobtainedbybothnormal-hearingandhearing-impairedlistenerswhentheidealspeechdetectorwasappliedonlytothelowerfrequenciesetal.,2006Considerablylarger,andsignicant,improvementswereobtainedinourstudywhenglimpseswereintroducedintheLF+MFregion.Nosigni-gaininintelligibilitywasobservedwhentheglimpseswereintroducedintheHFbandinanyofthecon-20–800msAlso,nosignicantgainwasobservedwhenglimpseswereintroducedintheMFband200msintheRFband200,400ms.Asonemightexpect,large50%wereobservedwhenglimpseswereintroducedinallframesthroughouttheutterance.Perfor-manceintheRFconditionwasconsistentlypoorinnearlyallconditions.Thissuggeststhatitismoredifcultforlistenerstointegrateglimpsesavailableindifferentfrequencyregionsatdifferenttimes,thantointegrateglimpsesavailableinthesameregionacrosstime.ItshouldbepointedoutthattheglimpsesintheRFconditionappearedrandomlyintimeandfrequencyanddifferedinthisrespecttothecheckerboardtypeofnoiseusedinotherstudiese.g.,Bussetal.,2003;Howard-JonesandRosen,1993whichappearedperiodi-cally.ThelocalSNRthresholdusedfordeningtheglimpsesinthepresentexperimentwasxedat0dB,anditsvaluecanunderstandablyinuencetheoutcomeoftheexperiment.InterestedtoknowwhetheradifferentpatternofresultswouldbeobtainedwithdifferentSNRthresholdvalues,weranafollow-upexperimentinwhichwevariedtheSNRthresholdfrom6to12dB.Fivenewsubjectswerere-cruitedforthisexperiment.Thesamesignal-processingtech-niquedescribedinSec.IIA3seeFig.1wasadoptedtoconstructstimuliwithglimpsesavailableintheLF+MF FIG.4.ColoronlineDifferenceinperformancebetweenthatreportedinFig.3withglimpsedstimuli,andthebaselineperformance26.8%correct0.05,**indicatestatisticallysignicantdiffer-encesbetweentheperformanceobtainedwithglimpsedstimuliandbaselinestimuli.The“infty”conditioncorrespondstotheconditioninwhichtheindicatedfrequencybandswereglimpsedthroughoutthewholeutterance.J.Acoust.Soc.Am.,Vol.122,No.2,August2007N.LiandP.C.Loizou:Glimpsinginnoise1169 band.ThisbandwaschosenasitperformednearlyaswellastheFFconditionfullspectrumavailable.Theglimpsewin-dowwidthwassetto20ms.TheprocedureoutlinedinSec.IIA4wasfollowed.Theresults,plottedintermsofpercentcorrect,areshowninFig.5asafunctionofthelocalSNRthreshold.ANOVAwithrepeatedmeasuresindicatedanon-5,20=1.78,=0.163effectofSNRthresh-oldonperformance.Performanceincreasedslightly,butnon-signicantly,astheSNRthresholdincreased,andremainedthesamefornegativevaluesoftheSNRthreshold.ItisworthnotingthattheplateauinperformanceseeninFig.5ispartiallyconsistentwiththatobservedbyBrungartetal.usingtheidealbinarymask.Themaindifferencebe-tweenourstudyandthatofBrungartetal.isthatinourcaseperformanceremainedatevenforpositiveSNRthresholds,whereasinBrungartetal.,performancedroppedprecipitouslyforSNRthresholdsabove0dB.ThisdifferenceisattributedtothefactthatinBrungartetal.unitsfallingbelowtheSNRthresholdwerezeroedout;hence,thenumberofretainedunitsprogres-sivelydecreasedastheSNRthresholdincreased.Incontrast,inourstudyallunitsfallingbelowtheSNRthresholdwereretainedseeEq.intheAppendixAInsummary,theresultsfromthepresentexperimentin-dicatethattheglimpsewindowwidthaswellastheSNRthresholdhadonlyaminoreffectonperformance.Glimpsinginnoisewasprimarilyaffectedbythelocationofthefre-quencybandcontainingglimpses.Highgainsinintelligibil-itywereobtainedwhenglimpseinformationwasavailablein2region0–3kHzIII.EXPERIMENT2:EFFECTOFTOTALGLIMPSEDURATIONONSPEECHINTELLIGIBILITYInthepreviousexperiment,wexedthetotalglimpsedurationto800ms,correspondingroughlyto33%ofthetotaldurationformostutterancesintheIEEEcorpus.AsshowninFig.3,largeimprovementsinintelligibilitywereobservedwhenthetotalglimpsingdurationincreasedfrom33%to100%comparethe“infty”conditionagainstallotherconditions.Thissuggeststhatthetotalglimpsedura-tioncanhaveasignicanteffectonintelligibility.Forthatreason,weexaminenexttheeffectoftotalglimpsedurationonperformance.A.Methods1.SubjectsandmaterialNinenewnormal-hearinglistenersparticipatedinthisexperiment.AllsubjectswerenativespeakersofAmericanEnglish,andwerepaidfortheirparticipation.Subjectsagerangedfrom18to40years,withthemajoritybeingunder-graduatestudentsfromtheUniversityofTexasatDallas.ThespeechmaterialconsistedofsentencestakenfromtheIEEEIEEE,1969.Asinexperiment1,thesentenceswerecorruptedbya20-talkerbabblemaskerAuditecCD,St.Louisat5-dB2.SignalprocessingThemethodusedtointroduceglimpsesinthetime-frequencyplanewasthesameasthatusedinexperiment1seeFig.1.Giventherelativelyweakeffectofglimpsewin-dowwidthonperformance,wesettheglimpsewindowwidthto20msforthisexperiment.Unlikeexperiment1,wevariedthetotalglimpsedurationto20%,30%,50%,60%,70%,80%,and100%ofthewholeutterance.Inthe50%condition,forinstance,glimpseswereintroducedinhalfof20-msframesintheutterance.Thetimeplacementoftheglimpseswasrandom.Glimpseswereintroducedintwodifferentbands,theLFband0–1kHzandtheLF+MF0–3kHz.Thesetwobandswerechosenastheywerefoundinexperiment1toyieldsignicantgainsinintelligi-seeFig.4.Toassessanypotentialgaininintelligi-bilityintroducedbyglimpsing,wealsoincludedasabase-lineconditiontheunmodiednoisysentences5-dBSNRTwosentencelistswereusedpercondition,andnoneofthelistswererepeated.3.ProcedureTheprocedurewasidenticaltothatusedinexperimentB.ResultsanddiscussionThemeanscoresforallconditionsareshowninFig.6.Performancewasmeasuredintermsofpercentofwordsidentiedcorrectly.Two-wayANOVArepeatedmeasuresindicatedasignicanteffectoftotalglimpseduration6,24=81.5,,asignicanteffectoffre-quencybandlocation=269.7,,andasignicantinteraction6,24=16.54,Asexpected,performanceimprovedasmoreglimpseswereintroducedinbothLFandLF+MFconditions.Pro-Fisher’sLSDwereruntoexamineatwhichpointglimpsedurationperformancereachedanas-ymptote.Resultsindicatedthat,whentheglimpseswerein-troducedintheLFband,performancereachedanasymptoteat80%ofutteranceduration.Thatis,scoresobtainedwith80%glimpsedurationdidnotdiffersignicantly=0.981fromthoseobtainedwith100%durationi.e.,whole FIG.5.MeansubjectrecognitionperformanceasafunctionofthelocalSNRthresholdforstimuliglimpsedintheLF+MFband.Errorbarsindicatestandarderrorsofthemean.1170J.Acoust.Soc.Am.,Vol.122,No.2,August2007N.LiandP.C.Loizou:Glimpsinginnoise andweresignicantlyhigherthanallotherconditions80%.Instarkcontrast,analysisoftheLF+MFscoresindicatedthattheasymptoteoccurredwhenglimpsing60%oftheutterance.Performancewithglimpsing100%durationwholeutterancedidnotdiffersignicantly=0.093fromthatobtainedwithglimpsing60%oftheThendingsofexperiment2areincloseagreementwiththoseofMillerandLicklider.Near-perfectiden-ticationwasachievedwhenonly50%ofthesignalwasavailableforglimpsingduringtheuninterruptedportions.Intheirstudy,thelistenershadaccesstothefullcleanspectrumofthetargetsignalduringthe“on”segmentsofthesignal.Inourcase,listenershadaccesstothefullnoisyspectrumbutonlytheLF+MFbandwasabovetheSNRthresholdandpresumablyavailableforglimpsing.Forthistypeofstimulicontainingpartiallymaskedspectralinformation,listenersrequiredatleast60%ofthetotaldurationoftheutterancetoobtainhighlevelsofspeechunderstanding.Theresultsfromthepresentexperimentsuggestthattheextentofthebenetintroducedbyglimpsingreliesheavilyonboththetotaldurationofglimpsingandthefrequencybandglimpsed.Thissuggeststhat,inorderforCASAandenhancementalgorithmstoimprovespeechintelligibility,glimpsingintheLF+MFbandneedstooccurmorethan50%ofthetime.IV.CONCLUSIONSAsignalprocessingtechniqueFig.1wasproposedthatcanbeusedasatoolforstudyingauditorysceneanalysisandspeechsegregationinthepresenceofvarioustypesofmaskers.Unlikethetime-frequencymasksusedintheprevi-ousstudiese.g.,Romanetal.,2003;Brungartetal.,2006theproposedtime-frequencymaskisnotbinarybuttakesrealvalues.Thepresentstudyprimarilyfocusedonidentifyingfac-torsthatmayinuenceglimpsingspeechinnoisewiththeproposedtime-frequencymask.Experiment1investigatedtheeffectofglimpsewindowwidthandfrequencylocationoftheglimpseforaxedduration33%ofutteranceglimpsing.Experiment2investigatedtheeffectoftotalglimpsedurationfortwofrequencybands.Fromtheresultsofthesetwoexperiments,wecandrawthefollowingconclu-Thefrequencylocationoftheglimpseshadasignicanteffectonspeechrecognition,withthehighestperfor-manceobtainedfortheLF+MFbandandthelowestfortheHFband.PerformancewiththeLF+MFbandwasfoundtobeasgoodasperformancewiththeFFbandinthemajorityoftheconditionstested.TheglimpsewindowwidthandSNRthresholdhadarelativelyminoreffectonperformanceseeFigs.3and,atleastfortherangeofvaluesconsidered.Relativetotheunprocessedstimuli5-dBSNR,small,butstatisticallysignicant,improvementsinintelligibilitywereobtainedwhentheglimpseswereavailableintheLFband,andcomparativelylargerimprovementswereobtainedwhentheglimpseswereavailableintheLF+MFbandcontaining1and2information.Listenerswereabletointegrateglimpsedinformationmoreeasilywhentheglimpseswereconsistentlytakenfromthesamefrequencyregionovertime.PerformancewiththeRFbandrandomlychosenbandswassigni-cantlylowerthanperformanceobtainedwiththeotherfrequencybands.Thetotalglimpsedurationhadthestrongesteffectinperformance.Highlevelsofspeechunderstandingwereobtainedwhenmorethan60%oftheutterancedurationwasglimpsedintheLF+MFband,atleastforthemultitalkerbabbleconsideredinthisstudy.Relativetotheunprocessedsentences5-dBSNR,thiscorrespondstoanimprovementof64percentagepointsfrom26%to90%Theaboveresultshavestrongimplicationsforspeechen-hancementandCASAalgorithmsaimingtoimproveintelli-gibilityofspeechembeddedinmultitalkerbabble.Forthesealgorithmstoimprovespeechintelligibility,itisextremelyimportanttoimprovethespectralSNRintheregionof0–3kHzLF+MFband,whichistheregioncontaining2information.Furthermore,itisnotnecessarytoim-provethespectralSNRinallframesi.e.,wholeutterancebutinatleast60%oftheutterance.ThisresearchwassupportedbyGrantNo.R01DC007527fromtheNationalInstituteofDeafnessandotherCommunicationDisorders,NIH.APPENDIXA:ATECHNIQUEFORINTRODUCINGInthisappendix,wedescribethesignalprocessingtech-niqueusedformodifyingthemaskermagnitudespectratoobtainglimpsesinspecicregionsofthespectrum.Westartbyexpressingthenoisyspeechspectruminthefrequencydomainasfollows: FIG.6.ColoronlineMeansubjectrecognitionperformanceasafunctionofthepercentageoftheutteranceglimpsedfortwofrequencybands.Errorbarsindicatestandarderrorsofthemean.J.Acoust.Soc.Am.,Vol.122,No.2,August2007N.LiandP.C.Loizou:Glimpsinginnoise1171 arethecomplexFFTspec-traofthenoisyspeech,cleanspeech,andmasker,respec-tively,obtainedattimeandfrequencybinourcase,multitalkerbabblewasaddedinexperiment1tothespeechsignalat5-dBSNR.ThespectralSNRintime-frequencyunitisgivenby=10log indicatesthemagnitudespectrum.Forallfallingwithintheprescribedfrequencyregioni.e.,theglimpseregion,thespectralSNRintime-frequencyiscomparedagainstathreshold,,andthemaskermagnitudespectrumismodiedaccordinglyiforleftunalteredif.Moreprecisely,end,isthemodiedmaskerspectrum,givenby·10/20istheSNRthresholdgivenindecibels.Inexperimentwassetto0dB.TheoperationdescribedinEq.appliedtoallunitsfallingwithintheglimpseregion.Forunitsfallingoutsidetheglimpseregion,thefollowingoperationisappliedtoensurethatthespectralSNRoftheremainingtargetunitsisbelowtheSNRthresholdend,isgivenbyEq..ThetwotypesofscalingdonetothemaskerspectrumbyEq.andEq.ensurethatonlytheprescribedfrequencybandcontainsglimpsinginformation.AfterapplyingEq.toallunitsinsidetheglimpseregionandEq.foralloutsidetheglimpseregion,wereconstructthenoisyspeechinframebytakinginverseFouriertransformofNotethatwecannotdirectlycomparetheoutcomeobtainedintheHFconditioninthepresentstudywiththatobtainedbyAnzaloneetal.Thisisbecausethehigh-frequencyconditiontestedinthestudybyAnza-etal.includedallfrequenciesabove1.5kHz,whereasinthepresentstudytheHFconditionincludedallfrequenciesabove3kHz.Anzalone,M.,Calandruccio,L.,Doherty,K.,andCarney,L..“De-terminationofthepotentialbenetoftime-frequencygainmanipulation,”EarHear.,480–492.Brungart,D.,Chang,P.,Simpson,B.,andWang,D..“Isolatingtheenergeticcomponentofspeech-on-speechmaskingwithidealtime-frequencysegregation,”J.Acoust.Soc.Am.,4007–4018.Buss,E.,Hall,J.W.,andGrose,J.H..“Spectralintegrationofsyn-chronousandasynchronouscuestoconsonantidentication,”J.Acoust.Soc.Am.115,2278–2285.Cooke,M.P.,Green,P.D.,andCrawford,M.D..“Handlingmissingdatainspeechrecognition,”Proc.3rdInt.Conf.Spok.Lang.Proc.,pp.Cooke,M..Glimpsingspeech.J.Phonetics,579–584.Cooke,M..“Makingsenseofeverydayspeech:Aglimpsingac-count,”inSpeechSeparationbyHumansandMachines,editedbyP.Di-KluwerAcademic,Dordrecht,pp.305–314.Cooke,M.P.,Green,P.D.,Josifovski,L.,andVizinho,A..“Robustautomaticspeechrecognitionwithmissinganduncertainacousticdata,”SpeechCommun.,267–285.Cooke,M.P..“Aglimpsemodelofspeechperceptioninnoise,”J.Acoust.Soc.Am.119,1562–1573.Culling,J.,andDarwin,C..“Perceptualandcomputationalsepara-tionofsimultaneousvowels:Cuesarisingfromlowfrequencybeating,”J.Acoust.Soc.Am.,1559–1569.Drullman,R..“Speechintelligibilityinnoise:Relativecontributionofspeechelementsaboveandbelowthenoiselevel,”J.Acoust.Soc.Am.,1796–1798.Festen,J.,andPlomp,R..“Effectsofuctuatingnoiseandinterfer-ingspeechonthespeech-receptionthresholdforimpairedandnormalhearing,”J.Acoust.Soc.Am.,1725–1736.Howard-Jones,P.A.,andRosen,S..“Uncomodulatedglimpsingin‘checkerboard’noise,”J.Acoust.Soc.Am.,2915–2922..“IEEERecommendedPracticeforSpeechQualityMeasure-ments,”IEEETrans.AudioElectroacoust.,225–246.Loizou,P.SpeechEnhancement:TheoryandPracticeCRCPress,TaylorFrancisGroup,BocaRaton,FLMiller,G..“Themaskingofspeech,”Psychol.Bull.,105–129.Miller,G.A.,andLicklider,J.C.R..“Theintelligibilityofinter-ruptedspeech,”J.Acoust.Soc.Am.,167–173.Roman,N.,Wang,D.,andBrown,G..“Speechsegregationbasedonsoundlocalization,”J.Acoust.Soc.Am.114,2236–2252.Roman,N.,andWang,D..“Pitch-basedmonauralsegregationofreverberantspeech,”J.Acoust.Soc.Am.,458–469.Wang,D..“Onidealbinarymaskasthecomputationalgoalofaudi-torysceneanalysis,”inSpeechSeparationbyHumansandMachineseditedbyP.DivenyiKluwerAcademic,Dordrecht,pp.181–187.1172J.Acoust.Soc.Am.,Vol.122,No.2,August2007N.LiandP.C.Loizou:Glimpsinginnoise