/
Chapter2KeywordSpottingMethodsThischapterwillreviewindetailthethreeKWS Chapter2KeywordSpottingMethodsThischapterwillreviewindetailthethreeKWS

Chapter2KeywordSpottingMethodsThischapterwillreviewindetailthethreeKWS - PDF document

luanne-stotts
luanne-stotts . @luanne-stotts
Follow
380 views
Uploaded On 2016-08-06

Chapter2KeywordSpottingMethodsThischapterwillreviewindetailthethreeKWS - PPT Presentation

nonkeywordspeechSzokeetal2005ThusacousticKWScanbeperformedinonlyonestageasillustratedinFig23PhoneticSearchKWSAsitsnamesuggestsphoneticsearchKWSutilizesaphoneticsearchengineIntherststageap ID: 435032

non-keywordspeech(Szokeetal.2005).Thus acousticKWScanbeperformedinonlyonestage asillustratedinFig.2.3PhoneticSearchKWSAsitsnamesuggests phoneticsearchKWSutilizesaphoneticsearchengine.Intherststage

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Chapter2KeywordSpottingMethodsThischapte..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Chapter2KeywordSpottingMethodsThischapterwillreviewindetailthethreeKWSmethods,LVCSRKWS,AcousticKWSandPhoneticSearchKWS,followedbyadiscussionandcomparisonofthemethods.2.1LVCSR-BasedKWSPerformingKWSontextualdatabasesisrelativelystraightforward.Thetextisperusedforagivenlistofwordsandthelocationofthewordsistaggedwithinthetext.Translatingthismethodforuseinspeechdatabasesisatwo-stageprocess.First,anLVCSRengineisemployedtotransformtheentirespeechsignalintotext.TheLVCSRengineperformsthesearchforthemostprobablesequenceofwordsbasedontheViterbisearchalgorithm,usingacousticmodels,alargelexiconofwordsandalanguagemodel.Inthesecondstage,theKWSmechanismutilizesestablishedtext-basedsearchmethodstolocatethekeywordswithinthetext.Anindexingphasecanbeperformedontheresultingtextinordertoacceleratethesearchresponsetime.ThismethodwillbereferredtoasLVCSR-basedKWS.illustratesthetwosequentialstagesinvolvedinLVCSR-basedKWS.2.2AcousticKWSAnothercommonKWSmethodisAcousticKWS.Usingthismethod,theenginedoesnotattempttotranscribetheentirestreamofspeech.LiketheLVCSR-basedmethod,thismethodemploystheViterbisearch.Thatis,thesystememploysaspeechrecognitionengineonthespeech.However,ratherthanalargevocabularywhichisintendedtocoverallpotentiallyspokenwords,asmallersetofdesignatedkeywordsisusedastherecognitionvocabulary(Thambiratnam2005)andgeneralspeechmodels(aspartoftheacousticalmodels)areusedtomodelA.Moyaletal.,PhoneticSearchMethodsforLargeSpeechDatabasesSpringerBriefsinElectricalandComputerEngineering,DOI10.1007/978-1-4614-6489-1_2,SpringerScience+BusinessMediaNewYork2013 non-keywordspeech(Szokeetal.2005).Thus,acousticKWScanbeperformedinonlyonestage,asillustratedinFig.2.3PhoneticSearchKWSAsitsnamesuggests,phoneticsearchKWSutilizesaphoneticsearchengine.Intherststage,aphonemedecoderisemployedoncetotransformthespeechinputintoatextualsequence.However,ratherthanproducingastringofwords,thedecodertransformsthespeechsignalintoastring(orlattice)ofphonemes(Amiretal.2001; Second stage - Keyword DetectionKnowledge SourcesAcousticModelsLanguageModelLexiconDecoderFront-EndProcessingInputSpeechAcousticfeature vectorsKeywordsequencesKeywordDetectionKeywordListRecognizedword sequence Fig.2AnLVSCRkeywordspottingsystem–one-timetransformationofaspeechdatabase(DB)intoatextualwordDBandKWSengine ProcessingInputSpeechKnowledge SourcesAcousticfeature vectorsKeywordsequencesDecoder AcousticModelsKWPronunciation Fig.3Anacoustickeywordspottingsystem82KeywordSpottingMethods YuandSeide2004;ThambiratnamandSridharan2005).Inthesecondstage,thephoneticsearchengineemploysadistancemeasuretocomputethetextualdistancebetweenthephonemesequencesthatcorrespondtothekeywordvocabularyandthephonemesequenceswithinthephonemestring(Alon2005).AsshowninFig.,thephoneticsearchengineusestwotypesofinputdata:alistofkeywords,whereeachwordisrepresentedbyasequenceofphonemes,andaspeechdatabasewhichhasbeenrunthroughaphonemedecodertoproduceasequenceofrecognizedphonemes.2.4Discussion:WhyPhoneticSearch?EachofthethreeKWSmethodspresentedabovehasadvantagesandshortcomings.Thecrucialparameterstoevaluateareresponsetime,KWSperformance,andkeywordexibility(JamesandYoung1994;DharanipragadaandRoukos2002;Mamouetal.2007;ThambiratnamandSridharan2007;Schneider2011).2.4.1ResponseTimeIntermsofoverallcomputationalcomplexity,LVCSR-basedKWSandphoneticsearchKWSbothimplementadoublestageprocess:(1)transformationofspeechtotext(wordsequencesinthecaseofLVCSRandphonemesequencesinthecaseofphoneticsearch)and(2)akeywordsearch(word-basedinthecaseofLVCSRandphoneme-basedinthecaseofphoneticsearch).Acoustic-basedKWS,ontheotherhand,isperformedinonestageandoperatesonthespeechitselfwithnotextual AcousticModels PhonemeLMInputSpeech DBPhonemeDecoderTextual PhonemeSequence DBPhonetic SearchKWS EngineKeywordhypotheses KW List Fig.4Aphoneticsearchsystem–one-timetransformationofaspeechDBtoatextualphonemeDBandKWSphoneticsearchengine2.4Discussion:WhyPhoneticSearch?9 transformation.AlthoughakeywordsearchthatisimplementedonfullytranscribedtextintheLVCSRmethodisfast(particularlyifthetexthasalsobeenindexed),itisusuallyatadisadvantageincomparisontothephoneticsearchandacousticmethodsduetothefactthatanLVCSRenginedemandsalargevocabularyandacomplexlanguagemodeltoproducerecognitionresults,thusresultinginahighlevelofcomplexityduringthepre-processingstage.Thephoneticsearchmethodperformsphonemerecognitionusingphonemetransitionprobabilities(di-phones)withnolexiconorwordlevellanguagemodel.Duringthesearchstage,however,phoneticsearchKWSusesatextualsequencedistancemeasurethatrequiresmorecomputation.Thisisbecausethephoneticsearchmustgenerateword-levelhypothesesbasedonphonemesequences,whileinLVCSR-basedKWSthetextualoutputisalreadyword-level(Burgetetal.2006).Incontrast,theacoustic-basedKWSusesavocabularyconsistingonlyofthekeywordsanddoesnotrequirealanguagemodelatall.Becausetheacoustic-basedmethodoperatesonthespeechitselfandrequiresonlyasmallvocabulary,itisappropriateforreal-timekeywordspottingorKWSinsmallspeechdatabases.However,thismeansthatgeneralspeechmustbewell-modeled(Thambiratnam2005)toavoidextensiveoverdetection(falsealarms).2.4.2KWSPerformanceThespontaneousspeechandpoorrecordingqualityofspeechdatabasesoftenleadstodecientLVCSRperformance(Butzbergeretal.1992;Cardilloetal.2002).Thelargenumberofdisuencies,includingmispronouncedwords,falsestarts,lledpauses,overlappingspeech,speakernoisesandbackgroundnoisefoundinsponta-neousspeech(Butzbergeretal.1992;GishriandSilber-Varod2010)oftenresultsinoutputsstrewnwithwordinsertions,deletionsandsubstitutions.Thusthe“mostprobable”wordsequencesproducedbytheenginemaynotadequatelyreecttheactualinputspeech.This,inturn,affectsthereliabilityofthekeywordsearch.Thesameistruewithregardtophoneticsearchresults.Poorphonemerecogni-tionmayyieldlowerkeywordrecognitionperformanceincomparisonwiththeacousticKWSmethod,whichworksonthespeechitselfbysearchingforaspecicsequenceofphonemeswithouttextualtransformation.2.4.3KeywordFlexibilityIncomparisontothephoneticsearchmethod,whichrunsonsequencesofphonemesratherthanwords,theLVCSRmethodisatadisadvantagewhenitcomestokeywordexibility(Cardilloetal.2002;Burgetetal.2006;Wallaceetal.2007).Usingthephoneticsearchmethodallowsapplicationuserstotalfreedominchangingthedesignatedkeywords,sincethetextualtransformationintophonemes102KeywordSpottingMethods isnotrestrictedbyavocabulary.Addingnewkeywordsisasimpleprocedurethatentailsre-runningthephoneticsearchonthephonemesequences,butdoesnotrequirere-runningthephonemedecoder.ThetextualtransformationproducedbyanLVCSRengine,ontheotherhand,isconstrainedbytherecognitionvocabularyandthelanguagemodelemployed.Thus,unlessthedesignatedkeywordswerepartoftheoriginalrecognitionvocabulary,theycannotbechangedwithoutrepeatingtherecognitionprocess(Clementsetal.2001;Cardilloetal.2002;Szokeetal.2005;MamouandRamabhadran2008).Sincekeywordsareinmanycasesnamesordomain-specicvernacular,theyareoftennotfoundinstandardlexicons(Wallaceetal.2007;GishriandSilber-Varod2010).ThisisasubstantialshortcomingoftheLVCSRmethod.Acoustic-basedKWSalsorepresentsanimpracticalsolutionforsearchinglargedatabasesthatrequirerapidandexiblesearchingcapabilities.Becauseitconsistsofonlyonestage,theentireprocessneedstobere-runonthespeechdatabaseeachtimeanewkeyworddictionaryisintroduced.Themajorityofapplicationsrequirekeywordexibility,aswellastheshortestpossibleresponsetimewhensearchingverylargespeechdatabases,makingthephoneticsearchKWSmethodmoreattractivethantheLVCSRandacoustic-basedoptionswhensearchingverylargespeechdatabases.Thus,thefocusofthefollow-ingchapterswillbeonphoneticsearchKWS,andtheimplementationofanalgorithmforthereductionofcomputationalcomplexityinthephoneticsearchKWSprocess.2.4Discussion:WhyPhoneticSearch?11

Related Contents


Next Show more