3900 4000 4100 4200 4300 4400 4500 4600 4700 480 0 1 05 0 05 1 Alkuper ID: 217954
Download Pdf The PPT/PDF document "3WindowingSpeechisnon-stationarysignalwh..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
3WindowingSpeechisnon-stationarysignalwherepropertieschangequiterapidlyovertime.ThisisfullynaturalandnicethingbutmakestheuseofDFTorautocorrelationasasuchimpossible.Formostphonemesthepropertiesofthespeechremaininvariantforashortperiodoftime(5-100ms).Thusforashortwindowoftime,traditionalsignalprocessingmethodscanbeappliedrelativelysuccessfully.Mostofspeechprocessinginfactisdoneinthisway:bytakingshortwindows(overlappingpossibly)andprocessingthem.Theshortwindowofsignallikethisiscalledframe.InimplementationalviewthewindowingcorrespondstowhatisunderstoodsinÞlterdesignaswindow-method:alongsignal(ofspeechforinstanceoridealimpulseresponse)ismultipliedwithawindowfunctionofÞnitelength,givingÞnitelengthweighted(usually)versionoftheoriginalsignal.IllustrationisinÞgure1. 3900 4000 4100 4200 4300 4400 4500 4600 4700 480 0 1 0.5 0 0.5 1 Alkuperäinen signaali ja Hanningikkuna... 100 200 300 400 500 600 700 800 900 100 0 0.03 0.02 0.01 0 0.01 0.02 0.03 ...tadaa! Ikkunoitu kehys signaalista Figure1:Theoriginalsignalanditswindowedversionbelow.InspeechprocessingtheshapeofthewindowfunctionisnotthatcrucialbutusuallysomesoftwindowlikeHanning,Hamming,triangle,halfparallerogram,notwithrightangles.ThereasonissameasinÞlterdesign,sidebandlobesassubstantiallysmallerthaninarectangularwindow.(Þgure2).MoreoverinLPC-analysis(tobestudiedlateron)thesignalispresumedtobe0outsidethewindow,hencetherectangularwindowproducesabruptchangeinthesignal,whichusuallydistortstheanalysis.Wemustkeepinmindthatinspeechprocessing(unlikeinÞlterdesignforinstance)themethodsareseldomwellgroundedmathematically.Usuallythegoalistoimplementasystemworkingaswellaspossibleinsomeapplication.Thecriteriaoftheapplicationmaybeburiedundercomplexhuman-relatedcriteria/preferenceshinderingprecisemathematicalmodeling.Forinstancethequalityofcodedspeech,intelligibilityofsynthesizedspeechorpleasantnessofenchancedspeech. 0 50 100 150 200 250 300 350 400 450 500 0 0.2 0.4 0.6 0.8 1 Muutama ikkunatyyppi... 0.005 0.01 0.015 0.02 0.025 0.03 0.035 60 50 40 30 20 10 0 10 ...ja niiden vasteet (normalisoituna)Normalisoitu taajuus (1=Nyquist)Vaste, dB Figure2:Abunchofwindowsandtheiramplituderesponses.Basedonthis,thewindowingshouldbeconfrontedwithcertaindegreeoffreedombeingpreparedtochangethewindowfunctionwhennecessary.Example:inspeechcodingthesampleswillbepursuedtoberepresentedexactlyastheyarewhentherectangularwindowisused.OntheotherhandwhentheLPC-parametersareanalyzed,asoftnon-symmetricwindowisusedinordertominimizethedelay.Inspeechrecognitionthewindowsareusuallyoverlapping10mswindows,whichareanalyzedinordertomakehypothesisofthecur-rentphoneme.ThehypothesesarecombinedoverseveralframesandÞnallythedecisionismadetomaximizethejointprobability.Ifonedesirestoshapethespeech(notonlyanalyze),itisagoodideatouseoverlappingwindowssummingapproximatelyto1.Forinstance:thecoarsestcodingsystemintheworld,whereineachframethe10highestsamplesofDFTarequantizedandothersaresetto0.Theinversetransformthenrestorestheoriginalsignal.LetÕsimplementthis,becauseitbringsupnicelydifferentaspectsrelatedtowindowing,analyza-tionandsynthesis.TheMatlab-sourcescriptcanbefoundinandbelow.functionsyn=FFT_coding_en(ind,N);%syn=FFT_coding_en(ind,N); %ThisistheEnglishversionofFFT_koodaus_fi.m.Theoperationof%FFT_coding_en.misidenticaltothatofFFT_koodaus_fi.m.%Windowingdemo:thespeechisprocessedin%windowsoflength60msandroundedges(ifind==0),orin%windowsoflength30msandarectangularshape(ifind==1).%FFTiscalculatedandallbutN+1biggest-amplitudefrequencybinsare%settozero.Then,theoutputspeech(syn)issynthesizedusingIFFT.%Thespeechdatausedbythisfunctionisyhdeksän16.mat.loadyhdeksän16.matx=x(:);%forceascolumnvectorfs=16000;%samplingfrequencyif(ind==0),awinlen=round(fs*0.06);%lengthoftheanalysiswindow,60ms%thewindowfunctionisalmostabritrarilychosen:%wemakeawindowthatisflatinthemiddleandhassoftedgestemp=hanning(fs*0.01);%thesoftedgesarehereawinfun=[temp(1:length(temp)/2);ones(awinlen-length(temp),1);...swinlen=round(awinlen/2);%lengthofthesynthesiswindowis%halfoftheanalysiswindowswinfun=hanning(swinlen);%windowfunctionforsynthesisnforward=swinlen/2;%howmanyfrequencybinstherearebetween%successiveframesif(ind==1),awinlen=round(fs*0.03);%lengthoftheanalysiswindow,30msawinfun=boxcar(awinlen);%nowarectangularwindowshapeischosenswinlen=round(awinlen);%nowthelengthofthesynthesiswindow=%lengthoftheanalysiswindowswinfun=boxcar(swinlen);%windowfunctionforsynthesisnforward=swinlen;%howmanyfrequencybinstherearebetween%successiveframesholdon-1),swinfun,r-.);title(analysiswindow(blue)andsynthesiswindow(red))axis([0100001.4])print-depsca_and_s_window.eps fftind=1:floor(awinlen/2)+1;%indicesofhalfofFFTvectorn=1+ceil(awinlen/2);%themiddlesampleoftheframesyn=zeros(size(x));flag=0;%flagfordoingsomedecisionsrelatedtodrawingwhile(n+ceil(awinlen/2)length(x))awinind=n-ceil(awinlen/2)+(0:awinlen-1);%indicesoftheanalysis%windowincurrentframeframe=x(awinind).*awinfun;%frameFrame=fft(frame);%FFToftheframe%wesearch(N+1)thlargestabsolutevalue[val,sind]=sort(abs(Frame(fftind)));valN=val(end-N);Frame(abs(Frame)valN)=0;%settozeroallbutN+1highest-amplitude%frequencybinsiframe=real(ifft(Frame));%inverseFFT,therealpartistaken%toavoidprecisionproblems=n-swinlen/2+(0:swinlen-1);%synthesiswindowingswin=iframe(1+awinlen/2-swinlen/2+(0:swinlen-1)).*swinfun;syn(swinind)=syn(swinind)+swin;%overlap-addif((n-597;.300;3000)&~flag)%drawapitcureoftheframeplot(x(awinind),k);%thisonecanhardlybeseeninthepicture%iftherectangularwindowisusedholdonplot(frame,b-.);plot(awinlen/2-swinlen/2+(0:swinlen-1),swin,r-.);title(originalspeech(black),analysiswindow(blue)andsynthesiswi print-depscframe.epsflag=1;n=n+nforward;holdontitle(originalspeech(black)andmodifiedspeech(red));print-depscresult.eps Theideaisthefollowing:thespeechisÞrstwindowedforFFTanalysiswithrelativelylong(60ms)andquiterectangularwindow.Thepurposeistogetsomeideaofthefrequencycontentinthewindow,butsothatthewindowingwillnotexcessivelyaffecttheshapeoftheframe.Asa(cautionary)example,thefunctionhasalsobeenwrittentocontainapossibilityofusingarectangularwindowoflength30ms,butinpracticeitisnotadvisabletousethisoptionsinceitconsiderablydegradesthespeechquality.NexttheFFToftheanalysisframeiscalculatedandallsmall-energyfrequenciesaresettozero.TheinverseFFTisthenappliedtothispartiallyzerospectrumtogetthetimedomainsignal.Thenextthingtodoisanewwindowingsystem(incasethe60mswindowwaschosen):wewindowonlythecentralpartoftheframeandedgeswillbefaded.WeuseHanning-windowthathasthenicepropertyofsummingto1whenthetimedifferencebetweensuccessivewindowsishalfofthelengthofthewindow(exactly1whenthelengthisoddinteger).Themethodofusingoverlappingshort-timesignalsandformingthereconstructionbysummingpartiallyoverlappingframesiscalledoverlap-addIf,asanexperiment,the30msrectangularwindowischosen,thesuccessiveframesdonotoverlapatall,butinstead,thesynthesisconsistsofcatenatingthetimedomainframesproducedbytheinverseFFTassuch.Inthiscase,discontinuitieswilloccurattheframeboundaries,whichmakestheoutputspeechsoundveryunnatural. 200 400 600 800 1000 120 0 0 0.2 0.4 0.6 0.8 1 1.2 1.4 Puoliksi päällekkäisten Hanningikkunoiden summa Figure3:ThesumofHanning-windowswithtimedifferencehalfofthelengthisalmost1,whichmakesthemattractiveinsynthesiswindowing.Whenthe60mswindowisused(i.e.,ind=0),andthevalueof9ischosenforN(i.e.,10largest-amplitudefrequencybinsofFFTarepreserved),thefunctionwillproducetheÞguresshowninFigures4-6. 0 100 200 300 400 500 600 700 800 900 100 0 0 0 .2 0 .4 0 .6 0 .8 1 1 .2 Figure4:Analysisandsynthesiswindowsusedinthefunction. 100 200 300 400 500 600 700 800 900 1000 1.5 1 0.5 0 0.5 1 1.5x 104 alkup. puhe (must), analyysiikkuna (sin) ja synteesiikkuna (pun) Figure5:Originalspeech(black),analysiswindow(blue),synthesiswindow(red). 0 2000 4000 6000 8000 10000 1200 0 1.5 1 0.5 0 0.5 1 1.5x 104 Figure6:Originalspeech(black)andcodedspeech(red).