/
3WindowingSpeechisnon-stationarysignalwherepropertieschangequiterapidl 3WindowingSpeechisnon-stationarysignalwherepropertieschangequiterapidl

3WindowingSpeechisnon-stationarysignalwherepropertieschangequiterapidl - PDF document

tatiana-dople
tatiana-dople . @tatiana-dople
Follow
379 views
Uploaded On 2015-12-08

3WindowingSpeechisnon-stationarysignalwherepropertieschangequiterapidl - PPT Presentation

3900 4000 4100 4200 4300 4400 4500 4600 4700 480 0 1 05 0 05 1 Alkuper ID: 217954

3900 4000 4100 4200 4300 4400 4500 4600 4700 480 0 1 0.5 0 0.5 1 Alkuper

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "3WindowingSpeechisnon-stationarysignalwh..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

3WindowingSpeechisnon-stationarysignalwherepropertieschangequiterapidlyovertime.ThisisfullynaturalandnicethingbutmakestheuseofDFTorautocorrelationasasuchimpossible.Formostphonemesthepropertiesofthespeechremaininvariantforashortperiodoftime(5-100ms).Thusforashortwindowoftime,traditionalsignalprocessingmethodscanbeappliedrelativelysuccessfully.Mostofspeechprocessinginfactisdoneinthisway:bytakingshortwindows(overlappingpossibly)andprocessingthem.Theshortwindowofsignallikethisiscalledframe.InimplementationalviewthewindowingcorrespondstowhatisunderstoodsinÞlterdesignaswindow-method:alongsignal(ofspeechforinstanceoridealimpulseresponse)ismultipliedwithawindowfunctionofÞnitelength,givingÞnitelengthweighted(usually)versionoftheoriginalsignal.IllustrationisinÞgure1. 3900 4000 4100 4200 4300 4400 4500 4600 4700 480 0 1 0.5 0 0.5 1 Alkuperäinen signaali ja Hanningikkuna... 100 200 300 400 500 600 700 800 900 100 0 0.03 0.02 0.01 0 0.01 0.02 0.03 ...tadaa! Ikkunoitu kehys signaalista Figure1:Theoriginalsignalanditswindowedversionbelow.InspeechprocessingtheshapeofthewindowfunctionisnotthatcrucialbutusuallysomesoftwindowlikeHanning,Hamming,triangle,halfparallerogram,notwithrightangles.ThereasonissameasinÞlterdesign,sidebandlobesassubstantiallysmallerthaninarectangularwindow.(Þgure2).MoreoverinLPC-analysis(tobestudiedlateron)thesignalispresumedtobe0outsidethewindow,hencetherectangularwindowproducesabruptchangeinthesignal,whichusuallydistortstheanalysis.Wemustkeepinmindthatinspeechprocessing(unlikeinÞlterdesignforinstance)themethodsareseldomwellgroundedmathematically.Usuallythegoalistoimplementasystemworkingaswellaspossibleinsomeapplication.Thecriteriaoftheapplicationmaybeburiedundercomplexhuman-relatedcriteria/preferenceshinderingprecisemathematicalmodeling.Forinstancethequalityofcodedspeech,intelligibilityofsynthesizedspeechorpleasantnessofenchancedspeech. 0 50 100 150 200 250 300 350 400 450 500 0 0.2 0.4 0.6 0.8 1 Muutama ikkunatyyppi... 0.005 0.01 0.015 0.02 0.025 0.03 0.035 60 50 40 30 20 10 0 10 ...ja niiden vasteet (normalisoituna)Normalisoitu taajuus (1=Nyquist)Vaste, dB Figure2:Abunchofwindowsandtheiramplituderesponses.Basedonthis,thewindowingshouldbeconfrontedwithcertaindegreeoffreedombeingpreparedtochangethewindowfunctionwhennecessary.Example:inspeechcodingthesampleswillbepursuedtoberepresentedexactlyastheyarewhentherectangularwindowisused.OntheotherhandwhentheLPC-parametersareanalyzed,asoftnon-symmetricwindowisusedinordertominimizethedelay.Inspeechrecognitionthewindowsareusuallyoverlapping10mswindows,whichareanalyzedinordertomakehypothesisofthecur-rentphoneme.ThehypothesesarecombinedoverseveralframesandÞnallythedecisionismadetomaximizethejointprobability.Ifonedesirestoshapethespeech(notonlyanalyze),itisagoodideatouseoverlappingwindowssummingapproximatelyto1.Forinstance:thecoarsestcodingsystemintheworld,whereineachframethe10highestsamplesofDFTarequantizedandothersaresetto0.Theinversetransformthenrestorestheoriginalsignal.LetÕsimplementthis,becauseitbringsupnicelydifferentaspectsrelatedtowindowing,analyza-tionandsynthesis.TheMatlab-sourcescriptcanbefoundinandbelow.functionsyn=FFT_coding_en(ind,N);%syn=FFT_coding_en(ind,N); %ThisistheEnglishversionofFFT_koodaus_fi.m.Theoperationof%FFT_coding_en.misidenticaltothatofFFT_koodaus_fi.m.%Windowingdemo:thespeechisprocessedin%windowsoflength60msandroundedges(ifind==0),orin%windowsoflength30msandarectangularshape(ifind==1).%FFTiscalculatedandallbutN+1biggest-amplitudefrequencybinsare%settozero.Then,theoutputspeech(’syn’)issynthesizedusingIFFT.%Thespeechdatausedbythisfunctionisyhdeksän16.mat.loadyhdeksän16.matx=x(:);%forceascolumnvectorfs=16000;%samplingfrequencyif(ind==0),awinlen=round(fs*0.06);%lengthoftheanalysiswindow,60ms%thewindowfunctionisalmostabritrarilychosen:%wemakeawindowthatisflatinthemiddleandhassoftedgestemp=hanning(fs*0.01);%thesoftedgesarehereawinfun=[temp(1:length(temp)/2);ones(awinlen-length(temp),1);...swinlen=round(awinlen/2);%lengthofthesynthesiswindowis%halfoftheanalysiswindowswinfun=hanning(swinlen);%windowfunctionforsynthesisnforward=swinlen/2;%howmanyfrequencybinstherearebetween%successiveframesif(ind==1),awinlen=round(fs*0.03);%lengthoftheanalysiswindow,30msawinfun=boxcar(awinlen);%nowarectangularwindowshapeischosenswinlen=round(awinlen);%nowthelengthofthesynthesiswindow=%lengthoftheanalysiswindowswinfun=boxcar(swinlen);%windowfunctionforsynthesisnforward=swinlen;%howmanyfrequencybinstherearebetween%successiveframesholdon-1),swinfun,’r-.’);title(’analysiswindow(blue)andsynthesiswindow(red)’)axis([0100001.4])print-depsca_and_s_window.eps fftind=1:floor(awinlen/2)+1;%indicesofhalfofFFTvectorn=1+ceil(awinlen/2);%themiddlesampleoftheframesyn=zeros(size(x));flag=0;%flagfordoingsomedecisionsrelatedtodrawingwhile(n+ceil(awinlen/2)length(x))awinind=n-ceil(awinlen/2)+(0:awinlen-1);%indicesoftheanalysis%windowincurrentframeframe=x(awinind).*awinfun;%frameFrame=fft(frame);%FFToftheframe%wesearch(N+1)thlargestabsolutevalue[val,sind]=sort(abs(Frame(fftind)));valN=val(end-N);Frame(abs(Frame)valN)=0;%settozeroallbutN+1highest-amplitude%frequencybinsiframe=real(ifft(Frame));%inverseFFT,therealpartistaken%toavoidprecisionproblems=n-swinlen/2+(0:swinlen-1);%synthesiswindowingswin=iframe(1+awinlen/2-swinlen/2+(0:swinlen-1)).*swinfun;syn(swinind)=syn(swinind)+swin;%overlap-addif((n&#x-597;&#x.300;3000)&~flag)%drawapitcureoftheframeplot(x(awinind),’k’);%thisonecanhardlybeseeninthepicture%iftherectangularwindowisusedholdonplot(frame,’b-.’);plot(awinlen/2-swinlen/2+(0:swinlen-1),swin,’r-.’);title(’originalspeech(black),analysiswindow(blue)andsynthesiswi print-depscframe.epsflag=1;n=n+nforward;holdontitle(’originalspeech(black)andmodifiedspeech(red)’);print-depscresult.eps Theideaisthefollowing:thespeechisÞrstwindowedforFFTanalysiswithrelativelylong(60ms)andquiterectangularwindow.Thepurposeistogetsomeideaofthefrequencycontentinthewindow,butsothatthewindowingwillnotexcessivelyaffecttheshapeoftheframe.Asa(cautionary)example,thefunctionhasalsobeenwrittentocontainapossibilityofusingarectangularwindowoflength30ms,butinpracticeitisnotadvisabletousethisoptionsinceitconsiderablydegradesthespeechquality.NexttheFFToftheanalysisframeiscalculatedandallsmall-energyfrequenciesaresettozero.TheinverseFFTisthenappliedtothispartiallyzerospectrumtogetthetimedomainsignal.Thenextthingtodoisanewwindowingsystem(incasethe60mswindowwaschosen):wewindowonlythecentralpartoftheframeandedgeswillbefaded.WeuseHanning-windowthathasthenicepropertyofsummingto1whenthetimedifferencebetweensuccessivewindowsishalfofthelengthofthewindow(exactly1whenthelengthisoddinteger).Themethodofusingoverlappingshort-timesignalsandformingthereconstructionbysummingpartiallyoverlappingframesiscalledoverlap-addIf,asanexperiment,the30msrectangularwindowischosen,thesuccessiveframesdonotoverlapatall,butinstead,thesynthesisconsistsofcatenatingthetimedomainframesproducedbytheinverseFFTassuch.Inthiscase,discontinuitieswilloccurattheframeboundaries,whichmakestheoutputspeechsoundveryunnatural. 200 400 600 800 1000 120 0 0 0.2 0.4 0.6 0.8 1 1.2 1.4 Puoliksi päällekkäisten Hanningikkunoiden summa Figure3:ThesumofHanning-windowswithtimedifferencehalfofthelengthisalmost1,whichmakesthemattractiveinsynthesiswindowing.Whenthe60mswindowisused(i.e.,ind=0),andthevalueof9ischosenforN(i.e.,10largest-amplitudefrequencybinsofFFTarepreserved),thefunctionwillproducetheÞguresshowninFigures4-6. 0 100 200 300 400 500 600 700 800 900 100 0 0 0 .2 0 .4 0 .6 0 .8 1 1 .2 Figure4:Analysisandsynthesiswindowsusedinthefunction. 100 200 300 400 500 600 700 800 900 1000 1.5 1 0.5 0 0.5 1 1.5x 104 alkup. puhe (must), analyysiikkuna (sin) ja synteesiikkuna (pun) Figure5:Originalspeech(black),analysiswindow(blue),synthesiswindow(red). 0 2000 4000 6000 8000 10000 1200 0  1.5 1  0.5 0 0.5 1 1.5x 104 Figure6:Originalspeech(black)andcodedspeech(red).

Related Contents


Next Show more