/
Comparing Audio Signals Phase misalignment Comparing Audio Signals Phase misalignment

Comparing Audio Signals Phase misalignment - PowerPoint Presentation

cheryl-pisano
cheryl-pisano . @cheryl-pisano
Follow
347 views
Uploaded On 2018-10-26

Comparing Audio Signals Phase misalignment - PPT Presentation

Deeper peaks and valleys Pitch misalignment Energy misalignment Embedded noise Length of vowels Phoneme variance What makes it difficult Review Minimum Distance Algorithm E X E C U T I ID: 697441

array distance audio cost distance array cost audio minimum metric algorithm length approach features frames signals values speech indices

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Comparing Audio Signals Phase misalignme..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Comparing Audio Signals

Phase misalignmentDeeper peaks and valleysPitch misalignmentEnergy misalignmentEmbedded noiseLength of vowelsPhoneme variance

What makes it difficult?Slide2

Review: Minimum Distance Algorithm

E

X

E

CUTION0123456789I1123456678N2223456777T3333455678E4343456678N5444456777T6555555678I7666666567O8777777656N9888888765

Array[

i,j

] = min{1+Array[i-1,j], cost(

i,j

)+Array[i-1,j-1],1+ Array[i,j-1)}Slide3

Pseudo Code (minDistance(target, source))

n = character in source

m

= characters in target

Create array, distance, with dimensions n+1, m+1FOR r=0 TO n distance[r,0] = rFOR c=0 TO m distance[0,c] = cFOR each row r FOR each column c IF source[r]=target[c] cost = 0 ELSE cost = 1 distance[r,c]=minimum of distance[r-1,c] + 1, //insertion distance[r, c-1] + 1, //deletion and distance[r-1,c-1] + cost) //substitutionResult is in distance[n,m]Slide4

Is Minimum Distance Applicable?

Maybe?The optimal distance from indices [a,b] is a function of the costs with smaller indices.This suggests that a dynamic approach may work.Problems

The cost function is more complex. A binary equal or not equal doesn’t work

Need to define a distance metric. But what should that metric be? Answer: It depends on which audio features we use.

Longer vowels may still represent the same speech. The classical solution is not to apply a cost when going from index [i-1,j] or [i,j-1] to [I,j]. Unfortunately, this assumption can lead to singularities, which result in incorrect comparisonsSlide5

Complexity of Minimum Distance

The basic algorithm is O(m*n) where m is the length (samples) of one audio signal and m is the length of the other. If m=n, the algorithm is O(n2). Why?: count the number of cells that need to be filled in.O(n2) may be too slow. Alternate solutions have been devised.Don’t fill in all of the cells.Use a multi-level approach

Question: Are the faster approaches needed for our purposes? Perhaps not!Slide6

Don’t Fill in all of the Cells

Problem:

May miss the optimal minimum

distancepathSlide7

The Multilevel Approach

Concept

Down sample to coarsen the array

Run the algorithm

Refine the array (up sample)Adjust the solutionRepeat steps 3-4 till the original sample rate is restoredNotes The multilevel approach is a common technique for increasing many algorithms’ complexity from O(n2) to O(n lg n)Example is partitioning a graph to balance work loads among threads or processorsSlide8

Singularities

AssumptionThe minimum distance comparing two signals only depends on the previous adjacent entriesThe cost function accounts for the varied length of a particular phoneme, which causes the cost in particular array indices to no longer be well-definedProblem:

The algorithm can compute incorrectly due to mismatched alignments

Possible solutions:

Compare based on the change of feature values between windows instead of the values themselvesPre-process to eliminate the causes of the mismatchesSlide9

Possible Preprocessing

Remove the phase from the audio:Compute the Fourier transform Perform discrete cosine transform on the amplitudesNormalize the energy of voiced audio

:

Compute the energy of both signals

Multiply the larger by the percentage differenceRemove the DC offset: Subtract the average amplitude from all samplesBrick Wall Normalize the peaks and valleys: Find the average peak and valley valueSet values larger than the average equal to the averageNormalize the pitch: Use PSOLA to align the pitch of the two signalsRemove duplicate frames: Auto correlate frames at pitch pointsRemove noise from the signal: implement a noise removal algorithmNormalize the speed of the speech: Slide10

Which Audio Features?

Cepstrals: They are statistically independent and phase differences are removedΔCepstrals, or ΔΔCepstrals: Reflects how the signal is changing from one frame to the nextEnergy: Distinguish the frames that are voiced verses those that are unvoiced

Normalized

LPC

Coefficients: Represents the shape of the vocal track normalized by vocal tract length for different speakers.These are the popular features used for speech recognitionSlide11

Which Distance Metric?

General Formula: array[i,j] = distance(i,j) + min{array[i-1,j], array[i-1,j-1],array[i,j-1)}

Assumption

: There is no cost assessed for

duplicate or eliminated frames. Distance Formula:Euclidian: sum the square of one metric minus another squaredLinear: sum the absolute value of the distance between featuresWeighting the features: Multiply each metric’s difference by a weighting factor to give greater/lesser emphasis to certain featuresExample of a distance metric using linear distance∑ wi |(fa[i] – fb[i])| where f[i] is a particular audio feature for signals a and b. w[i] is that feature’s weight