/
1 Haim Kaplan, Uri  Zwick 1 Haim Kaplan, Uri  Zwick

1 Haim Kaplan, Uri Zwick - PowerPoint Presentation

alexa-scheidler
alexa-scheidler . @alexa-scheidler
Follow
359 views
Uploaded On 2018-12-15

1 Haim Kaplan, Uri Zwick - PPT Presentation

Tel Aviv University March 2016 Last updated March 28 2017 Algorithms in Action Fast Fourier Transform 2 String Matching abracadabra abraabracadabracadabraabara abracadabra Given a text ID: 741457

cross correlation mismatches time correlation cross time mismatches iff length pattern text matching matches exact wildcards strings find match

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "1 Haim Kaplan, Uri Zwick" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

1

Haim Kaplan, Uri

Zwick

Tel Aviv University

March 2016Last updated: March 28, 2017

Algorithms in Action

Fast Fourier TransformSlide2

2String Matchingabracadabraabraabracadabracadabraabara

abracadabra

Given a

text

of length and a pattern of length ,find all occurrences of the pattern in the text.

 

The naïve algorithm runs in

time.

 

Several classical algorithms run in

time.

[

Knuth-Morris-Pratt (1977)] [Boyer-Moore (1977)

]

 Slide3

3More String Matching Problemsabra

cadabra

abraabracadabracadabraabara

abracadabra

Count the number of matches/mismatches in each alignment of the pattern with the text.

(Find all aligments

with at most

mismatches.)

 

Allow a

wildcard

(“don’t care”)

(

)

that match

any (single) symbol in the pattern and/or text. 

“Traditional” string matching techniques

are not so efficient for these extensions.Slide4

4

(Cross-)

Correlation

 

 

 

 

 

 

 Slide5

(Cross-)CorrelationA convolution without the initial reversal, with a shift

of indices.

 

.

 

The correlation of two vectors of length

can be computed in

time.

 Slide6

6

(Cross-)

Correlation (unequal lengths)

 Slide7

7

(Cross-)

Correlation

 

 Slide8

8

(Cross-)

Correlation

 

 

 Slide9

9

(Cross-)

Correlation

 

 

 

 Slide10

10

(Cross-)

Correlation

 

 

 

 

 Slide11

11

(Cross-)

Correlation

 

 

 

 

 

 

 Slide12

12

(Cross-)

Correlation

 

 

 

 

 

 

 

 Slide13

13

(Cross-)

Correlation

 

 

 

 

 

 

 

 

 Slide14

(Cross-)Correlation

 

If

is of length

and

of length

,

where

, then

.

 

Exercise:

The correlation of two vectors of length

and

,

where

, can be computed in

time.

 

Sometimes, only the values

, corresponding

to a full overlap of

with a shift of

, are of interest.

 Slide15

Counting mismatches[Fischer-Paterson (1974)]Let

be the alphabet of the pattern and text.

 

We may assume that

. (Why?) 

For every

create two Boolean strings:

iff

iff

 

Correlation of

and

counts

mismatches

involving

.

 Slide16

16Counting mismatchesabracadabraabraabracadabracad

abraabara

10010101001

01

1001101010110101011001010Slide17

17Counting mismatchesabracadabraabraabracadabracadabraabara

10010101001

011001101010110101011001010

abrac

adabraabraabracad

abracadabraabara

10010

1

0100

1

01100

1

1010

1

0110101011001010Slide18

Counting mismatchesLet be the alphabet of the pattern and text.

 

We may assume that

. (Why?)

 For every

create two Boolean strings:

iff

iff

 

Correlation of

and

counts

mismatches

involving

.

 

Summing over all

we get the total no. of

mismatches

.

 

Complexity:

word operations.

(Each word assumed to hold

bits.)

 

Fast only if

is small.

 Slide19

19Counting mismatches with wildcards[Fischer-Paterson (1974)]

For every

create two Boolean strings:

iff

iff

and

 

Complexity:

word operations.

 Slide20

20Counting mismatches with wildcardsabracada

*

ra

abraabra*

adabracadabraabara10010101001011001100010110101011001010abraca

da*r

a

abraa

b

raca

*

abracadabraabara

10010

1

0100

1

011001101000110101011001010Slide21

21Counting mismatches with wildcardsIf we only want to find exact matches, replace each

character

by a specific

bit string

 Slide22

22

Counting

mismatches

with

wildcards

Complexity drops to

.

 

Can we get rid of the dependence on

?

 

Count mismatches of the binary strings as

before

(2 convolutions)

A result of 0 corresponds to a matchSlide23

23-matching[Lipsky-Porat

(2011)]

 

Suppose that each “character” is a real number.

We want to find approximate matches.For each

we want to compute

 

Standard string matching uses the

Hamming

distance.

Two characters either match or they do not.

is not closer to

than to

.

 

 

-distance:

 Slide24

24

 

Constant.

time.

 

Correlation.

time.

 

Easy in

time.

 

-matching can be computed in

time.

 

-matching

[

Lipsky-Porat

(2011)]

 Slide25

25Replace each character by a positive integer.Replace the wildcard by 0.

For each

compute

 

 

There is an exact match at position

iff

 

Exact

matches

with

wildcards

[Clifford-Clifford (2007

)]Slide26

Exact matches with wildcards[Clifford-Clifford (2007)]

 

Compute three correlations of

appropriate sequences in

time.

 

Running time is independent of

!

Assuming that each character fits in an

-bit word

and that operations on such words takes constant time.