/
LING/C SC/PSYC 438/538 LING/C SC/PSYC 438/538

LING/C SC/PSYC 438/538 - PowerPoint Presentation

conchita-marotz
conchita-marotz . @conchita-marotz
Follow
387 views
Uploaded On 2016-07-01

LING/C SC/PSYC 438/538 - PPT Presentation

Lecture 8 Sandiway Fong Adminstrivia Homework 4 not yet graded Todays Topics Homework 4 review Perl regex Homework 2 Review Sample data file First try just try to detect a repeated word ID: 386087

review homework matching perl homework review perl matching chapter compare matched greedy match regular output bar file context list

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "LING/C SC/PSYC 438/538" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

LING/C SC/PSYC 438/538

Lecture

8

Sandiway FongSlide2

Adminstrivia

Homework 4 not yet graded …Slide3

Today's Topics

Homework

4

review

Perl regexSlide4

Homework 2 Review

Sample data file:

First try..

just try to detect a repeated wordSlide5

Homework 2 Review

Sample data file:

Sample output:Slide6

Homework 2 Review

Key:

think algorithmically

think of a specific example first

w

1

w

2w3w4w5

Compare w1 with w2

Compare w

2

with w

3

Compare w

3

with w

4

Compare w

4

with w

5Slide7

Array indices start from 0…

Homework 2 Review

Generalize specific example, then code it up

Compare w

1

with w

1+1

Compare w

2 with w2+1Compare wn-2 with wn-2+1Compare wn-1 with wn “for” loop

implementation

words

0

,words

1

… words

n-1

array @words

Array indices end just before $#words…Slide8

Homework 2 ReviewSlide9

Homework 2 ReviewSlide10

Homework 2 ReviewSlide11

Homework 2 ReviewSlide12

Homework 2 ReviewSlide13

Homework 2 ReviewSlide14

Homework 2 Review

a decent first pass …Slide15

Homework 2 Review

Sample data file:

Output:Slide16

Homework 2 Review

Second try..

merging multiple occurrencesSlide17

Homework 2 Review

Second try..

merging multiple occurrences

Sample data file:

Output:Slide18

Homework 2 Review

Third try..

i

mplementing a simple table of exceptionsSlide19

Homework 2 Review

Third try..

t

able of exceptions

Sample data file:

Output:Slide20

Perl regex

more powerful than simple wildcard matching, e.g. files

rm

*.jpg

,

rm

PIC000?.JPG

Regular expression pattern matching:

regular expressions are patterns using operators: * (zero or more occurrences), + (one or more occurrences), ? (optional), | (disjunction)widely used in many areastheoretically equivalent to Type-3 languages in the Chomsky hierarchy less powerful than Context-free languages etc.Slide21

Perl regex

Perl regular expression (re) matching:

$a =~ /

foo

/

/…/

contains a regular expression

will evaluate to true/false depending on what’s contained in

$aPerl regular expression (re) match and substitute:$a =~ s/foo/bar/s/…match… /…substitute… /will modify $a by looking for a single occurrence of match and replacing that with substitutes/…

match… /…substitute… /g

g = flag:

global

match and

substituteSlide22

Perl regex

Typically useful

with the

standard code

template for reading in a file line-by-line:

open($txtfile,$ARGV[0]) or die "$ARGV[0] not found!\n";

while ($line = <$

txtfile

>) { if ($line =~ /..regex../) { do stuff… } }Slide23

Chapter 2: JM

character class: Perl lingoSlide24

Chapter 2: JMSlide25

Chapter 2: JM

Backslash

lowercase letter for class

Uppercase variant for all but classSlide26

Unicode and \w

\

w is [0-9A-Za-z_]

Definition is expanded for Unicode

:

use utf8;

use open

qw

(:std :utf8);my $str = "school école École šola trường स्कूल škole

โรงเรียน";@words = (

$

str

=~ /(\w+)/g);

foreach

$word (@words) { print "$word\n" }

list contextSlide27

Chapter 2: JMSlide28

Chapter 2: JM

SheeptalkSlide29

Chapter 2: JMSlide30

Chapter 2: JM

Precedence of operators

Example: Column 1 Column 2 Column 3 …

/Column [0-9]+ */

/

(

Column [0-9]+ *

)

*//house(cat(s|)|)/Perl:in a regular expression the pattern matched by within the pair of parentheses is stored in designated variables $1 (and $2 and so on)Precedence Hierarchy:spaceSlide31

Chapter 2: JM

A shortcut:

list

context for matching

http://

perldoc.perl.org/perlretut.html

returns a list

returns 1 (true) or “” (empty if false)Slide32

Chapter 2: JM

s/([0-9]+)/<\1>/

what does this do?

Backreferences

give Perl

regexps

more expressive power than

finite state automata

(

fsa

)Slide33

Shortest vs. Greedy Matching

default behavior

in Perl RE match:

take the longest possible matching string

aka

greedy matching

This behavior can be changed, see next slideSlide34

Shortest vs. Greedy Matching

from http://

www.perl.com/doc/manual/html/pod/perlre.html

Example:

$

_ = "The food is under the bar in the barn.";

if

( /foo(.*?)bar/ ) { print ”matched <$1>\n"; }Output:matched <d is under the >Notes:? immediately following a repetition operator like * (or +) makes the operator work in non-greedy modeSlide35

Shortest vs. Greedy Matching

from http://

www.perl.com/doc/manual/html/pod/perlre.html

Example:

$

_ = "The

foo

d is under the

bar in the barn."; if ( /foo(.*?)bar/ ) { print ”matched <$1>\n"; }Output:greedy: matched <d is under the bar in the >

shortest: matched <d

is under the >

(.*?)

(.*)Slide36

Shortest vs. Greedy Matching

RE search is supposed to be fast

but searching is not necessarily proportional to the length of the input being searched

in fact, Perl RE matching can can take exponential time (in length)

non-deterministic

may need to backtrack (revisit) if it matches incorrectly part of the way through

time

length

linear

time

length

exponentialSlide37

Global Matching: scalar context

g flag in the condition of a while-loopSlide38

Global Matching: list context

g flag in list contextSlide39

Split

@

array

= split /

re

/,

string

splits

string

into a list of substrings split by

re

. Each substring is stored as an element of @

array

.

Examples (from

perlrequick

tutorial):Slide40

SplitSlide41

Matched PositionsSlide42

Matched Positions