Lecture 8 Sandiway Fong Adminstrivia Homework 4 not yet graded Todays Topics Homework 4 review Perl regex Homework 2 Review Sample data file First try just try to detect a repeated word ID: 386087
Download Presentation The PPT/PDF document "LING/C SC/PSYC 438/538" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
LING/C SC/PSYC 438/538
Lecture
8
Sandiway FongSlide2
Adminstrivia
Homework 4 not yet graded …Slide3
Today's Topics
Homework
4
review
Perl regexSlide4
Homework 2 Review
Sample data file:
First try..
just try to detect a repeated wordSlide5
Homework 2 Review
Sample data file:
Sample output:Slide6
Homework 2 Review
Key:
think algorithmically
…
think of a specific example first
w
1
w
2w3w4w5
Compare w1 with w2
Compare w
2
with w
3
Compare w
3
with w
4
Compare w
4
with w
5Slide7
Array indices start from 0…
Homework 2 Review
Generalize specific example, then code it up
Compare w
1
with w
1+1
Compare w
2 with w2+1Compare wn-2 with wn-2+1Compare wn-1 with wn “for” loop
implementation
words
0
,words
1
… words
n-1
array @words
Array indices end just before $#words…Slide8
Homework 2 ReviewSlide9
Homework 2 ReviewSlide10
Homework 2 ReviewSlide11
Homework 2 ReviewSlide12
Homework 2 ReviewSlide13
Homework 2 ReviewSlide14
Homework 2 Review
a decent first pass …Slide15
Homework 2 Review
Sample data file:
Output:Slide16
Homework 2 Review
Second try..
merging multiple occurrencesSlide17
Homework 2 Review
Second try..
merging multiple occurrences
Sample data file:
Output:Slide18
Homework 2 Review
Third try..
i
mplementing a simple table of exceptionsSlide19
Homework 2 Review
Third try..
t
able of exceptions
Sample data file:
Output:Slide20
Perl regex
more powerful than simple wildcard matching, e.g. files
rm
*.jpg
,
rm
PIC000?.JPG
Regular expression pattern matching:
regular expressions are patterns using operators: * (zero or more occurrences), + (one or more occurrences), ? (optional), | (disjunction)widely used in many areastheoretically equivalent to Type-3 languages in the Chomsky hierarchy less powerful than Context-free languages etc.Slide21
Perl regex
Perl regular expression (re) matching:
$a =~ /
foo
/
/…/
contains a regular expression
will evaluate to true/false depending on what’s contained in
$aPerl regular expression (re) match and substitute:$a =~ s/foo/bar/s/…match… /…substitute… /will modify $a by looking for a single occurrence of match and replacing that with substitutes/…
match… /…substitute… /g
g = flag:
global
match and
substituteSlide22
Perl regex
Typically useful
with the
standard code
template for reading in a file line-by-line:
open($txtfile,$ARGV[0]) or die "$ARGV[0] not found!\n";
while ($line = <$
txtfile
>) { if ($line =~ /..regex../) { do stuff… } }Slide23
Chapter 2: JM
character class: Perl lingoSlide24
Chapter 2: JMSlide25
Chapter 2: JM
Backslash
lowercase letter for class
Uppercase variant for all but classSlide26
Unicode and \w
\
w is [0-9A-Za-z_]
Definition is expanded for Unicode
:
use utf8;
use open
qw
(:std :utf8);my $str = "school école École šola trường स्कूल škole
โรงเรียน";@words = (
$
str
=~ /(\w+)/g);
foreach
$word (@words) { print "$word\n" }
list contextSlide27
Chapter 2: JMSlide28
Chapter 2: JM
SheeptalkSlide29
Chapter 2: JMSlide30
Chapter 2: JM
Precedence of operators
Example: Column 1 Column 2 Column 3 …
/Column [0-9]+ */
/
(
Column [0-9]+ *
)
*//house(cat(s|)|)/Perl:in a regular expression the pattern matched by within the pair of parentheses is stored in designated variables $1 (and $2 and so on)Precedence Hierarchy:spaceSlide31
Chapter 2: JM
A shortcut:
list
context for matching
http://
perldoc.perl.org/perlretut.html
returns a list
returns 1 (true) or “” (empty if false)Slide32
Chapter 2: JM
s/([0-9]+)/<\1>/
what does this do?
Backreferences
give Perl
regexps
more expressive power than
finite state automata
(
fsa
)Slide33
Shortest vs. Greedy Matching
default behavior
in Perl RE match:
take the longest possible matching string
aka
greedy matching
This behavior can be changed, see next slideSlide34
Shortest vs. Greedy Matching
from http://
www.perl.com/doc/manual/html/pod/perlre.html
Example:
$
_ = "The food is under the bar in the barn.";
if
( /foo(.*?)bar/ ) { print ”matched <$1>\n"; }Output:matched <d is under the >Notes:? immediately following a repetition operator like * (or +) makes the operator work in non-greedy modeSlide35
Shortest vs. Greedy Matching
from http://
www.perl.com/doc/manual/html/pod/perlre.html
Example:
$
_ = "The
foo
d is under the
bar in the barn."; if ( /foo(.*?)bar/ ) { print ”matched <$1>\n"; }Output:greedy: matched <d is under the bar in the >
shortest: matched <d
is under the >
(.*?)
(.*)Slide36
Shortest vs. Greedy Matching
RE search is supposed to be fast
but searching is not necessarily proportional to the length of the input being searched
in fact, Perl RE matching can can take exponential time (in length)
non-deterministic
may need to backtrack (revisit) if it matches incorrectly part of the way through
time
length
linear
time
length
exponentialSlide37
Global Matching: scalar context
g flag in the condition of a while-loopSlide38
Global Matching: list context
g flag in list contextSlide39
Split
@
array
= split /
re
/,
string
splits
string
into a list of substrings split by
re
. Each substring is stored as an element of @
array
.
Examples (from
perlrequick
tutorial):Slide40
SplitSlide41
Matched PositionsSlide42
Matched Positions