Fall 20151 Week 4 CSCI141 Scott C Johnson Computers can process text as well as numbers Example a news agency might want to find all the articles on Hurricane Katrina as part of the tenth ID: 200337
Download Presentation The PPT/PDF document "Strings and Files" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Strings and Files
Fall 20151 Week 4
CSCI-141
Scott C. JohnsonSlide2
Computers can process text as well as numbers
Example: a news agency
might want to find all the articles on Hurricane Katrina as part of the tenth anniversary of this disaster. They would have to search for the words “hurricane and “Katrina”
StringSlide3
How do we represent text such as:
Words
SentencesCharactersIn programming we call these sequences Strings.A sequence is a collection or composition of elements in a specific orderFor strings, the elements are characters
This includes:
PunctuationSpacesNumeric charactersThe order is a spelling of a word or structure of a sentence. Not always true.
StringSlide4
A string can be:Empty
Non-empty
Must start with a character and be followed by a stringDefinition 1 A string is one of the following:An empty stringA non-empty string, which has the following parts:A head, which is a single character, followed by,
A tail, which is a string
StringsSlide5
When processing strings we must be able to check for:
Empty string
def strIsEmpty(str
)
The headdef strHead(
str
)
The tail
def strTail(str)To construct strings from other stringsdef strConcat(str1, str2)We will implemeant these later….
String
OpertationsSlide6
A Python string is:A sequence of characters
The sequence can be explicitly written between two quotation marks
Python allows single or double quotesExamples:‘Hello World’“Hello World”‘’ (the empty string is a valid string)
‘a’
“abc”Python StringsSlide7
Like numbers, string can be assigned to variables
str
= ‘Hello World’You can asscess parts of strings via indexingstr[n] means give me the nth-1 character of the string
str
str[0] == ‘H’str[1] == ‘e’s
tr
[2] == ‘l’
str
[5] == ‘ ‘ Python StringsSlide8
Another way to access parts of a string is via slicing
s
tr[m:n] means give me the part of the string from character m up to but not including nBoth m and n are optionalIf m is omitted then it starts from the beginning of the stringIf n is omitted it ends at the end of the string
Examples:
Str = ‘Hello World’Str[1:4] == ‘ell’Str[:5] == ‘Hello’
Str
[1:] == ‘
ello
World’Python StringsSlide9
You can concatenate string, or put two string together
The plus-sign (+)
Examples‘Hello’ + ‘World’ == ‘Hello World’‘a’ + ‘ab’ == ‘aab’‘ab’ + ‘c’ == ‘abc’
‘a’ + ‘b’ + ‘c’ == ‘
abc’Python StringsSlide10
Python strings are immutable
Means parts of strings cannot be changed using the assignment operator
If we assign a new value to a string it replaces the old valueBasically a new stringExample:Str = ‘abc’
Str
[0] == ‘a’Str[0] = ‘z’We get: “Traceback (most recent call last
):
File
"<
stdin >", line 1, in <module >TypeError: ’str’ object does not support item assignment“Python StringsSlide11
def
strIsEmpty(str): if str == ‘’: return True
else:
return False def strHead(str): return
str
[:1]
def
strTail(str): return str[1:]def strConcat(str1, str2): return str1 + str2Python String OperationsSlide12
Computing length of stringsPython has a function to do this
But we will write our own version so we can:
Learn about stringsAnd how to process themThink about this: lengthRec(‘
abc
’) == 3Python String OperationsSlide13
Lets break down string length as a recursive function
‘’: the empty string, length is 0
‘a’: string with one character, length = 1Head length = 1tail length is empty string, length = 0‘ab’: string with two character, length = 2Head length = 1tail is string with one character, length = 1
‘
abc’: string with three characters. Length = 3Head length = 1tail is a string with two characters, length = 2Notice the pattern????
Python String OperationsSlide14
The pattern leads us to:
def lengthRec(str): if str == ‘’:
return 0
else: return 1 + lengthRec(strTail(
str
))
Python String OperationsSlide15
Computing the reversal of a string
reverseRec
(‘abc’) == ‘cba’Lets solve for this like any other recursive function
Python String OperationsSlide16
String reversal cases:
‘’:
emptystring reversed is the empty string‘a’: a single character reversed is the same string‘ab’: two character string‘b’ + ‘a’ == ‘ba’strTail
(‘ab’) +
strHead(‘ab’)‘abc’: three character string‘c’ + ‘b’ + ‘a’ == ‘cba
’
strTail
(
strTail(‘abc’)) + strHead(strTail(‘abc’)) + strHead(‘abc’)strTail(‘bc’) + strHead(‘bc’) + strHead
(‘
abc
’)
Reversal of string ‘
bc
’
+
strHead
(‘
abc
’)
Python String OperationsSlide17
From this we get:
def reverseRec(str): if str
== ‘’:
return ‘’ else: return reverseRec(
strTail
(
str
) + strHead(str)Python String OperationsSlide18
Substitution TracesSlide19
Substitution TracesSlide20
Accumulative LengthRecall the recursive form we did earlier:
def
lengthRec(str): if str == ‘’: return 0
else:
return 1 + lengthRec(strTail(str))
How can we change this to use an accumulator variable??
Notice we are returning zero for the empty case and 1 + the recursive call for the other case
Accumulative RecursionSlide21
Accumulative LengthWe can add the accumulator variable and add 1 for the recursive call and return it for the empty string:
def
lengthAccum(str, ac): if str == ‘’:
return
ac else: return lengthAccum(strTail
(
str
), ac + 1)
Basically:if we have a head, add one to the acummulator and check the tailElse return the accumulator variable since we have nothing else to countAccumulative RecursionSlide22
Accumulative Reverse
Recall the recursive form we did earlier:
def reverseRec(str): if str
== ‘’:
return ‘’ else: return reverseRec(strTail
(
str
)
+ strHead(str)How can we change this to use an accumulator variable??Notice we are returning the empty string for the empty case and the recursive call + the strHead for the other caseAccumulative RecursionSlide23
Accumulative Reverse
We can add the accumulator variable and add the
strHead to the accumulator for each recursive call:def reverseAccum(str, ac):
if
str == ‘’: return ac else:
return
reverseAccum
(
strTail(str), strHead(str) +ac)Accumulative RecursionSlide24
We often can replace recursion with iteration
This requires the use of a new type of Python statement
The for loopString IterationSlide25
for loop example
f
or ch in ‘abc’: print(ch)This will print:
a
b cString IterationSlide26
With the for loop when can convert our recursive string operations to iterative forms
Recall the accumulative form:
def lengthAccum(str, ac): if str == ‘’:
return ac
else: return lengthAccum(strTail(str
), ac + 1
)
basically we ran the function over and over again adding one to ac until we hit the empty string…
How could we make that into a for loop???String IterationSlide27
With a for loop we can avoid recursion
def lenghtIter(str): ac = 0 for ch
in
str: ac = 1 + ac return ac
String IterationSlide28
This can work for reverse too!
def
reverseAccum(str, ac): if str == ‘’: return ac
else:
return reverseAccum(strTail(str
),
strHead
(
str) +ac)Becomes:def reverseIter(str): ac = ‘’ for ch in str: ac = hd + ac return ac
String IterationSlide29
Some times we want to access parts of a string by index
This means iterating over the range of values 0, 1, 2, …,
len(str) -1len(str) is a built in Python function for string lengthTo do this we have a special for loop in Python
for
i in range(0, len(str))This says for all character in
str
starting at index 0 to the last index of the string, do x
It does not have to be the whole string
for I in range(2, 5)… this will do all i’s 2, 3, 4 Index Values and RangeSlide30
example
Index Values and RangeSlide31
Example:
Index Values and RangeSlide32
Say we do not want to type in a long string every time…
For instance we want to find and remove all of the instances of a word in a report…
How can we do that without using an input statement and entering the entire text manually?FilesSlide33
Python can read files!Lets look at a basic function to hide a word in a string
def
hide(textFileString, hiddenWord): for currentWord in
textFileString
: if currentWord == hiddenWord:
print(‘---’)
else:
print
currentWord FilesSlide34
How do we do this from a text file and not a string?
To make this problem simple we will assume only one word per line in the file
Do the file reads, the spaces are shown on purpose as _:word1__word2___word3__word4
_word5
word6word7FilesSlide35
How can we read this file?
It is actually pretty easy in Python
for line in open(‘text1.txt’): print(line) This give us:word1__word2
___
word3__word4_word5
word6
word7
Files
Notice the spaces are still there and it appears to have more space between lines!Slide36
The extra space between lines is due to:
The print function adds a new line
The original new line from the file is still in the stringIf we were to make a single concatenated string we would see the original file contentsstr = ‘’for
line in open(‘text1.txt’):
str = str + lineprint(str
)
word1
__word2
___word3__word4_word5word6word7 FilesSlide37
We can make printing betterp
rint(
str, end=‘’)Print will not generate newlinesWe still have an issue for our problemNewlines are still there from the filefor line in open(‘text1.txt’):
print(line == ‘word1’)
We would see false for all lineEven thought word1 looks like it existsDue to the new line from the file!
FilesSlide38
We can use a Python feature called strip
This removes all whitespace from a string
Whitespace:NewlinesSpacesWe call it a bit different that other string functionsstr.strip()It returns the ‘stripped’ string
FilesSlide39
Using strip we get
for line in open(‘text1.txt’):
print(line.strip() == ‘word1’)Which results in: True
False
False False False
False
False FalseFilesSlide40
Using all these ideas we can make the hide and a helper function
FilesSlide41
We orginally
made this function to hide a word…
But it can be used to find a word tooWe look for, or search for, the word to replace itWe look for all occurrences of the word…Sometimes we only want to find the first time a word happens
FilesSlide42
Linear SearchThe process of visiting each element in a sequence in order stopping when either:
we find what we are looking for
We have visited every element, and not found a matchOften we wish to characterize how long an algorithm takesMeasuring in seconds often depends on the hardware, operating system, etc
We want to avoid such details
How do we do this?FilesSlide43
We avoid this by focusing on the size of the input, N…
And characterize the time spend as a function of N
This function is called the time complexityOne way to measure performance is to count the number of operations or statements the code makesFilesSlide44
For the hide function we:Strip the current word
Compare it to the hidden word
And then prints the resultsSince this is a loop, this occurs for each wordEven thought the operations can take different times individually, as the number of words grow this time difference are very smallThey are considered a constant amount
FilesSlide45
For a linear search we can see that different size of N can lead to different behaviors and times…
Say the element we are looking for is the first element… it’s very fast….
What if it is the last element?We must then look at every elementIf n is 10 then the slow case is no problem..What if it was 10 billion?
FilesSlide46
Typically we are interested in how bad it can get…
Or the Worst-case analysis
Consider the worst-case analysis of the linear search…For each element we spend some constant timePlus some fixed time to startup and end the searchIf processing of an element takes constant time kThen search all N elements should take
k
* NSimilarly the startup and end time is some constant cSo the time to run the linear search is (k * N) + c
FilesSlide47
So the time to run the linear search is
(
k * N) + cWhat are k and c?Most of the time we simply do not care….It is often good enough to know that the time complexity is a linear function with a non-zero slope!
The ,mathematical way to ignore this is to say the time complexity of linear search is
O(N)It is pronounced “Order N”This is known as “Big-O” notation
FilesSlide48
“Big-O” notationMakes it easy to compare algorithms…
Constant time, like comparing two characters,
O(1)There are some algorithms that are O(N2) and O(N
3
)We prefer O(1) to O(N), O(N) to O(N2
)
…
There are possibly
time complexities in between these…FilesSlide49
O(N
2
) example:def counter(n): for i in range(0,n): for j in range(0, n):
print(
i * j)O(N3) example:def counter(n):
for
i
in range(0,n):
for j in range(0, n): for k in range(0,n): print(i * j * k)Files