/
Strings and Files Strings and Files

Strings and Files - PowerPoint Presentation

test
test . @test
Follow
395 views
Uploaded On 2015-11-21

Strings and Files - PPT Presentation

Fall 20151 Week 4 CSCI141 Scott C Johnson Computers can process text as well as numbers Example a news agency might want to find all the articles on Hurricane Katrina as part of the tenth ID: 200337

string str files return str string return files python def strtail strings strhead time abc

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Strings and Files" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Strings and Files

Fall 20151 Week 4

CSCI-141

Scott C. JohnsonSlide2

Computers can process text as well as numbers

Example: a news agency

might want to find all the articles on Hurricane Katrina as part of the tenth anniversary of this disaster. They would have to search for the words “hurricane and “Katrina”

StringSlide3

How do we represent text such as:

Words

SentencesCharactersIn programming we call these sequences Strings.A sequence is a collection or composition of elements in a specific orderFor strings, the elements are characters

This includes:

PunctuationSpacesNumeric charactersThe order is a spelling of a word or structure of a sentence. Not always true.

StringSlide4

A string can be:Empty

Non-empty

Must start with a character and be followed by a stringDefinition 1 A string is one of the following:An empty stringA non-empty string, which has the following parts:A head, which is a single character, followed by,

A tail, which is a string

StringsSlide5

When processing strings we must be able to check for:

Empty string

def strIsEmpty(str

)

The headdef strHead(

str

)

The tail

def strTail(str)To construct strings from other stringsdef strConcat(str1, str2)We will implemeant these later….

String

OpertationsSlide6

A Python string is:A sequence of characters

The sequence can be explicitly written between two quotation marks

Python allows single or double quotesExamples:‘Hello World’“Hello World”‘’ (the empty string is a valid string)

‘a’

“abc”Python StringsSlide7

Like numbers, string can be assigned to variables

str

= ‘Hello World’You can asscess parts of strings via indexingstr[n] means give me the nth-1 character of the string

str

str[0] == ‘H’str[1] == ‘e’s

tr

[2] == ‘l’

str

[5] == ‘ ‘ Python StringsSlide8

Another way to access parts of a string is via slicing

s

tr[m:n] means give me the part of the string from character m up to but not including nBoth m and n are optionalIf m is omitted then it starts from the beginning of the stringIf n is omitted it ends at the end of the string

Examples:

Str = ‘Hello World’Str[1:4] == ‘ell’Str[:5] == ‘Hello’

Str

[1:] == ‘

ello

World’Python StringsSlide9

You can concatenate string, or put two string together

The plus-sign (+)

Examples‘Hello’ + ‘World’ == ‘Hello World’‘a’ + ‘ab’ == ‘aab’‘ab’ + ‘c’ == ‘abc’

‘a’ + ‘b’ + ‘c’ == ‘

abc’Python StringsSlide10

Python strings are immutable

Means parts of strings cannot be changed using the assignment operator

If we assign a new value to a string it replaces the old valueBasically a new stringExample:Str = ‘abc’

Str

[0] == ‘a’Str[0] = ‘z’We get: “Traceback (most recent call last

):

File

"<

stdin >", line 1, in <module >TypeError: ’str’ object does not support item assignment“Python StringsSlide11

def

strIsEmpty(str): if str == ‘’: return True

else:

return False def strHead(str): return

str

[:1]

def

strTail(str): return str[1:]def strConcat(str1, str2): return str1 + str2Python String OperationsSlide12

Computing length of stringsPython has a function to do this

But we will write our own version so we can:

Learn about stringsAnd how to process themThink about this: lengthRec(‘

abc

’) == 3Python String OperationsSlide13

Lets break down string length as a recursive function

‘’: the empty string, length is 0

‘a’: string with one character, length = 1Head length = 1tail length is empty string, length = 0‘ab’: string with two character, length = 2Head length = 1tail is string with one character, length = 1

abc’: string with three characters. Length = 3Head length = 1tail is a string with two characters, length = 2Notice the pattern????

Python String OperationsSlide14

The pattern leads us to:

def lengthRec(str): if str == ‘’:

return 0

else: return 1 + lengthRec(strTail(

str

))

Python String OperationsSlide15

Computing the reversal of a string

reverseRec

(‘abc’) == ‘cba’Lets solve for this like any other recursive function

Python String OperationsSlide16

String reversal cases:

‘’:

emptystring reversed is the empty string‘a’: a single character reversed is the same string‘ab’: two character string‘b’ + ‘a’ == ‘ba’strTail

(‘ab’) +

strHead(‘ab’)‘abc’: three character string‘c’ + ‘b’ + ‘a’ == ‘cba

strTail

(

strTail(‘abc’)) + strHead(strTail(‘abc’)) + strHead(‘abc’)strTail(‘bc’) + strHead(‘bc’) + strHead

(‘

abc

’)

Reversal of string ‘

bc

+

strHead

(‘

abc

’)

Python String OperationsSlide17

From this we get:

def reverseRec(str): if str

== ‘’:

return ‘’ else: return reverseRec(

strTail

(

str

) + strHead(str)Python String OperationsSlide18

Substitution TracesSlide19

Substitution TracesSlide20

Accumulative LengthRecall the recursive form we did earlier:

def

lengthRec(str): if str == ‘’: return 0

else:

return 1 + lengthRec(strTail(str))

How can we change this to use an accumulator variable??

Notice we are returning zero for the empty case and 1 + the recursive call for the other case

Accumulative RecursionSlide21

Accumulative LengthWe can add the accumulator variable and add 1 for the recursive call and return it for the empty string:

def

lengthAccum(str, ac): if str == ‘’:

return

ac else: return lengthAccum(strTail

(

str

), ac + 1)

Basically:if we have a head, add one to the acummulator and check the tailElse return the accumulator variable since we have nothing else to countAccumulative RecursionSlide22

Accumulative Reverse

Recall the recursive form we did earlier:

def reverseRec(str): if str

== ‘’:

return ‘’ else: return reverseRec(strTail

(

str

)

+ strHead(str)How can we change this to use an accumulator variable??Notice we are returning the empty string for the empty case and the recursive call + the strHead for the other caseAccumulative RecursionSlide23

Accumulative Reverse

We can add the accumulator variable and add the

strHead to the accumulator for each recursive call:def reverseAccum(str, ac):

if

str == ‘’: return ac else:

return

reverseAccum

(

strTail(str), strHead(str) +ac)Accumulative RecursionSlide24

We often can replace recursion with iteration

This requires the use of a new type of Python statement

The for loopString IterationSlide25

for loop example

f

or ch in ‘abc’: print(ch)This will print:

a

b cString IterationSlide26

With the for loop when can convert our recursive string operations to iterative forms

Recall the accumulative form:

def lengthAccum(str, ac): if str == ‘’:

return ac

else: return lengthAccum(strTail(str

), ac + 1

)

basically we ran the function over and over again adding one to ac until we hit the empty string…

How could we make that into a for loop???String IterationSlide27

With a for loop we can avoid recursion

def lenghtIter(str): ac = 0 for ch

in

str: ac = 1 + ac return ac

String IterationSlide28

This can work for reverse too!

def

reverseAccum(str, ac): if str == ‘’: return ac

else:

return reverseAccum(strTail(str

),

strHead

(

str) +ac)Becomes:def reverseIter(str): ac = ‘’ for ch in str: ac = hd + ac return ac

String IterationSlide29

Some times we want to access parts of a string by index

This means iterating over the range of values 0, 1, 2, …,

len(str) -1len(str) is a built in Python function for string lengthTo do this we have a special for loop in Python

for

i in range(0, len(str))This says for all character in

str

starting at index 0 to the last index of the string, do x

It does not have to be the whole string

for I in range(2, 5)… this will do all i’s 2, 3, 4 Index Values and RangeSlide30

example

Index Values and RangeSlide31

Example:

Index Values and RangeSlide32

Say we do not want to type in a long string every time…

For instance we want to find and remove all of the instances of a word in a report…

How can we do that without using an input statement and entering the entire text manually?FilesSlide33

Python can read files!Lets look at a basic function to hide a word in a string

def

hide(textFileString, hiddenWord): for currentWord in

textFileString

: if currentWord == hiddenWord:

print(‘---’)

else:

print

currentWord FilesSlide34

How do we do this from a text file and not a string?

To make this problem simple we will assume only one word per line in the file

Do the file reads, the spaces are shown on purpose as _:word1__word2___word3__word4

_word5

word6word7FilesSlide35

How can we read this file?

It is actually pretty easy in Python

for line in open(‘text1.txt’): print(line) This give us:word1__word2

___

word3__word4_word5

word6

word7

Files

Notice the spaces are still there and it appears to have more space between lines!Slide36

The extra space between lines is due to:

The print function adds a new line

The original new line from the file is still in the stringIf we were to make a single concatenated string we would see the original file contentsstr = ‘’for

line in open(‘text1.txt’):

str = str + lineprint(str

)

word1

__word2

___word3__word4_word5word6word7 FilesSlide37

We can make printing betterp

rint(

str, end=‘’)Print will not generate newlinesWe still have an issue for our problemNewlines are still there from the filefor line in open(‘text1.txt’):

print(line == ‘word1’)

We would see false for all lineEven thought word1 looks like it existsDue to the new line from the file!

FilesSlide38

We can use a Python feature called strip

This removes all whitespace from a string

Whitespace:NewlinesSpacesWe call it a bit different that other string functionsstr.strip()It returns the ‘stripped’ string

FilesSlide39

Using strip we get

for line in open(‘text1.txt’):

print(line.strip() == ‘word1’)Which results in: True

False

False False False

False

False FalseFilesSlide40

Using all these ideas we can make the hide and a helper function

FilesSlide41

We orginally

made this function to hide a word…

But it can be used to find a word tooWe look for, or search for, the word to replace itWe look for all occurrences of the word…Sometimes we only want to find the first time a word happens

FilesSlide42

Linear SearchThe process of visiting each element in a sequence in order stopping when either:

we find what we are looking for

We have visited every element, and not found a matchOften we wish to characterize how long an algorithm takesMeasuring in seconds often depends on the hardware, operating system, etc

We want to avoid such details

How do we do this?FilesSlide43

We avoid this by focusing on the size of the input, N…

And characterize the time spend as a function of N

This function is called the time complexityOne way to measure performance is to count the number of operations or statements the code makesFilesSlide44

For the hide function we:Strip the current word

Compare it to the hidden word

And then prints the resultsSince this is a loop, this occurs for each wordEven thought the operations can take different times individually, as the number of words grow this time difference are very smallThey are considered a constant amount

FilesSlide45

For a linear search we can see that different size of N can lead to different behaviors and times…

Say the element we are looking for is the first element… it’s very fast….

What if it is the last element?We must then look at every elementIf n is 10 then the slow case is no problem..What if it was 10 billion?

FilesSlide46

Typically we are interested in how bad it can get…

Or the Worst-case analysis

Consider the worst-case analysis of the linear search…For each element we spend some constant timePlus some fixed time to startup and end the searchIf processing of an element takes constant time kThen search all N elements should take

k

* NSimilarly the startup and end time is some constant cSo the time to run the linear search is (k * N) + c

FilesSlide47

So the time to run the linear search is

(

k * N) + cWhat are k and c?Most of the time we simply do not care….It is often good enough to know that the time complexity is a linear function with a non-zero slope!

The ,mathematical way to ignore this is to say the time complexity of linear search is

O(N)It is pronounced “Order N”This is known as “Big-O” notation

FilesSlide48

“Big-O” notationMakes it easy to compare algorithms…

Constant time, like comparing two characters,

O(1)There are some algorithms that are O(N2) and O(N

3

)We prefer O(1) to O(N), O(N) to O(N2

)

There are possibly

time complexities in between these…FilesSlide49

O(N

2

) example:def counter(n): for i in range(0,n): for j in range(0, n):

print(

i * j)O(N3) example:def counter(n):

for

i

in range(0,n):

for j in range(0, n): for k in range(0,n): print(i * j * k)Files