/
Introduction to Python: Day four Introduction to Python: Day four

Introduction to Python: Day four - PowerPoint Presentation

DaintyLittlePrincess
DaintyLittlePrincess . @DaintyLittlePrincess
Follow
343 views
Uploaded On 2022-08-01

Introduction to Python: Day four - PPT Presentation

Stephanie Spielman Big data in biology summer school 2018 Center for computational biology and bioinformatics University of Texas at austin Working with strings round 2 Filetext manipulation often uses some more advanced string methods ID: 931510

string sys print script sys string script print import python regex argv mystring split regular expressions match musculus mus

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Introduction to Python: Day four" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Introduction to Python: Day four

Stephanie Spielman

Big data in biology summer school, 2018

Center for computational biology and bioinformatics

University of Texas at

austin

Slide2

Working with strings, round 2

File/text manipulation often uses some more advanced string methods

stringvariable

.split

()

stringvariable

.join

()

stringvariable

.strip

()

.

rstrip

()

, .

lstrip

()

stringvariable

.startswith

()

stringvariable

.endswith

()

Slide3

The .split() method

## Usage:

## <string>.split(<string/character to split on>)

mystring

= "Hello this is a string"

## Split string into a list on

a space

## The split argument is *removed* from the output

print(

mystring

.split

(" ")

)

["Hello", "this", "is", "a", "string"]

## Split string into a list on

the lowercase letter 's'

print(

mystring

.split

("s")

)

["

Hello

thi

", "

i

", " a ", "

tring

."]

Slide4

The .join() method is opposite of .split()s

## Usage:

##

<string to join with >.

split

(<list to join>)

mystring

= "Hello this is a string"

## Split string into a list on

a space

## The split argument is *removed* from the output

print(

mystring

.split

(" ")

)

["Hello", "this", "is", "a", "string"]

x =

["Hello", "this", "is", "a", "string"]

print(

" "

.join(x)

)

"

Hello this is a

string"

# Useful for creating comma-separated values, IMO

x = ["col1", "col2", "col3"]

print

(

","

.join(x)

)

"col1,col2,col3"

Slide5

the .strip() family removes leading and trailing whitespace,

etc

mystring

= " Hello this is a string"

print(

mystring

.strip

()

)

"

Hello this is a

string"

print(

mystring.

lstrip

()

)

"Hello this is a string"

print(

mystring

.rstrip

()

)

" Hello

this is a string"

newstring

= "

abcdefa

"

print(

newstring

.strip

("a")

)

"

bcdef

"

Slide6

A note on whitespace

Symbol

Meaning

\s

Single space

\t

Tab

\n

Newline

\r

Return character (mostly on Windows)

Slide7

.Startswith

() and .

endswith

()

mystring

= "Hello this is a string"

print(

mystring

.startswith

("H")

)

True

print(

mystring

.endswith

("g")

)

True

print(

mystring

.startswith

("Hello")

)

True

print(

mystring

.startswith

("badgers")

)

False

# Useful for file parsing!!

for line in

file_lines

:

if

line.startswith

("some important thing"):

## do something with these lines only

Slide8

breathing break

Slide9

Regular expressions

Pattern-based

search and replace

Extremely powerful beyond all reason

Excellent for

text (file)

manipulation!

Slide10

Regular expressions

String: Mus

musculus

Regex: Mus

Match:

Mus

musculus

Slide11

Regular expressions

String: Mus

musculus

Regex: Mus

musculus

Match:

Mus

musculus

Slide12

Regular expressions

String: Mus

musculus

Regex:

[

mM

]

us

Match:

Mus

musculus

Slide13

Regular expressions

String: Mus

musculus

Regex:

[A-

Za

-z]

us

Match:

Mus

musculus

Slide14

Regular expressions

String: Mus

musculus

Regex:

\

w

us

Match:

Mus

musculus

Slide15

Regular expressions

String: Mus

musculus

Regex:

\w+

Match:

Mus

musculus

Slide16

Regular expressions

String: Mus

musculus

Regex:

[A-Z]\w+ \w+

Match:

Mus

musculus

Slide17

Regular expressions

String: Mus

musculus

Regex:

(

[A-Z]

)

\w+

(

\w+

)

Replace: \1. \2

New string: M.

musculus

Slide18

Regular expressions

String: 85.34 cm

Regex:

\d+

Match:

85.34

cm

Slide19

Regular expressions

String: 85.34 cm

Regex:

\d+\.\d+

Match:

85.34

cm

Slide20

Regular expressions

String: 85.34 cm

Regex:

\d+\.\d+ \w+

Match:

85.34

cm

Slide21

Regular expressions

String: 85 cm

Regex:

\d+\.\d+ \w+

Match: 85 cm

Slide22

Regular expressions

String: 85 cm

Regex:

\d+\.*\d* \w+

Match: 85 cm

Slide23

Regular expressions

String: 85 cm

Regex:

^\d

Match: 85 cm

Slide24

Regular expressions

String: 85 cm

Regex:

\w$

Match: 85 cm

Slide25

Regular expressions

String: 85.341234 cm

Regex:

(

\d+\.\d{3}

)

\d

+ cm

Replace: \1

New string: 85.341

Slide26

Regular expressions

String: 85.34 cm

Regex:

(

\d+\.\d{3}

)

\d+ cm

Replace: \1

New string: ?????

Slide27

Group Exercise

Come up with a regular expression to convert the following text:

85.34 cm

85.3 cm

85.678 cm

85.6 cm

923.1115 cm

923.1 cm

1.95 cm

1.9 cm

6 cm

6

cm

Slide28

exercise break

Slide29

The re module

Full documentation:

https://

docs.python.org/3/library/re.html

Greatest hits of the re module:

re.split

()

splits text on a regex

re.search

()

search for a single regex occurrence

re.findall

()

searches for all occurrences of a regex

re.sub

()

replace a regex pattern

Generally,

re.functionnname

(regex, string)

Slide30

re.split()

## Recall regular .split():

mystring

= "

stephaniespielman

"

mystring.split

("e")

["

st

", "

phani

", "pi", "

lman

"]

##

re.split

(regex, string) splits on a regex pattern

mynewstring

= "100,000,000.000"

re.split

(

"[,\.]"

,

mynewstring

)

[

'100', '000', '000', '000

']

## Extra useful for splitting on *arbitrary whitespace*

otherstring

= "hello

\

t

goodbye

seeya

\

n

imback

"

re.split

(

"\s+"

,

otherstring

)

['hello', 'goodbye', '

seeya

', '

imback

']

Slide31

re.search()

## Search for occurrence of a number, for example

mystring

= "Stephanie was born 10/11/88 at 10:21 am"

searches =

re.search

(

"\d+\/\d+\/\d+"

,

mystring

)

print(searches

)

<_

sre.SRE_Match

object; span=(19, 27), match='10/11/88

'>

print(

searches

.group

(0)

)

'

10/11/88

'

## Use parentheses to search for several patterns

searches

=

re.search

(

"

(

\

d+\/\d+\/\d

+

)

.+

(

\d+:\d+

)

"

,

mystring

)

print(

searches.group

(0))

## The full match

'

10/11/88

at

10:21

'

print

(

searches.group

(1))

## First captured group

'

10/11/88

'

print

(

searches.group

(2))

## Second captured group

'

0:21

'

## Be as explicit as possible!

searches =

re.search

(

"

(

\d+\/\d+\/\d

+

)

.+\s

(

\

d+:\d+

)

"

,

mystring

)

print

(

searches.group

(2))

## Second captured

group, fixed

'1

0:21

'

Slide32

re.findall()

## Returns a list of all detected patterns

mystring

= "Stephanie was born 10/11/88, and Basil was

born

on 5/9/16

"

finds =

re.findall

(

"\d+\/\d+\/\d+"

,

mystring

)

print(finds)

['

10/11/88

', '5/9/16']

Slide33

re.sub()

##

The regex version of .replace()

## Usage:

re.sub

(regex to find, regex to replace with, string)

mystring

= "Stephanie was born 10/11/88, and Basil was

born

on 5/9/16. But I like this slash /."

## We want to achieve this new string:

##

"Stephanie was born

10-11-88

, and Basil was

born

on

5-9-16.

But

I like this slash

/."

print(

re.sub

(

"(\

d

+)\/

(\

d

+)

\/

(\

d

+)

"

, "

\\1

-

\\2

-

\\3

",

mystring

) )

'Stephanie

was born 10-11-88, and Basil was born on 5-9-16.

But I like this slash

/.'

## As usual, must redefine to save!

new =

re.sub

("(\d+)\/(\d+)\/(\d+)", "\\1-\\2-\\3",

mystring

)

Slide34

exercise break

Slide35

python modules

Separate libraries of code that provide specific functionality for a certain set of tasks

Some are part of

base Python

and some are not

Slide36

a few base-python modules

os

and

shutil

Useful for interacting with the

o

perating

s

ystem

sys

Useful for interacting with the Python interpreter

subprocess

Useful for calling external software from your Python script

re

Regular expressions

Slide37

loading modules in a script

Use the import command at the *top* of your script:

import

os

import

os

as

opsys

from

os

import *

from

os

import

<function

/

submodule

>

Slide38

loading modules in a script

Use the import command at the *top* of your script:

import

os

import

os

as

opsys

from

os

import *

from

os

import

<function

/

submodule

>

use as

os.function_name

()

opsys.function_name

()

use as

function_name

()

Slide39

loading modules in a script

Use the import command at the *top* of your script:

import

os

import

os

as

opsys

from

os

import *

from

os

import

<function

/

submodule

>

use as

os.function_name

()

opsys.function_name

()

use as

function_name

()

Slide40

the os

/

shutil

modules

Functions provide UNIX commands

os

/

shutil

function

UNIX

equivalent

os.remove

("filename")

rm

filename

os.rmdir

("directory")

rm

–r directory

os.chdir

("directory")

cd directory

os.listdir

("directory")

ls

directory

os.mkdir

("directory")

mkdir

directory

shutil.copy

("

oldfile

", "

newfile

")

cp

oldfile

newfile

shutil.move

("

oldfile

", "

newfile

")

mv

oldfile

newfile

Slide41

looping over files with os.listdir

import

os

directory = "my/directory/with/tons/of/files/"

# Obtain list of files in directory

files =

os.listdir

(directory)

# Loop over files that end with .txt

for file in files:

if

file.endswith

(".txt"):

f = open(directory + file, "r")

# do something with file

f.close

()

Slide42

the sys module

A few variables/functions I find useful:

sys.path

sys.exit

()

sys.argv

Slide43

using sys.path

sys.path

is a list of directories in your

PYTHONPATH

import sys

# Add directories as usual, with append!

sys.path.append

("directory/I/want/to/access")

Slide44

using sys.exit

()

sys.exit

()

will immediately stop the interpreter and exit out of the script

Slide45

using sys.exit

()

sys.exit

()

will immediately stop the interpreter and exit out of the script

import sys

if

something_important

== False:

print(

"Oh no, something is wrong

!!!")

sys.exit

()

Slide46

using sys.argv

sys.argv

is a list of command-line input arguments

Always read as

strings

sys.argv

[0]

## The name of the script

sys.argv

[1]

## The value of the first command line

arg

sys.argv

[2]

## The value of the

second command

line

arg

Slide47

sys.argv script

################ This is the script ##############

import

sys

value =

sys.argv

[1]

print("You provided", value)

###################################################

### Calling script from console with an argument ####

python

myscript.py

75

You provided 75

### You'll get an error if no argument is provided ###

python

myscript.py

Traceback

(most recent call last):

  File "

hi.py

", line 3, in <module>

    value =

sys.argv

[1]

IndexError

: list index out of range

Slide48

sys.argv script

fancified

################ This is the script ##############

import sys

assert(

len

(

sys.argv

) == 2

), "Expected an argument"

value =

sys.argv

[1]

print("You provided", value)

###################################################

### You'll get an error if no argument is provided ###

python

myscript.py

"

Expected an argument"

Slide49

sys.argv script

################ This is the script ##############

import sys

assert(

len

(

sys.argv

) == 2), "Expected an argument"

value =

sys.argv

[1]

print(

int

(value)

+ 25)

###################################################

### Calling script from console ####

python

myscript.py

75

Traceback

(most recent call last):

  File

"

myscript.py

",

line 4, in <module>

    print(value + 25)

TypeError

: cannot concatenate '

str

' and '

int

' objects

Slide50

sys.argv script, slightly fancy

################ This is the script ##############

import sys

assert(

len

(

sys.argv

) == 2), "Expected an argument"

value =

float(

sys.argv

[1]

)

print(value + 25)

###################################################

### Calling script from console ####

python

myscripy.py

75

100.0

Slide51

A bit fancier

################ This is the script ##############

import sys

assert(

len

(

sys.argv

) == 2),

"Usage: python

myscript.py

<value>"

value =

float(

sys.argv

[1]

)

print(value + 25)

###################################################

### Calling script from console ####

python

myscripy.py

75

100.0

Slide52

Fanciest: Try/except

################ This is the script ##############

import sys

assert(

len

(

sys.argv

) == 2),

"Usage: python

myscript

<value>"

value =

float(

sys.argv

[1]

)

print(value + 25)

###################################################

### Calling script from console ####

python

myscript.py

75

100.0

python

myscript.py

Stephanie

Traceback

(most recent call last):

  File

"

myscript.py

",

line 4, in <module>

    value = float(

sys.argv

[1])

ValueError

: could not convert string to float: S

tephanie

Slide53

################ This is the script ##############

import sys

assert(

len

(

sys.argv

) == 2), "Usage: python

myscript

<value>"

value =

sys.argv

[1]

try:

value = float(value)

except:

raise

AssertionError

("Couldn't make the input a float!")

print(value + 25)

###################################################

### Calling script from console ####

python

myscripy.py

75

100.0

python

myscripy.py

Stephanie

"

Couldn't make the input a float!"

Slide54

Try/except, more generally

...

... Python code

...

try:

...

... Attempt code which might raise an error

...

except:

...

...

Code to run if an error of any kind occurred

...

...

... Python code

...

Slide55

Try/except, more generally

...

... Python code

...

try:

...

... Attempt code which might raise an error

...

except

TypeError

:

...

...

Run only if a Type Error *Specifically* occurred

...

...

... Python code

...

Slide56

Heavy duty science libraries

scipy

and

numpy

Work with matrices

F

undamental scientific computing

Matlab

in Python

https://www.scipy.org

/

http

://www.numpy.org

/

pandas

Data structures (R for python

ish

)

https://pandas.pydata.org

/

scikit

-learn

Machine learning

http://

scikit-learn.org

/stable/

Slide57

creating your own modules

Any python script can be imported into another!

# Import a script named

useful_functions.py

import sys

sys.path.append

("/path/to/the/script")

import

useful_functions

# OR:

from

useful_functions

import *

Slide58

install external modules

Use the program

pip

from a bash terminal

Linux users can obtain pip with:

sudo

apt-get install pip

Mac users w/ homebrew have it already (comes with Python)

Install package named XXX with:

pip

install XXX