Stephanie Spielman Big data in biology summer school 2018 Center for computational biology and bioinformatics University of Texas at austin Working with strings round 2 Filetext manipulation often uses some more advanced string methods ID: 931510
Download Presentation The PPT/PDF document "Introduction to Python: Day four" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Introduction to Python: Day four
Stephanie Spielman
Big data in biology summer school, 2018
Center for computational biology and bioinformatics
University of Texas at
austin
Slide2Working with strings, round 2
File/text manipulation often uses some more advanced string methods
stringvariable
.split
()
stringvariable
.join
()
stringvariable
.strip
()
.
rstrip
()
, .
lstrip
()
stringvariable
.startswith
()
stringvariable
.endswith
()
Slide3The .split() method
## Usage:
## <string>.split(<string/character to split on>)
mystring
= "Hello this is a string"
## Split string into a list on
a space
## The split argument is *removed* from the output
print(
mystring
.split
(" ")
)
["Hello", "this", "is", "a", "string"]
## Split string into a list on
the lowercase letter 's'
print(
mystring
.split
("s")
)
["
Hello
thi
", "
i
", " a ", "
tring
."]
Slide4The .join() method is opposite of .split()s
## Usage:
##
<string to join with >.
split
(<list to join>)
mystring
= "Hello this is a string"
## Split string into a list on
a space
## The split argument is *removed* from the output
print(
mystring
.split
(" ")
)
["Hello", "this", "is", "a", "string"]
x =
["Hello", "this", "is", "a", "string"]
print(
" "
.join(x)
)
"
Hello this is a
string"
# Useful for creating comma-separated values, IMO
x = ["col1", "col2", "col3"]
print
(
","
.join(x)
)
"col1,col2,col3"
Slide5the .strip() family removes leading and trailing whitespace,
etc
mystring
= " Hello this is a string"
print(
mystring
.strip
()
)
"
Hello this is a
string"
print(
mystring.
lstrip
()
)
"Hello this is a string"
print(
mystring
.rstrip
()
)
" Hello
this is a string"
newstring
= "
abcdefa
"
print(
newstring
.strip
("a")
)
"
bcdef
"
Slide6A note on whitespace
Symbol
Meaning
\s
Single space
\t
Tab
\n
Newline
\r
Return character (mostly on Windows)
Slide7.Startswith
() and .
endswith
()
mystring
= "Hello this is a string"
print(
mystring
.startswith
("H")
)
True
print(
mystring
.endswith
("g")
)
True
print(
mystring
.startswith
("Hello")
)
True
print(
mystring
.startswith
("badgers")
)
False
# Useful for file parsing!!
for line in
file_lines
:
if
line.startswith
("some important thing"):
## do something with these lines only
Slide8breathing break
Slide9Regular expressions
Pattern-based
search and replace
Extremely powerful beyond all reason
Excellent for
text (file)
manipulation!
Slide10Regular expressions
String: Mus
musculus
Regex: Mus
Match:
Mus
musculus
Slide11Regular expressions
String: Mus
musculus
Regex: Mus
musculus
Match:
Mus
musculus
Slide12Regular expressions
String: Mus
musculus
Regex:
[
mM
]
us
Match:
Mus
musculus
Slide13Regular expressions
String: Mus
musculus
Regex:
[A-
Za
-z]
us
Match:
Mus
musculus
Slide14Regular expressions
String: Mus
musculus
Regex:
\
w
us
Match:
Mus
musculus
Slide15Regular expressions
String: Mus
musculus
Regex:
\w+
Match:
Mus
musculus
Slide16Regular expressions
String: Mus
musculus
Regex:
[A-Z]\w+ \w+
Match:
Mus
musculus
Slide17Regular expressions
String: Mus
musculus
Regex:
(
[A-Z]
)
\w+
(
\w+
)
Replace: \1. \2
New string: M.
musculus
Slide18Regular expressions
String: 85.34 cm
Regex:
\d+
Match:
85.34
cm
Slide19Regular expressions
String: 85.34 cm
Regex:
\d+\.\d+
Match:
85.34
cm
Slide20Regular expressions
String: 85.34 cm
Regex:
\d+\.\d+ \w+
Match:
85.34
cm
Slide21Regular expressions
String: 85 cm
Regex:
\d+\.\d+ \w+
Match: 85 cm
Slide22Regular expressions
String: 85 cm
Regex:
\d+\.*\d* \w+
Match: 85 cm
Slide23Regular expressions
String: 85 cm
Regex:
^\d
Match: 85 cm
Slide24Regular expressions
String: 85 cm
Regex:
\w$
Match: 85 cm
Slide25Regular expressions
String: 85.341234 cm
Regex:
(
\d+\.\d{3}
)
\d
+ cm
Replace: \1
New string: 85.341
Slide26Regular expressions
String: 85.34 cm
Regex:
(
\d+\.\d{3}
)
\d+ cm
Replace: \1
New string: ?????
Slide27Group Exercise
Come up with a regular expression to convert the following text:
85.34 cm
85.3 cm
85.678 cm
85.6 cm
923.1115 cm
923.1 cm
1.95 cm
1.9 cm
6 cm
6
cm
Slide28exercise break
Slide29The re module
Full documentation:
https://
docs.python.org/3/library/re.html
Greatest hits of the re module:
re.split
()
splits text on a regex
re.search
()
search for a single regex occurrence
re.findall
()
searches for all occurrences of a regex
re.sub
()
replace a regex pattern
Generally,
re.functionnname
(regex, string)
Slide30re.split()
## Recall regular .split():
mystring
= "
stephaniespielman
"
mystring.split
("e")
["
st
", "
phani
", "pi", "
lman
"]
##
re.split
(regex, string) splits on a regex pattern
mynewstring
= "100,000,000.000"
re.split
(
"[,\.]"
,
mynewstring
)
[
'100', '000', '000', '000
']
## Extra useful for splitting on *arbitrary whitespace*
otherstring
= "hello
\
t
goodbye
seeya
\
n
imback
"
re.split
(
"\s+"
,
otherstring
)
['hello', 'goodbye', '
seeya
', '
imback
']
Slide31re.search()
## Search for occurrence of a number, for example
mystring
= "Stephanie was born 10/11/88 at 10:21 am"
searches =
re.search
(
"\d+\/\d+\/\d+"
,
mystring
)
print(searches
)
<_
sre.SRE_Match
object; span=(19, 27), match='10/11/88
'>
print(
searches
.group
(0)
)
'
10/11/88
'
## Use parentheses to search for several patterns
searches
=
re.search
(
"
(
\
d+\/\d+\/\d
+
)
.+
(
\d+:\d+
)
"
,
mystring
)
print(
searches.group
(0))
## The full match
'
10/11/88
at
10:21
'
print
(
searches.group
(1))
## First captured group
'
10/11/88
'
print
(
searches.group
(2))
## Second captured group
'
0:21
'
## Be as explicit as possible!
searches =
re.search
(
"
(
\d+\/\d+\/\d
+
)
.+\s
(
\
d+:\d+
)
"
,
mystring
)
print
(
searches.group
(2))
## Second captured
group, fixed
'1
0:21
'
Slide32re.findall()
## Returns a list of all detected patterns
mystring
= "Stephanie was born 10/11/88, and Basil was
born
on 5/9/16
"
finds =
re.findall
(
"\d+\/\d+\/\d+"
,
mystring
)
print(finds)
['
10/11/88
', '5/9/16']
Slide33re.sub()
##
The regex version of .replace()
## Usage:
re.sub
(regex to find, regex to replace with, string)
mystring
= "Stephanie was born 10/11/88, and Basil was
born
on 5/9/16. But I like this slash /."
## We want to achieve this new string:
##
"Stephanie was born
10-11-88
, and Basil was
born
on
5-9-16.
But
I like this slash
/."
print(
re.sub
(
"(\
d
+)\/
(\
d
+)
\/
(\
d
+)
"
, "
\\1
-
\\2
-
\\3
",
mystring
) )
'Stephanie
was born 10-11-88, and Basil was born on 5-9-16.
But I like this slash
/.'
## As usual, must redefine to save!
new =
re.sub
("(\d+)\/(\d+)\/(\d+)", "\\1-\\2-\\3",
mystring
)
Slide34exercise break
Slide35python modules
Separate libraries of code that provide specific functionality for a certain set of tasks
Some are part of
base Python
and some are not
Slide36a few base-python modules
os
and
shutil
Useful for interacting with the
o
perating
s
ystem
sys
Useful for interacting with the Python interpreter
subprocess
Useful for calling external software from your Python script
re
Regular expressions
Slide37loading modules in a script
Use the import command at the *top* of your script:
import
os
import
os
as
opsys
from
os
import *
from
os
import
<function
/
submodule
>
loading modules in a script
Use the import command at the *top* of your script:
import
os
import
os
as
opsys
from
os
import *
from
os
import
<function
/
submodule
>
use as
os.function_name
()
opsys.function_name
()
use as
function_name
()
Slide39loading modules in a script
Use the import command at the *top* of your script:
import
os
import
os
as
opsys
from
os
import *
from
os
import
<function
/
submodule
>
use as
os.function_name
()
opsys.function_name
()
use as
function_name
()
Slide40the os
/
shutil
modules
Functions provide UNIX commands
os
/
shutil
function
UNIX
equivalent
os.remove
("filename")
rm
filename
os.rmdir
("directory")
rm
–r directory
os.chdir
("directory")
cd directory
os.listdir
("directory")
ls
directory
os.mkdir
("directory")
mkdir
directory
shutil.copy
("
oldfile
", "
newfile
")
cp
oldfile
newfile
shutil.move
("
oldfile
", "
newfile
")
mv
oldfile
newfile
Slide41looping over files with os.listdir
import
os
directory = "my/directory/with/tons/of/files/"
# Obtain list of files in directory
files =
os.listdir
(directory)
# Loop over files that end with .txt
for file in files:
if
file.endswith
(".txt"):
f = open(directory + file, "r")
# do something with file
f.close
()
Slide42the sys module
A few variables/functions I find useful:
sys.path
sys.exit
()
sys.argv
Slide43using sys.path
sys.path
is a list of directories in your
PYTHONPATH
import sys
# Add directories as usual, with append!
sys.path.append
("directory/I/want/to/access")
Slide44using sys.exit
()
sys.exit
()
will immediately stop the interpreter and exit out of the script
Slide45using sys.exit
()
sys.exit
()
will immediately stop the interpreter and exit out of the script
import sys
if
something_important
== False:
print(
"Oh no, something is wrong
!!!")
sys.exit
()
Slide46using sys.argv
sys.argv
is a list of command-line input arguments
Always read as
strings
sys.argv
[0]
## The name of the script
sys.argv
[1]
## The value of the first command line
arg
sys.argv
[2]
## The value of the
second command
line
arg
Slide47sys.argv script
################ This is the script ##############
import
sys
value =
sys.argv
[1]
print("You provided", value)
###################################################
### Calling script from console with an argument ####
python
myscript.py
75
You provided 75
### You'll get an error if no argument is provided ###
python
myscript.py
Traceback
(most recent call last):
File "
hi.py
", line 3, in <module>
value =
sys.argv
[1]
IndexError
: list index out of range
Slide48sys.argv script
fancified
################ This is the script ##############
import sys
assert(
len
(
sys.argv
) == 2
), "Expected an argument"
value =
sys.argv
[1]
print("You provided", value)
###################################################
### You'll get an error if no argument is provided ###
python
myscript.py
"
Expected an argument"
Slide49sys.argv script
################ This is the script ##############
import sys
assert(
len
(
sys.argv
) == 2), "Expected an argument"
value =
sys.argv
[1]
print(
int
(value)
+ 25)
###################################################
### Calling script from console ####
python
myscript.py
75
Traceback
(most recent call last):
File
"
myscript.py
",
line 4, in <module>
print(value + 25)
TypeError
: cannot concatenate '
str
' and '
int
' objects
Slide50sys.argv script, slightly fancy
################ This is the script ##############
import sys
assert(
len
(
sys.argv
) == 2), "Expected an argument"
value =
float(
sys.argv
[1]
)
print(value + 25)
###################################################
### Calling script from console ####
python
myscripy.py
75
100.0
Slide51A bit fancier
################ This is the script ##############
import sys
assert(
len
(
sys.argv
) == 2),
"Usage: python
myscript.py
<value>"
value =
float(
sys.argv
[1]
)
print(value + 25)
###################################################
### Calling script from console ####
python
myscripy.py
75
100.0
Slide52Fanciest: Try/except
################ This is the script ##############
import sys
assert(
len
(
sys.argv
) == 2),
"Usage: python
myscript
<value>"
value =
float(
sys.argv
[1]
)
print(value + 25)
###################################################
### Calling script from console ####
python
myscript.py
75
100.0
python
myscript.py
Stephanie
Traceback
(most recent call last):
File
"
myscript.py
",
line 4, in <module>
value = float(
sys.argv
[1])
ValueError
: could not convert string to float: S
tephanie
Slide53################ This is the script ##############
import sys
assert(
len
(
sys.argv
) == 2), "Usage: python
myscript
<value>"
value =
sys.argv
[1]
try:
value = float(value)
except:
raise
AssertionError
("Couldn't make the input a float!")
print(value + 25)
###################################################
### Calling script from console ####
python
myscripy.py
75
100.0
python
myscripy.py
Stephanie
"
Couldn't make the input a float!"
Slide54Try/except, more generally
...
... Python code
...
try:
...
... Attempt code which might raise an error
...
except:
...
...
Code to run if an error of any kind occurred
...
...
... Python code
...
Slide55Try/except, more generally
...
... Python code
...
try:
...
... Attempt code which might raise an error
...
except
TypeError
:
...
...
Run only if a Type Error *Specifically* occurred
...
...
... Python code
...
Slide56Heavy duty science libraries
scipy
and
numpy
Work with matrices
F
undamental scientific computing
Matlab
in Python
https://www.scipy.org
/
http
://www.numpy.org
/
pandas
Data structures (R for python
ish
)
https://pandas.pydata.org
/
scikit
-learn
Machine learning
http://
scikit-learn.org
/stable/
Slide57creating your own modules
Any python script can be imported into another!
# Import a script named
useful_functions.py
import sys
sys.path.append
("/path/to/the/script")
import
useful_functions
# OR:
from
useful_functions
import *
Slide58install external modules
Use the program
pip
from a bash terminal
Linux users can obtain pip with:
sudo
apt-get install pip
Mac users w/ homebrew have it already (comes with Python)
Install package named XXX with:
pip
install XXX