Python for Biologists Part 1 This Lecture Learning Objectives Install Python Data amp Variables Strings String slicing String methods Lists List methods amp list slicing Math Arrays ID: 225150
Download Presentation The PPT/PDF document "Introduction to" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Introduction to
Python for BiologistsPart 1
This LectureSlide2
Learning Objectives
Install PythonData & Variables
StringsString slicing
String methods
Lists
List methods & list slicing
Math
ArraysSlide3
Why do Biologist need to learn Programming?
http://archive.oreilly.com/pub/a/oreilly//news/perlbio_1001.html http://www.nature.com/nbt/journal/v31/n10/box/nbt.2721_BX1.html Biology is becoming a data-driven fieldNew technology enables scientists to generate large data sets in semi-automated experiments. Analysis of your own data is challenging
Automation saves timeMany interesting questions remain unanalyzed in huge amounts of publicly available dataIntegration of new experimental results with public data is a challenging computational problem
Scientists who can pursue innovative data analysis methods have an advantage over those limited to existing software (or those who require the assistance of other people with programming and data analysis skills)Slide4
Python
* is a Programming LanguageFree, open sourceRuns on all types of computers“User friendly and easy to learn”“clean readable code”Very popular among bioinformaticians
Good documentation availablehttps://wiki.python.org/moin/BeginnersGuide/Overview
Powerful “object oriented” features
Many add-on toolkits (“modules”) available for scientific computing, visualization, statistics, etc.
*Python is named after a 1970’s British comedy TV show, not a large snakeSlide5
Grad School
Python
Thanks to
xkcd
: https://xkcd.com/519/Slide6
Python.orgSlide7
Online Tutorials
You can’t learn an entire programming language from a couple of classroom lectures.There are many online tutorials for Python, which allow self-learning at your own pace We recommend:Codecademy.comTryPython.orgLearnPython.orgLearnPythontheHardWay.org/bookSoftware
CarpentyFor Biologists:Python for Biologists
Rosalind Python Village
(learn by solving problems)Slide8
Reading For this week:
Python for Biologists, chapter 1-3The anatomy of successful computational biology software. Altschul S, Demchak B, Durbin R, Gentleman R, Krzywinski M, Li H, Nekrutenko A, Robinson J, Rasband W, Taylor J, Trapnell C.Nature Biotechnology 2013
Oct;31(10):894-7. DOI:doi:10.1038/nbt.2721Slide9
Install Python
Assignment: Install Python on your computerBe sure to include the Numpy and SciPy modulesOne easy way to set up a GUI for Python (on Mac and Windows) is to download the free version of Anaconda: http://continuum.io/downloadsOr you can run the command line version on Linux or in the Macintosh Terminal (
for Mac you will need Xcode
, which is a free software developers toolkit from Apple, is not installed by default in OSX
)Slide10
Anaconda
Your life (in this course) will probably be easier if you install the (free) Anaconda – includes numerical, scientific, statistical, and graphics modules. http://continuum.io/downloadsSlide11
Programming Concepts
All programming languages are built from the same basic elements:dataoperatorsflow controlThese concepts are expressed in a specific syntax for each programming languageSlide12
Data types
Basic:Strings = 'GATCCATGCGAGACCCTTGA‘Numbers = 7, 123.455, 4.2e-14Boolean = True, FalseEvery data object has a type (try these examples on your own)>>> type (1)
>>> type (“GATCCT”)Slide13
Variables
A Variable is a named container for data (think of it as a box or a shelf that has a name)In Python, a variable can hold any type of data, does not need to be pre-definedThe data in the variable can be changed at any time (and can change to a different type)Python variable names must start with a letter, can only contain text letters and numbers and the underscore _ character. Case sensitiveSlide14
Comments
Comments are bits of text added by the programmer into the code that explain what is going on. They are not executed by the computer. Python uses the hash symbol # to mark a comment, anything on a line after the # is ignoredUse lots of clear comments in your code: for a good grade, so others can understand your code, and so you can understand your own code from the past (days, weeks, years… ago). Slide15
Examples of Variables
A value is assigned to a variable by the = sign. The value to the right of the = is put into the variable name on the left.my_DNA = "
ATGCGTA"gene_length
= 467
Dog_Text
= “my Dog has Fleas”
#spaces are part of a string
c
ounter = 6
pi_short
= 3.14
my_list
=
[a, b, c, d
]
HBB_human
=
“MVHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH
”
#this string is one line that wraps on the screenSlide16
Strings
Strings are text. Must always be in quotes.Can use single or double quotes, but must be consistentA string can contain space characters and also newline characters.Biology data involves a lot of strings: sequences, names (taxonomy, gene names), etc.A string is usually assigned to a variable>>>my_string = “gi|45478711|ref|NC_005816.1|Yersinia
pestis biovar
Microtus
”Slide17
String Methods
In Python, data objects of type ‘string’ have built in operators called ‘methods”Methods use a ‘dot’ syntax as follows: >>> my_DNA = "ATGCGTA" >>> my_DNA.count(G)
2
>>>
my_DNA.lower
()
'
atgcgta
‘
Slide18
String Concatenation
Two strings can be joined with the + operatorc = 'cat'h = 'hat'print ('cat' +
'hat')c
h
= c
+
h
print
ch
p
rint (c
+
'
in the
'
+ h)
Numbers must be converted to strings using the
str
()
function before using the string
concatentation
operator
A = 5
print (A
+
c)
#note the error message
print
(
'We have'
+
'
'
+
str
(A
)
+
'
'
+
c
+
's')
Slide19
More String methods
upper() and lower() return a value that changes the case of a string. You usually need to put this value into a variable, otherwise the original string is unchanged. >>> my_DNA
= “TATGCGTA"
>>>
my_DNA.lower
()
‘
tatgcgta
‘
>>>
my_DNA
'TATGCGTA‘
len
()
gives the length of a string
>>>
len
(
my_DNA
)
8Slide20
Find & Replace
find() is another handy string method. (Note: It only works for exact matches) >>> my_DNA = "ATGCGTA“ >>> my_DNA.find
("GC") 2
#returns the position index of the first occurrence of the search string in the target
replace()
finds and replaces letters in a string
>>>
my_DNA.
replace
('T', 'X' )
'AXGCGXA'Slide21
Lists
Lists contain a group of things, in square brackets, separated by commasList1 = [a, b, c, d]List2 = [“XP_008199794”, “PF03769”, “gi|54037254”]
List_mix = [“fish”, “hat”, “box”, 17, 4935.45, True]
The elements of a list do not all have to be of the same type
Lists are used for many tasks in Python that involve a lot of data.Slide22
List Elements
The elements in a list are ordered. They can be accessed by their index number in the list.Python starts counting list elements at zeroThe list index is indicated by a number in square brackets following the name of the listList slicing uses this format: [begin:end:step]You can do fancy things with list slicing, but intervals are counted with strange rules. You need to study this.
>>>
my_list
=['G
', 'A', 'hat', 'cat
']
>>>
my_list
[1]
'A'
>>>
my_list
[1:3]
['A', 'hat
']
>>>
my_list
[:-2]
[
'G', 'A
']Slide23
List Methods
You can assign a value to a specific position in a list: >>> my_list=['G', 'A', 'hat', 'cat'] >>> my_list[1] = “X”
>>> my_list
[
'G',
‘X',
'hat', 'cat']
List methods are functions built into the list data type. They use the ‘dot’ syntax just like string methods.
m
y_list.count
(‘G’)
1
l
ist.append
()
is a commonly used list method. It adds its argument to the end of a list. It is frequently used to collect results as a program steps through a loop
my_list.append
(‘T’)
>>>
my_list
['G', '
X
', 'hat',
'cat
'
,
'
T' ]Slide24
String SlicingStrings can be treated as a list of letters, and sliced with the exact same methods as lists
>>> my_DNA = "ATGCGTA" >>> my_DNA[1]
'T' >>>
my_DNA
[1:4]
'TGC
'Slide25
Split a string into a List
Sometimes it is helpful to turn a string into a list of words or numbers. The split() method does this.By default, it splits on whitespace, but any character specified in the parentheses can be used as delimiter.This is useful when working with tab delimited or comma delimited (csv) data.>>> names = "melanogaster,simulans,yakuba,ananassae
">>> species =
names.split
(",")
>>> print(names[1] + ' ' + species[2])
e
yakubaSlide26
The list() function
The list() function splits a string into a list of characters>>> hi = "Hello world">>> list(hi)['H', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd']Slide27
Join
join() turns a list of strings into a single string. You can add a spacer such as a comma or space character. It has a backwards syntax, where the spacer is the thing being acted upon by the method:>>> my_list=['G', 'A', 'hat', 'cat']>>> spacer = ':'
>>> newstring =
spacer.
join
(
my_list
)
>>>
newstring
'
G:A:hat:cat
'
>>>
'#'.
join
(
my_list
)
'
G#A#hat#cat
'Slide28
String Slicing: Exercises(do these yourself in Python shell)
>>> dna = 'CGGTTAATAGGGACTCTC'>>> dna[0]>>> dna[0:3]>>> dna[-1]>>>
dna[-1:-3]>>> dna[-3:-1]
>>>
dna
[0:5]
>>>
dna
[0:5:2]
>>>
dna
[0:5][::-1]
>>>
dna
[0:5][::-2]
>>>
dnaSlide29
Math
Python can do simple math like a calculator.Type the following expressions into an interactive Python session (or the IDE editor), hit the enter/return key (or Run button) and observe the results:2 + 26 – 3
8 / 3.09 * 3
6 ** 2Slide30
Math module
Python does not activate all of its built-in functions when you start it upYou use the “import” command to add modules.Type “import math” to get more advanced mathematics functions. math.sqrt() is a function in the math module. Try this:
import math
m
ath.sqrt
(36)
6.0Slide31
Simple Navigation
Doing some simple file system navigation in Python is unreasonably difficult (uses a module called os)Where am I?>>> import os>>> os.getcwd
()'C:\\Python27‘
What files are in this directory (folder)?
>>>
os.listdir
('.')
['at.py
', 'hello.py
',
'JASPAR-pfm_all.txt
',
'JasparClient.py',
'MA0024.1.pfm',
'my_blast.xml
',
'ros4.py
', 'rosalind_ini5.txt',
'SRR020192.fastq
',
'Test_100.fasta‘]
Change directory
>>>
os.chdir
('/Users/
stu
/Python
')Slide32
NumPy and Arrays
Arrays are like lists, but they contain only numbers, and they have dimensions.NumPy is a Python module that enables array operations. Here is a simple one dimensional array of integers (just like a list):>>> import numpy
as np
>>>
x =
np.array
([42,47,11],
int
)
>>>
x
>>>
array([42, 47, 11])
Software Carpentry
has a nice introduction to
NumPy
arrays:
http://swcarpentry.github.io/python-novice-inflammation/01-numpy.htmlSlide33
2-Dimensional Array
A two dimensional array is like a list of lists, but each row must have the same number of elements.>>> x = np.array( ((11,12,13), (21,22,23), (31,32,33)) ) >>> print x
[ [11 12 13]
[
21 22 23]
[
31 32 33
] ]
Note the nested square brackets
NumPy
has no problem with 3, 4, or more dimensions, but it is annoying to represent as text.Slide34
Matrix Math
Matrices are 2-dimensional arrays.NumPy has linear algebra methods for operations on matrices. These operations require that two matrices be of the same size. Vector additionMatrix subtractionMatrix multiplicationScalar product (dot
product)Cross product
>>> x =
np.array
([3,2])
>>>
y =
np.array
([5,1])
>>>
z = x + y
>>>
z
array
([8, 3]) Slide35
Assignment:
Rosalind Python VillageAll 6 problems (should take you 1-2 hours)Rosalind Python Village:http://rosalind.info/problems/list-view/?location=python-villageSlide36
Summary
Install Python
Data & Variables
Strings
String slicing
String methods
Lists
List methods & list slicing
Math
Arrays