/
Introduction to Introduction to

Introduction to - PowerPoint Presentation

pamella-moone
pamella-moone . @pamella-moone
Follow
360 views
Uploaded On 2016-02-20

Introduction to - PPT Presentation

Python for Biologists Part 1 This Lecture Learning Objectives Install Python Data amp Variables Strings String slicing String methods Lists List methods amp list slicing Math Arrays ID: 225150

python list dna string list python string dna data methods array hat type strings cat math lists variable slicing

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Introduction to" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Introduction to

Python for BiologistsPart 1

This LectureSlide2

Learning Objectives

Install PythonData & Variables

StringsString slicing

String methods

Lists

List methods & list slicing

Math

ArraysSlide3

Why do Biologist need to learn Programming?

http://archive.oreilly.com/pub/a/oreilly//news/perlbio_1001.html http://www.nature.com/nbt/journal/v31/n10/box/nbt.2721_BX1.html Biology is becoming a data-driven fieldNew technology enables scientists to generate large data sets in semi-automated experiments. Analysis of your own data is challenging

Automation saves timeMany interesting questions remain unanalyzed in huge amounts of publicly available dataIntegration of new experimental results with public data is a challenging computational problem

Scientists who can pursue innovative data analysis methods have an advantage over those limited to existing software (or those who require the assistance of other people with programming and data analysis skills)Slide4

Python

* is a Programming LanguageFree, open sourceRuns on all types of computers“User friendly and easy to learn”“clean readable code”Very popular among bioinformaticians

Good documentation availablehttps://wiki.python.org/moin/BeginnersGuide/Overview

Powerful “object oriented” features

Many add-on toolkits (“modules”) available for scientific computing, visualization, statistics, etc.

*Python is named after a 1970’s British comedy TV show, not a large snakeSlide5

Grad School

Python

Thanks to

xkcd

: https://xkcd.com/519/Slide6

Python.orgSlide7

Online Tutorials

You can’t learn an entire programming language from a couple of classroom lectures.There are many online tutorials for Python, which allow self-learning at your own pace We recommend:Codecademy.comTryPython.orgLearnPython.orgLearnPythontheHardWay.org/bookSoftware

CarpentyFor Biologists:Python for Biologists

Rosalind Python Village

(learn by solving problems)Slide8

Reading For this week:

Python for Biologists, chapter 1-3The anatomy of successful computational biology software. Altschul S, Demchak B, Durbin R, Gentleman R, Krzywinski M, Li H, Nekrutenko A, Robinson J, Rasband W, Taylor J, Trapnell C.Nature Biotechnology 2013

Oct;31(10):894-7. DOI:doi:10.1038/nbt.2721Slide9

Install Python

Assignment: Install Python on your computerBe sure to include the Numpy and SciPy modulesOne easy way to set up a GUI for Python (on Mac and Windows) is to download the free version of Anaconda: http://continuum.io/downloadsOr you can run the command line version on Linux or in the Macintosh Terminal (

for Mac you will need Xcode

, which is a free software developers toolkit from Apple, is not installed by default in OSX

)Slide10

Anaconda

Your life (in this course) will probably be easier if you install the (free) Anaconda – includes numerical, scientific, statistical, and graphics modules. http://continuum.io/downloadsSlide11

Programming Concepts

All programming languages are built from the same basic elements:dataoperatorsflow controlThese concepts are expressed in a specific syntax for each programming languageSlide12

Data types

Basic:Strings = 'GATCCATGCGAGACCCTTGA‘Numbers = 7, 123.455, 4.2e-14Boolean = True, FalseEvery data object has a type (try these examples on your own)>>> type (1)

>>> type (“GATCCT”)Slide13

Variables

A Variable is a named container for data (think of it as a box or a shelf that has a name)In Python, a variable can hold any type of data, does not need to be pre-definedThe data in the variable can be changed at any time (and can change to a different type)Python variable names must start with a letter, can only contain text letters and numbers and the underscore _ character. Case sensitiveSlide14

Comments

Comments are bits of text added by the programmer into the code that explain what is going on. They are not executed by the computer. Python uses the hash symbol # to mark a comment, anything on a line after the # is ignoredUse lots of clear comments in your code: for a good grade, so others can understand your code, and so you can understand your own code from the past (days, weeks, years… ago). Slide15

Examples of Variables

A value is assigned to a variable by the = sign. The value to the right of the = is put into the variable name on the left.my_DNA = "

ATGCGTA"gene_length

= 467

Dog_Text

= “my Dog has Fleas”

#spaces are part of a string

c

ounter = 6

pi_short

= 3.14

my_list

=

[a, b, c, d

]

HBB_human

=

“MVHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH

#this string is one line that wraps on the screenSlide16

Strings

Strings are text. Must always be in quotes.Can use single or double quotes, but must be consistentA string can contain space characters and also newline characters.Biology data involves a lot of strings: sequences, names (taxonomy, gene names), etc.A string is usually assigned to a variable>>>my_string = “gi|45478711|ref|NC_005816.1|Yersinia

pestis biovar

Microtus

”Slide17

String Methods

In Python, data objects of type ‘string’ have built in operators called ‘methods”Methods use a ‘dot’ syntax as follows: >>> my_DNA = "ATGCGTA" >>> my_DNA.count(G)

2

>>>

my_DNA.lower

()

'

atgcgta

Slide18

String Concatenation

Two strings can be joined with the + operatorc = 'cat'h = 'hat'print ('cat' +

'hat')c

h

= c

+

h

print

ch

p

rint (c

+

'

in the

'

+ h)

Numbers must be converted to strings using the

str

()

function before using the string

concatentation

operator

A = 5

print (A

+

c)

#note the error message

print

(

'We have'

+

'

'

+

str

(A

)

+

'

'

+

c

+

's')

Slide19

More String methods

upper() and lower() return a value that changes the case of a string. You usually need to put this value into a variable, otherwise the original string is unchanged. >>> my_DNA

= “TATGCGTA"

>>>

my_DNA.lower

()

tatgcgta

>>>

my_DNA

'TATGCGTA‘

len

()

gives the length of a string

>>>

len

(

my_DNA

)

8Slide20

Find & Replace

find() is another handy string method. (Note: It only works for exact matches) >>> my_DNA = "ATGCGTA“ >>> my_DNA.find

("GC") 2

#returns the position index of the first occurrence of the search string in the target

replace()

finds and replaces letters in a string

>>>

my_DNA.

replace

('T', 'X' )

'AXGCGXA'Slide21

Lists

Lists contain a group of things, in square brackets, separated by commasList1 = [a, b, c, d]List2 = [“XP_008199794”, “PF03769”, “gi|54037254”]

List_mix = [“fish”, “hat”, “box”, 17, 4935.45, True]

The elements of a list do not all have to be of the same type

Lists are used for many tasks in Python that involve a lot of data.Slide22

List Elements

The elements in a list are ordered. They can be accessed by their index number in the list.Python starts counting list elements at zeroThe list index is indicated by a number in square brackets following the name of the listList slicing uses this format: [begin:end:step]You can do fancy things with list slicing, but intervals are counted with strange rules. You need to study this.

>>>

my_list

=['G

', 'A', 'hat', 'cat

']

>>>

my_list

[1]

'A'

>>>

my_list

[1:3]

['A', 'hat

']

>>>

my_list

[:-2]

[

'G', 'A

']Slide23

List Methods

You can assign a value to a specific position in a list: >>> my_list=['G', 'A', 'hat', 'cat'] >>> my_list[1] = “X”

>>> my_list

[

'G',

‘X',

'hat', 'cat']

List methods are functions built into the list data type. They use the ‘dot’ syntax just like string methods.

m

y_list.count

(‘G’)

1

l

ist.append

()

is a commonly used list method. It adds its argument to the end of a list. It is frequently used to collect results as a program steps through a loop

my_list.append

(‘T’)

>>>

my_list

['G', '

X

', 'hat',

'cat

'

,

'

T' ]Slide24

String SlicingStrings can be treated as a list of letters, and sliced with the exact same methods as lists

>>> my_DNA = "ATGCGTA" >>> my_DNA[1]

'T' >>>

my_DNA

[1:4]

'TGC

'Slide25

Split a string into a List

Sometimes it is helpful to turn a string into a list of words or numbers. The split() method does this.By default, it splits on whitespace, but any character specified in the parentheses can be used as delimiter.This is useful when working with tab delimited or comma delimited (csv) data.>>> names = "melanogaster,simulans,yakuba,ananassae

">>> species =

names.split

(",")

>>> print(names[1] + ' ' + species[2])

e

yakubaSlide26

The list() function

The list() function splits a string into a list of characters>>> hi = "Hello world">>> list(hi)['H', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd']Slide27

Join

join() turns a list of strings into a single string. You can add a spacer such as a comma or space character. It has a backwards syntax, where the spacer is the thing being acted upon by the method:>>> my_list=['G', 'A', 'hat', 'cat']>>> spacer = ':'

>>> newstring =

spacer.

join

(

my_list

)

>>>

newstring

'

G:A:hat:cat

'

>>>

'#'.

join

(

my_list

)

'

G#A#hat#cat

'Slide28

String Slicing: Exercises(do these yourself in Python shell)

>>> dna = 'CGGTTAATAGGGACTCTC'>>> dna[0]>>> dna[0:3]>>> dna[-1]>>>

dna[-1:-3]>>> dna[-3:-1]

>>>

dna

[0:5]

>>>

dna

[0:5:2]

>>>

dna

[0:5][::-1]

>>>

dna

[0:5][::-2]

>>>

dnaSlide29

Math

Python can do simple math like a calculator.Type the following expressions into an interactive Python session (or the IDE editor), hit the enter/return key (or Run button) and observe the results:2 + 26 – 3

8 / 3.09 * 3

6 ** 2Slide30

Math module

Python does not activate all of its built-in functions when you start it upYou use the “import” command to add modules.Type “import math” to get more advanced mathematics functions. math.sqrt() is a function in the math module. Try this:

import math

m

ath.sqrt

(36)

6.0Slide31

Simple Navigation

Doing some simple file system navigation in Python is unreasonably difficult (uses a module called os)Where am I?>>> import os>>> os.getcwd

()'C:\\Python27‘

What files are in this directory (folder)?

>>>

os.listdir

('.')

['at.py

', 'hello.py

',

'JASPAR-pfm_all.txt

',

'JasparClient.py',

'MA0024.1.pfm',

'my_blast.xml

',

'ros4.py

', 'rosalind_ini5.txt',

'SRR020192.fastq

',

'Test_100.fasta‘]

Change directory

>>>

os.chdir

('/Users/

stu

/Python

')Slide32

NumPy and Arrays

Arrays are like lists, but they contain only numbers, and they have dimensions.NumPy is a Python module that enables array operations. Here is a simple one dimensional array of integers (just like a list):>>> import numpy

as np

>>>

x =

np.array

([42,47,11],

int

)

>>>

x

>>>

array([42, 47, 11])

Software Carpentry

has a nice introduction to

NumPy

arrays:

http://swcarpentry.github.io/python-novice-inflammation/01-numpy.htmlSlide33

2-Dimensional Array

A two dimensional array is like a list of lists, but each row must have the same number of elements.>>> x = np.array( ((11,12,13), (21,22,23), (31,32,33)) ) >>> print x

[ [11 12 13]

[

21 22 23]

[

31 32 33

] ]

Note the nested square brackets

NumPy

has no problem with 3, 4, or more dimensions, but it is annoying to represent as text.Slide34

Matrix Math

Matrices are 2-dimensional arrays.NumPy has linear algebra methods for operations on matrices. These operations require that two matrices be of the same size. Vector additionMatrix subtractionMatrix multiplicationScalar product (dot

product)Cross product

>>> x =

np.array

([3,2])

>>>

y =

np.array

([5,1])

>>>

z = x + y

>>>

z

array

([8, 3]) Slide35

Assignment:

Rosalind Python VillageAll 6 problems (should take you 1-2 hours)Rosalind Python Village:http://rosalind.info/problems/list-view/?location=python-villageSlide36

Summary

Install Python

Data & Variables

Strings

String slicing

String methods

Lists

List methods & list slicing

Math

Arrays