/
Bienvenido  Vélez UPR Mayaguez Bienvenido  Vélez UPR Mayaguez

Bienvenido Vélez UPR Mayaguez - PowerPoint Presentation

DateMeDarling
DateMeDarling . @DateMeDarling
Follow
342 views
Uploaded On 2022-08-04

Bienvenido Vélez UPR Mayaguez - PPT Presentation

Using Molecular Biology to Teach Computer Science 1 These materials were developed with funding from the US National Institutes of Health grant 2T36 GM008789 to the Pittsburgh Supercomputing Center ID: 935780

dna pattern pittsburgh list pattern dna list pittsburgh institutes supercomputing gm008789 2t36 grant health national funding developed materials key

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Bienvenido Vélez UPR Mayaguez" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Bienvenido VélezUPR Mayaguez

Using Molecular Biology to Teach Computer Science

1

These materials were developed with funding from the US National Institutes of Health grant #2T36 GM008789 to the Pittsburgh Supercomputing Center

BING

6004: Intro to Computational

BioEngineering

Spring

2016

Lecture

3: Container Objects

Slide2

Essential Computing for Bioinformatics

Slide3

OutlineTop-Down DesignLists and Other SequencesDictionaries and Sequence TranslationFinding ORF's in sequences3These materials were developed with funding from the US National Institutes of Health grant #2T36 GM008789 to the Pittsburgh Supercomputing Center

Slide4

Finding Patterns Within SequencesThese materials were developed with funding from the US National Institutes of Health grant #2T36 GM008789 to the Pittsburgh Supercomputing Center4from string import *def searchPattern(dna, pattern): 'print all start positions of a pattern string inside a target string

' site = find (dna, pattern) while site != -1: print 'pattern %s found at position %

d' % (pattern, site) site = find (dna, pattern, site + 1)

Example

from:

Pasteur Institute Bioinformatics Using

Python

>>>

searchPattern("acgctaggct","gc

")

Slide5

HomeworkExtend searchPattern to handle unknown residues

Slide6

Lecture 2 Homework: Finding Patterns Within Sequencesfrom string import *def searchPattern(dna, pattern): 'print all start positions of a pattern string inside a target string' site = findDNAPattern (dna, pattern) while site != -1: print 'pattern

%s found at position %d' % (pattern, site) site = findDNApattern (dna, pattern, site + 1)

Example from Pasteur Institute Bioinformatics Using Python

>>>

searchPattern

(

'

acgctaggct

'

,

'

gc

'

)

Slide7

Lecture 2 Homework: One Approachdef findDNAPattern(dna, pattern,startPosition, endPosition): 'Finds

the index of the first occurrence of DNA pattern within DNA sequence'

dna =

dna.lower() # Force sequence and pattern to lower case pattern =

pattern.lower

()

for

i

in

xrange(startPosition

,

endPosition

):

# Attempt to match

pattern

starting at position

i

if (

matchDNAPattern

(dna[i:],pattern

)):

return

i

return -1

Write your own find function:

Top-Down Design:

From

BIG

functions to small helper functions

7

These materials were developed with funding from the US National Institutes of Health grant #2T36 GM008789 to the Pittsburgh Supercomputing Center

Slide8

Lecture 2 Homework: One Approachdef matchDNAPattern(sequence, pattern): 'Determines if DNA pattern is a prefix of DNA sequence'

i = 0 while ((i

< len(pattern)) and (

i < len(sequence))):

if (not

matchDNANucleotides

(sequence[i

],

pattern[i

])):

return False

i

=

i

+ 1

return (

i

==

len(pattern

))

Write your own find function:

Top-Down Design:

From

BIG

functions to

small

helper functions

8These materials were developed with funding from the US National Institutes of Health grant #2T36 GM008789 to the Pittsburgh Supercomputing Center

Slide9

Lecture 2 Homework: One Approachdef matchDNANucleotides(base1, base2): 'Returns True is nucleotide bases are equal or one of them is unknown'

return (base1 == 'x'

or base2 == '

x' or

(

isDNANucleotide

(base1) and (base1 == base2)))

Write your own find function:

Top-Down Design:

From

BIG

functions to

small

helper functions

9

These materials were developed with funding from the US National Institutes of Health grant #2T36 GM008789 to the Pittsburgh Supercomputing Center

Slide10

Lecture 2 Homework: One Approachdef findDNAPattern(dna, pattern,startPosition=0, endPosition

=None): 'Finds

the index of the first ocurrence of DNA pattern within DNA

sequence' if (

endPosition

== None):

endPosition

=

len

(

dna

)

dna

=

dna.lower

() # Force sequence and pattern to lower case

pattern =

pattern.lower

()

for

i

in

xrange

(

startPosition

,

endPosition): # Attempt to match patter starting at position i

if (matchDNAPattern(dna[i

:],pattern)): return i

return -1

Using default parameters:

10

These materials were developed with funding from the US National Institutes of Health grant #2T36 GM008789 to the Pittsburgh Supercomputing Center

Slide11

Top Down Design: A Recursive ProcessStart with a high level problemDesign a high-level function assuming existence of ideal lower level functions that it needsRecursively design each lower level function top-down11These materials were developed with funding from the US National Institutes of Health grant #2T36 GM008789 to the Pittsburgh Supercomputing Center

Slide12

List Values[10, 20, 30, 40]['spam', 'bungee', 'swallow']['hello',

2.0, 5, [10, 20]][]

Lists can be heterogeneousand nested

The empty list

Homogeneous

Lists

12

These materials were developed with funding from the US National Institutes of Health grant #2T36 GM008789 to the Pittsburgh Supercomputing Center

Slide13

Generating Integer Lists>>> range(1,5)[1, 2, 3, 4]>>> range(10)[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]>>> range(1, 10, 2)[1, 3, 5, 7, 9]

In Generalrange(first,last+1,step)13

These materials were developed with funding from the US National Institutes of Health grant #2T36 GM008789 to the Pittsburgh Supercomputing Center

Slide14

Accessing List Elements>> words=['hello', 'my', 'friend']>> words[1]'my'

>> words[1:3]['my', 'friend

']>> words[-1]'

friend'>> 'friend

'

in words

True

>> words[0] =

'

goodbye

'

>> print words

[

'

goodbye

'

,

'

my

'

, 'friend'

]slices

single element

negative

index

Testing

List membership

Lists are

mutable

14

These materials were developed with funding from the US National Institutes of Health grant #2T36 GM008789 to the Pittsburgh Supercomputing Center

Slide15

More List SlicesSlicing operator always returns a NEW list>> numbers = range(1,5)>> numbers[1:][1, 2, 3, 4]>> numbers[:3][1, 2]>> numbers[:][1, 2, 3, 4]

15These materials were developed with funding from the US National Institutes of Health grant #2T36 GM008789 to the Pittsburgh Supercomputing Center

Slide16

Modifying Slices of Lists>>> list = ['a', 'b', 'c', 'd', 'e',

'f']>>> list[1:3] = ['x

', 'y']

>>> print list['a

'

,

'

x

'

,

'

y

'

,

'

d

'

,

'

e

'

, 'f']>>> list[1:3] = []

>>> print list['a', 'd', '

e', 'f'

]

>>> list =

[

'a', '

d', 'f']

>>> list[1:1] = ['b', 'c']

>>> print list['a'

,

'

b

'

,

'

c

'

,

'

d

'

,

'

f

'

]

>>> list[4:4] =

[

'

e

'

]

>>> print list

[

'

a

'

,

'

b

'

,

'

c

'

,

'

d

'

,

'

e

'

,

'

f

'

]

Inserting

slices

Deleting

slices

Replacing

slices

16

These materials were developed with funding from the US National Institutes of Health grant #2T36 GM008789 to the Pittsburgh Supercomputing Center

Slide17

Traversing Lists ( 2 WAYS)for codon in codons: print codoni = 0while (i <

len(codons)): codon = codons[i

] print codon

i =

i

+ 1

Which one do you prefer? Why?

Why does Python provide both

for

and

while

?

codons =

[

'

cac

'

,

'

caa

'

,

'

ggg

'

]

17

These materials were developed with funding from the US National Institutes of Health grant #2T36 GM008789 to the Pittsburgh Supercomputing Center

Slide18

def stringToList(theString)

Slide19

Complementing Sequences: UtilitiesDNANucleotides='acgt'DNAComplements='tgca'

def isDNANucleotide(nucleotide)

Slide20

Complementing Sequencesdef getComplementDNANucleotide(n)

Slide21

Complementing a List of Sequencesdef getComplementDNASequences(sequences)

Slide22

Python Sequence TypesThese materials were developed with funding from the US National Institutes of Health grant #2T36 GM008789 to the Pittsburgh Supercomputing Center22Type Description Elements MutableStringType Character string Characters only noUnicodeType Unicode character string Unicode characters only noListType List Arbitrary objects yesTupleType

Immutable List Arbitrary objects noXRangeType return by xrange() Integers noBufferType Buffer return by buffer() arbitrary objects of one type yes/no

Slide23

Operations on SequencesOperator/Function Action Action on Numbers[ ], ( ), ' ' creations + t concatenation additions * n repetition n times multiplications[i] indexations[i:k

] slicex in s membershipx not in s absence

for a in s traversallen(s) lengthmin(s

) return smallest elementmax(s) return greatest element

23

These materials were developed with funding from the US National Institutes of Health grant #2T36 GM008789 to the Pittsburgh Supercomputing Center

Slide24

ExercisesReturn the list of codons in a DNA sequence for a given reading frameReturn the lists of restriction sites for an enzyme in a DNA sequenceReturn the list of restriction sites for a list of enzymes in a DNA sequenceFind all the ORF's of length >= n in a sequenceDesign and implement Python functions to satisfy the following contracts:

24These materials were developed with funding from the US National Institutes of Health grant #2T36 GM008789 to the Pittsburgh Supercomputing Center

Slide25

DictionariesDictionaries are mutable unordered collections which may contain objects of different sorts. The objects can be accessed using a key.25These materials were developed with funding from the US National Institutes of Health grant #2T36 GM008789 to the Pittsburgh Supercomputing Center

Slide26

Molecular Masses As Python Dictionary# Molecular mass of each DNA nucleotide in g/mol

MolecularMass =

{

'a'

:

491.2,

'

c

'

:

467.2,

'

g

'

:

507.2,

'

t

'

:

482.2

}

26

These materials were developed with funding from the US National Institutes of Health grant #2T36 GM008789 to the Pittsburgh Supercomputing Center

d

ef

molecularMass

(s):

'

Returns

the molecular mass of sequence

s

'

if

isDNASequence

(s):

totalMass

= 0

for base in s:

totalMass

=

totalMass

+

MolecularMass

[base]

return

totalMass

else

raise

Exception

(

'

molecularMass

:

Invalid DNA

base

'

)

Slide27

Genetic Code As Python DictionaryGeneticCode = { 'ttt

': 'F

',

'tct'

:

'

S

'

,

'

tat

'

:

'

Y

'

,

'

tgt

'

:

'

C

'

,

'

ttc

'

: '

F'

,

'

tcc

'

:

'

S

'

,

'

tac

'

:

'

Y

'

,

'

tgc

'

:

'

C

'

,

'

tta

'

:

'

L

'

,

'

tca

'

:

'

S

'

,

'

taa

'

:

'

*

'

,

'

tga

'

:

'

*

'

,

'

ttg

'

:

'

L

'

,

'

tcg

'

:

'

S

'

,

'

tag

'

:

'

*

'

,

'

tgg

'

:

'

W

'

,

'

ctt

'

:

'

L

'

,

'

cct

'

:

'

P

'

,

'

cat

'

:

'

H

'

,

'cgt': 'R', 'ctc': 'L', 'ccc': 'P', 'cac': 'H', 'cgc': 'R', 'cta': 'L', 'cca': 'P', 'caa': 'Q', 'cga': 'R', 'ctg': 'L', 'ccg': 'P', 'cag': 'Q', 'cgg': 'R', 'att': 'I', 'act': 'T', 'aat': 'N', 'agt': 'S', 'atc': 'I', 'acc': 'T', 'aac': 'N', 'agc': 'S', 'ata': 'I', 'aca': 'T', 'aaa': 'K', 'aga': 'R', 'atg': 'M', 'acg': 'T', 'aag': 'K', 'agg': 'R', 'gtt': 'V', 'gct': 'A', 'gat': 'D', 'ggt': 'G', 'gtc': 'V', 'gcc': 'A', 'gac': 'D', 'ggc': 'G', 'gta': 'V', 'gca': 'A', 'gaa': 'E', 'gga': 'G', 'gtg': 'V', 'gcg': 'A', 'gag': 'E', 'ggg': 'G' }

27

These materials were developed with funding from the US National Institutes of Health grant #2T36 GM008789 to the Pittsburgh Supercomputing Center

Slide28

A Test DNA Sequencecds ='''atgagtgaacgtctgagcattaccccgctggggccgtatatcggcgcacaaa

Slide29

CDS Sequence -> Protein Sequencedef translateDNASequence(dna)

Slide30

Dictionary Methods and OperationsMethod or Operation

Action

d[key]

Get the value of the entry with key key in d

d[key] = val

Set the value of entry with key key to val

del d[key]

Delete entry with key key

d.clear()

Removes all entries

len(d)

Number of items

d.copy()

Makes a shallow copya

d.has_key(key)

Returns 1 if key exists, 0 otherwise

d.keys()

Gives a list of all keys

d.values()

Gives a list of all values

d.items()

Returns a list of all items as tuples (key, value)

d.update(new)

Adds all entries of dictionary new to d

d.get(key

[, otherwise])

Returns value of the entry with key key if it exists

Otherwise returns to otherwise

d.setdefaults(key [, val])

Same as d.get(key), but if key does not exist, sets d[key] to val

d.popitem()

Removes a random item and returns it as

tuple

30

These materials were developed with funding from the US National Institutes of Health grant #2T36 GM008789 to the Pittsburgh Supercomputing Center

Slide31

Finding ORF's def findDNAORFPos(sequence, minLen, startCodon, stopCodon

, startPos, endPos)

Slide32

Extracting the ORFdef extractDNAORF(sequence, minLen, startCodon, stopCodon, startPos

, endPos)

Slide33

HomeworkDesign an ORF extractor to return the list of all ORF's within a sequence together with their positions33These materials were developed with funding from the US National Institutes of Health grant #2T36 GM008789 to the Pittsburgh Supercomputing Center

Slide34

Next TimeHandling files containing sequences and alignments34These materials were developed with funding from the US National Institutes of Health grant #2T36 GM008789 to the Pittsburgh Supercomputing Center