Karsten Hokamp PhD TCD Bioinformatics Support Team TCD 26082015 Overview Programming First Python scriptprogram Why Python Bioinformatics examples Additional resources Outlook What is programming and why bother ID: 933388
Download Presentation The PPT/PDF document "A Brief Introduction to Scientific Progr..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
A Brief Introduction to Scientific Programming with Python
Karsten Hokamp, PhDTCD Bioinformatics Support Team
TCD, 26/08/2015
Slide2Overview
ProgrammingFirst Python script/program
Why
Python
?Bioinformatics examplesAdditional resourcesOutlook
Slide3What is programming and why bother?
Data processingAutomation
Combination of programs for analysis pipelines
More control and flexibility
Better understanding of how programs work
Slide4Programming Concepts
Turn into a very meticulous problem solverBreak problems into small details
Keep it variable
Give very precise instructions
Slide5Programming Concepts
"human" recipe
Slide6Programming Concepts
"
computerised
" recipe
Slide7Mac for Windows users
The main differences:cmd instead of ctrl (e.g. cmd-C for copying)
r
ight-click mouse: ctrl-click
# character: alt-3s
witch between applications: cmd-tab
Spotlight (top right) for finding files/programs
Apple symbol (top left) for logging out
Slide8IDLE: Integrated D
eveLopment Environment
o
pen through Spotlight
Slide9IDLE: Integrated D
eveLopment Environment
Slide10IDLE: Integrated D
eveLopment Environment
Alternatively: open through Finder
Slide11IDLE: Integrated D
eveLopment Environment
i
nteractive Python console
Slide12IDLE: Integrated D
eveLopment Environment
s
imple Python statement
Slide13IDLE: Integrated D
eveLopment Environment
u
ser input
output
Slide14IDLE:
Integrated DeveLopment
E
nvironmentt
ry a few simple
n
umeric operations
u
ser input
output
Slide15IDLE: Integrated D
eveLopment Environment
r
epeat/combine
previous commands by clicking into
them and hitting return
(use left/right arrows
and delete to edit them)
Slide16IDLE: Integrated D
eveLopment Environment
Console
vs
Editor
Console
Editor
interactive
requires extra click for running
great for trying out code
additional IDLE
functionality
not suited for long
scripts
suited for long scripts
no saving of code
allows to save code
Slide17IDLE: Writing Python Scripts
o
pen a new file
Slide18IDLE:
Writing Python Scripts
write some code
Slide19IDLE: Writing Python Scripts
run your code shortcut: F5
Slide20IDLE: Writing Python Scripts
save file first
Slide21IDLE:
Writing Python Scripts
specify a file name
Slide22IDLE: Writing Python Scripts
write more code
IDLE provides help
Slide23IDLE: Writing Python Scripts
s
ave and run:
cmd
-S then F5
Slide24IDLE: Writing Python Scripts
make it personal
Slide25IDLE: Writing Python Scripts
keep going
Slide26Python vs Perl
the equivalent
i
n Perl
Slide27Python vs Perl
the equivalenti
n Perl
Slide28Python vs Perl
f
ewer special characters
i
ndentation enforced
more user-friendly functions
Python
Perl
Slide29Why Python?
easy to learn great for beginners
e
nforces clean coding
great for teacherscomes with IDE
avoids command-line usage
o
bject-orientated
code reuse and recycling
very popular
many peers
B
ioPython
many bioinformatics modules
Slide30Simple Bioinformatics Example
built-in function '
len
'
Slide31Simple Bioinformatics Example
built-in function 'set'
Slide32Simple Bioinformatics Example
built-in functions 'sorted' and 'set'
Slide33Simple Bioinformatics Example
string method 'count'
Slide34Simple Bioinformatics Example
string method 'upper'
Slide35Basic sequence manipulation
Fetch records from databasesMultiple sequence alignment (Clustal, Muscle)
Sequence similarity search (Blast)
Working with motifs:
MEME, Jaspar, Transfac
Phylogenetics
Clustering
Visualisation
Slide36Parsing
GenBank records:
from Bio import
SeqIO
record
=
SeqIO.read
("
AE014613.1.gb
"
, "
genbank
"
)
record.description
'
Salmonella
enterica
subsp.
enterica
serovar
Typhi
Ty2, complete genome.
'
len
(
record.features
)
9086
Slide37Parsing sequence records:
from Bio import
SeqIO
for entry
in
SeqIO.parse
("tlr4_protein.fa", "
fasta
")
:
print(
entry.description
)
print(
len
(entry)
, '
bp
'
)
gi
|765368240|gb|AJR32867.1| TLR4 [Gallus
gallus
]
843
bp
gi
|111414439|gb|ABH09759.1| toll-like receptor 4 [
Bos
taurus
]
841
bp
gi
|6175873|gb|AAF05316.1|AF177765_1 toll-like receptor 4 [Homo sapiens
]
839
bp
…
Slide38Graphics:
Chromosomes
colour
-coded by GC content (Bioinformatics with Python Cookbook)
Slide39Graphics:
C
oloured
phylogenetic tree from Ebola sequences (Bioinformatics with Python Cookbook)
Slide40Additional Resources
https://
store.continuum.io
/
cshop/anaconda/
Slide41Visualisations with Matplotlib
http://
matplotlib.org
/
gallery.html
Slide42Examples
http://
scikit-learn.org
Slide43Scikit-learn – Machine Learning in Python
Machine Learning: PCA of Iris data set
http://scikit-learn.org/stable/auto_examples/decomposition/
plot_pca_iris.html
Slide44Python Help
Slide45Online courses
http://biopython.org/DIST/docs/tutorial/
Tutorial.html
http://
dowell.colorado.edu/education-python.html
http://www.pasteur.fr/formation/infobio/
python
https://
www.codecademy.com
/tracks/python
http
://anh.cs.luc.edu/python/hands-on
/
https://
www.coursera.org
Slide46Books
Slide47Conclusions
You have been briefly introduced to Python and IDLE.You have learnt about programming concepts.
You have seen examples of what can be accomplished through Python.
Topics of an extensive Python course:
Coding in Python – variables, scope, functions…
Bioinformatics with
BioPython
Automated biological data analysis –
your interests!
Slide48Thank You!
http://bioinf.gen.tcd.ie
/
workshops/python
Slide49Don't forget to log out!