Karthik Jagadeesh Bo Yoo 28 September 201 6 Q What is your genome Q What is your genome A The sum of your hereditary information Human Genome 3 billion base pairs ATGC ID: 528306
Download Presentation The PPT/PDF document "A Zero-Knowledge Based Introduction to B..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
A Zero-Knowledge Based Introduction to Biology
Bo
Yoo
January 09, 2020Slide2
Announcements
Please sign up for PiazzaCA Office Hours (starting Monday 1/13):Mondays 2PM-4PM in Beckman B383 (third floor)
Beckman center access through main gate onlySlide3
Announcements
Attendance will be taken starting next Tuesday (1/14) lectureThere will be a sign-up sheet by the doorYou get two free absences
Attendance is 5% of your final gradeSlide4
Announcements
Homework 1 will be released next Thursday (1/16)Due 11:59PM 2/4 (via email)You have 3 late days (can use on homework only)
Read the instructions carefully (what files to submit etc.)Post questions on Piazza instead of emailing usInclude question number on the subject line 3 questions All three questions will require UCSC Genome Browser (tutorial 1/16) Slide5
Q: What is your genome?Slide6
Q: What is your genome?
A:
The sum of your hereditary information.Slide7
DNA: “
Blueprints
” for a cell
Genetic information encoded in long strings
Deoxyribonucleic acid
(DNA)
comes in four
bases
: adenine
(A)
, thymine
(T)
, guanine
(G)
, and cytosine
(C)Slide8
Nucleobase Complementary Pairing
Adenine (A)
Cytosine (C)
Guanine (G)
Thymine (T)
pyrimidines
purinesSlide9
DNA Double HelixSlide10
DNA PackagingSlide11
From DNA to Organism
You are composed of ~ 10 trillion cellsSlide12
From DNA to
Organism
CellSlide13
From DNA to
Organism
Cell Protein
Proteins do most of the work in biologySlide14
Human Genome3 billion base pairs: A,T,G,C
Full DNA sequence in virtually all cells DNA is the blueprint for life:Cookbook with many “recipes” for proteins - genes
Proteins do most of the work in biologySlide15
Protein coding genes
In human: set of 20-25K genes that eventually become translated to proteins The number of genes differ by species!
Seemingly less complex organisms may have large number of genesE.g. Human (20-25k genes) vs. Rice (51k genes)How are proteins made from DNA? Slide16
Central Dogma of BiologySlide17
Gene Transcription
DNA -> RNA Slide18
DNA
(
Deoxyribonucleic acid)
vs RNA (ribonucleic acid)
Deoxyribose in DNA Ribose in RNASlide19
RNA Nucleobases
Adenine (A)
Cytosine (C)
Guanine (G)
Uracil (U)
pyrimidines
purinesSlide20
5 prime and 3 primeSlide21
Genes are transcribed from the template strandSlide22
Gene Transcription
(DNA -> RNA)
3’
5’
5’
3’
G A T T A C A . . .
C T A A T G T . . .Slide23
Gene Transcription
3’
5’
5’
3’
G A T T A C A . . .
C T A A T G T . . .Slide24
Gene Transcription
3’
5’
5’
3’
G A T T A C A . . .
C T A A T G T . . .
Strands are separated (DNA helicase)Slide25
Gene Transcription
3’
5’
5’
3’
G A T T A C A . . .
C T A A T G T . . .
G A U U A C A
An RNA copy of the 5’→3’ sequence is created from the 3’→5’ templateSlide26
Gene Transcription
3’
5’
5’
3’
G A U U A C A . . .
G A T T A C A . . .
C T A A T G T . . .
pre-mRNA
5’
3’Slide27
Genes can be found on both strands
Coding and template strands are relative to the gene
A gene can be on the minus strand
In general genomic sequence are written in 5’->3’
3’
5’
5’
3’
G A T T A C A
C T A A T G T Slide28
RNA Processing
5’ cap
poly(A) tail
intron
exon
mRNA
5’ UTR
3’ UTRSlide29
Gene Structure
5’
3’
promoter
5’ UTR
exons
3’ UTR
introns
coding
non-codingSlide30
Gene Translation
RNA -> Protein Slide31
From RNA to Protein
Proteins are long strings of amino acids joined by peptide bonds
Translation from RNA sequence to amino acid sequence performed by ribosomes
20 amino acids
→
3 RNA letters required to specify a single amino acid
(codons)Slide32
Amino Acid
Alanine
Arginine
Asparagine
Aspartate
Cysteine
Glutamate
Glutamine
Glycine
Histidine
Isoleucine
Leucine
Lysine
Methionine
Phenylalanine
Proline
Serine
Threonine
Tryptophan
Tyrosine
Valine
There are 20 standard amino acidsSlide33
TranslationSlide34
Translation
5’
. . . A U U A U G G C C U G G A C U U G A . . .
3’
UTR
Met
Start Codon
Ala
Trp
Thr
Stop CodonSlide35
Translation
The ribosome (a complex of protein and RNA) synthesizes a protein by reading the mRNA in triplets (codons). Each codon is translated to an amino acid. Slide36
Central Dogma of BiologySlide37
Most of Our Genome Do Not Code for Proteins!Slide38
What does the rest of the genome do?
3 billion base pairs in our genome1-2% coding (codes for proteins)10-20% regulatory
These regulatory elements give rise to differentiation1 million Regulatory elements (switches) enable:Precise control for turning genes on/offDiverse cell types (lung, heart, skin)Analogy: Making specific recipes (genes) for a full meal from a large cookbook (genome) at a given timeSlide39
Gene Expression Regulation
Determines when
each gene
should
be expressed
Why? Every cell has
same DNA
but each cell expresses
different proteins
.
Signal transduction: One signal converted to another: cascade has “master regulators” turning on many proteins, which in turn each turn on many proteinsSlide40
Different Cell Types
Subsets of the DNA sequence determine the identity and function of different cellsSlide41
Regulatory Elements
Expression Modulated by Regulatory elementsEnhancer, Promoters, SilencersRegulates transcription (DNA -> RNA) of a gene
CS analogy:Genes are like variable assignments (a = 7)Regulatory elements are control flow, complex logicSlide42
Regulatory Elements
Transcription factors (TFs):Proteins that recognize sequence motifs in enhancers, promoters
Combinatorial switches that turn genes on/offComplex assists or inhibits formation of the RNA polymerase machinerySlide43
Transcription Factor Binding Sites
Short, degenerate DNA sequences recognized by particular transcription factors
For complex organisms, cooperative binding of multiple transcription factors required to initiate transcription
Binding Sequence LogoSlide44
Signal Transduction
Transcription Factor A
TF A
Binding Site
Gene BSlide45
Repeats
Sequences that repeat many times in the genomeAbout 50% of the genome Slide46
Repeats
Interspersed Repeats (Transposable elements)Using some unknown mechanic to multiply themselves and move around in the genomeSlide47
Repeats
Most repeat events are neutral Most copies are inactive (e.g. 5’ truncation) when they arrive at a new location Makes genome sizes to grow
To make it through generations the repeats must be in the germline cells (eggs and sperms)Slide48
Repeats
Simple repeats Every possible motif of mono-, di, tri- and tetranucleotide repeats is vastly overrepresented in the human genome.
These are called microsatellites,Longer repeating units are called minisatellites,The real long ones are called satellites.
AAAAAAAAA
CACACACAC
CAACAACAASlide49
Still a lot that we don’t know Slide50
Mutations in the Genome
Over our lifetime, our DNA replicates trillions of times with the help of DNA polymerase
But even polymerase is “imperfect”, every now and then (roughly 1 in every 100,000 bp), DNA polymerase makes a mistake in replication resulting in “mutations”There are other sources of mutation, including smoking, sunlight and radiationSlide51
Single Nucleotide ChangesSlide52
Single Nucleotide ChangesSlide53
Mutation:
Structural AbnormalitiesSlide54
How does the genome influence human disease?Slide55
Bejerano Lab
Disease Implications
SHH
MUTATIONS
Brain
Limb
OtherSlide56
Bejerano Lab
Limb Enhancer 1Mb away from Gene
SHH
limb
on
offSlide57
Bejerano Lab
SHH
Enhancer Deletion
limb
DELETE
Limb
on
offSlide58
Bejerano Lab
SHH
Enhancer 1bp Substitution
limb
MUTATIONS
Limb
Lettice et al.
HMG
2003 12: 1725-35
on
off
on
offSlide59
Genome Wide Association Study (GWAS):
80% of GWAS SNPs are noncoding (hard to interpret)Active area of research
Bejerano LabSlide60
Evolution = Mutation + SelectionSlide61
Human Mutation Rate
Recent sequencing analysis suggests ~40-60 new mutations in a child that were not present in either parent.Mutations range from the smallest possible (single base pair change) to the largest – whole genome duplication (to be discussed).
Selection does not tolerate all of these mutation, but it sure does tolerate some.Slide62
Selection
time
Harmful mutation
Beneficial mutation
Neutral
mutationSlide63
Summary
All hereditary information encoded in double-stranded DNA
Each cell in an organism has same DNA
DNA
→
RNA
→
protein
Proteins have many diverse roles in cell
Gene regulation diversifies protein products within different cellsSlide64
Summary
Very small portion of the genome actually code for proteins, a lot of it is repeats and regulatory elements
Mutations and repeats that made into the germline cells gets passed down generations
Evolutions happens through mutations and selection processes