A second step is determining the sequence of the gene or genes determining the phenotype and understanding how the expression of the genes is regulated at the transcriptional level Subsequent steps involve analysis of posttranscriptional events understanding how the genes fit into metabolic ID: 686618
Download Presentation The PPT/PDF document "Molecular Tools Knowing how many genes ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Molecular Tools Slide2
Knowing how many genes determine a phenotype, and where the genes are located, is a first step in understanding the genetic basis of a phenotype
A second step is determining the sequence of the gene, or genes, determining the phenotype and understanding how the expression of the genes is regulated at the transcriptional level
Subsequent steps involve analysis of post-transcriptional events, understanding how the genes fit into metabolic pathways and how these pathways interact with the environment
Steps in Genetic AnalysisSlide3
Barley DNA Sequence
Total sequence is 5,300,000,000 base pairs165 % of human genome Enough characters for 11,000 large novelsExpressed Genes - 60,000,000 base pairs
~ 1% of total sequence, like humans125 large novelsSlide4
Complete genome sequences are coming, but aren't yet available for many plants
The trend is sequencing with multiple applications - e.g. whole genomes, specific targets within genomes, or genotyping by sequencing (GBS)
Even when complete genome sequence information is available for every plant, there will always be reason to study allelic diversity and interactions at specific loci and to compare genome sequences of multiple individuals
Step 2 & SequencingSlide5
Getting DNA
Can be a rate limiting step, unless automated
Cutting the DNA with restriction enzymesReducing complexity
Managing the pieces of DNA in vectors (or alternatives); collections of pieces are maintained in libraries
Selecting DNA targets via
amplification
and/or
hybridization
Determining
nucleotide sequence
of the targeted DNA
Molecular Tools for Step 2Slide6
G
enomic
DNA: Order your kit today!
One-by-one (artisanal) to high-throughput (DNA from seed chips + robotics) Leaf segments to cheek swabs
Key
considerations are
Concentration
Purity
Fragment size
Extracting DNASlide7
cDNA
:
From mRNA to DNA
Making DNA – the cDNA WaySlide8
Primers, adapters, and more
…
~$0.010 per bp
...
Making synthetic DNA: oligonucleotides
Synthetic organisms??? ~ 1million
bp
synthetic so far
….
Slide9
Restriction enzymes make cuts at defined recognition sites in DNAA defense system for bacteria, where they attack and degrade the DNA of attacking bacteriophages
The restriction enzymes are named for the organism from which they were isolated
Harnessed for the task of systematically breaking up DNA into fragments of tractable size and for various polymorphism detection assaysEach enzyme recognizes a particular DNA sequence and cuts in a specified fashion at the sequence Cutting the DNA – Restriction EnzymesSlide10
Cutting the DNA – Restriction EnzymesSlide11
Cutting the DNA – Restriction EnzymesSlide12
An enzyme that has a four-base recognition site will cut approximately every 256 bp (4
4) and more frequently than one with a six base recognition site, which in turn will cut more often than one with an eight base recognition site
Methylation sensitivity:Avoid repetitive DNA in order to focus on coding regionsTarget the epigenome
Cutting the DNA – Restriction EnzymesSlide13
Palindrome recognition sites
– the same sequence is specified when each strand of the double helix is read in the opposite direction
Sit on a potato pan, OtisCigar? Toss it in a can, it is so tragicUFO. tofuGolf? No sir, prefer prison flogFlee to me remote elf
Gnu dungLager, Sir, is regalTuna nut
Cutting the DNA – Restriction EnzymesSlide14
Vectors: The role of the vector is to propagate and maintain the DNA fragments generated by the restriction digestion
Efficiency and simplicity of inserting and retrieving the inserted DNA fragments
A key feature of the cloning vector is size of the DNA fragment insert that it can efficiently and reliably handleExample: the principle of cloning a DNA fragment in a plasmid vector
DNA: Vectors and librariesSlide15
Vector
Insert size(kb)
Plasmid ~ 1
Lambda phage ~ 20 Bacterial Artificial Chromosomes (BAC)
~ 200
Common vectors and approximate insert sizes
DNA Libraries – VectorsSlide16
Libraries
are repositories of DNA fragments cloned in their vectors
or
platform-specific oligonucleotide adapters subsequent use.
Libraries can be classified based
on the cloning vector – e.g. plasmid, BAC
In terms of the source of the cloned DNA fragments – e.g. genomic,
cDNA
In terms of intended use: next generation sequencing (NGS), genotyping by sequencing (GBS)
DNA LibrariesSlide17
Total genomic DNA digested and the fragments cloned into an appropriate vector or system
In principle, this library should consist of samples of all the genomic DNA present in the organism, including both coding and non-coding sequences
Ideally, every copy of every gene (or a portion of every sequence) should be represented somewhere in the genomic libraryThere are strategies for enriching genomic libraries for specific types of sequences and removing specific types of sequences – e.g. favoring unique vs. highly repetitive sequences
DNA Libraries – GenomicSlide18
A cDNA (complementary DNA) library is generated from mRNA transcripts, using the enzyme
reverse transcriptase, which creates a DNA complement to a mRNA template
The cDNA library is based on mRNA: therefore the library will represent only the genes that are expressed in the tissue and/or developmental stage that was sampled
DNA Libraries – cDNASlide19
Invented by K.B Mullis in 1983
Allows in vitro amplification of ANY DNA sequence in large numbersDNA Amplification: Polymerase Chain Reaction (PCR)
https://www.youtube.com/watch?v=2KoLnIwoZKUSlide20
Design of two single stranded oligonucleotide primers complementary to motifs on the template DNA.
DNA Amplification: Polymerase Chain Reaction (PCR)Slide21
A Polymerase extends the 3’ end of the primer sequence using the DNA strand as a template.
DNA Amplification - PCRSlide22
The PCR reaction
has the following steps: Denaturing: raising the temperature to 94 C to make DNA single stranded
Annealing: lowering the temperature to 35 – 65 C the primers bind to the target sequences on the template DNAElongation: DNA polymerase extends the 3’ ends of the primer sequence. Temperature must be optimal for DNA polymerase activity.DNA Amplification - PCR PrinciplesSlide23
Each cycle can be repeated multiple times if the 3’ end of the primer is facing the target amplicon. The reaction is typically repeated 25-50 cycles.
Each cycle generates exponential numbers of DNA fragments that are identical copies of the original DNA strand between the two binding sites.
The PCR reaction consists of:A bufferDNA polymerase (thermostable)Deoxyribonucleotide triphosphates (dNTPs
)Two primers (oligonucleotides)Template DNALabelling as required
DNA Amplification - PCR PrinciplesSlide24
Denaturing
: raising the temperature to 94 C to make DNA single strandedAnnealing: lowering the temperature to 35 – 65 C the primers bind to the target sequences on the template DNA
Elongation: DNA polymerase extends the 3’ ends of the primer sequence. Temperature must be optimal for DNA polymerase activity.DNA Amplification - PCR PrinciplesSlide25
The choice of what DNA will be amplified by the polymerase is determined by the
primers (short pieces of synthesized DNA - oligonucleotides
) that prime the polymerase reactionThe DNA between the primers is amplified by the polymerase: in subsequent reactions the original template, plus the newly amplified fragments, serve as templatesSteps in the reaction include denaturing
the target DNA to make it single-stranded, addition of the single stranded oligonucleotides, hybridization of the primers to the template, and primer extension
DNA Amplification - PCR PrimingSlide26
The PCR process is repeated as necessary until the target fragment is sufficiently amplified that it can be isolated, visualized, and/or manipulated
A key component of PCR is a thermostable
polymerase, such as TAQ polymerasePCR can be used to amplify rare fragments from a pool of DNA, generate an abundance of a particular fragment from a single copy from a small sample (e.g. fossil DNA), generate samples of all DNA in a genome, and it is the foundation for many types of molecular markers
DNA Amplification - PCR Applicationhttps://www.youtube.com/watch?v=2KoLnIwoZKUSlide27
Single strand nucleic acids have a natural tendency to find and pair with other single strand nucleic acids with a complementary sequence
An application of this affinity is to label one single strand with a tag – radioactivity and fluorescent dyes are often used - and then to use this probe to find complementary sequences in a population of single stranded nucleic acids
For example, if you have a cloned gene – either a cDNA or a genomic clone - you could use this as a probe to look for a homologous sequence in another DNA sample DNA HybridizationSlide28
By denaturing the DNA in the sample, and using your labeled single stranded probe you can search the sample for the complementary sequence
Pairing of probe and sample can be visualized by the label – e.g. on X-ray film or by measuring fluorescenceThe principle of hybridization can be applied to pairing events involving DNA: DNA; DNA: RNA; and protein: antibody
DNA Hybridization
Southern blotNorthern blotWestern blotSlide29
Advances in technology have removed the technical obstacles to determining the
nucleotide sequence of a
gene, a chromosome region, or a whole genomeThe starting point for any sequencing project – be it of a single cloned fragment or of an entire genome - is a defined fragment of DNA
Sequencing the DNASlide30Slide31
Start with a defined fragment of DNA
Based on this template, generate a population of molecules differing in size by one base of known composition
Fractionate the population molecules based on size
The base at the truncated end of each of the fractionated molecules is determined and used to establish the nucleotide sequence
Sanger DNA Sequencing (old but still relevant)Slide32
A dideoxy nucleotide lacks a 3' OH and once incorporated, it will terminate strand synthesis. L-1. No free 3' OH
Sanger Sequencing -
ddNTPsSlide33
deoxinucleotyde
(
dNTP
)dideoxinucleotyde (ddNTP)
Buffer
DNA polymerase
dNTPs
Labeled primer
Target DNA
ddGTP
ddATP
ddTTP
ddCTP
Decoding DNA – Sanger Sequencing
https://
www.dnalc.org/view/15923-Cycle-sequencing.htmlSlide34
Next Generation Sequencing - Illumina
https://www.youtube.com/watch?v=womKfikWlxM
https://www.illumina.com/technology/next-generation-sequencing.htmlSlide35
PAC Biohttps://www.youtube.com/watch?v=
v8p4ph2MAvI
Single Molecule Real Time SequencingSlide36
Sequencing considerations
Method
Read lengthAccuracyReads per runTime per runCost per 1 million bases (in US$)Advantages
DisadvantagesSingle-molecule real-time sequencing (Pacific Bio)5,500 bp to 8,500 bp avg (10,000 bp); maximum read length >30,000
bases99.999% consensus accuracy; 87% single-read accuracy
50,000 per SMRT cell, or ~400
megabases
30 minutes to 2
hours
$0.33–$1.00
Longest read length. Fast.
Moderate throughput. Equipment can be very expensive.
Sequencing by synthesis (
Illumina)50 to 300 bp98%
up to 3 billion
1 to 10 days, depending upon sequencer and specified read length
$0.05 to $0.15Potential for high sequence yield, depending upon sequencer model and desired application.
Equipment can be very expensive. Requires high concentrations of DNA.
Chain termination (
Sanger
sequencing)
400 to 900 bp
99.9%
N/A
20 minutes to 3 hours
$2400
Long individual reads.
More expensive and impractical for larger sequencing projects.Slide37
Arabidopsis
thaliana has the smallest genome known in the plant kingdom
(135 Mb) and for this reason has become a favorite of plant molecular biologistsPsilotum
nudum (the "whisk fern") is a far simpler plant than Arabidopsis (it has no true leaves, flowers, or fruit) and has a genome size is 2.5 x 1011 Mb
Dealing with the C value paradox and whole genome sequencing…technology, time, and $
Genome Size and whole genome sequencing Slide38
Technologies for whole genome sequencing are evolving very rapidly and too fast for us to compare and contrast in this class
Key considerations
Cost Speed Read length Assembly
Sequencing Developments
https://nanoporetech.com/Slide39
RNA seq Target the transcriptome rather than the genome
http://rnaseq.uoregon.edu/https://
www.illumina.com/techniques/sequencing/rna-sequencing.htmlSlide40
OSU Sequencing ResourcesSlide41
Sequencing a plant genome Slide42
OSU in the lead with plant genome sequencing
And the Beaver too Slide43
Fragaria vesca
Herbaceous, perennial2n=2x=14 240
MbReference species for RosaceaeGenetic resources Forward and reverse genetics
Fragaria x
ananassa: 2n=8x=56. The youngest crop? 250 years. Why sequence the “weed” when you could sequence the crop??? Slide44
Short reads
No physical referenceDe novo assembly
Open sourceWould they follow the same path in 2017? Slide45
Genome Sequencing and Assembly
3 next-gen platforms (new in the 20-10’s – old school in the 20-teens)
X39 coverage (number of reads including a given nucleotide) Contigs (overlapping reads) assembled into scaffolds (contigs + gaps)Slide46
Genome Sequencing and Assembly
~ 3,200 scaffolds N50 of 1.3 Mb (weighted average length)Over 95% (209.8 Mb) of total sequence is represented in
272 scaffoldsFlow cytometry to measure genome size: ~240 Mb Why is this > than the number of Mb sequenced (209)? Slide47
Anchoring the genome sequence to the genetic map
94% of scaffolds anchored
to the diploid Fragaria reference linkage map using 390 genetic markersPseudochromosomes ~ linkage groups ~ karyotype
Of what use is a linkage map when you can have a whole genome sequence? Slide48
Synteny
Prunus
and F. vesca
Homologs Orthologs ParalogsNice picture, but of what use? Slide49
The small genome size (240 Mb)Absence of large genome duplications Limited numbers of transposable elements, compared to other angiosperms -
the driver of small genome size? Slide50
Transcriptome sequence (cDNAs)
Organ specificity
Fruits and roots – different types of genes
“host gene deserts” Slide51
Gene prediction 34,809 nuclear genesflavor, nutritional value and flowering time
1,616 transcription factorsRNA genes 569
tRNA, 177 rRNA, 111 spliceosomal RNAs, 168 small nuclear RNAs,76 micro RNA and 24 other RNAs Are these results expected? Chloroplast genome155,691 bp encodes 78 proteins, 30 tRNAs
and 4 rRNA genesEvidence of DNA transfer from plastid genome to the nuclear genomeGenetics accounting Slide52
Strawberry unique gene clusters
Utility in practical
horticulture? Acknowledgements: Thanks to Merve Sekerli for the slide preparation