/
Molecular Tools  Knowing how many genes determine a phenotype, and where the genes are Molecular Tools  Knowing how many genes determine a phenotype, and where the genes are

Molecular Tools Knowing how many genes determine a phenotype, and where the genes are - PowerPoint Presentation

tatyana-admore
tatyana-admore . @tatyana-admore
Follow
349 views
Uploaded On 2018-10-08

Molecular Tools Knowing how many genes determine a phenotype, and where the genes are - PPT Presentation

A second step is determining the sequence of the gene or genes determining the phenotype and understanding how the expression of the genes is regulated at the transcriptional level  Subsequent steps involve analysis of posttranscriptional events understanding how the genes fit into metabolic ID: 686618

sequencing dna genome sequence dna sequencing sequence genome single polymerase pcr size amplification fragments genes template libraries restriction sequences

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Molecular Tools Knowing how many genes ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Molecular Tools Slide2

Knowing how many genes determine a phenotype, and where the genes are located, is a first step in understanding the genetic basis of a phenotype

A second step is determining the sequence of the gene, or genes, determining the phenotype and understanding how the expression of the genes is regulated at the transcriptional level 

Subsequent steps involve analysis of post-transcriptional events, understanding how the genes fit into metabolic pathways and how these pathways interact with the environment

Steps in Genetic AnalysisSlide3

Barley DNA Sequence

Total sequence is 5,300,000,000 base pairs165 % of human genome Enough characters for 11,000 large novelsExpressed Genes - 60,000,000 base pairs

~ 1% of total sequence, like humans125 large novelsSlide4

Complete genome sequences are coming, but aren't yet available for many plants  

The trend is sequencing with multiple applications - e.g. whole genomes, specific targets within genomes, or genotyping by sequencing (GBS)

Even when complete genome sequence information is available for every plant, there will always be reason to study allelic diversity and interactions at specific loci and to compare genome sequences of multiple individuals

Step 2 & SequencingSlide5

Getting DNA

Can be a rate limiting step, unless automated

Cutting the DNA with restriction enzymesReducing complexity

Managing the pieces of DNA in vectors (or alternatives); collections of pieces are maintained in libraries

Selecting DNA targets via

amplification

and/or

hybridization

Determining

nucleotide sequence

of the targeted DNA

Molecular Tools for Step 2Slide6

G

enomic

DNA:   Order your kit today!

One-by-one (artisanal) to high-throughput (DNA from seed chips + robotics)  Leaf segments to cheek swabs

Key

considerations are

Concentration

Purity

Fragment size

Extracting DNASlide7

 

cDNA

:

From mRNA to DNA

Making DNA – the cDNA WaySlide8

Primers, adapters, and more

~$0.010 per bp

...

Making synthetic DNA: oligonucleotides

Synthetic organisms??? ~ 1million

bp

synthetic so far

….

Slide9

Restriction enzymes make cuts at defined recognition sites in DNAA defense system for bacteria, where they attack and degrade the DNA of attacking bacteriophages

The restriction enzymes are named for the organism from which they were isolated  

Harnessed for the task of systematically breaking up DNA into fragments of tractable size and for various polymorphism detection assaysEach enzyme recognizes a particular DNA sequence and cuts in a specified fashion at the sequence Cutting the DNA – Restriction EnzymesSlide10

Cutting the DNA – Restriction EnzymesSlide11

Cutting the DNA – Restriction EnzymesSlide12

An enzyme that has a four-base recognition site will cut approximately every 256 bp (4

4) and more frequently than one with a six base recognition site, which in turn will cut more often than one with an eight base recognition site

Methylation sensitivity:Avoid repetitive DNA in order to focus on coding regionsTarget the epigenome

Cutting the DNA – Restriction EnzymesSlide13

Palindrome recognition sites

– the same sequence is specified when each strand of the double helix is read in the opposite direction

Sit on a potato pan, OtisCigar? Toss it in a can, it is so tragicUFO. tofuGolf? No sir, prefer prison flogFlee to me remote elf

Gnu dungLager, Sir, is regalTuna nut

Cutting the DNA – Restriction EnzymesSlide14

Vectors: The role of the vector is to propagate and maintain the DNA fragments generated by the restriction digestion

Efficiency and simplicity of inserting and retrieving the inserted DNA fragments

A key feature of the cloning vector is size of the DNA fragment insert that it can efficiently and reliably handleExample: the principle of cloning a DNA fragment in a plasmid vector

DNA: Vectors and librariesSlide15

Vector 

Insert size(kb) 

Plasmid ~ 1  

Lambda phage ~ 20  Bacterial Artificial Chromosomes (BAC)

~ 200

Common vectors and approximate insert sizes

DNA Libraries – VectorsSlide16

Libraries

are repositories of DNA fragments cloned in their vectors

or

platform-specific oligonucleotide adapters subsequent use.

Libraries can be classified based

on the cloning vector – e.g. plasmid, BAC 

In terms of the source of the cloned DNA fragments – e.g. genomic,

cDNA

In terms of intended use: next generation sequencing (NGS), genotyping by sequencing (GBS)

DNA LibrariesSlide17

Total genomic DNA digested and the fragments cloned into an appropriate vector or system

In principle, this library should consist of samples of all the genomic DNA present in the organism, including both coding and non-coding sequences

Ideally, every copy of every gene (or a portion of every sequence) should be represented somewhere in the genomic libraryThere are strategies for enriching genomic libraries for specific types of sequences and removing specific types of sequences – e.g. favoring unique vs. highly repetitive sequences  

DNA Libraries – GenomicSlide18

A cDNA (complementary DNA) library is generated from mRNA transcripts, using the enzyme

reverse transcriptase, which creates a DNA complement to a mRNA template

The cDNA library is based on mRNA: therefore the library will represent only the genes that are expressed in the tissue and/or developmental stage that was sampled  

DNA Libraries – cDNASlide19

Invented by K.B Mullis in 1983

Allows in vitro amplification of ANY DNA sequence in large numbersDNA Amplification: Polymerase Chain Reaction (PCR)

https://www.youtube.com/watch?v=2KoLnIwoZKUSlide20

Design of two single stranded oligonucleotide primers complementary to motifs on the template DNA.

DNA Amplification: Polymerase Chain Reaction (PCR)Slide21

A Polymerase extends the 3’ end of the primer sequence using the DNA strand as a template.

DNA Amplification - PCRSlide22

The PCR reaction

has the following steps: Denaturing: raising the temperature to 94 C to make DNA single stranded

Annealing: lowering the temperature to 35 – 65 C the primers bind to the target sequences on the template DNAElongation: DNA polymerase extends the 3’ ends of the primer sequence. Temperature must be optimal for DNA polymerase activity.DNA Amplification - PCR PrinciplesSlide23

Each cycle can be repeated multiple times if the 3’ end of the primer is facing the target amplicon. The reaction is typically repeated 25-50 cycles.

Each cycle generates exponential numbers of DNA fragments that are identical copies of the original DNA strand between the two binding sites.

The PCR reaction consists of:A bufferDNA polymerase (thermostable)Deoxyribonucleotide triphosphates (dNTPs

)Two primers (oligonucleotides)Template DNALabelling as required

DNA Amplification - PCR PrinciplesSlide24

Denaturing

: raising the temperature to 94 C to make DNA single strandedAnnealing: lowering the temperature to 35 – 65 C the primers bind to the target sequences on the template DNA

Elongation: DNA polymerase extends the 3’ ends of the primer sequence. Temperature must be optimal for DNA polymerase activity.DNA Amplification - PCR PrinciplesSlide25

The choice of what DNA will be amplified by the polymerase is determined by the

primers (short pieces of synthesized DNA - oligonucleotides

) that prime the polymerase reactionThe DNA between the primers is amplified by the polymerase: in subsequent reactions the original template, plus the newly amplified fragments, serve as templatesSteps in the reaction include denaturing

the target DNA to make it single-stranded, addition of the single stranded oligonucleotides, hybridization of the primers to the template, and primer extension 

DNA Amplification - PCR PrimingSlide26

The PCR process is repeated as necessary until the target fragment is sufficiently amplified that it can be isolated, visualized, and/or manipulated

A key component of PCR is a thermostable

polymerase, such as TAQ polymerasePCR can be used to amplify rare fragments from a pool of DNA, generate an abundance of a particular fragment from a single copy from a small sample (e.g. fossil DNA), generate samples of all DNA in a genome, and it is the foundation for many types of molecular markers

DNA Amplification - PCR Applicationhttps://www.youtube.com/watch?v=2KoLnIwoZKUSlide27

Single strand nucleic acids have a natural tendency to find and pair with other single strand nucleic acids with a complementary sequence

An application of this affinity is to label one single strand with a tag – radioactivity and fluorescent dyes are often used - and then to use this probe to find complementary sequences in a population of single stranded nucleic acids

For example, if you have a cloned gene – either a cDNA or a genomic clone - you could use this as a probe to look for a homologous sequence in another DNA sample  DNA HybridizationSlide28

By denaturing the DNA in the sample, and using your labeled single stranded probe you can search the sample for the complementary sequence

Pairing of probe and sample can be visualized by the label – e.g. on X-ray film or by measuring fluorescenceThe principle of hybridization can be applied to pairing events involving DNA: DNA; DNA: RNA; and protein: antibody

DNA Hybridization

Southern blotNorthern blotWestern blotSlide29

Advances in technology have removed the technical obstacles to determining the

nucleotide sequence of a

gene, a chromosome region, or a whole genomeThe starting point for any sequencing project – be it of a single cloned fragment or of an entire genome - is a defined fragment of DNA 

Sequencing the DNASlide30
Slide31

Start with a defined fragment of DNA

Based on this template, generate a population of molecules differing in size by one base of known composition

Fractionate the population molecules based on size

The base at the truncated end of each of the fractionated molecules is determined and used to establish the nucleotide sequence

Sanger DNA Sequencing (old but still relevant)Slide32

A dideoxy nucleotide lacks a 3' OH and once incorporated, it will terminate strand synthesis. L-1. No free 3' OH

Sanger Sequencing -

ddNTPsSlide33

deoxinucleotyde

(

dNTP

)dideoxinucleotyde (ddNTP)

Buffer

DNA polymerase

dNTPs

Labeled primer

Target DNA

ddGTP

ddATP

ddTTP

ddCTP

Decoding DNA – Sanger Sequencing

https://

www.dnalc.org/view/15923-Cycle-sequencing.htmlSlide34

Next Generation Sequencing - Illumina

https://www.youtube.com/watch?v=womKfikWlxM

https://www.illumina.com/technology/next-generation-sequencing.htmlSlide35

PAC Biohttps://www.youtube.com/watch?v=

v8p4ph2MAvI

Single Molecule Real Time SequencingSlide36

Sequencing considerations

Method

Read lengthAccuracyReads per runTime per runCost per 1 million bases (in US$)Advantages

DisadvantagesSingle-molecule real-time sequencing (Pacific Bio)5,500 bp to 8,500 bp avg (10,000 bp); maximum read length >30,000

bases99.999% consensus accuracy; 87% single-read accuracy

50,000 per SMRT cell, or ~400

megabases

30 minutes to 2

hours

$0.33–$1.00

Longest read length. Fast.

Moderate throughput. Equipment can be very expensive.

Sequencing by synthesis (

Illumina)50 to 300 bp98%

up to 3 billion

1 to 10 days, depending upon sequencer and specified read length

$0.05 to $0.15Potential for high sequence yield, depending upon sequencer model and desired application.

Equipment can be very expensive. Requires high concentrations of DNA.

Chain termination (

Sanger

sequencing)

400 to 900 bp

99.9%

N/A

20 minutes to 3 hours

$2400

Long individual reads.

More expensive and impractical for larger sequencing projects.Slide37

Arabidopsis

thaliana has the smallest genome known in the plant kingdom

(135 Mb) and for this reason has become a favorite of plant molecular biologistsPsilotum

nudum (the "whisk fern") is a far simpler plant than Arabidopsis (it has no true leaves, flowers, or fruit) and has a genome size is 2.5 x 1011 Mb

Dealing with the C value paradox and whole genome sequencing…technology, time, and $

Genome Size and whole genome sequencing Slide38

Technologies for whole genome sequencing are evolving very rapidly and too fast for us to compare and contrast in this class

Key considerations

Cost Speed Read length Assembly  

Sequencing Developments

https://nanoporetech.com/Slide39

RNA seq Target the transcriptome rather than the genome

http://rnaseq.uoregon.edu/https://

www.illumina.com/techniques/sequencing/rna-sequencing.htmlSlide40

OSU Sequencing ResourcesSlide41

Sequencing a plant genome Slide42

OSU in the lead with plant genome sequencing

And the Beaver too Slide43

Fragaria vesca

Herbaceous, perennial2n=2x=14 240

MbReference species for RosaceaeGenetic resources Forward and reverse genetics

Fragaria x

ananassa: 2n=8x=56. The youngest crop? 250 years. Why sequence the “weed” when you could sequence the crop??? Slide44

Short reads

No physical referenceDe novo assembly

Open sourceWould they follow the same path in 2017? Slide45

Genome Sequencing and Assembly

3 next-gen platforms (new in the 20-10’s – old school in the 20-teens)

X39 coverage (number of reads including a given nucleotide) Contigs (overlapping reads) assembled into scaffolds (contigs + gaps)Slide46

Genome Sequencing and Assembly

~ 3,200 scaffolds N50 of 1.3 Mb (weighted average length)Over 95% (209.8 Mb) of total sequence is represented in

272 scaffoldsFlow cytometry to measure genome size: ~240 Mb Why is this > than the number of Mb sequenced (209)? Slide47

Anchoring the genome sequence to the genetic map

94% of scaffolds anchored

to the diploid Fragaria reference linkage map using 390 genetic markersPseudochromosomes ~ linkage groups ~ karyotype

Of what use is a linkage map when you can have a whole genome sequence? Slide48

Synteny

Prunus

and F. vesca

Homologs Orthologs ParalogsNice picture, but of what use? Slide49

The small genome size (240 Mb)Absence of large genome duplications Limited numbers of transposable elements, compared to other angiosperms -

the driver of small genome size? Slide50

Transcriptome sequence (cDNAs)

Organ specificity

Fruits and roots – different types of genes

“host gene deserts” Slide51

Gene prediction 34,809 nuclear genesflavor, nutritional value and flowering time

1,616 transcription factorsRNA genes 569

tRNA, 177 rRNA, 111 spliceosomal RNAs, 168 small nuclear RNAs,76 micro RNA and 24 other RNAs Are these results expected? Chloroplast genome155,691 bp encodes 78 proteins, 30 tRNAs

and 4 rRNA genesEvidence of DNA transfer from plastid genome to the nuclear genomeGenetics accounting Slide52

Strawberry unique gene clusters

Utility in practical

horticulture? Acknowledgements: Thanks to Merve Sekerli for the slide preparation