What you need to know Ahmed Rebai Ahmedrebaicbsrnrttn DNA The code of life DNA is a molecule made of four bricks Living cells organisms have DNA within it DNA contains ID: 542115 Download Presentation

Download Presentation - The PPT/PDF document "BASIC of GENETICS" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

149 views

## Presentation on theme: "BASIC of GENETICS"— Presentation transcript

Slide1

BASIC of GENETICSWhat you need to know

Ahmed

Rebai

Ahmed.rebai@cbs.rnrt.tnSlide2

DNA.. The code of lifeDNA is a molecule made of four bricks

Living

cells

/organisms have DNA within itDNA contains the ‘text’ of lifeSlide3

DNASlide4

From DNA to proteinSlide5

DNAParts of DNA are CODING (give proteins) this

is

only 3% in human genome but 95% of yeastParts of DNA are NON-CODING:Introns

Regulatory

region

of

genes

Other

(

junk

DNA!)Slide6

GENEGene: a section of DNA that codes for a protein and protein contributes to a trait

A chromosome is a ‘chunk’ of DNA and genes are parts of chromosomesSlide7

genes … allelesBecause we have a pair of each

chromosome,

we

have two copies of each geneThese two forms

can

be

identical

in

sequence

or

different

:

they

are called ALLELEAlleles can yield different phenotypesSlide8

ALLELEAllele: the different ‘options’ for a gene

Example: attached or unattached earlobes are the alleles for the gene for earlobe shapeSlide9

DOMINANT/RECESSIVEDominant: an allele that blocks or hides a recessive allele

Recessive

: an allele that is blocked by or hidden by a dominant alleleSlide10

GENOTYPEGenotype: A person’s set of alleles (gene options)

Genotypes can be noted by

Two letters denoting alleles: AA, AB, BB or for single variations for example AA, AG, GG

A digit 1, 2, 3 or 0,1,2 (choosing a reference allele)

2

1

0Slide11

HOMOZYGOUS/HETEROZYGOUSHomozygous: When a person’s two alleles for a gene are the

same

Heterozygous

: When a person’s two alleles for a gene are differentYou get one allele from your mom and one from your dad.

If you get the same alleles from your mom and dad, you are

homozygous

for that gene.

If your mom gave you a different allele than your dad, you are

heterozygous

for that geneSlide12

PHENOTYPEPhenotype: A person’s physical features because of their genotype

What you look like (your phenotype) is based on what your genotype is (your genes)Slide13

Segergation: lessons from peasMendel (1822-1884) in the

monastry

of St. Thomas in the

town of Brno (Brünn), in the Czech Republic. By a series of experiments in 1856-1863 on garden peas

discovred

the

laws

of

inheritance

Slide14

Sexual reproductionSlide15

Mendelian genetics: the lawsSlide16

SegergationSlide17

Slide18

Segregation rules1. Genes come in pairs, which means that a cell or individual has

two copies (alleles) of

each

gene.2. For each pair of genes, the alleles may be identical (homozygous WW or homozygous ww), or they may be different (heterozygous Ww).3. Each reproductive cell (gamete) produced

by an individual contains only

one allele of

each gene (that is, either

W or w).

4. In the formation of gametes, any particular

gamete is equally likely to include either allele (hence, from a heterozygous

Ww

genotype,

half the gametes contain

W and the

other

half contain w).5. The union of male and female reproductive cells is a random process that reunites the

alleles in pairs.Slide19

Mendel’s first lawThe Principle of Segregation: In the formation of gametes, the paired hereditary determinants separate (segregate) in such a way that each gamete is equally likely to contain either member

of the pair.Slide20

RecombinationMendel studied co-segregation

of

two

genes by crossing: Wrinkled and Green x Round and YellowSlide21

Slide22

Mendenl’s second lawThe Principle of Independent Assortment: Segregation of the members of any pair of alleles is independent of the segregation of other pairs in the formation of reproductive cells.

This is of course valid for unlinked genesSlide23

RecombinationWhen two genes

are

linked

(close on the same chromosome) they do not segregate independently; frequencies of genotypes in

progeny

depend

on the distance

between

genesSlide24

Multiple genes for a phenotype: polygenic traitsSlide25

Continious scale for a phenotypeSlide26

Let us exerciceWhat are the genotypes produced

by the

following

matings and their frequencies:AA x AAAA x AaAA x aaAa x Aa

Aa x

aa

aa

x

aa

What

are the

frequencies

of

two-gene

genotypes from this mating: AABb x AaBB

?Slide27

Population GeneticsBasic concepts and theoriesSlide28

Probability in population geneticsConsider the offsprings of the

mating

Aa x AaThe addition rule:Pr(an offspring have at

least one A

allele

)=Pr(A-)= Pr(AA or Aa)= Pr(AA)+Pr(Aa)=1/4+1/2=3/4

For

any

two

independent

events

A and B Pr(A or B)=Pr(A)+Pr(B)The multiplication rule:Pr(

two offsprings having at least one A allele

each

)= Pr(A- and A-)=Pr(A-)

xPr

(A-)= 3/4x3/4=9/16

Far

any

two

independent

events

A and B

Pr(A and B)=Pr(A)

x

Pr

(B)Slide29

ExerciceTwo indivdiuals with genotypes

Aa and Aa

married

and had three children; what is the probability

that

one of

their

children

has the

genotype

aa

?

Pr(

aa and (AA or Aa) and (AA or Aa))= Pr(aa)xPr(A-)xPr

(A-)=1/4x3/4x3/4=9/64ButSince the aa child have three

possible

birth

orders

we

should

multiply

by 3.

so

27/64.

Compute

for the case of

two

children

?

(

response

: 6/16; for 4

children

this

is

also

27/64)Slide30

Organization of genetic variationA population is a group of organisms

of the

same

species living within a sufficiently restricted geographical area that

any

mmeber

can

potentially

mate

with

any

other member (of the opposite sex)Population subdivision can be due to

geographic constraints as well as to social behaviourLocal populations: by country,

town

, : a group of

individuals

that

can

interbreed

also

said

subpopulations

or

Mendelian

populationsSlide31

Genetic variationPhenotypic diversity in

natural

populations

is impressive and is due to genetic variation: multiple alleles for many genes

affecting

the

phenotype

Population

genetics

is

concerned

by

describing how alleles are organized into

genotypes and to determine wether alleles of the

same

or

different

genes

are

associated

at

randomSlide32

32Allele frequencies

in populations

Allele

frequency

is

the proportion in the population of all

alleles

of the

gene

that

are of the

specified

type

Since

the population are of large size allele frequencies are estimated from a population

sampleConsider a gene with

genotypes

:

AA

,

Aa

et

aa

and a

sample

of N

individuals

We

count the

number

of

individuals

that

have

AA

,

Aa

et

aa

genotypes

(

denoted

N

AA

,

N

Aa

et

N

aa

,

respectively

) and

we

estimate

the

ferquency

of

allele

A

by the

number

of

alleles

A

among

all

alleles

segregating

in the population,

that

is

:

p

A

= (2N

AA

+

N

Aa

)/2N

and

then

p

a

=1-

p

ASlide33

ExampleIn a sample of 1000 individuals 298 were of genotype MM and 489 MN and 213 NN so the

ferquency

of allele M is

p

M

=(2*298+489)/(2*1000)=0.54

We can compute a 95% confidence interval for the frequency based on the binomial law and normal approximation:

This approximation is only valid for non-small (>0.1) and non-high (<0.9) frequencies

In example we get

[0.52 ; 0.56]Slide34

For rare allelesFor rare alleles (less than

1%)

there

is chance that a sample do not contain any allele carrier

so

the

frequency

estimation

will

be

0

An alternative

is

to use

Emprical Bayes estimation:For uniform prior this gives p=(k+2)/(n+4)

where k is the observed number

of

alleles

in the

sample

and

n

the total

number

of

allelesSlide35

Random matingMeans that any

two

individuals (of opposite sex) have the same probability to mate This means

that

genotypes

meet

each

other

with the same probability as if they were

formed by random collision of genotypesRandom

mating

can

apply

to

some

genes

like

those

controlling

blood

groups or

neutral

polymorphisms

but not for

others

like

those

controlling

skin

color

or

heightSlide36

Slide37

Non overlapping generationFormally this

means

that the cycle of birth, maturation and death includes the death of all individuals

present

in

each

generation

before

the

next

generation matureThis is only an approximation (simplistic in

humans) but works well as far as geotype

frequencies

are

considered

Slide38

The Hardy-Weinberg principle38

If

we

assume that The organism is

diploid

Reproduction

is

sexual

Generations

non-

overlapping

Allele

frequencies identical in males and females

The population is of large sizeMating is

random

Migration and mutation

is

negligible

Natural

seltcion

does

not affect

allelesSlide39

Then..Genotype frequencies

can

be deduced from allele frequencies (p

is

frequency

of

allele

A

,

q=1-p

of

allele

a):

AA: p² Aa: 2pq

aa

: q²

These

frequencies

(

allelic

and

genotypic

)

remains

the

same

over

generations

:

we

say

that

the population

is

in

Hardy-Weinberg

Equilibrium

(HWE)Slide40

Why?Slide41

Implication of hweDespite very restrictive and incorrect assumption HWE

offers

a

reference model in which there are no evolutionary forces at work other

than

those

imposed

by the

process

of reproduction

itself

(

like a mechanical model of falling object without any force in action

other than gravity)The HW model separates life cycle to

two

phases:

games

->zygote and zygote->

adult

Even

if the

assumptions

of non-

overlapping

generations

is

not

true

HWE

will

be

attained

gradually

Applies

also

to

multiallelic

genesSlide42

Implication of hweSlide43

application of HWEWe can calculate the number

of carriers of a rare mutation in the population

Ex:

cystic fibrosis in european population patient is known to be 1 over 1700 (

q=0.024

)

so

the

number

of

heterozygotes

is

(due to HWE) about 5%

So

when there is a very rare allele most of genotypes

containing this allele are heterozygous:

Show

that

for a rare

allele

of

frequency

is

1/1000

there

are 2000 times more

heterzoygotes

than

recessive

homozygotes?Slide44

HWE deviation44

Deviation

from HWE can

be

due to

inbreeding

,

population stratification,

selection

,

gender-dependent

allele

frequencies, non-random (assortative

) matingPrinciple do not apply directly

to X-

linked

genes

or Y-

linked

genesSlide45

Tests of HWE45

Compare

observed

to expected genotype

counts

using

Pearson chi-square test

of

goodness

of fit:

with

3

genotypes and 1 parameter estimated

(p) we have a test with 1

df

Inappropriate

for rare

variants

(

low

genotype

counts

): use

Fisher Exact Test

(FET)

Other

Exact tests are

available

in the

R

language

(

e.g

.

Genetics

package,…)Slide46

Pearson chi-square through D46

Let

D

A= PAA

- p²

Testing

HWE

is

testing

D

A

=0

Compute

p-value

= Pr(²

1df

> ²

obs

)

If

p-value<0,05 (or 0,0001)

then

Deviation

from

HWESlide47

47Example:

In a sample of 1000 individuals 298 were of genotype MM and 489 MN and 213 NN so the

ferquency

of allele M is Genotypes: MM MN NN

Observed counts : 298 489 213

Expected counts : 294.3 496.4 209.3

p

M

=0.54, P

MM

=0.294 so

D=0.298-0.294=0.004

²=N

D²/(p(1-p))²=1000*(0.004/(0.54*0.46))²

²=0.25<3.84; p-value=0.61

Tests of HWE:

let’s

do

it!Slide48

Haplotypes from genotypes

48

If

we study many

genes

they

can

be

linked

and one

can

use haplotypesA haplotype (haploid

genotype) is a set for

alleles

carried

by one chromosome for

several

genes

Consider

two

genes

(

A,a

) and (

B,b

)

with

allele

frequencies

(

p

A

,

p

a

) and (

p

B

,

p

b

)

If

gametic

frequencies

are

product

of

allele

frequencies

:

AB

:

p

A

x

p

B

,

Ab

:

p

A

xp

b

,

aB

:

p

a

x

p

B

,

ab

:

p

a

x

p

b

We

say

that

the

genes

are in

random

association or in

Linkage

equilibrium

Slide49

Slide50

Linkage disequiulibriumIf the observed frequency of

gametes

(

e.g. PAB) differ from that expected

under

linkage

equilibrium

(

p

A

x

p

B

)

we say

that the gene is in

Linkage

Disequilibrium

(LD)

To

measure

and test LD

we

need

to know the

haplotype

frequenciesSlide51

Linkage Disequilibrium51

SNP1

SNP2

Allele Frequencies

40%

60%

30%

70%

No LD

Linkage Disequilibrium (LD)

12%

28%

18%

42%

a

A

60%

30%

10%

B

bSlide52

LD measures: DThe difference between

observed

and

expected haplotype frequencyIs also equal to

D

is

bounded

between

D

max

and

D

minSlide53

D’: standardized DPractically choose

alleles

A and B

such that D>0 and pA

>

p

B

,

A

standardized

measure

of LD

is

thus:

D’=1 denotes

complete

LDSlide54

The r² measure : more practical54

This is correlation from the 2x2 contingency table of haplotype counts

Or Slide55

Testing LDWe can show that

Nr²

is a chi-square test of LD (1df)Exercice: two blood group systems: M/N and S/s gave

following

haplotypes

(1000

individuals

):

MS: 474 Ms: 611 NS: 142 Ns: 733

Allele

frequencies

are M: 0.54, S: 0.31

Compute D and D’ and r²Test LDSolution: D=0.07, D’=0.50 r²=0.47, X²=470, p<10-100Slide56

Causes of LDLD is ‘created by linkage’If r

is

the recombination rate between two genes then we

can

show

that

LD

at

generation

t

is

given byDt=(1-r)tD0

If r is

small

(

genes

very

close on chromosome) the

decay

is

very

slow and

can

stay

for over

hundreds

of

generationSlide57

Recombination and LD

(1-r)/2 /2Slide58

Decay of LD over generationsSlide59

Slide60

Admixture of populationsLD can be created

by the

merge

of populations having different gametic frequenciesLet two populations and two

genes

in linkage

equiulibrium

in

both

,

where

alleles

A and B have

frequencies 0.05 in the first population and 0.95 in the second populationA new population is formed by

equal mixture of the two populations, show that LD is high

in

that

population (D=0.2 and D’=0.81) ?Slide61

AdmixtureSlide62

Natural (Darwinian) selectionIndividuals differ in

their

ability to survive and reproduce owing in part to their genotypeTh

selective

advantage

/

disadvantage

is

measured

by

fitness

Selection results in a change of allele frequencies over generations and

deviation from HWESlide63

Effect of selection Slide64

Random Genetic DriftFor each

generation

there is a chance in the drawing of gametes that will unit to

form

the

next

generation

This chance

can

result

in a

random

change in allele frequency and may ultimately

lead to the fixation or elimination of

some

allelesSlide65

Simply sayingSlide66

Mathematical models of DriftWright-Fisher model (1930): probability of obtaining k copies of an allele that had frequency

p

in the last generation is:

expected time before a neutral allele becomes fixed through genetic drift is given by:Slide67

Slide68

Population bottleneckSlide69

Founder effectSlide70

Population substructureWhen a population is organized in

several

subpopulations having different genetic composition (allele frequencies

)

Substructure

generally

results

in the

reduction

of

heterozygotes

frequency

relative to that expected with random mating

(Wahlund principle)Several measures

to

assess

population substructure : F-

statisticsSlide71

F-statisticsDefined by Wright (1921) (1-F

IT

)=(1-F

IS)(1-FST)Slide72

Another formulationThe mots useful to test substructure is FST an index

that

measures the level of genetic divergence among subpopulationsFST=(HT-H

S

)/H

T

H

S

:

average

heterozygosity

among

individuals within subpopulationsHT: average heterozygosity among individuals

within the total populationsAccording to variance of allele frequencies

Can

be

calculated

by R package (

hierfstat

)

F

ST

is

not a

genetic

distanceSlide73

How to use it?FST=1 means total divergence by fixation of alternative

alleles

in

subpopulations<0.05: little differentiation0.0<FST<0.15 moderate0.15<F

S

T<0.25

high

>0.25

very

high

Test chi-square

with

1

df

: X²= (k-1) N FST Examples: between european and

sub-sahrian african: 0.15Japanese-african

: 0.19

europeans

: 0.11Slide74

ExampleTwo population where allele frequency

is

0,5 and 0,3Slide75

AdmixtureGenetic admixture occurs when individuals from two or more previously separated populations begin interbreeding. Admixture results in the introduction of new genetic lineages into a population.Most human populations are a product of mixture of genetically distinct groups that

intermixed within

the last 4,000 years. Slide76

Admixture detectionBy testing HWEStandard statistical

methods

applied to data on genotype, alleles/haplotype frequencies: Principal component

Analysis

(PCA),

Clustering

: K-

means

,

hierarchical

,..

Advanced

methods

:

Maximum likelihood (psmix R package)Bayesian methods Wavelet

analysis (adwave R package)STRUCTURE Slide77

Principal component analysisSlide78

ClusteringSlide79

structure

inferring the presence of distinct populations, assigning individuals to populations, studying hybrid zones, identifying migrants and admixed individuals, and estimating population allele frequencies in situations where many individuals are migrants or admixed

.

http://

pritchardlab.stanford.edu/structure.html

Slide80

Admixture

https://www.genetics.ucla.edu/software/admixture

/Slide81

R packagesGenetics: Classes and methods for handling

genetic

data. Includes classes to represent genotypes and haplotypes at single markers up

to multiple markers on multiple chromosomes.

Function

include

allele

frequencies

,

flagging

homo/heterozygotes, flagging carriers of certain alleles, estimating and testing

for Hardy-Weinberg disequilibrium, estimating and testing for linkage

disequilibrium

,

...

Adegenet

:

Classes and functions for genetic data analysis within the multivariate

framework

Hierfstat

:

estimation of hierarchical F-statistics from

haploid or

diploid genetic data with any numbers of levels in the hierarchy, following the

algorithm Functions

are also given to test via

randomisation

the

significance of each F and variance

componentsSlide82

Recommended readings