/
Part1:  Large-scale gene expression ( Part1:  Large-scale gene expression (

Part1: Large-scale gene expression ( - PowerPoint Presentation

esther
esther . @esther
Follow
342 views
Uploaded On 2022-06-18

Part1: Large-scale gene expression ( - PPT Presentation

transcriptomic data analysis Ståle Nygård Bioinformatics core facility OUSUiO staalnifiuiono Gene expression Gene expression is the process by which information from a gene is used in the synthesis of a functional gene product ID: 919898

genes gene analysis rna gene genes rna analysis sequencing expression data highest microarray cluster normalization interactions microarrays ordinary transcriptomic

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Part1: Large-scale gene expression (" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Part1: Large-scale gene expression (transcriptomic) data analysis

Ståle

Nygård

, Bioinformatics core facility, OUS/UiO

staaln@ifi.uio.no

Slide2

Gene expressionGene expression is the process by which information from a gene is used in the synthesis of a functional gene product.

Slide3

Transcriptomic data“Genome-wide” measurements of gene expression (several thousand gene transcripts)Are often used to find differentially expressed genesBetween groups of individuals (with different phenotypes, e.g. disease/healthy, long/short survival etc)Over time (e.g

as disease develop, as tissue develop)

Slide4

4

Development

of

transcriptomic

s

Multiple Northern blots

Macroarrays

cDNA

microarraysOligonucleotide microarraysHigh density arraysHigh througput sequencing (RNA sequencing)Next-next generation sequencing: True single molecule sequencing. E.g NanoPore technology (http://www.nanoporetech.com)

1977

1987

1995

1996

2003

2005

Future

Slide5

Alternative splicing (example)

Slide6

Microarrays vs RNA-Seq

Slide7

7

Microarray

pipeline

(

simplified

)

Amplification and

Labelling

RNA/DNA

Nucleic acid purification

Labeled RNA/DNA

Hybridisation, washing

Bioinformaticanalysis

Scan, Quantitate

Raw data

EBE`BEpBEBLEÐB@E@B@EàB@EBhEpBHE°BPEpB

E

`B`EðBEBHEPB$E



BEBEB@E(EBEBPE

B8EàB$EàB$EPBE#°BLE`B`EàBPE°BEÐBDEB8EBBBEB$EÀBLEBE



B`E`B@E"



BTE°BE



B

E@B,EÀB8E%BªEÀB\E°BHE



B8E@B\E



BLE

B4EàB$E`BEÀB8E@B4EðB@EBE

àB$E

BDEB<EÐBTE°B,EB$EPBEB@EðB,EB<E0BHE

B4EBE@BEB(E

B,EBXE!@BXE`BDEàBdEpBHEB(E#ÀB4E`B4E

B4E°B4E)`BE@B4E0BDEpBdE`BHEPBE@BE@BEÀBE!PB0EpBE"°BEpB,EàBPEB`E



BHEB8EpBEpB@EB

Pre-processing

Sample

Slide8

RNA sequencing

Slide9

RNA sequencing

Slide10

RNA sequencing

Slide11

Bioinformatic analysis of RNA sequencing data – main stepsAlignment to transcriptomeAssembly (finding isoforms)Count reads

(

per isoform or gene)

Normalization

Differential expression (per isoform or gene)

Functional analysis

Slide12

NormalizationGoal: remove technical artifacts, which can be due toDifferent amounts of input materialDifferent degrees of degradationDust, scratches etc on the arrays++Most normalization methods assume that the overall intensity is the same for different samples (

e.g

quantile

normlization

).

Slide13

13

Quantile

normalization

Enforce equal distribution between the microarrays. Procedure

Sort the expression values for each microarray from highest to lowest

Calculate the mean value for each rank

For every array

let the highest ranked gene have the mean value of the highest ranked genes (of all arrays)Let the second highest ranked gene have the mean value of the second highest ranked genes (of all arrays)and so on for all ranks

Slide14

Normalization usingTMM (Trimmed Mean of M-values)Highly expressed genes having big influence on library size

In TMM the genes with the smallest and largest ratios (i. e 40% of the genes) are not used in the normalization.

Slide15

Testing for differential expression (microarrays)- Ordinary t-test:

- Variance estimates can be improved by ”borrowing strength” across genes in a technique called variance shrinkage

Many methods use this technique,

e.g

SAM.

Non-parametric methods (e.g. rank product)

NB! Ordinary

t-test works well for large sample sizes.

Slide16

(RNA-seq data)

Slide17

Slide18

Transcriptomic data analysis - summary

Slide19

Microarray vs RNA-SeqAdvantages RNA-SeqCan handle alternative splicingClaimed to be more robust to degradation

Now also cheaper

Advantages microarrays

Claimed higher accuracy for lowly expressed genes

Analysis tools are more mature

From:

Differential

analysis of gene regulation at transcript resolution with RNA-seq

(Trapnell et al, Nature Biotechnology,2013).

Slide20

Correction for multiple testingIn ordinary microarray studies (looking at all genes), use false discovery rates instead of ordinary p-values

Slide21

21

Hierarchical

clustering

Genes and samples

can

be

clustered

at

the same timeAgglomerative: start with one element as a cluster (bottom-up). Most commonDivisive: start with all elements in one large cluster (top-down)Dendrogram: a cluster tree

Why

cluster genes?Reduce complexity

Generate hypothesis, e.g. h

ypothesize that a group of

genes with similar expression

profiles interact or are involved

in the same process

Why

cluster

samples?

Identify

known

sub-

groups

Find

new

or more

detailed

subgroups

Quality

check

(

detect

outliers

)

Slide22

Functional analysisOver-representation analysis (ORA). Finding pre-defined gene sets overrepresented by regulated genes. The gene sets can beGene Ontology categories (molecular functions, biological processes, cellular componentsPathways (signalling, metabolic)Map (pair-wise) molecular interactions onto the set of regulated genes using e.g

Protein-protein interactions

Transcription factor binding information

Slide23

GO

structure

Terms are related within a hierarchy

Describes

multiple levels of detail of gene function

Terms can have more than one parent or child

Slide24

Pathway analysis - example

Slide25

25

Fisher

s

exact

test

Background population:

500 black

genes (diff.

expr

genes)

,

5000 red

genes (not diff.

expr

. genes)

Gene

group (GO term, pathway)

Gene A

Gene B

Gene C

Gene D

Gene E

P-value

Null distribution

Answer = 4.6 x 10

-4

Slide26

Network construction based on microarray dataNetwork construction from genomic data is difficult. Many possible combinations of interactions.Network construction could be guided by including external information

about interactions

.

Examples

Seeded Bayesian networks (

Djebbari

et al, 2008

)Bioconductor package BionetBionet example