/
Transcriptomics  – Transcriptomics  –

Transcriptomics – - PowerPoint Presentation

lindy-dunigan
lindy-dunigan . @lindy-dunigan
Follow
343 views
Uploaded On 2019-11-26

Transcriptomics – - PPT Presentation

Transcriptomics towards RNASeq Federico M Giorgi federicogiorgigmailcom Analisi del Genoma e Bioinformatica Corso di Laurea Specialistica in Biotecnologie delle piante ID: 768120

cpv exercise genes microarray exercise cpv microarray genes stress probes data gene risposta affymetrix time transcriptomics norflurazon arabidopsis col

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Transcriptomics –" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Transcriptomics – towards RNASeq Federico M. Giorgi – federico.giorgi@gmail.com Analisi del Genoma e Bioinformatica Corso di Laurea Specialistica in Biotecnologie delle piante e degli animali

Overview of the courseTranscription and Transcriptomics Transcriptomics methods: Microarrays Exercises on Microarray analysis RNASeq RNASeq exercises Real cases – Further applications Day 1 23/04/2012 Room β 3 Day 2 02/05/2012Room 35 Day 3 07/05/2012 Room β 3

Background - Transcriptomics Transcriptome is the complete set of transcripts in a cell, and their quantity, for a specific developmental stage or physiological condition.

Background - TranscriptomicsTranscriptomics is the study of the Transcriptome Key Aims of Trascriptomics: • To catalogue all species of transcript• To determine the transcriptional structure(s) of genes• To quantify the changing expression levels of each transcript during development and under different conditions. Final goals:• To monitor molecular responses to treatments• To understand how the cell functions

Transcriptomics network Transcript s interact with each other, and with proteins, lipids, metabolites... Some mRNAs get translated into Transcription Factors , which control the expression of other transcripts...A representation of these relationships is usually a complex NETWORK

Transcriptomics techniquesNorthern Blot Hybridization-based Microarrays Can cover almost an entire transcriptome Exon arrays can detect specific splice variants Tiling arrays can detect inter-genic regions Still less expensive than high-throughput sequencing (NGS) Problems with cross-hybridizing probes Requires prior knowledge of the organism, for probe designThe first method for specific RNA detectionBased initially on radioactive probes Not fully quantitativeLow yieldStill required sometimes as validation for more modern techniques

Transcriptomics techniquesReal Time Quantitative PCR Normalizes the Ct versus two or more housekeeping genes – usually constant in their expression across tissues (e.g. Tubulin or EF1 α in Arabidopsis thaliana )Can be efficiently used for up to 100 transcripts. And it’s considered the golden standard for single-gene papers Requires prior knowledge of the organism, for primers design (but degenerated primers can be used) Calculates the Ct (PCR cycle at which the specific transcript appears above a detection threshold)

Transcriptomics techniquesSequence-based Sanger Tag-based RNASeq Long sequences (1500 nt) Low throughput Expensive Very rarely used for transcriptome SAGE, CAGE, MPSS Higher throughput Expensive Mapping problems High throughputWide TranscriptomeFully quantitative High information/cost ratio Doesn’t require prior knowledge of the organism: everything is sequenced... Even contaminants Requires a consistent bioinformatics infrastructure Required specific and fast software to be dealt with

Microarray classesDouble channel Single Channel Non-Affymetrix Affymetrix Normalization is achieved within the array itself, e.g. (Affymetrix) by the presence of Mismatch probes paired with Perfect Match probes Two samples are tagged with a green (Cy3) or a red (Cy5) chromophore, and then compared at the same time on the chip Agilent Nimblegen Custom spotting

B B B Affymetrix Microarrays AAAA AAAA AAAA Total mRNA Reverse transcription In vitro transcription cDNA Biotin-labeled cRNA B B B Fragmentation B B GeneChip Expression array Hybridization B Wash and stain B Scan and quantitate Data processing

Affymetrix MicroarraysA chip consists in a number of probesetsProbesets are intended to measure expression for a specific mRNA/gene (or family of genes)Probesets consist of a collection of 25mer probes selected from the target sequence

Microarray Probes Gene -mRNA “Perfect match” probeset ( ~ 11 probes per gene) * Affymetrix ATH1 Arabidopsis GeneChip © 22810 genes (probesets)* “Mismatch” probeset ( to detect unspecific signal)

Microarray Probes

Microarray – Data processing Raw optical image (DAT file) Raw intensity values (CEL file)

Microarray – Data processing Raw intensity values Background subtraction Normalization Summarization Makes arrays comparable Summarizes 11 probes ’ values in 1 gene value (CEL file)

Preprocessing: RMA and MAS5RMA MAS5 Background correction Normalization Summarization RMA background model Zone effect Subtraction of mismatch probes Quantile Scale Median polish ( multi-array ) Tukey biweight robust median ( single-array )

Which one works better?RMA/GCRMA is ecellent for differential expression (see Cope et al. 2004 benchmark)MAS5, by keeping arrays separated, is good for other tasks that require independence of measurements, e.g. Gene Network Reconstruction (Lim et al., 2007)

Other less known (and never used) microarray preprocessing methodsdChip (Li-Wong, 2001)FARMS (Hochreiter, 2006; performs better than all methods in benchmark studies; it’s also very fast, but nobody uses it) PLIER (new Affymetrix standard, recent winner in a comparison with RT-PCR measurements – Gyorffy et al., 2009) tRMA (Giorgi et al., 2010; fixes problems in multi-matching probesets, a common issue with microarray data)

Microarray Probes Gene -mRNA “Perfect match” probeset ( ~ 11 probes per gene) * Affymetrix ATH1 Arabidopsis GeneChip © 22810 genes (probesets)* “Mismatch” probeset ( to detect unspecific signal)

BLAST bitscore >=50.1 25/ 25 alignment0 mismatches Probesets mapping to Arabidopsis genes 0 mismatches

BLAST bitscore >=42.1 21/25 alignment 1 mismatch Probesets mapping to Arabidopsis genes 1 mismatch

Probesets mapping to Arabidopsis genes 2 mismatches BLAST bitscore >=34.2 17/25 alignment 2 mismatches Probes with 2 mismatches detect a mRNA with a signal only 20% smaller than a 0 mismatches probe (Hooyberghs et al., 2009)

Microarrays vs. RNASeqCount vs. continuous

Microarrays vs. RNASeqSignal vs. Quantity plots Real quantity of transcript Microarray signal Saturation Limit of detection

Microarrays vs. RNASeqGene length bias Tarazona 2011

Exercises Confucius 551 B.C. – 479 B.C. I see and I forget I read and I remember I do and I understand

Exercise – get some microarraysGoogle for:Gene Expression Omnibus Browse at your leasure, e.g. By looking for Arabidopsis thaliana experiments How many samples are available for this organism? (When I started my Master thesis – in 2007 – there were 5500)

Exercise – get Norflurazon dataset 1 – look for this 2 – click on the series data

Exercise – get Norflurazion dataset 3 – scroll down 4 – download the CEL files via ftp Infos on the data, etc...

Excercise: fast microarray analysis1 Download Norflurazon data from Gene Expression Omnibus mkdir /home/ngs/transcriptomics cd /home/ngs/transcriptomics wget http://giorgilab.org/GSE12887.RAW.tar tar xvf GSE12887.RAW.targunzip *gz

Excercise: fast microarray analysis2 Set up your R cd /home/ngs/transcriptomics R setRepositories() install.packages(" affyPLM")install.packages(" limma") This tells R where to find Bioinformatics tools (select « BioC software») (and an italian mirror) This installs a library for Affymetrix microarray processing This installs a library for differential gene expression analysis

Excercise: fast microarray analysis3 Normalize your microarrays library(affyPLM) list.celfiles() abatch<-ReadAffy(filenames=list.celfiles()[1:6]) eset<-rma(abatch) Loads the library for Affymetrix data Check your CEL files are in the directory Loads the first 6 CEL files into an AffyBatch object Normalization Alternatives: rma, mas5, gcrma...

Excercise: fast microarray analysis4 Differential Expression for Genes library(limma) groups <- c ( "Control", "Norflurazon") samples <- as.factor(c(1,1,1,2,2,2)) design <- model.matrix(~ -1+samples) colnames(design ) <- groupscontrast.matrix <- makeContrasts(Norflurazon-Control,levels=design) fit <- lmFit(eset , design) fit2 <- contrasts.fit(fit, contrast.matrix) fit2 <- eBayes(fit2 ) Loads a library for DEG Define replicate structure Tell R what to compare (easy here, only two groups) Fit a linear model to the data

Excercise: fast microarray analysis5 Print your data output<-cbind(fit2$coef,fit2$p.val) rownames(output)<-rownames(fit2) write.table(output, file= " norflurazon.txt",sep ="\t",quote=FALSE) write.table(output[fit2$p.val<0.05,], file ="norflurazon.sig.txt ",sep="\t" ,quote =FALSE ) q() Get Fold Change (M) and associated p-values for every gene Save your genes into a file, tab-separated Save your gene, but only the significantly changed Quit R (press y to confirm)

Excercise: fast microarray analysis6 Read your file lists You can open them with LibreOffice Calc (Excel) or gedit, or any text editor You can sort them by p-values The «strange» identifiers are Affymetrix ids. You can convert them into gene names/functions on TAIR www.arabidopsis.org Search  microarray element

Exercise – visualize your outputDownload the following program: Mapman – graphical tool for transcript and metabolite systems biology http://mapman.gabipd.org/web/guest/mapman cd /home/ngs/transcriptomics wget http://giorgilab.org/MapManInst-3_1_0.jar java –jar MapManInst-3_1_0.jar cd /home/ngs/transcriptomicswget http://giorgilab.org/norflurazon.txtwget http://giorgilab.org/norflurazon.sig.txtIf for some reason R didn’t work: If it doesn’t find Java, indicate /usr/bin Install Java... Try without, it should work

Exercise –MapMan You must open MapMan Mapping Pathway Data Right click, «add data» Add «norflurazon.txt»

Exercise –MapMan Mapping Pathway Data Click on «show pathway»

Exercise –MapMan Mapping Pathway Data Click on «show pathway»

Exercise –MapMan

Exercise –MapManPlay with «scale», e.g. You want to see only at least Log2FC of 2 (4 times induced/repressed) genes

Exercise –MapManPlay with «scale», e.g. You want to see only at least Log2FC of 2 (4 times induced/repressed) genes

Exercise –MapManPlay with «scale», e.g. You want to see only at least Log2FC of 2 (4 times induced/repressed) genes

Exercise – MapManYou can change pathway on the left menu

Exercise –MapManYou can see which GROUPS of genes are repressed What is the effect of this Norflurazon?

Exercise – get the programs Download the following program: Robin – graphical tool for RNASeq and Microarray data analysis http://mapman.gabipd.org/web/guest/robin

Select «Affymetrix GeneChip microarray experiment» Exercise – start Affymetrix analysis

Start new project Exercise – start Affymetrix analysis Any folder is good. You can pick for example /home/ngs/transcriptomics/microarray1 Add new data Pick the CEL files:

Exercise – quality checks Select only the first six of them: GSM323075-GSM323080 (To save time) Check in the expert options: how many normalization methods do you see? Click Next

You should now see an activity icon in the lower right cornerRunning all calculations might take some time Exercise – quality checks

When finished you are presented with an overview of the results Let’s go through them by clicking on individual results Click to enlarge Exercise – quality checks

Exercise: investigating probe intensityEach Affymetrix Array has several probes per probeset (i.e. most often a gene)The intensity distribution should be similar across arrays. This can be seen in boxplots and in the „ hist “ plot which shows a smoothed histogram

In the histogram the distribution should be unimodal i.e. we should see one peak. Ideally this should be Gaussian distributed On most chips we see two peaks: one very sharp towards lower signals and a broader peak for stronger signalsUsually, the left one indicates a mixture of noise signals and not-hybridizing probes. The right one indicates true hybridization. In the following example: S.pombe material hybridized on a S.cerevisiae microarray: Exercise: investigating probe intensity

Scatter Plots of one array against another oneScatter Plots just plot the log intensity of one array against another one. Ideally most points should lie around the blue 45 degree line. This is because most genes should be unchanged Red lines indicate a deviation of more than one log unit from this unchanged state. In these plots we can again clearly see two populations of points Click through some of the scatter plots and eyeball them

Exercise: RNA degradation plot As several probes are detecting one gene, one can order them from 5‘ to 3‘ and calculate averages One would expect the 3‘ most probes to be the strongest Most important is that the lines are more or less parallel indicating a similar degradation AAAAAA3’ 5’ Degradation direction Probes

Exercise: quality Checks - MA plots One classical way to look at arrays is by looking at MA plots M is array1 (channel1) – array 2 (channel2) A is the average signal (array1+array2)/2The red line in the plots represents the moving average it should be close to the zero lineToo strong deviations are flagged, but there are none hereA M

Exercise: Probe Level model (PLM) plotsPLM plots might show hybridization artifactsThese usually show in regular patterns or in extensive greenish areas on a chip The upper left corner and the middle is always regularly white, this is fine and part of the chip layout

Exercise: PCA Clustering and projecting the data should reflect the experimental structure PCA is projecting the data into e.g. two dimensions where each axis is orthogonal to the other axes and where the axes are linear combinations of the genes to maximize the observed variance Most variance explained Axis explaining second most variance PCA : P rincipal C omponent Analysis

Exercise: Hierarchical ClusteringSometimes the experiment structure is easier visualized when clustering the dataIn our example clustering reflects the experimental groups The arrays -29 -32 don‘t cluster by biological origin.

Exercise: Differential ExpressionNow we just distribute the files into groups (use the groups from the GEO webpage)

We can have a look at the contrast between pretreatments and/or conditions by control clicking on one and then dragging the mouse to generate an arrowFor more complex designs one can create meta-groups of contrasts to ask for differences of differences (Interaction terms in ANOVA) Exercise: Experimental Design

Complex designs: Exercise: Experimental Design Treated Mutant Untreated Mutant Treated Wild type Untreated Wild type !

Exercise: Experimental Design

Exercise: Differentially expressed genesROBIN normalized the arrays using Robust Multichip average (RMA). Other methods are available which you can reach via the Expert options Robin then uses the BioConductor package limma (linear models) This is very similar to ANOVA type analysesP-values are calculated using moderated tests These take into account expression of all probes P-values should be corrected for multiple testing (FDR)

Exercise: MA Plots againIn the resulting MA plots the average in the control group is compared to the average in the treatment group (M= treatment – control) Upregulated in treatment Downregulated in treatment Red circles surround significantly changed genes (with high variation AND concordance of replicates) A M

ConclusionsMicroarrays were the first platform to achieve Transcriptome CoverageThey have issues compared to RNASeq, but they are well established in diagnostics and control experimentsMore than 9000 samples for Arabidopsis Thaliana onlyIn brief: if you do Bioinformatics, you will have to work with them for years to come You have thousands of genes (probesets) measured in a single microarray experiment: interpreting the data is not trivial Next episode: Next Generation Sequencing

Final slide

Effetti trascrittomici del trattamento con CPVScomposizione della risposta di Arabidopsis thaliana alla somministrazione di Concentrato Proteico Vegetale Federico M. Giorgi fgiorgi@appliedgenomics.org

Design sperimentale Arabidopsis thaliana CPV YE Ora 0 Ora 4 Ora 12 Ora 24 Trattamenti H80 NHL AK F100 MC CSL Controllo Ctrl

Trattamenti - Clustering YE cluster Circadian clusters 4h 12h 0/24h CPV/MC cluster

Contrasti: trattamento vs. controllo Controllo Trattamento vs.   Detto anche LogFC («Log Fold Change») Geni indotti da CPV Geni repressi da CPV Geni inalterati Gene misurato (circa 22mila nel nostro microarray)

Tratttamento con CPV – Gruppi funzionali over-rappresentati Significantly over-representated Mapman ontological groups Bonferroni q<0.05 ( Mefisto tool) Lipid transfer proteins Seed storage/lipid transfer proteins Lipid transfer proteins Abscisic acid activation GDSL lipases Terpenoid metabolismUnknownStorage proteins Peroxiredoxins Peroxidases Biotic stress receptors DC1 domain-containing proteins Biotic stress receptors Misc signalling receptor kinases DC1 domain-containing proteins Peroxidases Peroxidases DC1 domain-containing proteins Metal binding, chelation and storage Biotic stress

La risposta a CPV è simile fra i tre time points

Ma in realtà – come spesso accade – il grosso del macchinario trascrizionale resta inalterato

La risposta a CPV è simile fra i tre time points

La risposta a CPV è simile fra i tre time points Trascritti indotti da CPV in tutti i time points Trascritti repressi da CPV 9 genes 3 LEA genes (chaperones and antifreeze, antidrought proteins, , regulated by abscisic acid - ABA - pathway) Hundertmark & Hincha, 2008 14 genes 3 peroxidases General stress response peroxidases. The same are repressed by 10 uM ABA treatment of seedlings (Kim et al., 2011)

Tempistiche di risposta trascrizionale a CPV Geni alterati (Log 2 FC>0.5) in almeno uno dei trattamenti CPV

Geni a risposta immediata, specifici della fase early Geni Early Induction Geni Early Repression Chaperonina Hsp20; invertase/pectin methylesterase inhibitor; metacaspasi 2; Unknown genes Transcription factors, laccases, several pectin degradation genes, a LEA protein

Geni specifici della fase late Geni Late Induction Geni Late Repression Cell wall rearrangement ABC transporter; peroxidases

Scomposizione dell’effetto di CPV Risposta a CPV v s. Tutti i trattamenti pubblicati e gli effetti noti in Arabidopsis thaliana Federico Antonietta Da ora useremo tutti e tre i time points come replicati

CPV Condizioni simili al CPV treatment Trattamento con CPV molto simile (linee ROSSE ) al trattamento con ABA (Goda et al., AtGenExpress consortium) , e alla risposta fisiologica della pianta allo stress Osmotico . Effetto opposto (linea BLU) al trattamento con Norflurazon e all’overespressore ARR22 ARR22 Participates into a His-Asp phosphorelay pathway. Transgenic lines overexpressing ARR22 (referred to as ARR22-ox) showed the characteristic dwarf phenotypes with poorly developed root systems. The results of Northern blot hybridization with selected sets of hormone-responsive genes suggested that cytokinin responses are selectively attenuated in ARR22-ox, while other hormone responses ( auxin , ABA and ethylene) occur normally ( Kiba et al., 2004)

Condizioni simili al CPV treatment Score Experiment Genotype Treatment / Tissue Control Experiment Experimental Category Spearman Correlation Between significant fold changes 0 CPV treatment Col-0 CPV / Whole plants Ctrl Perata Group 1.00   1 ABA 3 h Col-0 10 µM ABA / seedling Mock 3 h Basic hormone treatment of seedlings 0.77    2 ARR22-ox (t- zeatin 3 h) ARR22-ox 20 µM t- zeatin / seedling Col-0 (t- zeatin 3 h) Cytokinin treatment of seedlings -0.70   3 ABA 1 h Col-0 10 µM ABA / seedling Mock 1 h Basic hormone treatment of seedlings 0.65   4 Norflurazon Col-0 5 µM Norflurazon / seedling Col-0 GEO bulk -0.65   5 Osmotic Stress 24 h (root) Col-0 osmotic stress (300 mM mannitol ) / roots Stress mock 24 h (root) Stress treatments 0.63   6 UV Stress 24 h (shoot) Col-0 UV-B stress (15 min 1.8 W/m2 Philips TL40W/12; thereafter recovery) / shoots Stress mock 24 h (shoot) Stress treatments -0.60   7 Salt Stress 24 h (root) Col-0 salt stress (150 mM NaCl ) / roots Stress mock 24 h (root) Stress treatments 0.57    8 Red/FR Light45min Col-0 Red Light / seedling Light treatments 0.56   9 Fe starvation 24 h Col-0 Fe starvation / whole roots Control GEO bulk 0.56   10 Mutant cry1 cry1 Continuous white light / seedling Col-0 GEO bulk 0.55   AtCAST experiment similarity method (Sasaki et al., 2011) note: FARO tool gives similar results (we used AtCAST because it contains more experiments)

Analisi per pathway Risposta simile fra i time points CPV 4 ore

Analisi per pathway Risposta simile fra i time points CPV 12 ore

Analisi per pathway Risposta simile fra i time points CPV 24 ore

Analisi per pathway Induzione via metabolica ABA Degradazione Pectine della Parete Cellulare Glicosil idrolasi Xyloglucano idrolasi Polygalatturonasi Repressione Perossidasi Log2FC

Induzione sintesi antocianine Le antocianine sono da tempo note come metaboliti che si accumulano durante il cold stress , aiutando fra le altre cose nel diminuire la temperature di congelamento (Christie et al., 1994) Fattori di trascrizione coinvolti (CPV 24 ore) At2g43140 Fattori di trascrizione bHLH, putativi, risposta robusta nei tre time points Nessun paper specifico li descrive Possibili nuovi candidati genici di risposta trascrizionale At1g10585

Benché alcune risposte trascrizionali siano condivise, l’effetto di CPV è solo vagamente simile a quello da cold stress ( AtGenExpress consortium ) LEA proteins Probabilmente, la risposta a CPV si configura come una mild cold acclimation (ABA-dipendente)

Tutti i dati Note: the page is not indexed by google, but it is also not encrypted http://www.giorgilab.org/biostimolanti/cpv.html

Conclusioni CPV induce un rilevante incremento delle vie trascrizionali di risposta all’ ABA e allo stress osmotico, con livellamento delle piante ad un assetto «late» di preparazione allo stress stesso, probabilmente simile all’acclimazione. L’uso incrociato di 1) Monitoring degli effetti attraverso diversi time points 2) Localizzazione e assegnazione trascritti in pathways 3) Confronto fra i nostri esperimenti e database pubblicic i ha consentito di descrivere l’effetto di CPV nonostante la mancanza di replicati biologici per il trattamento, e la conseguente intrinseca «debolezza» statistica.Questo non solo consente di giustificare almeno in parte gli effetti temperatura-protettivi di CPV sulle piante, ma indirizza la ricerca verso nuovi potenziali detectors nella risposta vegetale a condizioni di stress, come ad esempio i fattori di trascrizione bHLH non ancora caratterizzati.

Overview of the courseTranscription and Transcriptomics Day 1 23/04/2012 Room β 3

Exercise – set up your ROpen the terminalwget http:// www.usadellab.org/cms/uploads/supplementary/trma/trma.tar.gz R CMD INSTALL trma.tar.gz Type R (plus ENTER) You’re now in the R tool for statistics Install the microarray analysis tools: setRepositories()install.packages(" affyPLM")install.packages("limma ")install.packages("plier ")And exit:q()Always type y to confirm This tells R where to find Bioinformatics tools (select «BioC software») This installs a library for Affymetrix microarray processing This installs a library for differential gene expression analysis This installs the optional normalization method

Related Contents


Next Show more