Outline RNA ImmunoPrecipitation RIP NGS on RIP amp its alternatives Alternate splicing Transcription as a graph Distribution of tags in exons Pipeline on RIPseq dataset RNA ImmunoPrecipitation RIP ID: 391547
Download Presentation The PPT/PDF document "RIP – Transcript Expression Levels" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
RIP – Transcript Expression LevelsSlide2
Outline
RNA Immuno-Precipitation (RIP)
NGS on RIP & its alternatives
Alternate splicing
Transcription as a graph
Distribution of tags in exons
Pipeline on RIP-seq datasetSlide3
RNA Immuno-Precipitation (RIP)
Global identification of multiple RNA targets of
RNA-Binding Proteins
(RBPs
)
Identify
proteins associated with RNAs in RNP
complexes
Identify subsets of RNAs that are functionally-related and potentially co-regulatedSlide4
How is RIP performed?Slide5
Sequencing on RIP
RIP-Chip
Noisy
May miss out rare transcripts
RIP-RT-PCR
PCR introduces mutations
RIP tilting-arrays
Very expensive
Too sensitive to ‘transcriptional noise’Slide6
NGS on RIP
RIP-Seq
A more complete and unbiased assessment of the global population of RNAs associated with a RNP complex
Minimize sequencing bias and high backgrounds known to the previously-mentioned methodsSlide7
Alternate Splicing
A simple example
Regions with the numbers of reads
Exon1: chr1:13113087-13113138(5,1);
Exon2: chr1:13113270-13113299(2,0);
Exon3: chr1:13113312-13113343(3,0);
Splice reads
chr1,13113107,13113138,chr1,13113312,13113343,3.0;
chr1,13113087,13113116,chr1,13113270,13113299,2.0;
Exon1(5)
Exon2(2)
Exon3(3)
Exon_Num
(Tags)Slide8
Alternate Splicing
A less
ideal
example
Regions with the numbers of reads
Exon1: chr4:145149018-145149181(29,0);
Exon2: chr4:145149265-145149402(8,0);
Exon3: chr4:146893298-146895275(116,1);
Splice reads
chr4,145149059,145149088,chr4,146894246,146894276,3.0;
chr4,145149374,145149402,chr4,146894470,146894498,2.0;
Exon1(29)
Exon2(8)
Exon3(116)Slide9Slide10
Transcription as a Graph
From RNA-seq data, check the overlap of the tags
If a region has more than one tag, we call it an enriched region
Nodes
Using the splice reads, we will connect the enriched regions
EdgesSlide11
Transcription as a Graph
Represent transcriptome in a topologically sorted acyclic graph
Some Observed Errors (RME005)
Out-of-range edges in graphs
Self-looping nodes
Default action: Ignore themSlide12
Distribution of Tags in Exons
rQuant –
Courtesy of Regina
Bohnert
(FML, Tubingen)Slide13
RNA-seq RIP-seq
The previous results are from
RNA-seq
Will we have similar observations on RIP-seq datasets?
And possibly link the observations to transcription expression levels in transcriptomeSlide14
Pipeline on RIP-seq dataset
Dataset RME005 is used
Use TopHat / Eland to map RNA back to genome
Generate transcription-graphs for each transcript with alternate splicing
Express the paths of all transcriptions in the graph using a set of linear equations
Use R to solve the linear equationsSlide15
An example from RME005
There are two transcripts
Path1: Exon1 -> Exon2 -> Exon4
Path2: Exon1 -> Exon3 -> Exon4
Exon1 - Exon4 have length L1 - L4, and have reads with number N1 - N4
S1-S4 are the numbers of splice reads
Exon1
Exon2
Exon3
Exon4
N1
N4
N3
N2
S1
S2
S3
S4Slide16
Assumptions
The transcript expression levels are:
Path1: x1
Path2: x2
The read length = constant
The reads are uniformly sampled from the transcripts
Use density of reads instead of
read_coverage
Differentiate reads on both long & short exonsSlide17
Equations for linear programming
Objective function: minimize the sum of
d_i
Constraints
N1/L1 = x1 + x2 + d1 - d2
S1/R = x1 + d3 - d4
N2/L2 = x1 + d5 - d6
S2/R = x1 + d7 - d8
S3/R = x2 + d9 - d10
N3/L3 = x2 + d11 - d12
S4/R = x2 + d13 - d14
N4/L4 = x1 + x2 + d15 - d16
x1 , x2 >= 0
d_i
>= 0
The solution should be the values of x1, x2 and all d_i
N1
N4
N3
N2
S1
S2
S3
S4Slide18
Another problem
An implicit assumption on enriched regions in RME005
RIP is known to be ~10% efficient
Noise will overwhelm true RNP-targets
Should use total-RNA as control dataset
True-positive regions from RIP should be relatively enriched with tags than Slide19
Handling the assumption
Obtain RNA-seq from the same source of transcriptome
Directly compare both RNA-seq and RIP-seq data
RIP-chip discriminate enriched region with >4-fold than RNA-chip data
Maybe 4-fold is the magic number ?
Current tag distribution observed by Dr Li Guoliang
Non-uniform as opposed to what rQuant has observed on RNA-seqSlide20
Q&A