/
Challenges in  Single  C Challenges in  Single  C

Challenges in Single C - PowerPoint Presentation

eliza
eliza . @eliza
Follow
67 views
Uploaded On 2024-01-29

Challenges in Single C - PPT Presentation

ell Sequencing Data Analysis Ion M ă ndoiu Computer Science amp Engineering Department University of Connecticut ionengruconnedu Szulwach et al httpjournalsplosorgplosonearticleid101371journalpone0135007 ID: 1042336

cells cell idf 2017 cell cells 2017 idf based http imputation amp org pairs mixtures experimental biorxiv type scrna

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Challenges in Single C" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1. Challenges in Single Cell Sequencing Data AnalysisIon MăndoiuComputer Science & Engineering DepartmentUniversity of Connecticution@engr.uconn.edu

2. Szulwach et al. http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0135007ChallengesAllelic dropouts

3. Szulwach et al. http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0135007Allelic dropoutsChallenges

4. Szulwach et al. http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0135007Allelic dropoutsChallenges

5. ChallengesLow RT efficiency & sequencing depthHicks et al. 2015, http://biorxiv.org/content/early/2017/05/08/025528Hicks et al. 2015, http://biorxiv.org/content/early/2017/05/08/025528

6. ChallengesPCR amplification biasZiegenhain et al. 2017, Mol. Cel. 65(4), pp. 631–643.e4Ziegenhain et al. 2017, Mol. Cel. 65(4), pp. 631–643.e4

7. ChallengesCell “quality”Live/deadStress responseMultiplets

8. ChallengesMany more:Stochastic effectsCells captured in different cell cycle phasesTranscriptional bursting hard to distinguish from technical artifacts Cell capture biasCapture rates may not be representative of population frequenciesScalabilityMillion cell datasets…

9. Imputation for scRNA-Seq DataCD45- CD45+CD45 UMI count=0Can drop-outs be recovered by imputation?

10. Existing Imputation MethodsBISCUIT (Azizi et al., GCB 2017)CIDR (Lin, Troup, & Ho, Genome Biol. 2017)DRImpute (Kwak et al., bioRxiv 2017)LSImpute (Moussa & Mandoiu, ISBRA 2018)MAGIC (van Dijk et al. bioRxiv. 2017)netSmooth (Ronen & Akalin, F1000Res. 2018)scImpute (Li & Li, Nat. Comm. 2018)

11. LSImputeStep 1: Selecting a small number (m) of cell pairs with highest similarity (O(n) using locality Sensitive Hashing)Step 2. Group selected cells into clusters using spherical k-meansStep 3. For each cluster, replace zeros with median/mean expression of the gene within the clusterStep 4. Collapse selected cells into centroid clusters and repeat until highest pair similarity drops below a given thresholdhttp://cnv1.engr.uconn.edu:3838/LSImpute/ 

12. Toy example

13. Imputation Experimental Setup209 somatosensory neurons isolated from the mouse dorsal root ganglion (Li et al., Cell research 2016) 31.5M reads/cell 10,950 +/-1,218 genes/cell Read subsampling 50k-20M readsGround truth: TPM values determined by running IsoEM2 (Mandric et al., Bioinformatics 2017) on full set of reads 

14. Raw DataDrImputescImputeKNNImpute LSImputeMedLSImputeMeanGene Detection Fraction 100k 1M 10M

15. Imputation effect on ClusteringsKmeans, top TF-IDF

16. TF-IDF TransformationBorrowed from information retrievalProduct of two factors:Term frequency: How frequently a term occurs in a document?Inverse document frequency: How uncommon the term is in the document collection?For scRNA-Seq data:For gene i in cell j with count fij :If gene i is detected in ni out of N cells:TF-IDF score: 

17. TF-IDF Based Feature Selection

18. TF-IDF Based ClusteringCells QC, Genes QC, Gap-Statistics AnalysisData Transformation: Log2(x+1) or noneFeature Selection: PCA, tSNE, highly variable genes* or noneSeurat (K-means)*Seurat (SNN)*GMMK-meansSph. K-meansHC (E/P)Louvain (E)Data Transformation: TF-IDFFeature Selection: High avg. TFIDF score (Top) or Highly variable TF-IDF (Var)GMMK-meansSph. K-meansHC (E/P/C)Data Binarization:Cutoff threshold per cell based on cell avg. TF-IDF(Bin)HC (E/P/C/J)Greedy (E/P/C/J)Louvain (E/P/C/J)

19. Experimental Setup: 10x PBMCFACS sorted blood cells of 7 types [Zheng et al., Nat. Comm. 2017]7:1, 3:1, 1:1, 1:3, and 1:7 simulated mixtures of cell type pairs of varying dissimilarity (1000 cells/pair)7-way mixture, equal proportions (7000 cells/mix)All datasets available at http://cnv1.engr.uconn.edu:3838/SCA/

20. Experimental Setup: 10x PBMC

21. Experimental Setup: Pancreatic Cells2045 Pancreatic cells of 7 types [Segerstolpe et al. 2016]Annotated based on known markers (removed for clustering)Capture proportions: 185 acinar cells, 886 alpha cells, 270 beta cells, 197 gamma cells, 114 delta cells, 386 ductal cells, and 7 epsilon cells

22. Pairs: 1:1 mixtures

23. Pairs: 1:3/3:1 mixtures

24. Pairs: 1:7/7:1 mixtures

25. 7-Way PBMC Mixture

26. Pancreatic Cells

27. Joint analysis of bulk and scRNA-SeqNeeded to get unbiased population frequencies of cell typesPotential to identify cell types missed by capture protocols

28. Linear modelcell type 1cell type 2cell type 3gene 1gene 2gene 3gene 4gene 5gene 6heterogeneous mixture   Cell concentrationsCell signatures          

29.  ScxEstimation of mixture proportions

30. Simultaneous Estimation of Mixture Proportions and Missing Signature SCX

31. ConclusionsIs there a role for imputation?scRNA-Seq clustering based on TF-IDF yields promising results Ongoing work: Web-based workflow for analysis and interactive visualizationIntegration of cell-cycle phase predictionClustering based on protein/pathway activity

32. Acknowledgements