Imputation Algorithms for Data Mining: Categorization and N

Imputation Algorithms for Data Mining: Categorization and N Imputation Algorithms for Data Mining: Categorization and N - Start

2016-11-04 57K 57 0 0

Description

Aleksandar. R. . Mihajlovic. Technische. . Uni. versität München. mihajlovic@mytum.de. +49 176 673 41387. +381 63 183 0081. 1. Overview . Explain input data based imputation algorithm categorization scheme. ID: 484610 Download Presentation

Embed code:
Download Presentation

Imputation Algorithms for Data Mining: Categorization and N




Download Presentation - The PPT/PDF document "Imputation Algorithms for Data Mining: C..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.



Presentations text content in Imputation Algorithms for Data Mining: Categorization and N

Slide1

Imputation Algorithms for Data Mining: Categorization and New Ideas

Aleksandar R. MihajlovicTechnische Universität Münchenmihajlovic@mytum.de+49 176 673 41387+381 63 183 0081

1

Slide2

Overview

Explain input data based imputation algorithm categorization scheme

Introduce a new categorization scheme of imputation algorithms

Introduce some new ideas for re-categorization and improvement of existing algorithms and creation of new ones

Slide3

Digitization of Microarray Data and the Missing Value Problem

Missing SNPs in individual DNAThese missing values statistically blur SNP allele association with the disease gene allele

3

Slide4

Earlier Input Data Based Classification of Imputation Algorithms [2]

Categorized according to the input data

4

Slide5

Global Approach

5

Slide6

Local Approach

6

Slide7

Hybrid Approach

+

7

Slide8

Knowledge Based Approach

8

Slide9

Earlier Input Data Based Classification of Imputation Algorithms

Classification example: Imputation Algorithms (briefly describe each)GlobalSVDImpute LocalKNNimputeHybridLinCmbKnowledgeGOimpute

9

Slide10

Ideas for Algorithmic Improvement [3]

Ideas for new categorization model of algorithms based on the methods they use.Link between the method used and the input data Room for subcategories based on methods Revising the categorization modelMendeleyevizationHybridizationTransdisciplinarizationRetrajectorization

10

Slide11

Mendeleyevization

CatalystProbability based algorithmsEM: expectation maximization algorithms have not been classified AcceleratorAlgebraic based algorithmsWith more memory and better processing power we can increase the number of subjects to be examined. This would improve the precision of Principle Component Analysis algorithms such as BPCA and Single Value Decomposition SVDimpute.

11

Slide12

Mendeleyevizaiton

Imputation Algorithms

Global

Probability Based

Algebra Based

12

Slide13

Hybridization

SymbiosisNN Based and Regression Based: The Local based algorithms can be classified as both symbiotic and synergic. The difference being the varying data types available for the imputation process. Based on the data set, the proper algorithm from statistical closeness category can be selected.SynergyStatistical Closeness: Both Nearest Neighbor based and Regression based algorithms can be made to work together, they are not too computationally expensive and can thus be used. It can be assumed that Regression based algorithms can be used to correct NN based algorithms by using the regression based result in an average of the two results.

13

Slide14

Hybridization

Imputation Algorithms

Local

NN Based

Regression Based

Statistical Closeness

14

Slide15

Transdisciplinarization

ModificationModified NN: Modify KNN to include additional parametersCompare large K to small K or find the average of all plausible K vlauesUse different number of flanking markersAverage out all possible outcomesMutationModified probabilityCompare probabilites of flanking markers in sequence of i’th subject j’th SNP allele with the rest. The value along with sequence with the highest probability wins.

15

Slide16

Transdisciplinarization (1)

Imputation Algorithms

Local

NN Based

Regression Based

Statistical Closeness

Modified NN

16

Slide17

Transdisciplinarization (2)

Imputation Algorithms

Global

Probability Based

Algebra Based

Modified Probability

17

Slide18

Retrajectorization

ReparametrizationProteome Based and Gene Based AlgorithmsHow protein/aminoacid/codon databases can be utilized in gene imputation is being researchedRegranularizationProcess Based: Data Set PartitioningChecking if there is Linkage Disequilibrium between the i’th subject with missing values and other sets of diseased patients. Sets are organized by the geographic origin of the subjectsFind the frequencies of the j’th SNP alleles (missing SNP allele under scrutiny in one subject) in the other setsIf LD exists between other set and subject then take allele into account if not then don’t

18

Slide19

Retrajectorization

Imputation Algorithms

Knowledge

Gene Based

Proteome Based

Process Based

19

Slide20

The Whole Categorization Tree

Imputation Algorithms

Knowledge

Global

Hybrid

Local

Probability based

Algebra Based

Regression based

NN based

Statistical Closeness

Process Based

Gene Based

Proteome Based

Modified Probability

Modified NN

20

Slide21

References

[1] Frey M., Gierl A., De Angelis, Beckers J., Kieser A., Genomics Lecture; Fakultät für Biowissenschaft, TUM, Weihenstephan, Freising bei München; Winter Semester 2011[2] Liew A.W., Law N., Yan H., Missing value imputation for gene expression data: computational techniques to recover missing data from available information, Briefings in Bioinformatics, December 14, 2010, pp.3[3] Milutinovic V., Korolija N., A Short Course for PhD Students in Science and Engineering: How to Write Papers for JCR Journals

21

Slide22

Questions

Aleksandar R. MihajlovicTechnisceh Universität Münchenmihajlovic@mytum.de+49 176 673 41387+381 63 183 0081

The End

22

Slide23


About DocSlides
DocSlides allows users to easily upload and share presentations, PDF documents, and images.Share your documents with the world , watch,share and upload any time you want. How can you benefit from using DocSlides? DocSlides consists documents from individuals and organizations on topics ranging from technology and business to travel, health, and education. Find and search for what interests you, and learn from people and more. You can also download DocSlides to read or reference later.