# Imputation Algorithms for Data Mining: Categorization and N

### Presentations text content in Imputation Algorithms for Data Mining: Categorization and N

Imputation Algorithms for Data Mining: Categorization and New Ideas

Aleksandar R. MihajlovicTechnische Universität Münchenmihajlovic@mytum.de+49 176 673 41387+381 63 183 0081

1

Slide2Overview

Explain input data based imputation algorithm categorization scheme

Introduce a new categorization scheme of imputation algorithms

Introduce some new ideas for re-categorization and improvement of existing algorithms and creation of new ones

Slide3Digitization of Microarray Data and the Missing Value Problem

Missing SNPs in individual DNAThese missing values statistically blur SNP allele association with the disease gene allele

3

Slide4Earlier Input Data Based Classification of Imputation Algorithms [2]

Categorized according to the input data

4

Slide5Global Approach

5

Slide6Local Approach

6

Slide7Hybrid Approach

+

7

Slide8Knowledge Based Approach

8

Slide9Earlier Input Data Based Classification of Imputation Algorithms

Classification example: Imputation Algorithms (briefly describe each)GlobalSVDImpute LocalKNNimputeHybridLinCmbKnowledgeGOimpute

9

Slide10Ideas for Algorithmic Improvement [3]

Ideas for new categorization model of algorithms based on the methods they use.Link between the method used and the input data Room for subcategories based on methods Revising the categorization modelMendeleyevizationHybridizationTransdisciplinarizationRetrajectorization

10

Slide11Mendeleyevization

CatalystProbability based algorithmsEM: expectation maximization algorithms have not been classified AcceleratorAlgebraic based algorithmsWith more memory and better processing power we can increase the number of subjects to be examined. This would improve the precision of Principle Component Analysis algorithms such as BPCA and Single Value Decomposition SVDimpute.

11

Slide12Mendeleyevizaiton

Imputation Algorithms

Global

Probability Based

Algebra Based

12

Slide13Hybridization

SymbiosisNN Based and Regression Based: The Local based algorithms can be classified as both symbiotic and synergic. The difference being the varying data types available for the imputation process. Based on the data set, the proper algorithm from statistical closeness category can be selected.SynergyStatistical Closeness: Both Nearest Neighbor based and Regression based algorithms can be made to work together, they are not too computationally expensive and can thus be used. It can be assumed that Regression based algorithms can be used to correct NN based algorithms by using the regression based result in an average of the two results.

13

Slide14Hybridization

Imputation Algorithms

Local

NN Based

Regression Based

Statistical Closeness

14

Slide15Transdisciplinarization

ModificationModified NN: Modify KNN to include additional parametersCompare large K to small K or find the average of all plausible K vlauesUse different number of flanking markersAverage out all possible outcomesMutationModified probabilityCompare probabilites of flanking markers in sequence of i’th subject j’th SNP allele with the rest. The value along with sequence with the highest probability wins.

15

Slide16Transdisciplinarization (1)

Imputation Algorithms

Local

NN Based

Regression Based

Statistical Closeness

Modified NN

16

Slide17Transdisciplinarization (2)

Imputation Algorithms

Global

Probability Based

Algebra Based

Modified Probability

17

Slide18Retrajectorization

ReparametrizationProteome Based and Gene Based AlgorithmsHow protein/aminoacid/codon databases can be utilized in gene imputation is being researchedRegranularizationProcess Based: Data Set PartitioningChecking if there is Linkage Disequilibrium between the i’th subject with missing values and other sets of diseased patients. Sets are organized by the geographic origin of the subjectsFind the frequencies of the j’th SNP alleles (missing SNP allele under scrutiny in one subject) in the other setsIf LD exists between other set and subject then take allele into account if not then don’t

18

Slide19Retrajectorization

Imputation Algorithms

Knowledge

Gene Based

Proteome Based

Process Based

19

Slide20The Whole Categorization Tree

Imputation Algorithms

Knowledge

Global

Hybrid

Local

Probability based

Algebra Based

Regression based

NN based

Statistical Closeness

Process Based

Gene Based

Proteome Based

Modified Probability

Modified NN

20

Slide21References

[1] Frey M., Gierl A., De Angelis, Beckers J., Kieser A., Genomics Lecture; Fakultät für Biowissenschaft, TUM, Weihenstephan, Freising bei München; Winter Semester 2011[2] Liew A.W., Law N., Yan H., Missing value imputation for gene expression data: computational techniques to recover missing data from available information, Briefings in Bioinformatics, December 14, 2010, pp.3[3] Milutinovic V., Korolija N., A Short Course for PhD Students in Science and Engineering: How to Write Papers for JCR Journals

21

Slide22Questions

Aleksandar R. MihajlovicTechnisceh Universität Münchenmihajlovic@mytum.de+49 176 673 41387+381 63 183 0081

The End

22

Slide23
## Imputation Algorithms for Data Mining: Categorization and N

Download Presentation - The PPT/PDF document "Imputation Algorithms for Data Mining: C..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.