Imputation Algorithms for Data Mining: Categorization and N
Presentations text content in Imputation Algorithms for Data Mining: Categorization and N
Imputation Algorithms for Data Mining: Categorization and New Ideas
Aleksandar R. MihajlovicTechnische Universität Münchenmihajlovic@mytum.de+49 176 673 41387+381 63 183 0081
Explain input data based imputation algorithm categorization scheme
Introduce a new categorization scheme of imputation algorithms
Introduce some new ideas for re-categorization and improvement of existing algorithms and creation of new onesSlide3
Digitization of Microarray Data and the Missing Value Problem
Missing SNPs in individual DNAThese missing values statistically blur SNP allele association with the disease gene allele
Earlier Input Data Based Classification of Imputation Algorithms 
Categorized according to the input data
Knowledge Based Approach
Earlier Input Data Based Classification of Imputation Algorithms
Classification example: Imputation Algorithms (briefly describe each)GlobalSVDImpute LocalKNNimputeHybridLinCmbKnowledgeGOimpute
Ideas for Algorithmic Improvement 
Ideas for new categorization model of algorithms based on the methods they use.Link between the method used and the input data Room for subcategories based on methods Revising the categorization modelMendeleyevizationHybridizationTransdisciplinarizationRetrajectorization
CatalystProbability based algorithmsEM: expectation maximization algorithms have not been classified AcceleratorAlgebraic based algorithmsWith more memory and better processing power we can increase the number of subjects to be examined. This would improve the precision of Principle Component Analysis algorithms such as BPCA and Single Value Decomposition SVDimpute.
SymbiosisNN Based and Regression Based: The Local based algorithms can be classified as both symbiotic and synergic. The difference being the varying data types available for the imputation process. Based on the data set, the proper algorithm from statistical closeness category can be selected.SynergyStatistical Closeness: Both Nearest Neighbor based and Regression based algorithms can be made to work together, they are not too computationally expensive and can thus be used. It can be assumed that Regression based algorithms can be used to correct NN based algorithms by using the regression based result in an average of the two results.
ModificationModified NN: Modify KNN to include additional parametersCompare large K to small K or find the average of all plausible K vlauesUse different number of flanking markersAverage out all possible outcomesMutationModified probabilityCompare probabilites of flanking markers in sequence of i’th subject j’th SNP allele with the rest. The value along with sequence with the highest probability wins.
ReparametrizationProteome Based and Gene Based AlgorithmsHow protein/aminoacid/codon databases can be utilized in gene imputation is being researchedRegranularizationProcess Based: Data Set PartitioningChecking if there is Linkage Disequilibrium between the i’th subject with missing values and other sets of diseased patients. Sets are organized by the geographic origin of the subjectsFind the frequencies of the j’th SNP alleles (missing SNP allele under scrutiny in one subject) in the other setsIf LD exists between other set and subject then take allele into account if not then don’t
The Whole Categorization Tree
 Frey M., Gierl A., De Angelis, Beckers J., Kieser A., Genomics Lecture; Fakultät für Biowissenschaft, TUM, Weihenstephan, Freising bei München; Winter Semester 2011 Liew A.W., Law N., Yan H., Missing value imputation for gene expression data: computational techniques to recover missing data from available information, Briefings in Bioinformatics, December 14, 2010, pp.3 Milutinovic V., Korolija N., A Short Course for PhD Students in Science and Engineering: How to Write Papers for JCR Journals
Aleksandar R. MihajlovicTechnisceh Universität Münchenmihajlovic@mytum.de+49 176 673 41387+381 63 183 0081