Ph.D. started in: 2017
Expected year of graduation: 2020
COINS consortium member: Norwegian University of Science and Technology
Supervised by: Sule Yildirim-Yayilgan
Research area: Cryptography
Project title: Intelligent techniques to improve data pre-processing
Project description: Pre-processing of large scale datasets in order to ensure data quality is a very important task in data mining. One of the serious threads to data quality is the missingness of data which will negatively affect the data quality by having significant effects in many real-life pattern classification scenarios especially when it leads to biased parameter estimates but also disqualify for analysis purposes. The process of replacing the missing data of observation based on other valid values of other variables is knows as imputation process. This process is becoming crucial for pattern classification because an inappropriate handling missing technique can affect the classification results. Different supervised and unsupervised learning methodscan be used for imputing the missing data such as decision tree (C5.4), clustering algorithm dealing with mixed attributes and artificial neural networks. Additionally some statistical imputation methods such as Mean\Mode, Hot-Deck have been applied emphasizing their limitations in large scale datasets compared to machine learning supervised and unsupervised methods. Which methods to apply best in order to preserve the overall quality of the data? Novel hybrid method will be introduced which will combine unsupervised learning methods with supervised ones for achieving a data imputation in terms of improving data quality.