Missing Features Reconstruction and Its Impact on Classification Accuracy

Rok
2019
Publikováno
Computational Science – ICCS 2019. Springer, Cham, 2019. p. 207-220. vol. 11538. ISBN 978-3-030-22744-9.
Typ
Stať ve sborníku
Anotace
In real-world applications, we can encounter situations when a well-trained model has to be used to predict from a damaged dataset. The damage caused by missing or corrupted values can be either on the level of individual instances or on the level of entire features. Both situations have a negative impact on the usability of the model on such a dataset. This paper focuses on the scenario where entire features are missing which can be understood as a specific case of transfer learning. Our aim is to experimentally research the influence of various imputation methods on the performance of several classification models. The imputation impact is researched on a combination of traditional methods such as k-NN, linear regression, and MICE compared to modern imputation methods such as multi-layer perceptron (MLP) and gradient boosted trees (XGBT). For linear regression, MLP, and XGBT we also propose two approaches to using them for multiple features imputation. The experiments were performed on both real world and artificial datasets with continuous features where different numbers of features, varying from one feature to 50%, were missing. The results show that MICE and linear regression are generally good imputers regardless of the conditions. On the other hand, the performance of MLP and XGBT is strongly dataset dependent. Their performance is the best in some cases, but more often they perform worse than MICE or linear regression.

An Overview of Transfer Learning Focused on Asymmetric Heterogeneous Approaches

Rok
2018
Publikováno
Data Management Technologies and Applications. Cham: Springer International Publishing, 2018. p. 3-26. vol. 814. ISSN 1865-0929. ISBN 978-3-319-94809-6.
Typ
Stať ve sborníku
Anotace
In practice we often encounter classification tasks. In order to solve these tasks, we need a sufficient amount of quality data for the construction of an accurate classification model. However, in some cases, the collection of quality data poses a demanding challenge in terms of time and finances. For example in the medical area, we encounter lack of data about patients. Transfer learning introduces the idea that a possible solution can be combining data from different domains represented by different feature spaces relating to the same task. We can also transfer knowledge from a different but related task that has been learned already. This overview focuses on the current progress in the novel area of asymmetric heterogeneous transfer learning. We discuss approaches and methods for solving these types of transfer learning tasks. Furthermore, we mention the most used metrics and the possibility of using metric or similarity learning.

Asymmetric Heterogeneous Transfer Learning: A Survey

Rok
2017
Publikováno
Proceedings of the 6th International Conference on Data Science, Technology and Applications. Porto: SciTePress - Science and Technology Publications, 2017. p. 17-27. vol. 1. ISBN 978-989-758-255-4.
Typ
Stať ve sborníku
Anotace
One of the main prerequisites in most machine learning and data mining tasks is that all available data originates from the same domain. In practice, we often can’t meet this requirement due to poor quality, unavailable data or missing data attributes (new task, e.g. cold-start problem). A possible solution can be the combination of data from different domains represented by different feature spaces, which relate to the same task. We can also transfer the knowledge from a different but related task that has been learned already. Such a solution is called transfer learning and it is very helpful in cases where collecting data is expensive, difficult or impossible. This overview focuses on the current progress in the new and unique area of transfer learning - asymmetric heterogeneous transfer learning. This type of transfer learning considers the same task solved using data from different feature spaces. Through suitable mappings between these different feature spaces we can get more data for solving data mining tasks. We discuss approaches and methods for solving this type of transfer learning tasks. Furthermore, we mention the most used metrics and the possibility of using metric or similarity learning.

Za obsah stránky zodpovídá: doc. Ing. Štěpán Starosta, Ph.D.