Ing. Daniel Vašata, Ph.D.

+420224359888
daniel.vasata@fit.cvut.cz
TH:A-1431

Theses

Sample theses

Bachelor theses

Web application supporting timetabling for part-time students

Author

Jiří Hanuš

Year

2013

Type

Bachelor thesis

Supervisor

Ing. Daniel Vašata, Ph.D.

Reviewers

Ing. Karel Klouda, Ph.D.

Department

Department of Software Engineering

Web application for ordinal encoding of string variables in data files

Author

Miroslav Duka

Year

2014

Type

Bachelor thesis

Supervisor

Ing. Daniel Vašata, Ph.D.

Reviewers

Ing. Daniel Dombek, Ph.D.

Department

Department of Software Engineering

Web demonstration of basic statistical calculations based on mathematical software R and SAGE

Author

Jana Ernekerová

Year

2015

Type

Bachelor thesis

Supervisor

Ing. Daniel Vašata, Ph.D.

Reviewers

Mgr. Rudolf Bohumil Blažek, Ph.D.

Department

Department of Software Engineering

Summary

This thesis deals with options of the integration open source calculating al- gebraic systems R and Sage into the web application. The connection of R software into the web application was done using API provided by OpenCPU project, the connection of Sage was done with Sage Cell Server service. Both selected algebraic systems were successfully used in the web application built on PHP language. The result is simple web application for basic statistical cal- culations. The main contribution of this thesis is the analysis of the possibility of using software R and Sage in the web applications and their comparison in terms of ease of integration, effectivity and practical applicability.

Thesis on DSpace

Media articles tracking and evaluation

Author

Peter Kanoš

Year

2018

Type

Bachelor thesis

Supervisor

Ing. Daniel Vašata, Ph.D.

Reviewers

Mgr. Jan Starý, Ph.D.

Department

Department of Applied Mathematics

Summary

This thesis deals with implementation of application for collection of the articles and their version in the time from czech news servers iDnes.cz and Aktuality.cz. The analysis is subsequently done by Doc2Vec. Analysis of these articles is focused on changes during the time and comparison of similarities between their sections. The changes refer to titles of the articles, perexes of the articles,text of the articles. Examined were mainly relations between differents factors such as time of publication of the article, article's main issues etc. The result of the thesis is an application written in the Python language.

Thesis on DSpace

Discussion comments analysis on Czech news websites

Author

Martin Vastl

Year

2019

Type

Bachelor thesis

Supervisor

Ing. Daniel Vašata, Ph.D.

Reviewers

Ing. Karel Klouda, Ph.D.

Department

Department of Applied Mathematics

Summary

This thesis is focused on the possibilities of using natural language processing methods to analyze comments on the news portal. The main goal is to compare the ability of BERT, Doc2vec, and Doc2vec with pretrained word vectors from BERT to examine the relevance between the comments and the content of an article from a news portal. Another goal is to use the text vector representations to detect anomalies via the Local outlier factor method. It was found by experiments, that the best model for text representation is BERT and that the pretrained word vectors have no positive impact on results in comparison of Doc2vec without pretrained vectors. Moreover, the Local outlier factor can detect anomaly comments and users when using vectors from BERT in contrast to Doc2vec text representations which are not good enough for anomaly detection and therefore often returns incorrect results.

Thesis on DSpace

Product review sentiment analysis in the Czech language

Author

Lukáš Langr

Year

2019

Type

Bachelor thesis

Supervisor

Ing. Daniel Vašata, Ph.D.

Reviewers

Mgr. Petr Novák, Ph.D.

Department

Department of Applied Mathematics

Summary

This thesis provides a closer look at the state of the art methods of representing documents for sentiment analysis tasks. As many of the recent articles only focus on either the English or the Chinese language, this thesis provides a unique evaluation of those methods from the perspective of the Czech language. We use various representations on reviews in the Czech language and perform a multiclass sentiment classification via machine learning models. Our achieved accuracy supersedes expectations and similar research articles using the same dataset in the Czech field. We believe this thesis will be a base upon which more extensive research of the possibilities of these representations will be conducted.

Thesis on DSpace

Unsupervised machine translation between Czech and German language

Author

Ivana Kvapilíková

Year

2020

Type

Bachelor thesis

Supervisor

Ing. Daniel Vašata, Ph.D.

Reviewers

Ing. Karel Klouda, Ph.D.

Department

Department of Applied Mathematics

Summary

Recent research has shown that it is possible to design a model that learns to translate entirely from monolingual texts. Even though the translation quality still lags behind the state-of-the art models trained on texts translated by humans, this line of research opens new doors for low-resource language pairs. This thesis provides an overview of unsupervised techniques for machine translation applicable in low-resource conditions. We apply the most promising approaches and compare their performance on the Czech-German language pair. Since the proposed methods depend on vector representations of words in a cross-lingual space, we experiment with these representations to show how much language-neutral information they carry.

Thesis on DSpace

Analysis of discussion comments and their authors in social media

Author

Martin Koucký

Year

2020

Type

Bachelor thesis

Supervisor

Ing. Daniel Vašata, Ph.D.

Reviewers

Ing. Karel Klouda, Ph.D.

Department

Department of Applied Mathematics

Summary

It is possible that among social media users there exist unknown clusters of users and anomalous users. This work explores that possibility by analyzing users represented by their comments. We find suitable sources of data on social media sites and download them. Then, we propose vector representations of users based on their comments. Finally, we try to explain the clusters of users and anomalous users using various attributes on social media sites and with manual analysis. Our results didn't prove the existence of clusters or anomalies among social media users, because there wasn't a clear distinction between normal and anomalous users or users of different clusters. This may have been caused by insufficient methods of representing users or manual analysis. But it may also mean, that there are no such clusters of users or anomalies commenting in a similar way to be found.

Thesis on DSpace

Analysis and prediction of blood glucose dynamics using Machine learning techniques

Author

Ladislav Floriš

Year

2022

Type

Bachelor thesis

Supervisor

Ing. Daniel Vašata, Ph.D.

Reviewers

Ing. Ivo Petr, Ph.D.

Department

Department of Applied Mathematics

Summary

This bachelor thesis tries to address the problem of predicting blood glucose (BG) levels of type 1 diabetes (T1D) patients. In our work, we first analyze BG dynamics and then research and evaluate suitable models for its prediction. We focused on models based on artificial neural networks, and support vector machines. These models were experimentally evaluated on 30-minute, 1-hour, and 2-hours prediction horizons. The data used in this thesis was collected by one patient for 128 days in free-living conditions and contains BG levels, insulin doses, carbohydrate intake, and physical activity. Model performance was assessed using Root Mean Square Error (RMSE). Clarke error grid analysis was used to measure clinical accuracy. The best RMSE achieved was 17,06 mg/dl, 24,32 mg/dl, and 27,11 mg/dl respectively for 30-minute, 1-hour, and 2-hours prediction horizons. Our results show that it is possible to develop models for BG prediction which perform well in free-living conditions. Unlike most of the other papers in the academic literature on BG prediction, we used a longer dataset containing over 4 months' worth of data for a single patient. Lastly, we made this dataset publicly available for further research in this area.

Thesis on DSpace

Normalization and smoothing of RSSI values of Bluetooth connection

Author

Filip Špaček

Year

2023

Type

Bachelor thesis

Supervisor

Ing. Daniel Vašata, Ph.D.

Reviewers

Ing. Magda Friedjungová, Ph.D.

Department

Department of Applied Mathematics

Summary

This thesis explores the effects of RSSI time series normalization and smoothing. It implements several different methods such as exponential smoothing, moving average smoothing, and Savitzky-Golay smoothing. It also proposes a normalization technique for compensating differences between RSSI values of distinct packet types. Proposed methods were tested on the existing approach detection model and results were compared.

Thesis on DSpace

Semantic Textual Similarity in Czech

Author

Jiří Bednář

Year

2023

Type

Bachelor thesis

Supervisor

Ing. Daniel Vašata, Ph.D.

Reviewers

Mgr. Petr Novák, Ph.D.

Department

Department of Applied Mathematics

Summary

Recent significant advancements in semantic textual similarity (STS) have primarily been driven by the availability of annotated data for English, a luxury that Czech and other low-resource languages often lack. In this thesis, we investigate the challenges and potential improvements in solving the STS problem for the Czech language. Our research explores advancements in neural networks, including the Transformer architecture and pre-trained language models such as BERT, RoBERTa, and ELECTRA. We provide an extensive study of techniques and models for STS, as well as methods for generating sentence embeddings. Additionally, we discuss Cross-encoder and Bi-encoder architectures, along with advanced training methods like SimCSE, TSDAE, Trans-Encoder, and Multilingual distillation. We present our STS models trained using these techniques and evaluate their performance on STS and two downstream tasks. Through our analysis, we highlight our best STS model, which sets multiple state-of-the-art results, demonstrating the potential for future advancements in STS for low-resource languages.

Thesis on DSpace

Automated exploratory data analysis for binary classification using pandas profiling library

Author

Jan Čáp

Year

2023

Type

Bachelor thesis

Supervisor

Ing. Daniel Vašata, Ph.D.

Reviewers

Ing. Magda Friedjungová, Ph.D.

Department

Department of Applied Mathematics

Summary

This work deals with automatic data exploration with binary classification. A search of already existing solutions for automatic data exploration is performed. Furthermore, statistical tests and~methods suitable for testing the dependence of two variables are investigated. Suitable options for~data distribution visualizations are also explored. In the next section, an extension to~the~\textit{Pandas Profiling} library selected in the search is proposed. The extension specializes in~binary classification. The extension includes graphs and statistics representing the dependency of~columns on the target variable, visualization of the dependency of missing values on~the~target variable, proposed column transformations and training of the default model for target variable classification. Based on the design, an extension to the \textit{Pandas Profiling} library was implemented to speed up data exploration with binary classification.

Thesis on DSpace

Using Monte Carlo Tree Search to play chess

Author

Jakub Král

Year

2024

Type

Bachelor thesis

Supervisor

Ing. Daniel Vašata, Ph.D.

Reviewers

prof. RNDr. Pavel Surynek, Ph.D.

Department

Department of Applied Mathematics

Summary

This thesis deals with the use of the Monte Carlo tree search algorithm and its combination with neural networks and deep reinforcement learning to play chess. The theoretical part of this thesis acquaints the reader with the methods and algorithms of reinforcement learning. In the practical part a model was created such that would train and then play on a standard personal computer. This is solved by using convolutional neural networks, initial supervised learning and then reinforcement learning via self-play. A model that fulfills these requirements was created and runs, but the model plays on a level much lower than was aimed for at the beginning of this work.

Thesis on DSpace

Predicting Aptamer Binding Strength in In Vitro Sequence Selection Using Deep Neural Networks

Author

Linda Beková

Year

2024

Type

Bachelor thesis

Supervisor

Ing. Daniel Vašata, Ph.D.

Reviewers

Mgr. Petr Šimánek

Department

Department of Applied Mathematics

Summary

This thesis addresses the problem of processing SELEX experiments using deep learning. The work includes employing a Feed-Forward Neural Network, a Convolutional Neural Network, a Bidirectional Long Short-Term Memory, and a Random Forest using the Python programming language and comparing their ability to predict the results of SELEX experiments. The thesis expands on previous research on aptamers' binding ability using Restricted Boltzmann Machines and offers multiple approaches to handling this problem. The selected models' predictions achieved a high accuracy on a dataset presented in previous research. When tested on additionally generated data, the models had difficulty differentiating between binders and non-binders and, therefore, were concluded as insufficient for use in the medical field. The results of individual models and approaches are compared. Of all the algorithms, the best performance showed the Restricted Boltzmann Machines followed by Random Forests.

Thesis on DSpace

Master theses

Learning methods for continuous-time hidden Markov models

Author

Lukáš Lopatovský

Year

2017

Type

Master thesis

Supervisor

Ing. Daniel Vašata, Ph.D.

Reviewers

Ing. Tomáš Šabata

Department

Department of Theoretical Computer Science

Summary

The continuous-time hidden Markov model is promising not only for the biomedical research. The lack of efficient learning algorithms has limited its use in the past. However, recently the new efficient EM approaches were presented. In this thesis we are examining and comparing current state-of-the-art methods that are able to train models containing hundreds of hidden states. As the part of the work we have developed the general purpose continuous-time and discrete-time hidden Markov model library effectively implementing the best performing learning methods that is easy to use and available for everyone under open-source license.

Thesis on DSpace

Suspicion of corruption rating of contracts published in the government contracts registry

Author

Jan Staněk

Year

2018

Type

Master thesis

Supervisor

Ing. Daniel Vašata, Ph.D.

Reviewers

Ing. Marek Sušický

Department

Department of Applied Mathematics

Summary

This master's thesis describes the design of metrics for identification suspicious contracts published in the register of contracts. It describes public data sources suitable for supplement data from the register of contracts, data integration and feature selection for anomaly detection. Designed metrics simplifies selection of contracts suitable for manual review.

Thesis on DSpace

Curriculum Learning of Neural Networks

Author

Gary Fibiger

Year

2020

Type

Master thesis

Supervisor

Ing. Daniel Vašata, Ph.D.

Reviewers

Ing. Magda Friedjungová

Department

Department of Applied Mathematics

Summary

Artificial neural networks are usually trained by observing samples from a training set in a random order. This approach is similar to biological organisms, but their learning process is hardly ever random. Human supervised learning utilizes a curriculum that leads the learning process. Many approaches were proposed to introduce a curriculum to artificial neural networks training in recent years. This thesis provides an overview of those approaches. Many of the approaches were implemented and experimentally evaluated. The results show that different approaches are favorable under different circumstances.

Thesis on DSpace

Sentiment Analysis using Domain Specific Adapters

Author

Lukáš Langr

Year

2022

Type

Master thesis

Supervisor

Ing. Daniel Vašata, Ph.D.

Reviewers

doc. Ing. Štěpán Starosta, Ph.D.

Department

Department of Applied Mathematics

Summary

Natural language processing has become a domain of large pre-trained models requiring a great deal of computing power to adjust to a custom task. In this work a different transfer learning method of domain specific adapters is proposed for the task of sentiment analysis. The adapted models are compared to a fine-tuning baseline in multiple experimental scenarios and their performance is comparable to considerably larger models while being much less computationally intensive. This approach looks to be a viable alternative to large models in lower computing power environments.

Thesis on DSpace

Recurrent Memory Models with Optimal Polynomial Projections

Author

Ondřej Naňka

Year

2021

Type

Master thesis

Supervisor

Ing. Daniel Vašata, Ph.D.

Reviewers

Ing. Karel Klouda, Ph.D.

Department

Department of Applied Mathematics

Summary

The aim of this thesis is to research the practical usability of high-order polynomial projection operators for compression of signals by projection onto polynomial bases for implementation of recurrent neural networks. Experiments in the field of sound classification and natural language processing are performed using Tensorflow framework and also as a spiking neural network using a simulator NengoDL.

Thesis on DSpace

The use of Relative Goodness of Fit Tests for training Generative Adversarial Networks

Author

Martin Scheubrein

Year

2021

Type

Master thesis

Supervisor

Ing. Daniel Vašata, Ph.D.

Reviewers

Ing. Magda Friedjungová, Ph.D.

Department

Department of Applied Mathematics

Summary

Generative adversarial networks (GAN) are a class of deep learning methods which are usually applied to images or other high-dimensional data. With such data, it is difficult to decide if the distribution learnt by a model matches the distribution of source data, or to locate the differences. To measure those discrepancies, maximum mean discrepancy (MMD) or unnormalized mean embedding (UME) measures may be used. This thesis verifies that with proper parametrization, both measures reliably detect both global and local discrepancies in image data. Choice of kernel, its parameters, and in the case of UME the selection of test locations, are studied in detail. Interpretability of optimized test locations in the context of local difference discovery is verified. Finally, a novel method of early stopping based on MMD and UME measured between the network's output and testing data is proposed.

Thesis on DSpace

Deep Reinforcement Learning for Super Mario Bros

Author

Ondřej Schejbal

Year

2022

Type

Master thesis

Supervisor

Ing. Daniel Vašata, Ph.D.

Reviewers

Mgr. Petr Novák, Ph.D.

Department

Department of Applied Mathematics

Summary

Within this master's thesis, a fine-tuned reinforcement learning model capable of preparing an intelligent agent able to play the Super Mario Bros. game has been created. Its architecture is based on conducted research on current state-of-the-art reinforcement learning techniques where the most relevant models for this type of task have been compared between each other. In order to compare the models, research and description of tools that allow the model to interact with the game had been done. Based on the comparison results, the most suitable approach was selected. Experiments with applying various modifications to the selected model have been done in order to find the most suitable modifications for the Super Mario Bros. game. The fine-tuned model has been used to train an intelligent agent, whose performances were tested on the level he was trained on and also on two levels that he had never seen before. The agent's performances were really good and showed nice behavioral patterns, mainly on the level he was trained on, as his performance on the unseen levels was understandably worse.

Thesis on DSpace

Improving blood glucose level prediction models

Author

Ladislav Floriš

Year

2024

Type

Master thesis

Supervisor

Ing. Daniel Vašata, Ph.D.

Reviewers

doc. Ing. Štěpán Starosta, Ph.D.

Department

Department of Applied Mathematics

Summary

This work addresses the task of predicting blood glucose levels in patients with type 1 diabetes. Models based on Transformer architecture and Legendre Memory Units (LMU) were explored. The application of LMUs in this work represents their first use for blood glucose level prediction. Employing multivariate time series, predictions are made with 30-minute and 60-minute horizons. Models were trained and evaluated using the OhioT1DM dataset, which includes eight weeks of data from 12 distinct patients. The dataset consists of two editions, released in 2018 and 2020. Performance was measured using Root Mean Square Error (RMSE), and Clarke Error Grid Analysis was utilized to evaluate clinical accuracy. LMUs achieved an RMSE of 18.17 mg/dl for the 30-minute horizon and 30.33 mg/dl for the 60-minute horizon, in the 2018 edition. In the 2020 edition, the RMSEs were 18.56 mg/dl and 32.57 mg/dl for the 30-minute and 60-minute horizons, respectively. LMUs were proven to match and, in smaller datasets (2018 edition of OhioT1DM), even outperform the state-of-the-art models.

Thesis on DSpace