Bachelor theses
Web application supporting timetabling for part-time students
Author
Jiří Hanuš
Year
2013
Type
Bachelor thesis
Supervisor
Ing. Daniel Vašata, Ph.D.
Reviewers
Ing. Karel Klouda, Ph.D.
Department
Web application for ordinal encoding of string variables in data files
Author
Miroslav Duka
Year
2014
Type
Bachelor thesis
Supervisor
Ing. Daniel Vašata, Ph.D.
Reviewers
Ing. Daniel Dombek, Ph.D.
Department
Web demonstration of basic statistical calculations based on mathematical software R and SAGE
Author
Jana Ernekerová
Year
2015
Type
Bachelor thesis
Supervisor
Ing. Daniel Vašata, Ph.D.
Reviewers
Mgr. Rudolf Bohumil Blažek, Ph.D.
Department
Summary
This thesis deals with options of the integration open source calculating al- gebraic systems R and Sage into the web application. The connection of R software into the web application was done using API provided by OpenCPU project, the connection of Sage was done with Sage Cell Server service. Both selected algebraic systems were successfully used in the web application built on PHP language. The result is simple web application for basic statistical cal- culations. The main contribution of this thesis is the analysis of the possibility of using software R and Sage in the web applications and their comparison in terms of ease of integration, effectivity and practical applicability.
Media articles tracking and evaluation
Author
Peter Kanoš
Year
2018
Type
Bachelor thesis
Supervisor
Ing. Daniel Vašata, Ph.D.
Reviewers
Mgr. Jan Starý, Ph.D.
Department
Summary
This thesis deals with implementation of application for collection of the articles and their version in the time from czech news servers iDnes.cz and Aktuality.cz. The analysis is subsequently done by Doc2Vec. Analysis of these articles is focused on changes during the time and comparison of similarities between their sections. The changes refer to titles of the articles, perexes of the articles,text of the articles. Examined were mainly relations between differents factors such as time of publication of the article, article's main issues etc. The result of the thesis is an application written in the Python language.
Discussion comments analysis on Czech news websites
Author
Martin Vastl
Year
2019
Type
Bachelor thesis
Supervisor
Ing. Daniel Vašata, Ph.D.
Reviewers
Ing. Karel Klouda, Ph.D.
Department
Summary
This thesis is focused on the possibilities of using natural language processing methods to analyze comments on the news portal. The main goal is to compare the ability of BERT, Doc2vec, and Doc2vec with pretrained word vectors from BERT to examine the relevance between the comments and the content of an article from a news portal. Another goal is to use the text vector representations to detect anomalies via the Local outlier factor method.
It was found by experiments, that the best model for text representation is BERT and that the pretrained word vectors have no positive impact on results in comparison of Doc2vec without pretrained vectors. Moreover, the Local outlier factor can detect anomaly comments and users when using vectors from BERT in contrast to Doc2vec text representations which are not good enough for anomaly detection and therefore often returns incorrect results.
Product review sentiment analysis in the Czech language
Author
Lukáš Langr
Year
2019
Type
Bachelor thesis
Supervisor
Ing. Daniel Vašata, Ph.D.
Reviewers
Mgr. Petr Novák, Ph.D.
Department
Summary
This thesis provides a closer look at the state of the art methods of representing
documents for sentiment analysis tasks. As many of the recent articles
only focus on either the English or the Chinese language, this thesis provides
a unique evaluation of those methods from the perspective of the Czech language.
We use various representations on reviews in the Czech language and
perform a multiclass sentiment classification via machine learning models. Our
achieved accuracy supersedes expectations and similar research articles using
the same dataset in the Czech field. We believe this thesis will be a base upon
which more extensive research of the possibilities of these representations will
be conducted.
Unsupervised machine translation between Czech and German language
Author
Ivana Kvapilíková
Year
2020
Type
Bachelor thesis
Supervisor
Ing. Daniel Vašata, Ph.D.
Reviewers
Ing. Karel Klouda, Ph.D.
Department
Summary
Recent research has shown that it is possible to design a model that learns to translate entirely from monolingual texts. Even though the translation quality still lags behind the state-of-the art models trained on texts translated by humans, this line of research opens new doors for low-resource language pairs. This thesis provides an overview of unsupervised techniques for machine translation applicable in low-resource conditions. We apply the most promising approaches and compare their performance on the Czech-German language pair. Since the proposed methods depend on vector representations of words in a cross-lingual space, we experiment with these representations to show how much language-neutral information they carry.
Analysis of discussion comments and their authors in social media
Author
Martin Koucký
Year
2020
Type
Bachelor thesis
Supervisor
Ing. Daniel Vašata, Ph.D.
Reviewers
Ing. Karel Klouda, Ph.D.
Department
Summary
It is possible that among social media users there exist unknown clusters of users and anomalous users.
This work explores that possibility by analyzing users represented by their comments.
We find suitable sources of data on social media sites and download them. Then, we propose vector representations of users based on their comments.
Finally, we try to explain the clusters of users and anomalous users using various attributes on social media sites and with manual analysis.
Our results didn't prove the existence of clusters or anomalies among social media users, because there wasn't a clear distinction between normal and anomalous users or users of different clusters.
This may have been caused by insufficient methods of representing users or manual analysis. But it may also mean, that there are no such clusters of users or anomalies commenting in a similar way to be found.
Analysis and prediction of blood glucose dynamics using Machine learning techniques
Author
Ladislav Floriš
Year
2022
Type
Bachelor thesis
Supervisor
Ing. Daniel Vašata, Ph.D.
Reviewers
Ing. Ivo Petr, Ph.D.
Department
Summary
This bachelor thesis tries to address the problem of predicting blood glucose (BG) levels of type
1 diabetes (T1D) patients. In our work, we first analyze BG dynamics and then research and
evaluate suitable models for its prediction.
We focused on models based on artificial neural networks, and support vector machines. These
models were experimentally evaluated on 30-minute, 1-hour, and 2-hours prediction horizons.
The data used in this thesis was collected by one patient for 128 days in free-living conditions
and contains BG levels, insulin doses, carbohydrate intake, and physical activity.
Model performance was assessed using Root Mean Square Error (RMSE). Clarke error grid
analysis was used to measure clinical accuracy. The best RMSE achieved was 17,06 mg/dl, 24,32
mg/dl, and 27,11 mg/dl respectively for 30-minute, 1-hour, and 2-hours prediction horizons.
Our results show that it is possible to develop models for BG prediction which perform well
in free-living conditions. Unlike most of the other papers in the academic literature on BG
prediction, we used a longer dataset containing over 4 months' worth of data for a single patient.
Lastly, we made this dataset publicly available for further research in this area.
Normalization and smoothing of RSSI values of Bluetooth connection
Author
Filip Špaček
Year
2023
Type
Bachelor thesis
Supervisor
Ing. Daniel Vašata, Ph.D.
Reviewers
Ing. Magda Friedjungová, Ph.D.
Department
Summary
This thesis explores the effects of RSSI time series normalization and smoothing. It implements several different methods such as exponential smoothing, moving average smoothing, and Savitzky-Golay smoothing. It also proposes a normalization technique for compensating differences between RSSI values of distinct packet types. Proposed methods were tested on the existing approach detection model and results were compared.
Semantic Textual Similarity in Czech
Author
Jiří Bednář
Year
2023
Type
Bachelor thesis
Supervisor
Ing. Daniel Vašata, Ph.D.
Reviewers
Mgr. Petr Novák, Ph.D.
Department
Summary
Recent significant advancements in semantic textual similarity (STS) have primarily been driven by the availability of annotated data for English, a luxury that Czech and other low-resource languages often lack. In this thesis, we investigate the challenges and potential improvements in solving the STS problem for the Czech language. Our research explores advancements in neural networks, including the Transformer architecture and pre-trained language models such as BERT, RoBERTa, and ELECTRA. We provide an extensive study of techniques and models for STS, as well as methods for generating sentence embeddings. Additionally, we discuss Cross-encoder and Bi-encoder architectures, along with advanced training methods like SimCSE, TSDAE, Trans-Encoder, and Multilingual distillation. We present our STS models trained using these techniques and evaluate their performance on STS and two downstream tasks. Through our analysis, we highlight our best STS model, which sets multiple state-of-the-art results, demonstrating the potential for future advancements in STS for low-resource languages.
Automated exploratory data analysis for binary classification using pandas profiling library
Author
Jan Čáp
Year
2023
Type
Bachelor thesis
Supervisor
Ing. Daniel Vašata, Ph.D.
Reviewers
Ing. Magda Friedjungová, Ph.D.
Department
Summary
This work deals with automatic data exploration with binary classification.
A search of already existing solutions for automatic data exploration is performed.
Furthermore, statistical tests and~methods suitable for testing the dependence of two variables are investigated. Suitable options for~data distribution visualizations are also explored.
In the next section, an extension to~the~\textit{Pandas Profiling} library selected in the search is proposed. The extension specializes in~binary classification. The extension includes graphs and statistics representing the dependency of~columns on the target variable, visualization of the dependency of missing values on~the~target variable, proposed column transformations and training of the default model for target variable classification.
Based on the design, an extension to the \textit{Pandas Profiling} library was implemented to speed up data exploration with binary classification.
Using Monte Carlo Tree Search to play chess
Author
Jakub Král
Year
2024
Type
Bachelor thesis
Supervisor
Ing. Daniel Vašata, Ph.D.
Reviewers
prof. RNDr. Pavel Surynek, Ph.D.
Department
Summary
This thesis deals with the use of the Monte Carlo tree search algorithm and its combination
with neural networks and deep reinforcement learning to play chess. The theoretical part of this
thesis acquaints the reader with the methods and algorithms of reinforcement learning. In the
practical part a model was created such that would train and then play on a standard personal
computer. This is solved by using convolutional neural networks, initial supervised learning and
then reinforcement learning via self-play. A model that fulfills these requirements was created
and runs, but the model plays on a level much lower than was aimed for at the beginning of this
work.
Predicting Aptamer Binding Strength in In Vitro Sequence Selection Using Deep Neural Networks
Author
Linda Beková
Year
2024
Type
Bachelor thesis
Supervisor
Ing. Daniel Vašata, Ph.D.
Reviewers
Mgr. Petr Šimánek
Department
Summary
This thesis addresses the problem of processing SELEX experiments using deep learning. The work includes employing a Feed-Forward Neural Network, a Convolutional Neural Network, a Bidirectional Long Short-Term Memory, and a Random Forest using the Python programming language and comparing their ability to predict the results of SELEX experiments. The thesis expands on previous research on aptamers' binding ability using Restricted Boltzmann Machines and offers multiple approaches to handling this problem. The selected models' predictions achieved a high accuracy on a dataset presented in previous research. When tested on additionally generated data, the models had difficulty differentiating between binders and non-binders and, therefore, were concluded as insufficient for use in the medical field. The results of individual models and approaches are compared. Of all the algorithms, the best performance showed the Restricted Boltzmann Machines followed by Random Forests.
Master theses
Learning methods for continuous-time hidden Markov models
Author
Lukáš Lopatovský
Year
2017
Type
Master thesis
Supervisor
Ing. Daniel Vašata, Ph.D.
Reviewers
Ing. Tomáš Šabata
Department
Summary
The continuous-time hidden Markov model is promising not only for the biomedical research. The lack of efficient learning algorithms has limited its use in the past. However, recently the new efficient EM approaches were presented. In this thesis we are examining and comparing current state-of-the-art methods that are able to train models containing hundreds of hidden states. As the part of the work we have developed the general purpose continuous-time and discrete-time hidden Markov model library effectively implementing the best performing learning methods that is easy to use and available for everyone under open-source license.
Suspicion of corruption rating of contracts published in the government contracts registry
Author
Jan Staněk
Year
2018
Type
Master thesis
Supervisor
Ing. Daniel Vašata, Ph.D.
Reviewers
Ing. Marek Sušický
Department
Summary
This master's thesis describes the design of metrics for identification suspicious contracts published in the register of contracts. It describes public data sources suitable for supplement data from the register of contracts, data integration and feature selection for anomaly detection. Designed metrics simplifies selection of contracts suitable for manual review.
Curriculum Learning of Neural Networks
Author
Gary Fibiger
Year
2020
Type
Master thesis
Supervisor
Ing. Daniel Vašata, Ph.D.
Reviewers
Ing. Magda Friedjungová
Department
Summary
Artificial neural networks are usually trained by observing samples from a training set in a random order. This approach is similar to biological organisms, but their learning process is hardly ever random. Human supervised learning utilizes a curriculum that leads the learning process. Many approaches were proposed to introduce a curriculum to artificial neural networks training in recent years. This thesis provides an overview of those approaches. Many of the approaches were implemented and experimentally evaluated. The results show that different approaches are favorable under different circumstances.
Sentiment Analysis using Domain Specific Adapters
Author
Lukáš Langr
Year
2022
Type
Master thesis
Supervisor
Ing. Daniel Vašata, Ph.D.
Reviewers
doc. Ing. Štěpán Starosta, Ph.D.
Department
Summary
Natural language processing has become a domain of large pre-trained models requiring a great deal of computing power to adjust to a custom task. In this work a different transfer learning method of domain specific adapters is proposed for the task of sentiment analysis. The adapted models are compared to a fine-tuning baseline in multiple experimental scenarios and their performance is comparable to considerably larger models while being much less computationally intensive. This approach looks to be a viable alternative to large models in lower computing power environments.
Recurrent Memory Models with Optimal Polynomial Projections
Author
Ondřej Naňka
Year
2021
Type
Master thesis
Supervisor
Ing. Daniel Vašata, Ph.D.
Reviewers
Ing. Karel Klouda, Ph.D.
Department
Summary
The aim of this thesis is to research the practical usability of high-order polynomial projection operators for compression of signals by projection onto polynomial bases for implementation of recurrent neural networks. Experiments
in the field of sound classification and natural language processing are performed using Tensorflow framework and also as a spiking neural network using
a simulator NengoDL.
The use of Relative Goodness of Fit Tests for training Generative Adversarial Networks
Author
Martin Scheubrein
Year
2021
Type
Master thesis
Supervisor
Ing. Daniel Vašata, Ph.D.
Reviewers
Ing. Magda Friedjungová, Ph.D.
Department
Summary
Generative adversarial networks (GAN) are a class of deep learning methods which are usually applied to images or other high-dimensional data. With such data, it is difficult to decide if the distribution learnt by a model matches the distribution of source data, or to locate the differences. To measure those discrepancies, maximum mean discrepancy (MMD) or unnormalized mean embedding (UME) measures may be used.
This thesis verifies that with proper parametrization, both measures reliably detect both global and local discrepancies in image data. Choice of kernel, its parameters, and in the case of UME the selection of test locations, are studied in detail. Interpretability of optimized test locations in the context of local difference discovery is verified.
Finally, a novel method of early stopping based on MMD and UME measured between the network's output and testing data is proposed.
Deep Reinforcement Learning for Super Mario Bros
Author
Ondřej Schejbal
Year
2022
Type
Master thesis
Supervisor
Ing. Daniel Vašata, Ph.D.
Reviewers
Mgr. Petr Novák, Ph.D.
Department
Summary
Within this master's thesis, a fine-tuned reinforcement learning model capable of preparing an intelligent agent able to play the Super Mario Bros. game has been created. Its architecture is based on conducted research on current state-of-the-art reinforcement learning techniques where the most relevant models for this type of task have been compared between each other. In order to compare the models, research and description of tools that allow the model to interact with the game had been done. Based on the comparison results, the most suitable approach was selected. Experiments with applying various modifications to the selected model have been done in order to find the most suitable modifications for the Super Mario Bros. game. The fine-tuned model has been used to train an intelligent agent, whose performances were tested on the level he was trained on and also on two levels that he had never seen before. The agent's performances were really good and showed nice behavioral patterns, mainly on the level he was trained on, as his performance on the unseen levels was understandably worse.
Improving blood glucose level prediction models
Author
Ladislav Floriš
Year
2024
Type
Master thesis
Supervisor
Ing. Daniel Vašata, Ph.D.
Reviewers
doc. Ing. Štěpán Starosta, Ph.D.
Department
Summary
This work addresses the task of predicting blood glucose levels in patients with type 1 diabetes. Models based on Transformer architecture and Legendre Memory Units (LMU) were explored. The application of LMUs in this work represents their first use for blood glucose level prediction. Employing multivariate time series, predictions are made with 30-minute and 60-minute horizons. Models were trained and evaluated using the OhioT1DM dataset, which includes eight weeks of data from 12 distinct patients. The dataset consists of two editions, released in 2018 and 2020.
Performance was measured using Root Mean Square Error (RMSE), and Clarke Error Grid Analysis was utilized to evaluate clinical accuracy. LMUs achieved an RMSE of 18.17 mg/dl for the 30-minute horizon and 30.33 mg/dl for the 60-minute horizon, in the 2018 edition. In the 2020 edition, the RMSEs were 18.56 mg/dl and 32.57 mg/dl for the 30-minute and 60-minute horizons, respectively.
LMUs were proven to match and, in smaller datasets (2018 edition of OhioT1DM), even outperform the state-of-the-art models.