Bachelor theses
System supporting research in combinatorics on words
Author
Radek Jireš
Year
2013
Type
Bachelor thesis
Supervisor
Ing. Karel Klouda, Ph.D.
Reviewers
doc. Ing. Štěpán Starosta, Ph.D.
Department
Corpus of comments below news articles
Author
Jakub Bartel
Year
2013
Type
Bachelor thesis
Supervisor
Ing. Karel Klouda, Ph.D.
Reviewers
doc. Ing. Štěpán Starosta, Ph.D.
Department
Application for data analysis of study results
Author
Martin Konečný
Year
2015
Type
Bachelor thesis
Supervisor
Ing. Karel Klouda, Ph.D.
Reviewers
Ing. Jitka Hrabáková, Ph.D.
Department
Czech e-Library - Poetry of the 19th and 20th Century
Author
Jaromír Dalecký
Year
2014
Type
Bachelor thesis
Supervisor
Ing. Karel Klouda, Ph.D.
Reviewers
doc. Ing. Štěpán Starosta, Ph.D.
Department
Online system to support writing pages on wikipedia.org
Author
Václav Makeš
Year
2016
Type
Bachelor thesis
Supervisor
Ing. Karel Klouda, Ph.D.
Reviewers
Ing. Tomáš Kalvoda, Ph.D.
Department
Summary
The thesis is focused on solving the problem of detection and design corrections of erroneous and missing data from an Internet encyclopedia Wikipedia. The result is an automated system that downloads, stores and analyzes the Czech edition of Wikipedia articles. To analyze the proposed three methods to identify articles for improvements and additions. Work shows the possibility of proposing improvements to the electronic encyclopedia.
Administration system for written tests and exams
Author
Kryštof Slavík
Year
2016
Type
Bachelor thesis
Supervisor
Ing. Karel Klouda, Ph.D.
Reviewers
doc. Ing. Štěpán Starosta, Ph.D.
Department
Summary
This bachelor thesis focuses on design and implementation of extension for web mathematical problems database. The aim of this extension is automated generation of written tests for students of mathematical courses on FIT CTU in Prague.
The application allows teachers of the courses to easily create multiple variants of tests without the need for manual assignment of mathematical problems. The thesis introduces an algorithm which is able to automatically create required number of tests based on the specified parameters using the database. The system allows comfortable specification of these parameters. The created tests can be exported in a printable format.
This thesis describes in detail the analysis of the required functionality of the application and its design. It focuses on the implementation which uses Ruby on Rails technology and it describes usage of the system in practice. The source code of the application is available on the attached DVD.
Application for visualisation of linear algebra notions and methods
Author
Martin Chvátal
Year
2016
Type
Bachelor thesis
Supervisor
Ing. Karel Klouda, Ph.D.
Reviewers
Ing. Petr Špaček, Ph.D.
Department
Summary
The topic of this thesis is creation of an educational software for linear algebra. It allows a teacher to supplement a lecture with example of how the currently studied topic can be used in informatics. There is a set of programming tasks prepared for students, to help them practice what they learned.
Management system for digitalized literary works
Author
Martin Melichar
Year
2018
Type
Bachelor thesis
Supervisor
Ing. Karel Klouda, Ph.D.
Reviewers
Mgr. Jan Starý, Ph.D.
Department
Summary
In this thesis the development of an application for the conversion of literary works into electronic form is described. Literary research focuses on comparing of technologies for web application development and comparing text formats for maintaining of electronic works. Furthermore, the assigned input data and the way of their import are described. The practical part follows the evaluation of research and describes the process of the application development. The primary contribution of this thesis is to facilitate the conversion of literary works into electronic form for the UČL AV ČR employees.
Web application for presentation of research institutions ranking data
Author
Pavel Švagr
Year
2018
Type
Bachelor thesis
Supervisor
Ing. Karel Klouda, Ph.D.
Reviewers
doc. Ing. Štěpán Starosta, Ph.D.
Department
Summary
The subject of this bachelor thesis is the processing of an open data set derived from ratings of research results published by Research, Development and Innovation Council. This thesis deals with the analysis of files, implementation of parsing module and reveals inconsistencies and errors. Subsequently it is focused on analysis, design and implementation of a web application which enables seaching in ratings and displays an overview of scientific activity of reseach organizations, their units and authors.
NHL match results prediction
Author
Filip Kojan
Year
2019
Type
Bachelor thesis
Supervisor
Ing. Karel Klouda, Ph.D.
Reviewers
Ing. Magda Friedjungová
Department
Summary
The goal of this thesis is to explore data sources about players and matches in NHL and about modern statistic methods, which are used for evaluating quality of teams and players and possibilities of using these informations for predicting results of NHL matches. Various classification models of machine learning are used and their predictive ability is compared. The results of predictions are compared to bookmaker predictions.
Detecting problems in outdoor cypher games
Author
Barbora Eliášová
Year
2019
Type
Bachelor thesis
Supervisor
Ing. Karel Klouda, Ph.D.
Reviewers
Ing. Tomáš Kalvoda, Ph.D.
Department
Summary
The thesis grasps analysis of data created during puzzlehunts made by Cryp-
tomania. It describes big cipher games and ciphers, that are used. Further off,
it covers different possibilities of cipher classfication. It presents data analysis
made by Tomáš Kuča and it's benefit to cipher games players. His webpage
called statek.seslost.cz classifies data from big cipher games. The the-
sis defines terms difficulty, complexity and time intesity. Data analysis itself
examines Cryptomania puzzlehunts Avraham Hrashalom, Fantom Brna and
Ztracené židovské město.
Duration of cipher solving was combined with number of hints taken. The
information was used to create a value that defines difficulty of the cipher.
Further, every team got a rating as well. Cluster analysis uses these informa-
tion to identify groups of similar teams.
Word sense representation for the Czech language
Author
Vojtěch Paukner
Year
2019
Type
Bachelor thesis
Supervisor
Ing. Karel Klouda, Ph.D.
Reviewers
Ing. Magda Friedjungová
Department
Summary
The thesis surveys traditional and state-of-the-art methods of natural language processing. Particular importance is placed on languages with rich morphology. The state-of-the-art methods are then applied in various ways on the Czech language in order to differentiate between distinct word senses based on their usage in a sentence. Evaluation of these experiments is an important part of the thesis.
Probabilistic algorithms for computing the LTS estimate
Author
Martin Jenč
Year
2019
Type
Bachelor thesis
Supervisor
Ing. Karel Klouda, Ph.D.
Reviewers
Mgr. Petr Novák, Ph.D.
Department
Summary
The least trimmed squares method is a robust version of the method of least squares, which is an essential tool of regression analysis used to find an estimate of coefficients in the linear regression model. Computing the least trimmed squared estimate is known to be NP-hard, hence only suboptimal probabilistic algorithms are usually used in practice. Besides describing those algorithms, we propose a few ways of combining those algorithms to obtain better performance.
Football player value prediction
Author
Jan Garček
Year
2020
Type
Bachelor thesis
Supervisor
Ing. Karel Klouda, Ph.D.
Reviewers
Ing. Mgr. Pavla Vozárová, Ph.D., M.A.
Department
Summary
The aim of this thesis is to explore free available data about football players. It explains variances between transfer value and market value and seeks attributes that have a major influence on the player's transfer value. The paper visualizes these attributes with special focus on seasons and nationality. Moreover, it evaluates results from other similar projects and various regression models for a prediction of transfer value are experimentally applied to collected data. Additionally, results of individual models are compared and the most accurate model is determined.
The main purpose of this work is to provide an available prediction model for transfer value to the general public for free.
Predicting selected basketball match events
Author
Ondřej Schejbal
Year
2020
Type
Bachelor thesis
Supervisor
Ing. Karel Klouda, Ph.D.
Reviewers
RNDr. Petr Olšák
Department
Summary
Within this bachelor's thesis, a model predicting the total number of points scored in future match development in NBA basketball match was created. Predictions are based on data from previous games and statistics, which were already published in the ongoing match. In order to obtain the data, a study of existing materials was made, which were then successfully used for the creation of sufficient materials for the training of the prediction model. Also, the research of already finished theses, which are focused on a similar topic, was made. Based on the gathered data, a linear regression prediction model was chosen, and interesting attributes were added to the data mentioned above, which were meant to improve the model's predictions. The model was trained successfully, and it's results on the testing set of data seemed to be favourable. Although the full quality of the results would be possible to obtain by testing the model on currently played matches. Unfortunately, this wasn't possible due to the ongoing COVID-19 pandemic, which took place during the creation of this bachelor's thesis.
Automated detection of text translations
Author
Jan Peřina
Year
2021
Type
Bachelor thesis
Supervisor
Ing. Karel Klouda, Ph.D.
Reviewers
Mgr. Petr Novák, Ph.D.
Department
Summary
This bachelor thesis explores the possibilities of detecting a translated portions of a text together with ways of search for the origin of such text on the internet. In this thesis an experiment of chosen method for machine translation detection is reproduced. This method was then improved by utilization a different text similarity metric and lemmatisation. The applicability of this method on human produced translation was tested. And several ways of transforming this way detected texts into search engine queries to effectively find their sources on the internet.
Expected goals in ice hockey
Author
Michal Seibert
Year
2022
Type
Bachelor thesis
Supervisor
Ing. Karel Klouda, Ph.D.
Reviewers
Mgr. Petr Novák, Ph.D.
Department
Summary
The subject of this thesis is to find and examine data sources about actions taken in NHL matches and then proceed to apply these data on forming models for predictions of expected goals. Several classification models are used for prediction. The models and their success rate is then compared with each other and with existing models. They are also used to gather additional information about players’ and teams’ performance.
Predicting selected basketball match events
Author
Radim Křesťan
Year
2023
Type
Bachelor thesis
Supervisor
Ing. Karel Klouda, Ph.D.
Reviewers
Mgr. Petr Novák, Ph.D.
Department
Summary
This thesis is focused on live predictions in basketball, specifically in NBA. The thesis briefly describes the domain and includes an analysis of experiments that have been conducted in the past. It also describes the process and the possibilities of data mining. In the practical part of this thesis, several models have been used, including but not limited to linear regression and random forests. The most successful method was linear regression which had the lowest error in majority of predictions. Player stats at the end of the game were predicted with known mid-game data.
Automatic poetic metre detection
Author
Kristýna Klesnilová
Year
2022
Type
Bachelor thesis
Supervisor
Ing. Karel Klouda, Ph.D.
Reviewers
Ing. Magda Friedjungová, Ph.D.
Department
Summary
This work is devoted to automatic metrical analysis of Czech syllabotonic verse metrically tagged inside a large poetic corpus - the Corpus of Czech Verse. First, it reimplements the existing data-driven approach used by a program called KVĚTA. Later, it models the problem as a sequence tagging task and solves it using machine learning. The BiLSTM-CRF model is used, representing the current state of the art for many sequence tagging tasks. Many different input configurations are tested. In all experiments, the inputted syllables or word tokens are represented by Word2Vec word embeddings trained on training data. The results are evaluated by computing three different accuracies of the predictions: syllable-level accuracy, line-level accuracy, and poem-level accuracy. It is shown that using BiLSTM-CRF represents a great success. With the best input configurations, it produces better results than the KVĚTA reimplementation, with predictions achieving 99.61% syllable accuracy, 98.86% line accuracy, and 90.40% poem accuracy. The most interesting finding is that the best results are obtained by inputting sequences representing whole poems instead of individual poem lines.
Named entity recognition for poetic texts
Author
Ondřej Černý
Year
2023
Type
Bachelor thesis
Supervisor
Ing. Karel Klouda, Ph.D.
Reviewers
Ing. Magda Friedjungová, Ph.D.
Department
Summary
The result of this work is a program that uses Natural Language Processing (NLP) techniques to identify named entities in the Corpus of Czech Verse (CCV). It is part of a cooperation with the Institute of Czech Literature (ICL). Since CCV is not even partially labeled for entity recognition, we first create a set of rules, and using those, we select entities from the poems. These entities are later on categorized into different entity categories using data from Wikipedia. After that, these categorized entities are used as training data for a BiLSTM-CRF neural network that is trained and fine-tuned for NER on the CCV. The resulting model can find and distinguish entities of Place, Person, Mystic Person, and Other. Since the text in the CCV is not labeled for NER, we cannot know the exact accuracy of the final BiLSTM-CRF model. If we would consider the data that are used for training of this model to be 100% accurate, then the final model would have achieved an accuracy of 0.99904 and an F1 score of 0.9532.
Automatic categorization of job ads
Author
Patricie Petriľáková
Year
2023
Type
Bachelor thesis
Supervisor
Ing. Karel Klouda, Ph.D.
Reviewers
Rodrigo Augusto da Silva Alves, Ph.D.
Department
Summary
This thesis presents the development of the classification model for Information Technology job advertisement at webpage up2staff.com. The objective is to create a reliable classification system that reduces the time and costs associated with manual categorization of job ads. The process involves analyzing and preprocessing a dataset of job ads, researching appropriate algorithms, and experimenting with combinations of feature engineering techniques and supervised machine learning classification algorithms. The model decides the final decision based on weighted decisions from two classification algorithms; one created for the content and the other for the job ads' title. Both classifications perform with the highest F1-score for the Support Vector Machines algorithm applied to TF-IDF features. The classification model achieves F1-score of 0.909.
Computer vision model for table tennis player detection
Author
Yannick Daniel Gibson
Year
2024
Type
Bachelor thesis
Supervisor
Ing. Karel Klouda, Ph.D.
Reviewers
Ing. Magda Friedjungová, Ph.D.
Department
Summary
This bachelor's thesis focuses on identifying specific targets in a ping-pong match. Among these targets are ping-pong paddles and players. Furthermore, we also decided to detect a ball and a scorekeeper in matches. We applied a computer vision system in the ubiquitous Python programming language for object detection with the architecture YOLOv8 (You Only Look Once version 8) based on YOLOv5 paper. This project gets a video input, draws the enclosing bounding boxes around objects of interest, and displays the video with predictions. We acquire unlabeled data and annotate it manually while also utilizing the pre-annotation method with a pre-trained model. In addition, we supply a plethora of data manipulation techniques and analysis of our results. We end with a robust model detecting all four defined classes at the inference speed of 72 Frames Per Second (FPS).
Visualization of statistics of Czech hockey players in the NHL
Author
Adam Lesch
Year
2024
Type
Bachelor thesis
Supervisor
Ing. Karel Klouda, Ph.D.
Reviewers
Ing. Magda Friedjungová, Ph.D.
Department
Summary
This bachelor thesis deals with visualization of hockey statistics with emphasis on Czech players in the form of a web application. First, research was conducted focusing on existing web pages presenting visualizations of hockey statistics in the NHL. Subsequently, data sources for creating visualizations were explored. The primary analysis informed the development of a web application, which presents performances of Czech hockey players in the NHL and offers an overview consisting of automatically updated plots. The plots explore three aspects: the performance of players in individual games, an overview of player statistics during the current season and the evolution of the role of Czech players as a whole in the NHL. The application was developed in R, using the Shiny package.
Master theses
Web portal for testing algorithms computing least trimmed squares estimate
Author
Jan Švehla
Year
2013
Type
Master thesis
Supervisor
Ing. Karel Klouda, Ph.D.
Reviewers
Ing. Jitka Hrabáková, Ph.D.
Department
Online doctor reservation system
Author
Martin Jelínek
Year
2014
Type
Master thesis
Supervisor
Ing. Karel Klouda, Ph.D.
Reviewers
doc. Ing. Štěpán Starosta, Ph.D.
Department
Internet Traffic Classification
Author
Jana Mašková
Year
2020
Type
Master thesis
Supervisor
Ing. Karel Klouda, Ph.D.
Reviewers
Ing. Simona Buchovecká
Department
Summary
This thesis delves into the topic of machine learning for the classification of internet traffic and the determination of harmful traffic. All steps of machine learning are considered as data collection and data preprocessing. Suitable classification algorithms and anomaly detection algorithms were chosen to accomplish the main task of the thesis. With regards to the classification of internet traffic, a high success rate was achieved for all selected datasets using supervised algorithms based on decision tree. For harmful traffic detection, only two of the seven datasets achieved a satisfactory score with used anomaly detection algorithms.
Algorithms for verifying properties of D0L systems
Author
Anežka Štěpánková
Year
2021
Type
Master thesis
Supervisor
Ing. Karel Klouda, Ph.D.
Reviewers
doc. Ing. Jan Janoušek, Ph.D.
Department
Summary
The aim of this work is to present combinatorics on word and theory od D0L-systems. Further, to study and understand algorithms for determining selected properties of D0L-systems, namely: pushy, injectivity, repetitivity and circularity. Furthermore, to implement these selected algorithms in the language
Python and then use them to find out these properties for binary morphisms and to evaluate the results of creating an overview of the properties of the tested binary morphisms.
Estimating webpage content in secure communication
Author
Marek Mařík
Year
2021
Type
Master thesis
Supervisor
Ing. Karel Klouda, Ph.D.
Reviewers
Ing. Karel Hynek, Ph.D.
Department
Summary
This master thesis deals with whether it is possible to determine from network traffic which websites were visited by the user despite the fact that the communication takes place in an encrypted way. Furthermore, whether it is possible to at least approximately determine the content of the web page from encrypted network traffic. All this based on the characteristics of network flows, i.e. without the traffic being decrypted.
As part of this work, a data set generator was designed and implemented, which allows to create data sets that contain captured network flows for visits to individual websites. Two datasets were created using this generator. A diverse set of features has been designed. Based on the features vectors, experiments were performed using multiple different models to identify websites and estimate their content. Furthermore, novelty detection models were created to detect unknown web pages.
Experiments show that based on encrypted traffic, websites can be relatively accurately identified and some attributes of their content can be estimated as well.
Extracting structured data from textual car selling advertisement data
Author
Filip Kojan
Year
2021
Type
Master thesis
Supervisor
Ing. Karel Klouda, Ph.D.
Reviewers
doc. Ing. Pavel Kordík, Ph.D.
Department
Summary
The aim of this work is to explore, design and test methods for extracting structured data from unstructured texts of car ads. Furthermore, examination of methods for text preprocessing into a format suitable for use in machine learning models and the application of these methods in combination with various machine learning models. The most successful models will be compared and the results they have achieved will be evaluated.
A Tool for Digitalizing Handwritten Chess Notation Sheets
Author
Jana Maříková
Year
2021
Type
Master thesis
Supervisor
Ing. Karel Klouda, Ph.D.
Reviewers
Ing. Jiří Kašpar
Department
Summary
This thesis aims to create a tool that would automate converting a photo of a chess score sheet into digitalized form with the help of OCR and machine learning techniques. The score sheet is a paper document where players write down their and opponents' moves.
First of all, the chess terminology and existing solutions are introduced. Then the description of a general OCR system is stated, and, finally, the implementation of the system and its evaluation are given.
Automatic detection of topics in poetic texts
Author
Martin Bendík
Year
2023
Type
Master thesis
Supervisor
Ing. Karel Klouda, Ph.D.
Reviewers
Ing. Magda Friedjungová, Ph.D.
Department
Summary
This thesis studies the detection of topics in the Corpus of Czech Verse, which contains tens of thousands of poems from the 19th and early 20th centuries. It uses machine learning methods to efficiently process the large amount of data. The output of these algorithms is a set of detected topics and the classification of individual poems into these topics. This can help in further analysis of the artworks, summarizing and exploring what each poem addresses. This thesis presents current research in the area of detecting topics in poetic texts in different languages and using different technologies. The thesis also includes the development of several models that are used to assign topics to individual poems. Unsupervised, supervised and semi-supervised algorithms have been used for this purpose. We evaluate all the created models in detail, visualize them, point out their strengths and weaknesses, specific features and last but not least compare the models with each other. Since the Corpus of Czech Verse does not contain annotations of poem topics, for the purpose of supervised learning, an annotated dataset was created, which consists of a subset of poems from the original dataset.
Behavioral segmentation of clients based on transaction history
Author
Tomáš Jungman
Year
2025
Type
Master thesis
Supervisor
Ing. Karel Klouda, Ph.D.
Reviewers
Tomáš Tax
Department
Summary
This thesis focuses on the analysis and improvement of transaction monitoring processes, specifically on client segmentation based on their behavioral patterns. Segmentation is a key tool for detecting suspicious activities that may indicate money laundering or other forms of fraudulent behaviour.
The thesis consists of a theoretical part, which provides a broader context of financial crime and the methods used to detect it, and a practical part, where the existing segmentation process at the author's employer is analyzed. Based on identified shortcomings, improvements are proposed, including adjustments to internal processes, code optimization, new algorithms, innovative data visualizations, and enhanced documentation. Selected proposals were tested on synthetic data, with experiments demonstrating that expanding parameters and deploying advanced algorithms, such as Mini Batch KMeans, have the potential to make the existing segmentation process more efficient.
The results of this work highlight the importance of linking technical innovations with process improvements and pave the way for further development. The conclusions also show the potential for transforming outdated processes and offer directions for future advancements.