Ing. Karel Klouda, Ph.D.

+420224359796
karel.klouda@fit.cvut.cz
TH:A-1422

Theses

Sample theses

Bachelor theses

System supporting research in combinatorics on words

Author

Radek Jireš

Year

2013

Type

Bachelor thesis

Supervisor

Ing. Karel Klouda, Ph.D.

Reviewers

doc. Ing. Štěpán Starosta, Ph.D.

Department

Department of Software Engineering

Corpus of comments below news articles

Author

Jakub Bartel

Year

2013

Type

Bachelor thesis

Supervisor

Ing. Karel Klouda, Ph.D.

Reviewers

doc. Ing. Štěpán Starosta, Ph.D.

Department

Department of Theoretical Computer Science

Application for data analysis of study results

Author

Martin Konečný

Year

2015

Type

Bachelor thesis

Supervisor

Ing. Karel Klouda, Ph.D.

Reviewers

Ing. Jitka Hrabáková, Ph.D.

Department

Department of Computer Systems

Czech e-Library - Poetry of the 19th and 20th Century

Author

Jaromír Dalecký

Year

2014

Type

Bachelor thesis

Supervisor

Ing. Karel Klouda, Ph.D.

Reviewers

doc. Ing. Štěpán Starosta, Ph.D.

Department

Department of Software Engineering

Online system to support writing pages on wikipedia.org

Author

Václav Makeš

Year

2016

Type

Bachelor thesis

Supervisor

Ing. Karel Klouda, Ph.D.

Reviewers

Ing. Tomáš Kalvoda, Ph.D.

Department

Department of Software Engineering

Summary

The thesis is focused on solving the problem of detection and design corrections of erroneous and missing data from an Internet encyclopedia Wikipedia. The result is an automated system that downloads, stores and analyzes the Czech edition of Wikipedia articles. To analyze the proposed three methods to identify articles for improvements and additions. Work shows the possibility of proposing improvements to the electronic encyclopedia.

Thesis on DSpace

Administration system for written tests and exams

Author

Kryštof Slavík

Year

2016

Type

Bachelor thesis

Supervisor

Ing. Karel Klouda, Ph.D.

Reviewers

doc. Ing. Štěpán Starosta, Ph.D.

Department

Department of Software Engineering

Summary

This bachelor thesis focuses on design and implementation of extension for web mathematical problems database. The aim of this extension is automated generation of written tests for students of mathematical courses on FIT CTU in Prague. The application allows teachers of the courses to easily create multiple variants of tests without the need for manual assignment of mathematical problems. The thesis introduces an algorithm which is able to automatically create required number of tests based on the specified parameters using the database. The system allows comfortable specification of these parameters. The created tests can be exported in a printable format. This thesis describes in detail the analysis of the required functionality of the application and its design. It focuses on the implementation which uses Ruby on Rails technology and it describes usage of the system in practice. The source code of the application is available on the attached DVD.

Application for visualisation of linear algebra notions and methods

Author

Martin Chvátal

Year

2016

Type

Bachelor thesis

Supervisor

Ing. Karel Klouda, Ph.D.

Reviewers

Ing. Petr Špaček, Ph.D.

Department

Department of Theoretical Computer Science

Summary

The topic of this thesis is creation of an educational software for linear algebra. It allows a teacher to supplement a lecture with example of how the currently studied topic can be used in informatics. There is a set of programming tasks prepared for students, to help them practice what they learned.

Thesis on DSpace

Management system for digitalized literary works

Author

Martin Melichar

Year

2018

Type

Bachelor thesis

Supervisor

Ing. Karel Klouda, Ph.D.

Reviewers

Mgr. Jan Starý, Ph.D.

Department

Department of Software Engineering

Summary

In this thesis the development of an application for the conversion of literary works into electronic form is described. Literary research focuses on comparing of technologies for web application development and comparing text formats for maintaining of electronic works. Furthermore, the assigned input data and the way of their import are described. The practical part follows the evaluation of research and describes the process of the application development. The primary contribution of this thesis is to facilitate the conversion of literary works into electronic form for the UČL AV ČR employees.

Thesis on DSpace

Web application for presentation of research institutions ranking data

Author

Pavel Švagr

Year

2018

Type

Bachelor thesis

Supervisor

Ing. Karel Klouda, Ph.D.

Reviewers

doc. Ing. Štěpán Starosta, Ph.D.

Department

Department of Software Engineering

Summary

The subject of this bachelor thesis is the processing of an open data set derived from ratings of research results published by Research, Development and Innovation Council. This thesis deals with the analysis of files, implementation of parsing module and reveals inconsistencies and errors. Subsequently it is focused on analysis, design and implementation of a web application which enables seaching in ratings and displays an overview of scientific activity of reseach organizations, their units and authors.

Thesis on DSpace

NHL match results prediction

Author

Filip Kojan

Year

2019

Type

Bachelor thesis

Supervisor

Ing. Karel Klouda, Ph.D.

Reviewers

Ing. Magda Friedjungová

Department

Department of Applied Mathematics

Summary

The goal of this thesis is to explore data sources about players and matches in NHL and about modern statistic methods, which are used for evaluating quality of teams and players and possibilities of using these informations for predicting results of NHL matches. Various classification models of machine learning are used and their predictive ability is compared. The results of predictions are compared to bookmaker predictions.

Thesis on DSpace

Detecting problems in outdoor cypher games

Author

Barbora Eliášová

Year

2019

Type

Bachelor thesis

Supervisor

Ing. Karel Klouda, Ph.D.

Reviewers

Ing. Tomáš Kalvoda, Ph.D.

Department

Department of Applied Mathematics

Summary

The thesis grasps analysis of data created during puzzlehunts made by Cryp- tomania. It describes big cipher games and ciphers, that are used. Further off, it covers different possibilities of cipher classfication. It presents data analysis made by Tomáš Kuča and it's benefit to cipher games players. His webpage called statek.seslost.cz classifies data from big cipher games. The the- sis defines terms difficulty, complexity and time intesity. Data analysis itself examines Cryptomania puzzlehunts Avraham Hrashalom, Fantom Brna and Ztracené židovské město. Duration of cipher solving was combined with number of hints taken. The information was used to create a value that defines difficulty of the cipher. Further, every team got a rating as well. Cluster analysis uses these informa- tion to identify groups of similar teams.

Thesis on DSpace

Word sense representation for the Czech language

Author

Vojtěch Paukner

Year

2019

Type

Bachelor thesis

Supervisor

Ing. Karel Klouda, Ph.D.

Reviewers

Ing. Magda Friedjungová

Department

Department of Applied Mathematics

Summary

The thesis surveys traditional and state-of-the-art methods of natural language processing. Particular importance is placed on languages with rich morphology. The state-of-the-art methods are then applied in various ways on the Czech language in order to differentiate between distinct word senses based on their usage in a sentence. Evaluation of these experiments is an important part of the thesis.

Thesis on DSpace

Probabilistic algorithms for computing the LTS estimate

Author

Martin Jenč

Year

2019

Type

Bachelor thesis

Supervisor

Ing. Karel Klouda, Ph.D.

Reviewers

Mgr. Petr Novák, Ph.D.

Department

Department of Applied Mathematics

Summary

The least trimmed squares method is a robust version of the method of least squares, which is an essential tool of regression analysis used to find an estimate of coefficients in the linear regression model. Computing the least trimmed squared estimate is known to be NP-hard, hence only suboptimal probabilistic algorithms are usually used in practice. Besides describing those algorithms, we propose a few ways of combining those algorithms to obtain better performance.

Thesis on DSpace

Football player value prediction

Author

Jan Garček

Year

2020

Type

Bachelor thesis

Supervisor

Ing. Karel Klouda, Ph.D.

Reviewers

Ing. Mgr. Pavla Vozárová, Ph.D., M.A.

Department

Department of Applied Mathematics

Summary

The aim of this thesis is to explore free available data about football players. It explains variances between transfer value and market value and seeks attributes that have a major influence on the player's transfer value. The paper visualizes these attributes with special focus on seasons and nationality. Moreover, it evaluates results from other similar projects and various regression models for a prediction of transfer value are experimentally applied to collected data. Additionally, results of individual models are compared and the most accurate model is determined. The main purpose of this work is to provide an available prediction model for transfer value to the general public for free.

Thesis on DSpace

Predicting selected basketball match events

Author

Ondřej Schejbal

Year

2020

Type

Bachelor thesis

Supervisor

Ing. Karel Klouda, Ph.D.

Reviewers

RNDr. Petr Olšák

Department

Department of Applied Mathematics

Summary

Within this bachelor's thesis, a model predicting the total number of points scored in future match development in NBA basketball match was created. Predictions are based on data from previous games and statistics, which were already published in the ongoing match. In order to obtain the data, a study of existing materials was made, which were then successfully used for the creation of sufficient materials for the training of the prediction model. Also, the research of already finished theses, which are focused on a similar topic, was made. Based on the gathered data, a linear regression prediction model was chosen, and interesting attributes were added to the data mentioned above, which were meant to improve the model's predictions. The model was trained successfully, and it's results on the testing set of data seemed to be favourable. Although the full quality of the results would be possible to obtain by testing the model on currently played matches. Unfortunately, this wasn't possible due to the ongoing COVID-19 pandemic, which took place during the creation of this bachelor's thesis.

Thesis on DSpace

Automated detection of text translations

Author

Jan Peřina

Year

2021

Type

Bachelor thesis

Supervisor

Ing. Karel Klouda, Ph.D.

Reviewers

Mgr. Petr Novák, Ph.D.

Department

Department of Applied Mathematics

Summary

This bachelor thesis explores the possibilities of detecting a translated portions of a text together with ways of search for the origin of such text on the internet. In this thesis an experiment of chosen method for machine translation detection is reproduced. This method was then improved by utilization a different text similarity metric and lemmatisation. The applicability of this method on human produced translation was tested. And several ways of transforming this way detected texts into search engine queries to effectively find their sources on the internet.

Thesis on DSpace

Expected goals in ice hockey

Author

Michal Seibert

Year

2022

Type

Bachelor thesis

Supervisor

Ing. Karel Klouda, Ph.D.

Reviewers

Mgr. Petr Novák, Ph.D.

Department

Department of Applied Mathematics

Summary

The subject of this thesis is to find and examine data sources about actions taken in NHL matches and then proceed to apply these data on forming models for predictions of expected goals. Several classification models are used for prediction. The models and their success rate is then compared with each other and with existing models. They are also used to gather additional information about players’ and teams’ performance.

Thesis on DSpace

Predicting selected basketball match events

Author

Radim Křesťan

Year

2023

Type

Bachelor thesis

Supervisor

Ing. Karel Klouda, Ph.D.

Reviewers

Mgr. Petr Novák, Ph.D.

Department

Department of Applied Mathematics

Summary

This thesis is focused on live predictions in basketball, specifically in NBA. The thesis briefly describes the domain and includes an analysis of experiments that have been conducted in the past. It also describes the process and the possibilities of data mining. In the practical part of this thesis, several models have been used, including but not limited to linear regression and random forests. The most successful method was linear regression which had the lowest error in majority of predictions. Player stats at the end of the game were predicted with known mid-game data.

Thesis on DSpace

Automatic poetic metre detection

Author

Kristýna Klesnilová

Year

2022

Type

Bachelor thesis

Supervisor

Ing. Karel Klouda, Ph.D.

Reviewers

Ing. Magda Friedjungová, Ph.D.

Department

Department of Applied Mathematics

Summary

This work is devoted to automatic metrical analysis of Czech syllabotonic verse metrically tagged inside a large poetic corpus - the Corpus of Czech Verse. First, it reimplements the existing data-driven approach used by a program called KVĚTA. Later, it models the problem as a sequence tagging task and solves it using machine learning. The BiLSTM-CRF model is used, representing the current state of the art for many sequence tagging tasks. Many different input configurations are tested. In all experiments, the inputted syllables or word tokens are represented by Word2Vec word embeddings trained on training data. The results are evaluated by computing three different accuracies of the predictions: syllable-level accuracy, line-level accuracy, and poem-level accuracy. It is shown that using BiLSTM-CRF represents a great success. With the best input configurations, it produces better results than the KVĚTA reimplementation, with predictions achieving 99.61% syllable accuracy, 98.86% line accuracy, and 90.40% poem accuracy. The most interesting finding is that the best results are obtained by inputting sequences representing whole poems instead of individual poem lines.

Thesis on DSpace

Named entity recognition for poetic texts

Author

Ondřej Černý

Year

2023

Type

Bachelor thesis

Supervisor

Ing. Karel Klouda, Ph.D.

Reviewers

Ing. Magda Friedjungová, Ph.D.

Department

Department of Applied Mathematics

Summary

The result of this work is a program that uses Natural Language Processing (NLP) techniques to identify named entities in the Corpus of Czech Verse (CCV). It is part of a cooperation with the Institute of Czech Literature (ICL). Since CCV is not even partially labeled for entity recognition, we first create a set of rules, and using those, we select entities from the poems. These entities are later on categorized into different entity categories using data from Wikipedia. After that, these categorized entities are used as training data for a BiLSTM-CRF neural network that is trained and fine-tuned for NER on the CCV. The resulting model can find and distinguish entities of Place, Person, Mystic Person, and Other. Since the text in the CCV is not labeled for NER, we cannot know the exact accuracy of the final BiLSTM-CRF model. If we would consider the data that are used for training of this model to be 100% accurate, then the final model would have achieved an accuracy of 0.99904 and an F1 score of 0.9532.

Thesis on DSpace

Automatic categorization of job ads

Author

Patricie Petriľáková

Year

2023

Type

Bachelor thesis

Supervisor

Ing. Karel Klouda, Ph.D.

Reviewers

Rodrigo Augusto da Silva Alves, Ph.D.

Department

Department of Applied Mathematics

Summary

This thesis presents the development of the classification model for Information Technology job advertisement at webpage up2staff.com. The objective is to create a reliable classification system that reduces the time and costs associated with manual categorization of job ads. The process involves analyzing and preprocessing a dataset of job ads, researching appropriate algorithms, and experimenting with combinations of feature engineering techniques and supervised machine learning classification algorithms. The model decides the final decision based on weighted decisions from two classification algorithms; one created for the content and the other for the job ads' title. Both classifications perform with the highest F1-score for the Support Vector Machines algorithm applied to TF-IDF features. The classification model achieves F1-score of 0.909.

Thesis on DSpace

Computer vision model for table tennis player detection

Author

Yannick Daniel Gibson

Year

2024

Type

Bachelor thesis

Supervisor

Ing. Karel Klouda, Ph.D.

Reviewers

Ing. Magda Friedjungová, Ph.D.

Department

Department of Applied Mathematics

Summary

This bachelor's thesis focuses on identifying specific targets in a ping-pong match. Among these targets are ping-pong paddles and players. Furthermore, we also decided to detect a ball and a scorekeeper in matches. We applied a computer vision system in the ubiquitous Python programming language for object detection with the architecture YOLOv8 (You Only Look Once version 8) based on YOLOv5 paper. This project gets a video input, draws the enclosing bounding boxes around objects of interest, and displays the video with predictions. We acquire unlabeled data and annotate it manually while also utilizing the pre-annotation method with a pre-trained model. In addition, we supply a plethora of data manipulation techniques and analysis of our results. We end with a robust model detecting all four defined classes at the inference speed of 72 Frames Per Second (FPS).

Thesis on DSpace

Visualization of statistics of Czech hockey players in the NHL

Author

Adam Lesch

Year

2024

Type

Bachelor thesis

Supervisor

Ing. Karel Klouda, Ph.D.

Reviewers

Ing. Magda Friedjungová, Ph.D.

Department

Department of Applied Mathematics

Summary

This bachelor thesis deals with visualization of hockey statistics with emphasis on Czech players in the form of a web application. First, research was conducted focusing on existing web pages presenting visualizations of hockey statistics in the NHL. Subsequently, data sources for creating visualizations were explored. The primary analysis informed the development of a web application, which presents performances of Czech hockey players in the NHL and offers an overview consisting of automatically updated plots. The plots explore three aspects: the performance of players in individual games, an overview of player statistics during the current season and the evolution of the role of Czech players as a whole in the NHL. The application was developed in R, using the Shiny package.

Thesis on DSpace

Master theses

Web portal for testing algorithms computing least trimmed squares estimate

Author

Jan Švehla

Year

2013

Type

Master thesis

Supervisor

Ing. Karel Klouda, Ph.D.

Reviewers

Ing. Jitka Hrabáková, Ph.D.

Department

Department of Software Engineering

Online doctor reservation system

Author

Martin Jelínek

Year

2014

Type

Master thesis

Supervisor

Ing. Karel Klouda, Ph.D.

Reviewers

doc. Ing. Štěpán Starosta, Ph.D.

Department

Department of Software Engineering

Internet Traffic Classification

Author

Jana Mašková

Year

2020

Type

Master thesis

Supervisor

Ing. Karel Klouda, Ph.D.

Reviewers

Ing. Simona Buchovecká

Department

Department of Information Security

Summary

This thesis delves into the topic of machine learning for the classification of internet traffic and the determination of harmful traffic. All steps of machine learning are considered as data collection and data preprocessing. Suitable classification algorithms and anomaly detection algorithms were chosen to accomplish the main task of the thesis. With regards to the classification of internet traffic, a high success rate was achieved for all selected datasets using supervised algorithms based on decision tree. For harmful traffic detection, only two of the seven datasets achieved a satisfactory score with used anomaly detection algorithms.

Thesis on DSpace

Algorithms for verifying properties of D0L systems

Author

Anežka Štěpánková

Year

2021

Type

Master thesis

Supervisor

Ing. Karel Klouda, Ph.D.

Reviewers

doc. Ing. Jan Janoušek, Ph.D.

Department

Department of Theoretical Computer Science

Summary

The aim of this work is to present combinatorics on word and theory od D0L-systems. Further, to study and understand algorithms for determining selected properties of D0L-systems, namely: pushy, injectivity, repetitivity and circularity. Furthermore, to implement these selected algorithms in the language Python and then use them to find out these properties for binary morphisms and to evaluate the results of creating an overview of the properties of the tested binary morphisms.

Thesis on DSpace

Estimating webpage content in secure communication

Author

Marek Mařík

Year

2021

Type

Master thesis

Supervisor

Ing. Karel Klouda, Ph.D.

Reviewers

Ing. Karel Hynek, Ph.D.

Department

Department of Applied Mathematics

Summary

This master thesis deals with whether it is possible to determine from network traffic which websites were visited by the user despite the fact that the communication takes place in an encrypted way. Furthermore, whether it is possible to at least approximately determine the content of the web page from encrypted network traffic. All this based on the characteristics of network flows, i.e. without the traffic being decrypted. As part of this work, a data set generator was designed and implemented, which allows to create data sets that contain captured network flows for visits to individual websites. Two datasets were created using this generator. A diverse set of features has been designed. Based on the features vectors, experiments were performed using multiple different models to identify websites and estimate their content. Furthermore, novelty detection models were created to detect unknown web pages. Experiments show that based on encrypted traffic, websites can be relatively accurately identified and some attributes of their content can be estimated as well.

Thesis on DSpace

Extracting structured data from textual car selling advertisement data

Author

Filip Kojan

Year

2021

Type

Master thesis

Supervisor

Ing. Karel Klouda, Ph.D.

Reviewers

doc. Ing. Pavel Kordík, Ph.D.

Department

Department of Applied Mathematics

Summary

The aim of this work is to explore, design and test methods for extracting structured data from unstructured texts of car ads. Furthermore, examination of methods for text preprocessing into a format suitable for use in machine learning models and the application of these methods in combination with various machine learning models. The most successful models will be compared and the results they have achieved will be evaluated.

Thesis on DSpace

A Tool for Digitalizing Handwritten Chess Notation Sheets

Author

Jana Maříková

Year

2021

Type

Master thesis

Supervisor

Ing. Karel Klouda, Ph.D.

Reviewers

Ing. Jiří Kašpar

Department

Department of Applied Mathematics

Summary

This thesis aims to create a tool that would automate converting a photo of a chess score sheet into digitalized form with the help of OCR and machine learning techniques. The score sheet is a paper document where players write down their and opponents' moves. First of all, the chess terminology and existing solutions are introduced. Then the description of a general OCR system is stated, and, finally, the implementation of the system and its evaluation are given.

Thesis on DSpace

Automatic detection of topics in poetic texts

Author

Martin Bendík

Year

2023

Type

Master thesis

Supervisor

Ing. Karel Klouda, Ph.D.

Reviewers

Ing. Magda Friedjungová, Ph.D.

Department

Department of Applied Mathematics

Summary

This thesis studies the detection of topics in the Corpus of Czech Verse, which contains tens of thousands of poems from the 19th and early 20th centuries. It uses machine learning methods to efficiently process the large amount of data. The output of these algorithms is a set of detected topics and the classification of individual poems into these topics. This can help in further analysis of the artworks, summarizing and exploring what each poem addresses. This thesis presents current research in the area of detecting topics in poetic texts in different languages and using different technologies. The thesis also includes the development of several models that are used to assign topics to individual poems. Unsupervised, supervised and semi-supervised algorithms have been used for this purpose. We evaluate all the created models in detail, visualize them, point out their strengths and weaknesses, specific features and last but not least compare the models with each other. Since the Corpus of Czech Verse does not contain annotations of poem topics, for the purpose of supervised learning, an annotated dataset was created, which consists of a subset of poems from the original dataset.

Thesis on DSpace

Behavioral segmentation of clients based on transaction history

Author

Tomáš Jungman

Year

2025

Type

Master thesis

Supervisor

Ing. Karel Klouda, Ph.D.

Reviewers

Tomáš Tax

Department

Department of Applied Mathematics

Summary

This thesis focuses on the analysis and improvement of transaction monitoring processes, specifically on client segmentation based on their behavioral patterns. Segmentation is a key tool for detecting suspicious activities that may indicate money laundering or other forms of fraudulent behaviour. The thesis consists of a theoretical part, which provides a broader context of financial crime and the methods used to detect it, and a practical part, where the existing segmentation process at the author's employer is analyzed. Based on identified shortcomings, improvements are proposed, including adjustments to internal processes, code optimization, new algorithms, innovative data visualizations, and enhanced documentation. Selected proposals were tested on synthetic data, with experiments demonstrating that expanding parameters and deploying advanced algorithms, such as Mini Batch KMeans, have the potential to make the existing segmentation process more efficient. The results of this work highlight the importance of linking technical innovations with process improvements and pave the way for further development. The conclusions also show the potential for transforming outdated processes and offer directions for future advancements.

Thesis on DSpace