Ing. Miroslav Čepek, Ph.D.

Theses

Bachelor theses

Product Compatibility Detection from Product Description

Author
Tomáš Bánhegyi
Year
2022
Type
Bachelor thesis
Supervisor
Ing. Miroslav Čepek, Ph.D.
Reviewers
Ing. Luděk Kopáček, Ph.D.
Summary
This bachelor thesis focuses on processing product descriptions to extract product compatibility information. The solution uses known machine learning models from the natural language processing field and its subtasks named entity recognition, relationship extraction, and question answering. A suitable dataset must contain annotations specifying the named entities and their relationships. After creating the dataset, there are applied selected machine learning models. I extracted product compatibility information with a 62.30% score, using just named entity recognition and relationship extraction. We decided to skip the question answering task because it would be out of scope for this bachelor thesis. This thesis brings a solution for leveraging pre-trained models to analyse product descriptions and extract their compatibility. In summary, there are described the possibilities for further research. As an appendix, there is a detailed description of the configuration file used for our model.

Generating Tabular Data while Preserving Dependencies

Author
Jakub Renc
Year
2022
Type
Bachelor thesis
Supervisor
Ing. Miroslav Čepek, Ph.D.
Reviewers
Ing. Magda Friedjungová, Ph.D.
Summary
This thesis processes the problematics of tabular data generation while preserving dependencies. The main aim of the thesis is to create a generator, which can generate data with the same statistical properties as original data. In addition, the model must be able to generate both numerical and categorical data. The present thesis contains a detailed implementation of two generative models, namely Variational autoencoder and Generative adversarial networks. The experimental part compares these models with chosen generators from the Synthetic data vault library. Models outputs are evaluated based on results from classification and regression tasks. Although Generative adversarial networks repeatedly scored the best results, it is impossible to determine a better model. Both generators proved that they can generate tabular data while preserving desired features.

Graph Neural Networks Exploration

Author
Barbara Bobeničová
Year
2023
Type
Bachelor thesis
Supervisor
Ing. Miroslav Čepek, Ph.D.
Reviewers
Mgr. Vojtěch Rybář
Summary
This thesis is dedicated to the analysis of graph neural network methods for the nodes and graph classification. Explores current libraries for working with graph neural networks such as StellarGraph, PyTorch Geometric and DGL. The graph algorithms Graph Convolutional Networks, GraphSAGE and Graph Attention Networks are tested and compared on selected datasets from the Open Graph Benchmark. The achieved results are compared with the state of the art results.

Detection and Reading Number Plates in Photos

Author
Patrik Vodila
Year
2022
Type
Bachelor thesis
Supervisor
Ing. Miroslav Čepek, Ph.D.
Reviewers
doc. Ing. Kamil Dedecius, Ph.D.
Summary
The work deals with the automatic recognition of vehicle number plates. The result of this work is a program that is able to recognize text on a single-line number plate with dark text on a light background. The output of the program is an image marked with all detected number plates and with recognized text. The program is written in Python and uses YOLO version 4 implemented using TensorFlow for number plate localization and Tesseract for character recognition. By using the OpenCV library, my implementation improved the results initially, using only Tesseract OCR and YOLO. The improvement in the correct recognitions was more than ten times better. Correct number plate recognition reached up to 31.5%. I have also found more applicable use-cases where the implementation proved successful.

Exploration of Graph Generation Techniques

Author
Kirill Poligach
Year
2023
Type
Bachelor thesis
Supervisor
Ing. Miroslav Čepek, Ph.D.
Reviewers
doc. Ing. Ivan Šimeček, Ph.D.
Summary
Due to various reasons -- like privacy or security concerns -- it's not always possible to work directly with an original graph, such as bank transactions or social network interaction graph. These graphs are necessary for ML projects such as bank security systems for the detection of abnormal transactions, social network recommendation systems, and many other similar projects. This thesis aims to investigate the current existing graph generation techniques and assess how much synthetic graphs statistically correspond to the properties of the original ones. It also evaluates the feasibility of using structural embedding models for classification problems using synthetic graphs.

Driving AWS DeepRacer Cars on Unknown Tracks

Author
Vincent Jakl
Year
2023
Type
Bachelor thesis
Supervisor
Ing. Miroslav Čepek, Ph.D.
Reviewers
Ing. Zdeněk Buk, Ph.D.
Summary
AWS Deepracer is a popular platform for developing autonomous racing cars using reinforcement learning. In this thesis, the aim is to develop a model for AWS Deepracer that can navigate tracks it has not seen before. The method, that was used to achieve such a model involved using a combination of techniques including data augmentation, and hyperparameter tuning. The model was trained on a set of tracks that were not included in the evaluation dataset and evaluated its performance on a separate set of tracks. The outcome of this thesis are two AWS Deepracer models, one slower but more careful one, and one faster but less accurate one. Both of these models, however, are able to run fairly accurately on a wide variety of tracks, including ones, unseen in training. The outcomes of this thesis allow future AWS Deepracer developers that might want to build a general model, to start with some insights into the process or use the already trained models from this thesis as a base for their own models.

Collision avoiding model for autonomous driving

Author
Peter Kosorín
Year
2023
Type
Bachelor thesis
Supervisor
Ing. Miroslav Čepek, Ph.D.
Reviewers
Ing. Zdeněk Buk, Ph.D.
Summary
Within this thesis, a comprehensive literature survey of various autonomous driving methodologies and machine learning model architectures has been conducted, with a particular focus on object avoidance. The thesis goes on to explore the capabilities of the AWS DeepRacer autonomous racecar platform. This platform is utilized to investigate the feasibility of training end-to-end self-driving models focused on object avoidance using reinforcement learning. Two self-driving architectures were compared, namely a three-layer convolutional neural network and a five-layer convolutional neural network architecture. Furthermore, the impact of sensor choice on the autonomous object avoidance task was compared. Experiments in the simulated environment showed, that the three-layer convolutional neural network architecture, equipped with a stereo camera and LiDAR sensors performed the best. The model was subsequently deployed to the DeepRacer vehicle and demonstrated in the real world. The thesis successfully demonstrated the feasibility of training end-to-end autonomous models using the AWS DeepRacer platform and simulated environment.

Artificial Intelligence based Detection and Tracking of Sperm Cells

Author
Jakub Hořenín
Year
2024
Type
Bachelor thesis
Supervisor
Ing. Miroslav Čepek, Ph.D.
Reviewers
Mgr. Alexander Kovalenko, Ph.D.
Summary
This thesis introduces a novel machine learning-based methodology for automated detection and tracking of sperm cells within microscopic video recordings, aiming to elucidate the dynamics and motion patterns of both individual sperm cells and sperm cell bundles. At first, the method identifies sperm cells across successive frames within a video sequence, facilitating the reconstruction of each cell's trajectory over time. Subsequently, I introduce a classification algorithm that distinguishes between single sperm cells, clusters of adjacent cells, sperm cell bundles, and clutter, addressing a gap in existing methodologies. Finally, I employ three conventional metrics for velocity assessment: Straight Line Velocity (VSL), Average Path Velocity (VAP), and Curvilinear velocity (VCL), to quantify the movement speed of both individual sperm cells and bundles. The approach represents a significant advancement in the automated analysis of sperm motility and aggregation phenomena, providing a robust tool for researchers to study sperm behavior with enhanced accuracy and efficiency. A web-based user interface has been created, and the latest version of the program utilizing this methodology is publicly available at https://apps.datalab.fit.cvut.cz/sperm_tracking/ with source code publicly available at gitlab: https://gitlab.fit.cvut.cz/horenjak/sperm_cell_tracking_app/.

Video Recording based Sperm Cell Movement Prediction and Modes of Movement Detection

Author
Matej Kulháň
Year
2024
Type
Bachelor thesis
Supervisor
Ing. Miroslav Čepek, Ph.D.
Reviewers
Mgr. Alexander Kovalenko, Ph.D.
Summary
This thesis explores the use of machine learning techniques to predict the future path of sperm cells, classify their directional orientation, and predict their rotation from video data. The research is motivated by the need to better understand the behavior and motility of sperm cells, which play a crucial role in biomedical research and human reproduction studies.

AWS DeepRacer Controller Training Scenarios Exploration for Real-World Perfomance

Author
Yelizaveta Tskhe
Year
2024
Type
Bachelor thesis
Supervisor
Ing. Miroslav Čepek, Ph.D.
Reviewers
Ing. Zdeněk Buk, Ph.D.
Summary
This thesis aims to investigate the steps required to transfer a machine learning controller from a simulated environment to the real-world vehicle, allowing it to navigate a track in a safe and reliable way despite the visual differences. The main challenge lies in bridging the gap between the ideal conditions in simulated environment and dynamic, noisy real-world scenario. The noise manifests in the lighting conditions, light reflections, presence of foreign objects in the vehicle's camera as well as uncertainty in the speed and steering behaviour due to the battery level. I have conducted an exhaustive review of the existing literature on the techniques and strategies for autonomously driving vehicles in order to gain a deeper understanding of the domain. Afterwards, I have trained and evaluated a reinforcement learning model that served as a benchmark for further experiments. Based on that, I proposed the potential improvements: training with a domain randomization and using a deeper 5-layer neural network. The improvements have been applied to the models in order to enhance the vehicle's performance in the real-world setting. They have been thoroughly tested and, as a result, domain randomization proved to be helpful, while using a 5-layer neural network did not seem to bring significant improvement due to the number of reasons.

Master theses

Detection and removal of watermarks from image data

Author
Tomáš Halama
Year
2023
Type
Master thesis
Supervisor
Ing. Miroslav Čepek, Ph.D.
Reviewers
Ing. Magda Friedjungová, Ph.D.
Summary
Digital image watermarking is a widely used technique for protecting intellectual property or authenticating digital media, but it can negatively impact image quality and usability. This motivates the need for removing watermarks from images, and deep learning presents a potential solution. This thesis develops a deep learning method for watermark removal, including a survey of existing techniques and the proposal of a novel architecture. The method's performance is evaluated in terms of watermark detection accuracy and image reconstruction quality.

Design and Implementation of Machine Learning Operations

Author
Michal Bacigál
Year
2023
Type
Master thesis
Supervisor
Ing. Miroslav Čepek, Ph.D.
Reviewers
Rodrigo Augusto da Silva Alves, Ph.D.
Summary
The growing popularity and importance of machine learning adoption across industries have led to a gradual enrichment of DevOps principles with data- and model-related concepts, forming a paradigm known as MLOps. This diploma thesis explores its importance and describes the main principles and phases involved. We perform a summary of MLOps tools and their features, which we use to select the appropriate tools for use in our academic setting, and design a proof of concept solution that can be used as a basis for further research and incorporation of machine learning operations to simplify the model development process for students and researchers at our university.

Exploring Modifications of Fourier Transform and their Impact on Accuracy of Machine Learning Techniques for ECG Classification

Author
Bogdan Buliakov
Year
2024
Type
Master thesis
Supervisor
Ing. Miroslav Čepek, Ph.D.
Reviewers
Ing. Tomáš Kalvoda, Ph.D.
Summary
This thesis aims to explore machine learning techniques and options to process time signals with focus on ECG. The thesis explores impact of modifications of Fourier tranform on performace of machine learning models and will compare the accuracy of models in frequency domain with individual modifications to models working in time domain. Individual steps: 1) Review machine learning models for time signal classification (partial and whole) and techniques for preprocessing time signals. 2) Describe and explore suitable dataset. For example CODE-15 for ECG Classification. 3) Review Fourier transform, it's modifications and propose your modifications. 4) Create a baseline model and experiment with modified models and document the impact of modifications on accuracy of the model. 5) Compare the results with techniques directly using the time domain data. Literature: Carlos Mateo, Juan Antonio Talavera. Short-time Fourier transform with the window size fixed in the frequency domain. Digital Signal Processing. Volume 77. 2018. ISSN 1051-2004. https://doi.org/10.1016/j.dsp.2017.11.003. Ribeiro, Antônio H., et al. "Automatic diagnosis of the 12-lead ECG using a deep neural network." Nature communications 11.1 (2020): 1760.