Ing. Jan Trávníček, Ph.D.

Theses

Bachelor theses

Automata library - internal and comunication format

Author
Martin Žák
Year
2013
Type
Bachelor thesis
Supervisor
Ing. Jan Trávníček, Ph.D.
Reviewers
Ing. Martin Poliak, Ph.D.

Knowledge game - Learn FIT

Author
Jaroslav Pikl
Year
2014
Type
Bachelor thesis
Supervisor
Ing. Jan Trávníček, Ph.D.
Reviewers
doc. RNDr. Alena Šolcová, Ph.D.

Implementation of eLibrary web application

Author
Dávid Ocetník
Year
2013
Type
Bachelor thesis
Supervisor
Ing. Jan Trávníček, Ph.D.

Backward tree pattern matching

Author
David Kocík
Year
2013
Type
Bachelor thesis
Supervisor
Ing. Jan Trávníček, Ph.D.
Reviewers
Ing. Jan Baier

Automata library - transformations between ragular expressions, regular grammars and finite automata

Author
Tomáš Pecka
Year
2014
Type
Bachelor thesis
Supervisor
Ing. Jan Trávníček, Ph.D.
Reviewers
Ing. Jan Baier

Automatová knihovna - determinisation of finite and pushdown automata

Author
Jan Veselý
Year
2014
Type
Bachelor thesis
Supervisor
Ing. Jan Trávníček, Ph.D.
Reviewers
Ing. Radomír Polách

Automata library - tree automata and algorithms on trees

Author
Štěpán Plachý
Year
2015
Type
Bachelor thesis
Supervisor
Ing. Jan Trávníček, Ph.D.
Reviewers
Ing. Radomír Polách
Summary
The topic of this thesis is finite tree automata, which is a computational model for processing regular tree languages, and their implementation in project Automata library. In the thesis are proposed and implemented data structures of trees and tree automata, with which is necessary to extend the library. Algorithms proposed and implemented in this thesis are random ranked and unranked labeled ordered tree generation, bottom-up finite tree automaton determinization, acceptance of ranked tree by deterministic and nondeterministic bottom-up finite tree automaton and finding occurrences of language of deterministic bottom-up finite tree automaton in ranked tree.

GCC frontend for the GCC internal form

Author
Martin Senko
Year
2015
Type
Bachelor thesis
Supervisor
Ing. Jan Trávníček, Ph.D.
Reviewers
Ing. Radomír Polách
Summary
This thesis describes and analyzes built-in intermediate representation dump format of GCC compiler and the informational value contained. It suggests extensions to the dump format, making it suitable for further front-end processing. This thesis also describes the front-end which was developed for the dump format processing and translating into GENERIC intermediate representation.

GCC and LLVM - Comparison of implementation of front-ends

Author
Ivan Ryšavý
Year
2015
Type
Bachelor thesis
Supervisor
Ing. Jan Trávníček, Ph.D.
Reviewers
Ing. Radomír Polách
Summary
Implementations of compiler frontends for GCC and LLVM compiler toolkits are presented in the thesis. Frontends parse and analyse Mila source code, transforming it into an intermediate representation used by a respective compiler. Mila language is formally described and an implementation of a lexical analyser, a syntactic analyser and an abstract syntax tree is documented. In this thesis there is placed emphasis on the detailed description and comparison of used interface of GCC and LLVM. Based on this description, anyone can create a new frontend for these compilers. It was found that the implementation of the LLVM front-end is more difficult, because there is needed to do more transformations of source code in the front-end. Front-ends have been successfully implemented and tested to be correct by a set of sample programs covering all functionality. Source codes of front-ends, sample programs and complete LL(k) grammar of Mila language can be found in the appendix.

Vizualization of backward tree pattern matching

Author
Josef Rypáček
Year
2015
Type
Bachelor thesis
Supervisor
Ing. Jan Trávníček, Ph.D.
Reviewers
Ing. Radomír Polách
Summary
This thesis deals with visualization of backward tree pattern matching algorithm. It describes design and features of required visualization. It implements visualization based on requirements and analysis. The thesis compares methods of implementation and its result is web application for algorithm visualization.

Detection of unspecified behaviour in C

Author
Martin Holoubek
Year
2016
Type
Bachelor thesis
Supervisor
Ing. Jan Trávníček, Ph.D.
Reviewers
Ing. Ladislav Vagner, Ph.D.
Summary
This thesis deals with the issue of the unspecified behavior of the programming language C in the GCC compiler. Furthermore the detection algorithm of the unspecified behavior is introduced and discussed. The chosen approach is described and implemented in the GCC compiler. The results of this paper extend capabilities of the compiler and facilitate the use of it.

Measurements in the automata library

Author
Radovan Červený
Year
2016
Type
Bachelor thesis
Supervisor
Ing. Jan Trávníček, Ph.D.
Reviewers
Ing. Jan Baier
Summary
This theses introduces new extension of the Automata library, which allows measuring running time of an algorithm, memory used by an algorithm, and general events for algorithm profiling. Next, applications for measurement automation and basic processing of measurement results are designed and implemented. Three new implementations of string matching algorithms are added to the Automata library. Finally, the results of an experimental measurement of the new algorithms as well as some already existing algorithms in the Automata library are presented.

Automata library - LR parser construction

Author
Martin Kočička
Year
2016
Type
Bachelor thesis
Supervisor
Ing. Jan Trávníček, Ph.D.
Reviewers
Ing. Jan Baier
Summary
This thesis covers basic LR parsing algorithms. We describe bottom-up and shift-reduce parsing methods in general, and then we focus on LR parsing. We go into more detail with two algorithms - LR(0) and SLR(1). Suitable algorithms and data structures are presented and implemented into the Automata library. We also explore existing solutions and the Automata library itself.

Generator of parser with grammar transformations

Author
Matěj Uzel
Year
2016
Type
Bachelor thesis
Supervisor
Ing. Jan Trávníček, Ph.D.
Reviewers
Ing. Petr Máj
Summary
The thesis analyzes principals of performing deterministic single-pass parsing. In particular it is focused on the top-down parsing method called recursive descent parser. The thesis also describes grammar transformations in order to get LL(1) grammar. A part of the thesis is a documentation of implemented parser generator. The generator implements semantic processing by using S-attributed and L-attributed grammars. The implemented parser is a part of attachment.

Support for creating semestral work of the Programming languages and compilers course

Author
Jan Dejdar
Year
2017
Type
Bachelor thesis
Supervisor
Ing. Jan Trávníček, Ph.D.
Reviewers
Ing. Radomír Polách
Summary
This bachelor thesis deals with creating a documentation for a semestral work of Programming languages and compilers course at Faculty of Information Technologies at CTU, and updating supporting materials to a current stable release of a GCC compiler. It is mainly focused on an interface used for a communication between the front-end and the middle-end parts of the GCC and a creation of an abstract syntax tree.

Implementation of a repetition searching algorithm in trees

Author
Aleksandr Shatrovskii
Year
2017
Type
Bachelor thesis
Supervisor
Ing. Jan Trávníček, Ph.D.
Reviewers
Ing. Ondřej Guth, Ph.D.
Summary
This thesis is concerned with implementation of the algorithm which computes all subtree repeats in a ranked labeled ordered tree. The efficient algorithm proposed by Michalis Christou et al. is analyzed in detail and is presented in this thesis. Internal representation of tree structures in the Automata library is explored. The algorithm and necessary data structures are implemented as part of the Automata library project. The implementation is tested with both predefined and randomly generated trees.

Design and implementation of language with simple, safe and effective memory management

Author
Miroslav Kravec
Year
2017
Type
Bachelor thesis
Supervisor
Ing. Jan Trávníček, Ph.D.
Reviewers
Ing. Radomír Polách
Summary
Simplicity of use, efficient resources usage and safety against memory corruption are important characteristics of memory management. To achieve them, the thesis is focused on region based memory management. Existing languages were analysed. A new language was designed, and a prototype was implemented as a frontend for LLVM compiler.

Design and implementation of tree data structures for C++

Author
Vladimír Vojáček
Year
2017
Type
Bachelor thesis
Supervisor
Ing. Jan Trávníček, Ph.D.
Reviewers
Ing. Ondřej Guth, Ph.D.
Summary
I focused on design and implementation of tree data structures in programming language C++ in my bachelor thesis. I designed structures according to analysis of reference algorithms Aho Corasick and Position heap. I implemented structures on the base of design and their proper function tested on mentioned algorithms. Reader can introduce with design of prepared data structures, which can be used in this form, or modified to special requests.

Implementation of client and server for control and interpretation of robot Karel

Author
Stefan Ćirić
Year
2017
Type
Bachelor thesis
Supervisor
Ing. Jan Trávníček, Ph.D.
Reviewers
Ing. Radomír Polách
Summary
The content of this thesis is the implementation of client and server system for Karel robot. The system will be implemented on an Arduino ESP8266 WiFi chip, through which users will connnect over a WiFi connection and be able to interact with the robot through the web page provided. It describes in detail the WiFi chip on which the system is developed, as well as fully explaining the concepts of the Karel language, Karel robot, showing what the robot's task is, and what it can accomplish. The second part focuses on the design and implementation techniques chosen to accomplish the task, going over the internal structure and architecture of the project.

Design and implementation of a context-free grammar parsing library

Author
Patrik Valkovič
Year
2018
Type
Bachelor thesis
Supervisor
Ing. Jan Trávníček, Ph.D.
Reviewers
Ing. Miroslav Hrončok
Summary
The goal of this thesis is to develop library that strictly separate syntactic and semantic part of the parsing proccess. Library is suppose to be simple and easy to use. Library parsing proccess uses context-free grammars and Cocke-Younger-Kasami algorithm, because of it's versatility. Library is developed in Python programming language. To simplify parsing proccess, the library implements transformations into Chomsky normal form. Moreover, it also implements backward transformations of the parsed tree. For that particular reasons, library provides complex parsing tool. The library was successfully implemented and published. The functionality of the library is demonstrated on lambda calculus interpreter, which functionality is to parse and interpret lambda calculus.

Quick search tree pattern matching

Author
Michal Cvach
Year
2018
Type
Bachelor thesis
Supervisor
Ing. Jan Trávníček, Ph.D.
Reviewers
Ing. Tomáš Pecka
Summary
This thesis deals with the problem of tree pattern matching. Adaptation of the Quick Search algorithm for tree pattern matching is designed and then implemented in the work. The algorithm is implemented in the Algorithm library project, and it is also implemented in the Forest fire & Fire wood toolkit, where it was extensively tested and then compared with other algoritms which can solve the same problem, mainly the backward linearised tree pattern matching algorithm. The presented algorithm achieves results which are by 30 % better than the results of the backward linearised tree pattern matching algorithm.

String searching implementation using compact suffix automaton

Author
Josef Erik Sedláček
Year
2018
Type
Bachelor thesis
Supervisor
Ing. Jan Trávníček, Ph.D.
Reviewers
Ing. Ondřej Guth, Ph.D.
Summary
The thesis is concerned with the problem of deciding, whether it is true, given input words w and u, that u is a substring of w, and eventually outputting a set of positions where u is found in w. Specifically, the content of this thesis deals with a direct construction of the compact suffix automaton, which can decide the problem in linear time and space in relation to the length of the word w. The result is an implementation in C++ of an algorithm for constructing the automaton as a part the Algorithms Library ALIB.

Graphs and graph algorithms for algorithms library

Author
Jan Uhlík
Year
2018
Type
Bachelor thesis
Supervisor
Ing. Jan Trávníček, Ph.D.
Reviewers
Ing. Radomír Polách
Summary
The bachelor thesis improves the implementation of the graphs and graph algorithms in the library ALib. This library has been developed at the Department of Computer Science FIT CTU in Prague. A great deal of emphasis has been placed on the universality of the implementation design which leads to intuitive usage and easy future extension of the library. The thesis also extends the library with new graph traverse and graph pathfinding algorithms and makes experiments with the original ones as well as the new ones.

Design and implementation of backward tree pattern matching algorithm modifications

Author
Kamil Červený
Year
2018
Type
Bachelor thesis
Supervisor
Ing. Jan Trávníček, Ph.D.
Reviewers
Ing. Tomáš Pecka
Summary
In this thesis is designed a new tree pattern matching algorithm. Its idea is based on a concept of good-suffix-shift heuristic of Boyer-Moore string searching algorithm and findings of an existing adaptation of Morris-Pratt algorithm for trees. The algorithm finds all occurences of a sought tree pattern in a given sought through tree using two auxiliary data structures. During the algorithm's run input trees are transformed into a linearised form, specifically postfix, ranked notation. Finally the implemented algorithm is tested with the best existing tree pattern matching algorithms and measurements outcomes show that it ranks among the fastest of them.

Improvement of GUI for algorithms library ALIB

Author
Martin Hanzík
Year
2018
Type
Bachelor thesis
Supervisor
Ing. Jan Trávníček, Ph.D.
Reviewers
Ing. Ondřej Guth, Ph.D.
Summary
The subject of this bachelor's thesis is to analyse the flaws of an existing graphical user interface for the algorithm library ALT, suggest improvements and implement them. The text contains research of the design pattern "pipes and filters" and applications that use it. Furthermore, it focuses on analysing the means of algorithm discovery provided by the library, exploring the possibilities of parallel job scheduling using synchronisation primitives and scheduling algorithms. Finally, simple user testing is performed.

Exact and approximate matching automata construction and simulation

Author
Tomáš Čapek
Year
2018
Type
Bachelor thesis
Supervisor
Ing. Jan Trávníček, Ph.D.
Reviewers
Ing. Radomír Polách
Summary
The objective of this thesis is to implement algorithms for the construction of search automata. The paper also deals with simulations of this type of automata. Methods using Hamming, Levenshtein and Generalized Levenshtein distance are used. This thesis is a part of the project Algorithm library.

Web-based finite automata editor

Author
Petr Svoboda
Year
2019
Type
Bachelor thesis
Supervisor
Ing. Jan Trávníček, Ph.D.
Reviewers
Ing. David Bernhauer
Summary
Bachelor thesis describes process of designing and implementing web application in the form of dynamic editor of finite automata. Existing solutions and their benefits are analyzed. It describes the selection and utilization of web technologies used in its creation. It also analyzes solutions for importing and exporting data of various file formats and automatic positioning of created state machines with the help of positioning algorithms. Lastly it also describes development of user-friendly environment.

Standard library container extension with notifications

Author
Michal Drabina
Year
2019
Type
Bachelor thesis
Supervisor
Ing. Jan Trávníček, Ph.D.
Reviewers
Ing. Petr Máj
Summary
This thesis deals with design and implementation of extension of C++ standard libary containers with notification about attepts to alter their content. Based on results of evaluation by listeners the operation is performed or cancelled. Implementation is tested from the perspective of functionality and type requirements imposed on elements of containers.

C++ compile-time generated lexical analyzer

Author
Boris Rúra
Year
2019
Type
Bachelor thesis
Supervisor
Ing. Jan Trávníček, Ph.D.
Reviewers
Ing. Ladislav Vagner, Ph.D.
Summary
Usefulness of lexical analysis is undeniable, be it in processing of natural language such as text messages or in processing of source code by compilers.As such, simplifying the creation and boosting the performance of lexical analyzers is always welcome. This thesis aims to show a different approach to generating lexical analyzers by tackling this issue as a meta-programming one,while leveraging the features of upcoming C++20 standard. It demonstrates that the meta-programming approach can yield at least as compact binaries as the standard one while being competitive in the speed of the lexical analysis.It concludes that this is a viable approach and removes the need for external programs generating lexical analyzers, albeit compilers are still evolving in this area and compilation times may be a concern.

Finite automata recognition from image

Author
Ondřej Hamák
Year
2019
Type
Bachelor thesis
Supervisor
Ing. Jan Trávníček, Ph.D.
Reviewers
Ing. Jakub Novák
Summary
This thesis deals with the conversion of an image (photograph) of a finite automaton diagram to a digital text format. It describes problems of photography processing, image segmentation and identification of elements in hand-drawn diagrams. The thesis includes an algorithm design for the transformation of diagrams to their textual representation and a prototype implementation for the Android platform.

SAP Hana database metadata extraction tool

Author
Ondřej Hlaváč
Year
2020
Type
Bachelor thesis
Supervisor
Ing. Jan Trávníček, Ph.D.
Summary
Static analysis of SQL scripts requires information about objects stored in the relation database instance the script is working with. These objects are called database metadata. This thesis focuses on metadata analysis in the SAP Hana database. It explores various database object types and discuses various ways to extract them. Then compares them to the metadata of Microsoft SQL Server and Oracle database. Results are used in the design and implementation of a metadata extraction tool. The tool itself is a part of the project Manta and integrated into its tool Manta Flow. Manta Flow is used to analyze information systems for the data flow that then creates a visual representation using data lineage. This includes databases and their scripts. The extraction tool is a fully working metadata extractor working with Manta Flow.

Implementation of string searching algorithms with constant extra space complexity

Author
Jan Jirák
Year
2020
Type
Bachelor thesis
Supervisor
Ing. Jan Trávníček, Ph.D.
Reviewers
Mgr. Petr Matyáš
Summary
The goal of this thesis is a theoretical introduction to searching in string exact match with only constant space memory given. Next the thesis considers the implementation of algorithms, which solve the problem. Algorithms will be integrated into Algorithms Library Toolkit developed at FIT CTU in Prague, where will be implemented based on pseudocodes from referred literature.

Burrows-Wheeler transform and searching in strings

Author
Martin Prokopič
Year
2021
Type
Bachelor thesis
Supervisor
Ing. Jan Trávníček, Ph.D.
Summary
This Bachelor’s thesis describes forward and reverse Burrows-Wheeler transform and researches its applications for solving the string exact pattern matching problem in transformed domain. Attached is our tested implementation of described data structures and algorithms in the Algorithms Library Toolkit project.

Finite automata editor on touch devices

Author
Marek Fořt
Year
2021
Type
Bachelor thesis
Supervisor
Ing. Jan Trávníček, Ph.D.
Reviewers
Ing. Eliška Šestáková, Ph.D.
Summary
This thesis is concerned with designing and implementing a prototype of a finite automata editor for iPad and a subsequent usability testing of the prototype. It also contains analysis of the Algorithms Library Toolkit library's web interface, of the library itself, and of approaches to recognizing strokes on touch devices.

Dataflow analysis in Azure Data Factory

Author
Jan Chybík
Year
2022
Type
Bachelor thesis
Supervisor
Ing. Jan Trávníček, Ph.D.
Reviewers
Ing. Petr Košvanec
Summary
This work focuses on data flow analysis of Azure data factory. It examines data flow representation in the Manta project. It explores and describes azure data factory source codes and designs a solution on how to analyze those source codes. In the end, it utilizes all findings to implement a proof-of-concept scanner, which will be part of the Manta project

Evolution of the web user interface of the algorithm library

Author
Hana Litavská
Year
2022
Type
Bachelor thesis
Supervisor
Ing. Jan Trávníček, Ph.D.
Reviewers
Ing. Tomáš Pecka
Summary
This bachelor thesis is dedicated to providing a clear view of improvements related to a web interface of a library named ALT which is dedicated to data structures, algorithms related to stringology, arbology, and partly graphs. The original version of the web is analyzed with a strong focus on use cases, user interface, used technology, and implementation. Content of the practical part of this bachelor thesis includes analysis, solution proposals of improvements related to existing bugs, new features, and limitations of existing implementation and implementation in general. All the changes are completely aligned with the used technology using React and Redux. The final version is tested using usability testing methodology to receive valuable feedback related to implemented changes and feedback in general related to the web interface of the ALT library.

Generic database metadata extractor

Author
Victor Petukhov
Year
2023
Type
Bachelor thesis
Supervisor
Ing. Jan Trávníček, Ph.D.
Reviewers
Mgr. Jiří Toušek
Summary
This bachelor's thesis aims to demonstrate the feasibility of implementing a generic metadata extractor for DBMSs using JDBC API. The study implemented the JDBC metadata extractor, which can be used for all relational database management systems that provide the JDBC interface. The motivation behind implementing a generic metadata extractor was to address the absence of a suitable generic extractor for all databases in Manta software solutions for data lineage analysis. The generic metadata extractor which uses JDBC API for gathering the metadata is implemented. The extractor is capable of retrieving various details such as the name, catalog, and schema of procedures, functions, tables, and views. Additionally, it provides information on the parameters and return types in these routines. The extractor can gather information about columns and their data types of tables, and views. The extractor was tested, and its ability to retrieve details about the database structure was demonstrated. The extracted entries are stored in in-memory or H2 databases using dictionary that was implemented using Manta dictionary abstractions. The findings of the study demonstrated that implementing a generic metadata extractor for RDBMS using JDBC API is feasible. The implemented extractor can be used for database management systems supported by the JDBC interface. It can provide information about the database structure required for data analysis. Further development and optimization of the extractor could lead to even more efficient and powerful database analysis.

Generated parser for the console language of the Algorithms library

Author
Ondřej Štorc
Year
2023
Type
Bachelor thesis
Supervisor
Ing. Jan Trávníček, Ph.D.
Reviewers
Ing. Tomáš Pecka
Summary
This thesis explores the transition from a handwritten parser in the Algorithms Library Toolkit to a new parser generated with ANother Tool for Language Recognition 4 (ANTLR4). The thesis begins by familiarizing the reader with fundamental concepts of parsing and parsers and providing an overview of ANTLR4 and the Algorithms Library Toolkit. The thesis details the integra- tion process of the ANTLR4-generated parser into the existing codebase of the Algorithms Library Toolkit. The approaches used to verify the correctness of the new parser are also described.

Master theses

Implementation of application licencing tool in Java

Author
Jan Vybíral
Year
2012
Type
Master thesis
Supervisor
Ing. Jan Trávníček, Ph.D.
Reviewers
Ing. Radomír Polách

Dead zone tree pattern matching in trees

Author
Robin Obůrka
Year
2016
Type
Master thesis
Supervisor
Ing. Jan Trávníček, Ph.D.
Reviewers
doc. Ing. Jan Janoušek, Ph.D.
Summary
A new Forward (Morris-Pratt-like) and a new Dead-zone tree pattern matching algorithms for ordered trees are presented. The algorithms find all occurrences of a single given tree pattern which match an input tree. They make use of linearisations of both the given pattern and the input tree. The algorithms use modified but similar approaches to their string equivalents. The size of the data structure constructed for the Forward tree pattern matching algorithm is linear in the size of the pattern tree. The Dead-zone tree pattern matching algorithm is using two data structures of sizes linear in the size of the alphabet and pattern tree, respectively. Algorithms were compared with best performing previously existing algorithms based on a (non-linearised) tree pattern matching using finite tree automata, stringpath matchers, and a Backward tree pattern matching algorithm. Measurements show that the Forward tree pattern matching algorithm outperforms these for single pattern matching and the Dead-zone tree pattern matching algorithm is comparable. Their time complexity properties are from the theoretical point of view decreased in comparison to their string equivalents but it is expected to improve. During matching, the number of symbol comparisons can be even linear in the size of the input tree in the best case in case of the Forward tree pattern matching algorithm and even sub-linear in case of the Dead-zone pattern matching algorithm.

Construction of a Pushdown Automaton Accepting a Language Given by Regular Tree Expression

Author
Tomáš Pecka
Year
2016
Type
Master thesis
Supervisor
Ing. Jan Trávníček, Ph.D.
Reviewers
Ing. Radomír Polách
Summary
This thesis studies regular tree expressions, a formalism for describing regular tree languages. Main topic of this thesis is a new algorithm for converting a regular tree expression to a pushdown automaton recognizing trees in their linear postfix notation. Resulting pushdown automaton is real-time height-deterministic, i.e., it can always be determinised. The algorithm for conversion is an adaptation of Glushkov's algorithm for converting a regular expression to a nondeterministic finite automaton. C++ implementation of the algorithm is attached as an extension of Automata Library.

Tree and pushdown automata minimisation

Author
Štěpán Plachý
Year
2017
Type
Master thesis
Supervisor
Ing. Jan Trávníček, Ph.D.
Reviewers
Ing. Radomír Polách
Summary
The topic of this thesis is the relationship between finite tree automata and pushdown automata and their minimization. Finite tree automata are an extension of finite automata and they process regular tree languages. In the thesis are described algorithms for reduction and minimization of finite tree automata, their conversion to pushdown automata accepting equivalent language in postfix notation and minimization of converting automata. All algorithms are also implemented in the project Automata library.

Dataflow analysis of SAS code language

Author
Miroslav Špak
Year
2019
Type
Master thesis
Supervisor
Ing. Jan Trávníček, Ph.D.
Summary
This thesis is dealing with analysis of data flows in SAS language and the ways of its representation. At first it examines the ways of analysing source codes, its reprezentation and visualisation of its data flows via Manta system. Than it explores syntax and semantics of SAS language and mentions a few differences from other languages used for database systems. After that it focuses on SAS studio and explores the data stored by this tool, structure of this data and abality to manage the data using SAS language. Based on this analysis the prototype tool is designed, implemented and at the end tested for its flawless function.

Finite tree automaton to tree regular expression conversion

Author
Jakub Doupal
Year
2019
Type
Master thesis
Supervisor
Ing. Jan Trávníček, Ph.D.
Reviewers
Ing. Tomáš Pecka
Summary
This Master's thesis studies regular tree expressions and a method for conversion from a finite tree automaton to a regular tree expression using equations. The studied method is then implemented in Algorithms Library Toolkit, which is being developed at the Department of Theoretical Computer Science at Faculty of Information Technology, Czech Technical University in Prague. This thesis also studies axioms for regular tree expressions and proposes new ones. The proposed axioms are also implemented in the Algorithms Library Toolkit.

Finite tree automata to regular tree expressions conversion by removal of states

Author
Tomáš Dejmek
Year
2019
Type
Master thesis
Supervisor
Ing. Jan Trávníček, Ph.D.
Reviewers
Ing. Tomáš Pecka
Summary
This thesis studies tree languages. Tree language is composed from set of trees similarly as textual language is composed from set of textual strings. Since class of languages which are accepted by finite tree automata and class of languages which are denoted by regular tree expressions are proven to be equal, conversion can be performed. I focus on conversion from finite tree automata to regular tree expressions. During this thesis, one state-of-the-art solution is presented and two new approaches are discovered and implemented. A brief comparison between algorithm and measurements are included.

Enhanced suffix arrays implementation and its usage

Author
Minh Trieu Quang
Year
2020
Type
Master thesis
Supervisor
Ing. Jan Trávníček, Ph.D.
Reviewers
Ing. Ondřej Guth, Ph.D.
Summary
The suffix tree has a major drawback having a large space consumption. The more space efficient data structure than suffix tree is a suffix array, and recently it was shown that every algorithm using a suffix tree can be replaced with an algorithm based on a suffix array in the same time complexity if the suffix array is enhanced with additional information and structures. The result is a proposed data structure of the enhanced suffix array (ESA) in C++ and implementations of the chosen algorithms that simulates three different suffix tree traversals. This solution is thoroughly tested, experimented and compared with the algorithms using the suffix tree and its standard traversals.

Dataflow extraction tool for Cobol programming language

Author
Andrej Taňkoš
Year
2021
Type
Master thesis
Supervisor
Ing. Jan Trávníček, Ph.D.
Summary
This work deals with the data flow analysis of COBOL programming language, specifically IBM COBOL dialect. The work first examines various approaches to data flow analysis, data flow representation and visualization in Manta project. Subsequently, IBM COBOL is analyzed to identify important segments of the language in which data is transferred and used. The work contains research of existing solutions that could help with the syntax analysis of COBOL programming language. Based on existing solutions and results of the analysis, an extraction tool is designed and implemented for the Manta project. The functionality of the resulting solution is shown in a set of tests and examples.

Quantum leap pattern matching in linearized trees

Author
Josef Erik Sedláček
Year
2021
Type
Master thesis
Supervisor
Ing. Jan Trávníček, Ph.D.
Reviewers
Ing. Štěpán Plachý
Summary
In this thesis a tree pattern matching is studied with a focus on linearised trees in prefix notation. The idea of the string matching algorithm called Quantum Leap is adapted to search tree patterns in linearised trees. Several variants of the algorithm are implemented as a part of the Forest FIRE toolkit and the empirically best implementations are compared with other existing algorithms in the toolkit.

Metadata extraction, parsing, and dataflow detection in Snowflake sql dialect

Author
Marek Tornóci
Year
2020
Type
Master thesis
Supervisor
Ing. Jan Trávníček, Ph.D.
Summary
The thesis deals with the analysis of data flows in the Snowflake SQL dialect and their possible representations. The work first examines how to analyze source codes, their representations, and the visualization of its data flows via the Manta system. The work continues to describe Snowflake database objects, their metadata needed to extract, how the metadata can be extracted, and analyzes the Snowflake SQL dialect. Based on this analysis, the design of a prototype solution and its implementation is created. The implemented prototype is covered by tests verifying its functionality.

Dataflow analysis of Google BigQuery scripts

Author
Kyrylo Bulat
Year
2021
Type
Master thesis
Supervisor
Ing. Jan Trávníček, Ph.D.
Summary
This thesis is focused on the analysis of data flows for Google BigQuery scripts and their representation. Firstly, it describes the possible approaches to data lineage, source code analysis, and data flow visualization in the Manta system. It then examines the Google BigQuery technology, its database objects and SQL dialect syntax. It continues with design and architecture, which are used during the implementation of the prototype. The last chapters of this work are dedicated to testing and presenting the outputs of the created solution.

Data flow analysis of scripts in SAP Hana SQL dialect

Author
Ondřej Hlaváč
Year
2022
Type
Master thesis
Supervisor
Ing. Jan Trávníček, Ph.D.
Reviewers
Mgr. Jiří Toušek
Summary
This thesis deals with the analysis of data flows specified in the SQL statements used by the SAP Hana database. First, it describes what data lineage is and why it is useful, then the different data lineage retrieval approaches. After that, it shows selected SAP Hana's SQL statements and their data flow. It also describes the design and implementation of a prototype tool to automatise the analysis as a component of the Manta Flow. This prototype tool can generate graphs describing the data movements/flows of given SQL scripts.

Data flow analysis of scripts in Databricks SQL dialect

Author
Lucie Procházková
Year
2023
Type
Master thesis
Supervisor
Ing. Jan Trávníček, Ph.D.
Reviewers
Mgr. Radek Mácha
Summary
This thesis investigates the Databricks SQL dialect, proposing an automatic script analysis method and a prototype scanner unit for Manta, a data lineage tool. Data lineage is essential for data integrity and governance. The research outcomes include a comprehensive data flow analysis in Data- bricks SQL, the prototype scanner unit design and implementation, and thor- ough testing. Our contributions enhance Manta's ability to work with Data- bricks systems, providing a valuable analytic tool for organizations relying on Databricks SQL for data processing.