Ing. Stanislav Kuznetsov, Ph.D.

Theses

Bachelor theses

Metadata for faculty data warehouse

Author
Jakub Krejčí
Year
2015
Type
Bachelor thesis
Supervisor
Ing. Stanislav Kuznetsov
Reviewers
Ing. Michal Valenta, Ph.D.
Summary
This bachelor's thesis describes design and implementation of metadata solutions for new data warehouse Faculty of Information Technology, CTU. In thesis is described theory of metadata and three basic types of metadata in data warehouse (business, technical, process execution). Implementation part describes complex design of metadata solution for data warehouse of faculty. It is described pilot deployment of process metadata using Pentaho Data Integration and businesss metadata using Pentaho Metadata Editor. Pilot deployment was successfully tested on PostgreSQL database and on creation report in Pentaho Report Designer.

Data historization for purposes of faculty warehouse

Author
Robert Kotlář
Year
2015
Type
Bachelor thesis
Supervisor
Ing. Stanislav Kuznetsov
Reviewers
Ing. Michal Valenta, Ph.D.
Summary
This bachelor thesis deals with the design and implementation of historization processes in data warehouse. In the thesis are described the most common ways of historization. Consequently are described all the implementation steps of the solution that I've chosen. All the historization processes were implemented in Pentaho Data Integration. Implementation was successful and the solution was deployed. In the last chapter I've prepared an example, how the data warehouse and analytic queries can benefit from historization.

Data warehouse API

Author
Daniel Pršala
Year
2015
Type
Bachelor thesis
Supervisor
Ing. Stanislav Kuznetsov
Reviewers
Ing. Jiří Mlejnek
Summary
This work is focusing on developing a prototype of suitable application in- terface to llow third-party applications access to faculty data warehouse. The content is analysis of current state of possible solutions, design of selected solution but especially implementation of working prototype itself. Non standard archi- tecture was used to resolve this problem. The architecture consists two layers. Communication layer, which controls the connection with client application and transforms the requested data into required format, contains several inde- pendent modules and business layer, which controls the permissions for each module and retrieves the data from the faculty database. The goal of this work is to create an application interface based on that architecture, which can also manage user roles and permissions and is easy to extend and manage.

ETL server for the faculty data warehouse

Author
Radim Lenger
Year
2015
Type
Bachelor thesis
Supervisor
Ing. Stanislav Kuznetsov
Reviewers
Ing. Michal Valenta, Ph.D.
Summary
The bachelor thesis deals with the design and implementation of ETL server for the school data warehouse purposes. Nowadays the ETL transformations are being started manually by admin where there is no specific process control. Designed server provides regular running of ETL transformation, monitors their activity and records information of running ETL scripts. I also designed process workflow of the entire procedure. Server also serves as testing machine for pilot deployment of historization and metadata within the ETL processes. I've chosen linux server and the open source tool Pentaho Kettle. In the first theoretical part the reader is informed about basic therms. I've also made a search report of available solutions for ETL processes. Concurrently I wrote down the measurement and testing of ETL jobs and their running on the server within the key proccess indicators section. In the second practical part I've done analysis of current solution. Thereafter I've designed and implemented ETL server workflow. I've prepared daemon written in C++ and I also wrote a few shell scripts. In conclusion I've tested the server regarding to the theoretical part and I've added some tests of my daemon and scripts. Within the real deployment admin will need to change and harmonize shell scripts regarding to the real data sets.

Data warehouse encyclopedia

Author
Martin Čejka
Year
2015
Type
Bachelor thesis
Supervisor
Ing. Stanislav Kuznetsov
Reviewers
Ing. Michal Valenta, Ph.D.
Summary
At the Czech Technical University a data warehouse has been created, to access more effectively the information and knowledge from the university systems. In order to unite and accelerate the communication between the management of the University and the team providing the data warehouse, a portal called Data warehouse Encyclopedia is currently needed. The analysis and the design of the portal are included in my bachelor thesis. The structure of the business and data dictionary represents an important part of the design. Due to my research, I have chosen the most appropriate project management system, accordingly I have created the Encyclopedia prototype, having been configured and tested. The proposed solution is used on the server of the faculty and prepared to be used further.

Mobile application for portal for collaboration with industry based on Android platform

Author
Petr Lorenc
Year
2016
Type
Bachelor thesis
Supervisor
Ing. Stanislav Kuznetsov
Reviewers
Mgr. Martin Podloucký
Summary
The content of theoretical part of this thesis is research of problematic of Android applications, research of tools for Android development and description of secured communication between mobile client and server. I also described here how to ensure parallel development of server and client. Practical part contains analysis, design, implementation and testing of prototype of Android application, which is set to communicate with portal Spolupráce s průmyslem. Emphasis is put on user interface and testing functional prototype.

Web application for the project "Successful first-grader"

Author
Lukáš Rod
Year
2018
Type
Bachelor thesis
Supervisor
Ing. Stanislav Kuznetsov
Reviewers
Ing. Jiří Mlejnek
Summary
The goal of this thesis is to build a web application for the project "Successful first-grader" which offers an extra education and courses for preschoolers. This application should allow the lector to store data about clients, their attendances, groups, payments for lectures and view the entire client's history. The server side is written in Python with Django web framework. The client side is built with React and communicates with the server REST API thanks to Django REST framework. Acceptance testing was successfully performed at the end and all found issues were fixed. The application is deployed to Heroku hosting and is used daily by the lector.

Intelligent personal assistant for Windows OS

Author
Jindřich Kuzma
Year
2018
Type
Bachelor thesis
Supervisor
Ing. Stanislav Kuznetsov
Reviewers
Ing. David Šenkýř
Summary
The goal of this thesis is to design an application for the Windows operating system, which will serve as an Intelligent Personal Assistant. Based on message inputs in natural language, the application will launch corresponding programmed functions. This thesis is based on the proper use of a library for Natural Language Processing functions. It is integrated as a local Python server that communicates with the application written in C++ over HTTP. The implemented application contains 12 executable functions that make up the assistant and it is possible to run it on a local machine.

Market Sentiment Indicator

Author
David Lebl
Year
2019
Type
Bachelor thesis
Supervisor
Ing. Stanislav Kuznetsov
Reviewers
Ing. Jan Blizničenko
Summary
The goal of this work is to design, implement and deploy an application thatanalyzes the sentiment of Twitter posts focusing on cryptocurrency and trad-ing. The theoretical part of the thesis deals with research of big data ar-chitecture for real-time stream analysis. The analytical and implementationpart deals with the choice of selected technologies, implementation of partialapplications of the microservices architecture and the way of deploying theapplication to the Kubernetes cloud environment. The result of the workis a monitoring system intended as a service for retail investors.

Market signal algorithm based on image recognition

Author
Andrey Babushkin
Year
2019
Type
Bachelor thesis
Supervisor
Ing. Stanislav Kuznetsov
Reviewers
doc. Ing. Štěpán Starosta, Ph.D.
Summary
Millions of transactions are processed in worldwide markets. Traders fight for profits by selling and buying different assets worldwide. In this endless war for money, tons of different techniques are being created, attempting to predict the price in advance and help traders make correct decisions. This thesis proposes a novel approach to analyse historical data of the price and generate market signals that tell traders what action should be taken right now. We make use of convolutional neural networks in combination with fully-connected ones to introduce a new model. Moreover, we discuss a technique to create a training dataset from a visual representation of a market indicator called the Relative Strength Index. The proposed model achieves 69% accuracy on data of the ETH/BTC cryptocurrency pair that, if taking into account the overall volatility of cryptomarkets, is a good baseline for future solutions.

Web application for online system at Fitness Power

Author
Patrik Kubec
Year
2019
Type
Bachelor thesis
Supervisor
Ing. Stanislav Kuznetsov
Reviewers
Ing. Jan Blizničenko
Summary
This bachelor thesis deals with creation of functional prototype of web application for fitness studio. This prototype will be in future added and fully adapted for fitness studio in case of thesis. Apllication will be used especially by employees of company and will be used like a tool for evidence of goods, staff management, finance overview and many other things. The purpose of the application is to improve quality of work in the workplace. Application serves as a substitute for paper records and because of that it makes work faster and more enjoyable. Thanks to easy and clearly control of application it is easier recording of movement customers in fitness studio. Technologies Spring Boot, Spring Data, Vaadin Framework, Vaadin Flow, Maven, H2 database are used to create the work.

Online Clearing center

Author
Mykyta Boiko
Year
2019
Type
Bachelor thesis
Supervisor
Ing. Stanislav Kuznetsov
Reviewers
Ing. Jiří Novák, Ph.D.
Summary
The overriding objective of this thesis is dedicating to create a prototype of the web-application that would act as a marketplace connecting crypto-trades, investors and developers of trading strategies. It should help to monetise the work of cryptocurrency trading strategies developers and to maximise profit for crypto-traders and investors using trading "Buy/Sell" signals generated by trading strategies. Another equally important component of work is to provide REST API as to crypto-traders, investors as well to trading strategies developers and different tools helping with choosing the required trading strategy. Tools include candlestick chart describing price movement with the mapping of trading signals with already expired by that time validity, and statistical data giving an analysis of strategy performance based on measuring overall success or failure of signals predictions for every single strategy presented on the marketplace. The result should be a prototype of a future ambitious platform capable of creating competition in today's crypto trading market.

Automated creation of experts profiles from public data

Author
Tomáš Lenoch
Year
2020
Type
Bachelor thesis
Supervisor
Ing. Stanislav Kuznetsov
Reviewers
doc. Ing. Pavel Kordík, Ph.D.
Summary
This bachelor thesis deals with design and analysis of methods for automated creation of expert profiles from public data. Trends and generally used methods in this area are examined. Several methods are proposed, implemented, and evaluated. Proposed methods are using different approaches to create profiles. Evaluation of method is achieved by comparing its output profiles with profiles from reference data. In the end, analysis of evaluation results which examines positive features and problems of each proposed method is provided. Output of thesis is a web application which allows user to create expert profiles using proposed methods.

The system for automatically creation of experts profiles based on similarities of their R&D results

Author
Maxim Sachok
Year
2020
Type
Bachelor thesis
Supervisor
Ing. Stanislav Kuznetsov
Reviewers
doc. Ing. Pavel Kordík, Ph.D.
Summary
This thesis is implementing a web application for author identification. In this thesis we will go through research, analysis, design, implementation and testing of a software. This application is follow microservice architecture and can be accessed with REST Client.

Linked Data and Question Answering module for chatbot

Author
Vojtěch Kulovaný
Year
2020
Type
Bachelor thesis
Supervisor
Ing. Stanislav Kuznetsov
Reviewers
doc. Ing. Pavel Kordík, Ph.D.
Summary
This bachelor's thesis' aim is to create a sample module for gathering data for a chatbot using Linked Data. After accepting a request, it should be able to gather the data from internet with a SPARQL query and return it.

AutoML approach in recommendation systems

Author
Daniil Pastukhov
Year
2021
Type
Bachelor thesis
Supervisor
Ing. Stanislav Kuznetsov, Ph.D.
Reviewers
Ing. Vojtěch Vančura
Summary
In all machine learning domains, as well as in recommendation systems, it is not easy to decide which model should be used in a given context. Recommendation systems are software tools trying to predict what items a user would be interested in. This thesis aims to evaluate different AutoML approaches using state-of-the-art algorithms and metrics, such as recall, catalogue coverage, and serendipity. AutoML is a process of automating the time-consuming and iterative training process of the ML model. There two relevant approaches in AutoML that we have chosen to experiment with --- Hyperparameter optimization and Meta-Learning. In this thesis, we tested state-of-the-art algorithms on publicly available MovieLens datasets employing AutoML techniques, and also proposed the alternative definition of serendipity.

Predicting signals in financial markets using time series analysis

Author
Bohumil Miláček
Year
2021
Type
Bachelor thesis
Supervisor
Ing. Stanislav Kuznetsov, Ph.D.
Reviewers
doc. Ing. Kamil Dedecius, Ph.D.
Summary
Work focuses on analysis of most suitable financial time series for Machine learning to predict buy/sell signals. Data selected for this thesis were stocks, namely TSLA, AAPL, MSFT. Their Tick data were taken and aggregated to time, tick, volume, dollar and Renko bars. On all bars a labeling was done using CTL method, which marks trend. A trivial strategy was used to simulate trading on all of the bars. As classifiers for evaluation a Random forest and Neural network were used, on which were the data trained and evaluated suitability for Machine learning models based on results of trend prediction on bars and results of simulated trading on predicted buy/sell signal. Results showed the most suitable bars are Renko.

Analyze and reconstruct order book to develop a model to predict cryptocurrency assets price.

Author
Mikhail Lyashenko
Year
2022
Type
Bachelor thesis
Supervisor
Ing. Stanislav Kuznetsov, Ph.D.
Reviewers
Mgr. Petr Šimánek
Summary
This thesis analyzes the information given in the cryptocurrency market, explores which works best in time series prediction of the data in the market: LSTM-RNN or GRU- RNN and attempts to predict price change in the market for three cryptocurrency pairs: BTC/USDT, ETH/USDT and DOGE/USDT. In particular, the main contributions of the thesis are as follows: 1. Analyzes cryptocurrency market and identifies some strong features of limit order books 2. Shows a true potential of RNNs in time series prediction. 3. Emphasizes a true importance of the data, on which models are trained

Web application for tracking cryptocurrency prices.

Author
Jan Koten
Year
2022
Type
Bachelor thesis
Supervisor
Ing. Stanislav Kuznetsov, Ph.D.
Reviewers
Ing. Ivo Petr, Ph.D.
Summary
This work deals with the issue of predicting the development of cryptocurrency. Within it, a web application is created that provides users with information such as the beginning, end and value of the following local extremes. This data is predicted by the application within tens of minutes into the future. Along with the historical value of cryptocurrencies, we are also recording searches of cryptocurrencies on Google and data from the development of VIX, S&P 500 and gold values. Application predictions are made on the basis of this data. At the same time, the recorded data is displayed in the application in the form of a graph. Users are then provided with real-time recommendations on the actions they need to take to earn money. These actions are sales, purchases and inactivity within individual cryptocurrencies. In this way, the application is able to help users improve their orientation in the cryptocurrency market.

Predicting forex trading signals using methods of image recognition.

Author
Gleb Fedorov
Year
2022
Type
Bachelor thesis
Supervisor
Ing. Stanislav Kuznetsov, Ph.D.
Reviewers
Ing. Tomáš Kalvoda, Ph.D.
Summary
Algorithmic trading, since its inception, has been a really attractive field for lots of researchers, promising an immediate return for any ingenuity put into it. Greatly inspired by the success of image recognition and deep learning in recent years, we tried to utilize said success in Forex. We prepared a survey of current image recognition techniques and applied them to predict Forex signals. Same as people can conclude upcoming trends by examining graphs; we trained an artificial neural network that can predict future price trends from graphs of extrapolations created via Fast Fourier transform. The final model achieves a performance of 63% on test data

Acoustic Noise Cancellation by Machine Learning

Author
Artem Yutukov
Year
2023
Type
Bachelor thesis
Supervisor
Ing. Stanislav Kuznetsov, Ph.D.
Reviewers
doc. Ing. Pavel Kordík, Ph.D.
Summary
This bachelor thesis contains general information about active noise cancellation systems, the history of the field, and a discussion of how we can represent sound for artificial intelligence processes. In particular, it discusses the use of machine learning in this area, and demonstrates the success of such systems, taking as a basis the convolutional recurrent neural network (CRN) architecture introduced by Tan K. and Wang D. The demonstration model is designed to predict future Short Time Fourier Transform (STFT) windows, using previous and zero-padded STFT windows as inputs. The proposed system was trained using a supervised learning approach and evaluated based on its ability to accurately predict several future STFT windows. The performance of the system was measured using the normalized mean square error (NMSE) between the predicted and actual STFT windows. The results show that the CRN-based system achieved low MSE values, indicating a high level of accuracy in predicting future STFT windows. The paper also discusses the limitations and possible improvements of the proposed system as well as its potential application in real scenarios. Overall, the results obtained in this work provide valuable insights into audio pre-processing and processing using neural networks and offer a promising foundation for future research in this area.

Development of Reporting Tool in the Experts.ai Platform

Author
Lev Popov
Year
2024
Type
Bachelor thesis
Supervisor
Ing. Stanislav Kuznetsov, Ph.D.
Summary
Předmětem této bakalářské práce je návrh a implementace frontendu nástroje pro reportování. Tento modul rozšiřuje webovou platformu Experts.ai vizualizací uživatelské aktivity na widgetech ve formě grafů a tabulek. Modul byl implementován pomocí frameworku Angular a programovacího jazyka TypeScript, stejně jako zbytek frontendu platformy, aby rozšířil webovou aplikaci o nové komponenty a webové stránky. Hlavním přínosem této práce bylo pečlivě navržené uživatelské rozhraní, které umožňuje zobrazit klíčové statistiky vedoucí k cenným poznatkům o chování uživatelů, co napomáhá k rozhodnutím založeným na datech. Tento návrh byl podpořen moderními frontendovými knihovnami použitými při implementaci nástroje pro reportování, a to významně zlepšilo uživatelský zážitek a udržovatelnost kódu platformy.

Development of Reporting Tool in the Experts.ai Platform

Author
Roman Chertishchev
Year
2024
Type
Bachelor thesis
Supervisor
Ing. Stanislav Kuznetsov, Ph.D.
Summary
The subject of this bachelor's work is the design and implementation of a module to collect and aggregate user interactions in widgets provided by the Experts.AI application. For this purpose we analyzed the application in which the module will be developed, analyzed existing solutions. The software was developed using technologies that are already used in the application. The result is a module that collects three types of user interactions and provides an API for sending aggregated statistical data on request, in addition to this functionality that allows you to unify the widgets of the application, making it possible to determine which widget the interaction is attached to.

Utilization of Transformer architecture for predicting financial time series in the forex market

Author
Radek Přibyl
Year
2024
Type
Bachelor thesis
Supervisor
Ing. Stanislav Kuznetsov, Ph.D.
Reviewers
Ing. Karel Klouda, Ph.D.
Summary
This thesis presents the implementation of two deep learning models--Transformer and Autoformerfor time series forecasting, specifically predicting foreign exchange (Forex) prices. Both models were developed from scratch using Python and TensorFlow. Part of the process was to utilize time series properties and signal processing techniques to improve the accuracy of the predictions. Data from the Forex market served as the input datasets, and technical indicators, were added to provide more information for the models to learn from. In addition, data smoothing techniques from the signal processing field were applied to reduce noise in the time series. The models were trained to predict prices, and their outputs were transformed into the binary classification domain to determine how the price changed during the observed time period. The performance of the models was compared to each other and to two basic baseline models. All code with the techniques used in this work is publicly available on GitHub at https://github.com/pribylr/bp/.

Master theses

Data Warehouse for CTU Survey

Author
Jiří Grill
Year
2015
Type
Master thesis
Supervisor
Ing. Stanislav Kuznetsov
Reviewers
Ing. Michal Valenta, Ph.D.
Summary
This master thesis focuses on reimplementation of already existing data warehouse, which gathers and stores all the entries from the university-wide application Anketa. Main purpose of this data warehouse is to store the all data from particular surveys and according faculty at Czech Technical University in Prague at the end of each semester. The main goal of this paper is to create a solution that would allow further integration into the newly established internal project at the Faculty of Information Technology. Furthermore, this paper introduces the general issues of data warehouses and metadata. One standalone chapter is dedicated to reporting tools that are used to generate reports from all the entries stored in the data warehouse. Resulting solution of this thesis is an implementation of the data warehouse that allows in-depth analysis of all the entries from the Anketa application database using the ETL scripts.

Integration of a recommendation system to the SSP portal

Author
Josef Dvořák
Year
2017
Type
Master thesis
Supervisor
Ing. Stanislav Kuznetsov
Reviewers
doc. Ing. Pavel Kordík, Ph.D.
Summary
This thesis concern with problems and possibilities of usage recommendation system (RS) and with problems of recommendation techniques and algorithms. It also concern with comparing and evaluation metrics of RS. The theoretical part of a thesis presents the domain of RS and actual problems. Then it presents the metrics and evaluation techniques that used for comparing different RS. At the last part, it presents the summary of actual RS a choose the most appropriate. The practical part of work describe design and implementation of an application for comparing and evaluating RS. The test part of the thesis includes results of comparing of recommendation models of selected RS.

Performance Prediction of the Bachelor programme Informatics at FIT CTU

Author
Magda Friedjungová
Year
2016
Type
Master thesis
Supervisor
Ing. Stanislav Kuznetsov
Reviewers
doc. Ing. Pavel Kordík, Ph.D.
Summary
This thesis is about the extraction of data from faculty systems which are being used to record the results of students at the Faculty of Information Technology at CTU. The collected data is preprocessed and predictive models which determine the success rate of students in the first semester of the first year of the bachelor programme Informatics in the academic year 2015/2016 are composed using suitable methods. An analysis of student performance is done based on the results of the prediction and improvements are proposed. The thesis further contains descriptions of the methods and evaluation of the models so that they could be reused in the next academic year.

Data Integration Methods for the CTU Data Warehouse

Author
Robert Kotlář
Year
2017
Type
Master thesis
Supervisor
Ing. Stanislav Kuznetsov
Reviewers
Ing. Magda Friedjungová
Summary
This master's thesis describes the design and implementation of data integration for the new data warehouse of CTU, which is implemented within development projects (DÚ no. 20 and no. 46). The first part of the thesis describes a theory of data warehouses, their architecture, and theory of data integration. In the implementation part of this thesis, author introduces the design and implementation of a Staging layer for the data warehouse of CTU. The last part defines the process of data integration into the data warehouse of CTU and its automatization.

Integration of Cooperation with Industry System into CTU Data Warehouse

Author
Jan Mikeš
Year
2017
Type
Master thesis
Supervisor
Ing. Stanislav Kuznetsov
Reviewers
Ing. Magda Friedjungová
Summary
This thesis deals with data integration of Cooperation with Industry system into CTU Data Warehouse. Firstly, analysis was conducted for both systems. Secondly, design was proposed for all the layers based on the architecture of existing data warehouse. The layers are: stage, integrated data layer and access layer which consists of semantic layer and data marts. Thirdly, implementation of ETL processes was described. Finally, analytic reports were created to demonstrate functional solution.

Integration of V3S into the CTU Data Warehouse

Author
Michal Štádler
Year
2017
Type
Master thesis
Supervisor
Ing. Stanislav Kuznetsov
Reviewers
Ing. Michal Valenta, Ph.D.
Summary
This thesis contains analysis of the System V3S for the purpose of its integration into the CTU Data Warehouse. The thesis also describes a solution concept and realization of stage, integrated, semantic, data access and data presentation layers of newly integrated data from V3S into the CTU Data Warehouse. The conclusion of the thesis includes several examples of reports, based on the data from V3S, which besides their own analytical meaning test the functionality of the V3S integration into the CTU Data Warehouse.

Extension of the web application for the project "Úspěšný prvňáček"

Author
Lukáš Rod
Year
2020
Type
Master thesis
Supervisor
Ing. Stanislav Kuznetsov
Reviewers
Ing. Josef Vogel, CSc.
Summary
The goal of this thesis is to extend the web application for the project "Úspěšný prvňáček" which offers an extra education and courses for preschoolers. The original application was created in a bachelor's thesis and offers features for storing information about clients, their attendances, groups, payments for the lectures, and viewing the client's entire history. The server side of the original and also the new application is written in Python with Django web framework, the client side is built with React and communicates via a REST API thanks to Django REST Framework. The final new extended application meets all the new requirements made by the lector. Also, thanks to the integration with advanced tools for easier development and maintenance, high code coverage by automated API and UI (E2E) tests, and configuration of multiple deploy environments, more reliable and faster delivery of new releases is possible. The application is deployed to more environments (including the production one) to Heroku and offers much more efficient and comfortable everyday work covering more areas of functionality thanks to the new features.

Module for intent detection in the internet banking domain for the Czech language

Author
Samuel Fabo
Year
2022
Type
Master thesis
Supervisor
Ing. Stanislav Kuznetsov, Ph.D.
Reviewers
Ing. Daniel Vašata, Ph.D.
Summary
In this thesis, we research and apply various techniques to solve the intent detection problem in the Czech internet banking domain. The intent detector is a fundamental part of each chatbot and keeps the user longer in contact with the machine if a high-quality, fine-tuned detector is used. We needed to gather the training data on our own because there are no publicly available datasets in the Czech language for this domain. Later on, we merged gathered samples of intents with the publicly available dataset BANKING77, which we translated into the Czech language. We succeeded in fine-tuning a model, which had good accuracy results on the test set. We deployed the model to the production version of the demo application.

Automated extraction of personal profiles from a university domain using web scraping and NLP methods

Author
Tomáš Lenoch
Year
2023
Type
Master thesis
Supervisor
Ing. Stanislav Kuznetsov, Ph.D.
Reviewers
Ing. Milan Dojčinovski, Ph.D.
Summary
This thesis deals with the development of a software application that can automatically extract personal profiles of employees from university websites, using web scraping and natural language processing (NLP) techniques. The extracted profiles include affiliations of the employees towards organizational units within the university. In addition, a user-friendly graphical interface is provided in the application to verify and modify the extracted profiles. The component-based design of the application allows for future adjustments to handle a more specific set of universities. The performance of the application is evaluated on the set of manually scraped university websites. The evaluation results suggest that the application can perform the required tasks. Further testing is required in the future due to the limited size of the reference set.