Ing. Milan Dojčinovski, Ph.D.

Theses

Master theses

Use of the Crowdsourcing Model for Data Annotation and Categorization

Author
Tomáš Kouba
Year
2013
Type
Master thesis
Supervisor
Ing. Milan Dojčinovski, Ph.D.
Reviewers
Ing. Ivo Lašek, Ph.D.

Web application for online betting game

Author
Jaroslav Líbal
Year
2013
Type
Master thesis
Supervisor
Ing. Milan Dojčinovski, Ph.D.
Reviewers
Ing. Jaroslav Kuchař, Ph.D.

Link Discovery Framework for the Web of Data

Author
Karel Svoboda
Year
2012
Type
Master thesis
Supervisor
Ing. Milan Dojčinovski, Ph.D.
Reviewers
Ing. Ivo Lašek, Ph.D.

Use of Crowdsourcing to Improve the Quality of Web API Documentations

Author
Michal Majerník
Year
2014
Type
Master thesis
Supervisor
Ing. Milan Dojčinovski, Ph.D.
Reviewers
doc. Ing. Tomáš Vitvar, Ph.D.

Automatic Generation of Web API Documentations

Author
Ondřej Karas
Year
2014
Type
Master thesis
Supervisor
Ing. Milan Dojčinovski, Ph.D.
Reviewers
Ing. Jaroslav Kuchař, Ph.D.

Evaluation framework for the NER systems

Author
Marek Kužel
Year
2014
Type
Master thesis
Supervisor
Ing. Milan Dojčinovski, Ph.D.
Reviewers
Ing. Josef Pavlíček, Ph.D.

Crawler for Collecting Web API Documentation

Author
Jiří Šmolík
Year
2015
Type
Master thesis
Supervisor
Ing. Milan Dojčinovski, Ph.D.
Reviewers
Ing. Petr Špaček, Ph.D.
Summary
The diploma thesis tackles the crawling and analysis of Web API documentation with focused crawler, which searches the Internet for user-specified documents. Each document is then classified as either API documentation or Other. The classification part uses a number of supervised machine learning algorithms, which are applied to crawled documents to decide, whether document is or is not a Web API documentation.

Learning Domains for Named Entities

Author
Tomáš Benák
Year
2016
Type
Master thesis
Supervisor
Ing. Milan Dojčinovski, Ph.D.
Reviewers
Ing. Jaroslav Kuchař, Ph.D.
Summary
The master thesis deals with the domains of the named entities and the possibilities of machine learning over them. At first the thesis analyses the problem of machine learning, the sources of data and the actual solutions. Based on these analyzes, the application, which creates the training datasets, and the REST API, which automates the process of learning domains for entities, are designed and implemented. Furthermore, the program Weka, which helps with creating models, and the project DBpedia, which is the main source of named entities, are described. Finally, the experiments are made to evaluate the quality of created models for learning domains for named entities.

Open-Source Crowdsourcing Application

Author
Tomáš Marek
Year
2016
Type
Master thesis
Supervisor
Ing. Milan Dojčinovski, Ph.D.
Reviewers
Ing. Petr Špaček, Ph.D.
Summary
This diploma thesis deals with the use of crowdsourcing for retrieving information and data. After studying this method, some existing tools are analyzed. On the basis of the pros and cons of these solutions, a new application is designed and implemented. The possible results are demonstrated in few concrete use cases.

Summarizing Linked Open Data Datasets

Author
Jana Čabaiová
Year
2017
Type
Master thesis
Supervisor
Ing. Milan Dojčinovski, Ph.D.
Reviewers
RNDr. Jakub Klímek, Ph.D.
Summary
The work deals with the study of the project Linked Open Data, its current state and also with the overview of the particular semantic technologies. It is RDF model, query language SPARQL, different formats for RDF datasets and the different accesses to the particular datasets. Part of the work is also the development of the web application which contains analysis, design, implementation and testing of the particular application. The main method of this application should enable the calculation of the summarization of LOD datasets on the base of domain specification, which means calculation of domains and entities proportion in particular dataset. The main result of this work is created and tested web application with the above mentioned implemented method on the real datasets DBpedia a GeoNames and also the processing and comparing of the particular results. This application should be useful mainly for these, who need to find out domain representation of their Linked Open Data dataset or they need to compare domain representation of two different datasets.

Graph Based Recommendation Algorithms for Linked Data

Author
Martin Chouň
Year
2016
Type
Master thesis
Supervisor
Ing. Milan Dojčinovski, Ph.D.
Reviewers
doc. Ing. Pavel Kordík, Ph.D.
Summary
The thesis deals with graph based recommendation algorithms for Linked Data. The author focuses on the description of the Semantic Web and Linked Data principles, recommender systems, their functions, recommendation techniques, and mentions the current existing solutions in the Linked Data recommendation. He deals with the analysis of graph algorithms and presents design and implementation of the application which uses them. Finally, the author makes experiments with the application on real data, discusses acquired knowledges and gives readers insight into the future development and expansion of this thesis and application.

Collection, Transformation, and Integration of Data from the Web Services Domain

Author
Radmir Usmanov
Year
2018
Type
Master thesis
Supervisor
Ing. Milan Dojčinovski, Ph.D.
Reviewers
doc. Ing. Tomáš Vitvar, Ph.D.
Summary
Currently, there are several repositories and data models that provide descriptions for Web APIs. The diploma thesis tackles the problem of transforming descriptions of Web APIs from several data models into one unified data model. It analyzes existing datasets and data models for Web APIs, establishes mappings between different data models, collects, transforms and integrates Web APIs data models into the unified data model, validates and evaluates extraction results.

Method for summarization and importance assessment of information on the Web of Data

Author
Marek Filteš
Year
2017
Type
Master thesis
Supervisor
Ing. Milan Dojčinovski, Ph.D.
Reviewers
Ing. Jaroslav Kuchař, Ph.D.
Summary
This work deals with entity sumarization of semantic web. Firstly, the issue of information, evaluation the importance of information as well as the general summation of entities. It goes to entity sumarization of semantic web entities. The practical part deals with design of the model and implementation of the entity summary tool based on the DBpedia abstracts dataset. The generated knowledge base is integrated within the implementation of the web browser.

Blockchain Based RDF Management

Author
Remy Rojas
Year
2019
Type
Master thesis
Supervisor
Ing. Milan Dojčinovski, Ph.D.
Reviewers
Ing. Jaroslav Kuchař, Ph.D.
Summary
As Structured open data sees a growth in popularity evidenced by the size of networks such as the Linked Open Data LOD cloud, aspects of its lifecycle management and scalability have yet to be adressed. At the time of writing, implementations of change tracking and provenance do not guarantee integrity and availability, and depend upon individual domain owners to be deployed and maintained. This represents a threat to the stability of a system in which data is composed of cross-domain URI references such as the Semantic Web's de-facto model: RDF. In this paper we explore the advantages and capabilities a solution based on Blockchain can provide when used as a support for RDF. We provide the design, implementation, testing, and evaluation of a Proof of Concept Distributed Ledger which addresses the use-cases of Create, Read, Update, Delete (CRUD) operations, Linked Data Notifications, and Publish/Subscribe Observer pattern. Our solution provides mutually distrusting parties a support for traceability and provenance of versioned RDF statements, leveraging integrity and availability with decentralization.

Domain-Specific NER Adaptation

Author
Bogoljub Jakovcheski
Year
2019
Type
Master thesis
Supervisor
Ing. Milan Dojčinovski, Ph.D.
Reviewers
Ing. Jaroslav Kuchař, Ph.D.
Summary
The popular but still under development Named Entity Recognition (NER) technology has seen a significant usage in both academic and industrial sphere inspite of it's more dominant coarse grain usage compared to it's fine grain usage. In this thesis, we use DBpedia NIF dataset. We process them, and prepare new datasets ready for training models with Stanford NER. Experiments are provided with trained models which cover the impact of results when used with global domain model and domain specific model. In addition, the experiments examine the impact of number of articles used to train models. The results from the experiments show that the domain specific fine grain models provide a better results than domain specific coarse grain models and global models in both annotations. As well, models trained with higher number of articles give better results than models trained with lower number of articles.

Extraction of linguistic information from Wikipedia

Author
Andriy Nazim
Year
2019
Type
Master thesis
Supervisor
Ing. Milan Dojčinovski, Ph.D.
Reviewers
prof. Dr. Ing. Petr Kroha, CSc.
Summary
DBpedia is a crowd-sourced community effort which aims at the extraction of information from Wikipedia and providing this information in a machine-readable format. Currently, the information contained in DBpedia is primarily derived for semi-structured sources such as Wikipedia infoboxes. However, vast amount of information is still hidden in the Wikipedia article texts. In this Thesis, I present approaches for extracting linguistic information from DBpedia, which are based on combining and parsing DBpedia sources - datasets and the results of the Master Project are datasets of linguistic information: synonyms, homonyms, semantic relationships, and inter-language synonyms. My project also pays special attention to cleaning, filtering of produced datasets, and its evaluation was carried out also by developing a Simple Web-Application for querying results.

Enrichment of the DBpedia NIF dataset

Author
Pragalbha Lakshmanan
Year
2020
Type
Master thesis
Supervisor
Ing. Milan Dojčinovski, Ph.D.
Reviewers
RNDr. Jakub Klímek, Ph.D.
Summary
DBpedia is a crowd-sourced community effort which aims at extracting information from Wikipedia articles and providing this information in a machine-readable format. DBpedia provides the extracted information as NIF datasets with the content of all Wikipedia articles in 128 languages. The aim of the thesis is to enrich this dataset with additional information by providing the results of splitting sentences, segregating tokens, finding parts of speech for tokens and enhancing links for the content of Wikipedia articles in English, French, German, Spanish and Japanese languages. The implementation consists of performing NLP tasks namely sentence splitting, tokenization, part of speech tagging on the pre-processed NIF datasets. Eventually contributing to the DBpedia community by adding additional links to the Wikipedia articles. Finally, evaluating the runtime of various NLP tasks and checking the accuracy of the results statistically. Enriching NIF dataset with the result-set of NLP tasks generated from the tool, is useful for performing more complicated NLP task(s).

Fact extraction from Wikipedia article texts

Author
Jakub Trhlík
Year
2020
Type
Master thesis
Supervisor
Ing. Milan Dojčinovski, Ph.D.
Reviewers
Ing. Petr Špaček, Ph.D.
Summary
Wikipedia is great source of information, currently its text information has not been extracted into fully machine-readable format. In this thesis, we use DBpedia NIF dataset, representing Wikipedia page structure, for targeted fact extraction. The dataset is parsed, enriched by links using several methods and then prepared for fact extraction. In this thesis multiple methods of fact extraction are researched, implemented and tested on selected relations. Experiments describe accuracy and viability of selected and implemented methods. Extracted relations are evaluated and submitted for addition to the DBpedia database.

Framework for Extraction of Wikipedia Articles Content

Author
Oleksandr Husiev
Year
2022
Type
Master thesis
Supervisor
Ing. Milan Dojčinovski, Ph.D.
Reviewers
Ing. Mgr. Ladislava Smítková Janků, Ph.D.
Summary
This thesis describes the development process of the extraction of Wikipedia articles content for a DBpedia, a crowd-sourced community effort. The main goal of this thesis was to develop a framework for extraction of Wikipedia articles content, structure, and annotations. The result is a framework that processes large Wikipedia XML dumps in several popular languages, with the possibility to dynamically add new languages, and produces clean text output, links, and page structure in N-Triples format.

Personalised real estate search application using semantic web technologies

Author
Tomáš Dvořák
Year
2022
Type
Master thesis
Supervisor
Ing. Milan Dojčinovski, Ph.D.
Reviewers
Ing. Oldřich Malec
Summary
The COVID-19 pandemic led to increasing demand for real estate, mainly for those in cities rich in civic amenities. Finding the right real estate property without any domain insights is difficult. Creating a real estate portal with more than just a base of advertisement listings can require the use of proprietary technologies, which often do not allow storing information for later usage and thus results in the state known as the vendor locking. This thesis proposes an alternative way of creating a web-based scalable application using open source technologies powered by a triple-store database which enables the potential of the linked data.

Archival tool for the Discord communications platform

Author
David Labský
Year
2022
Type
Master thesis
Supervisor
Ing. Milan Dojčinovski, Ph.D.
Reviewers
Ing. Jaroslav Kuchař, Ph.D.
Summary
Discord is a popular instant messaging platform which currently does not allow its users to export all data which they can access. The ability to create backups of online data is important for personal reasons as well as to enable long term preservation. The goal of this thesis is to create an open-source tool for the archival of Discord chats to which one has gained access. We use a strategy of capturing network traffic performed by a headless web browser. This method is broadly applicable to archiving single page applications other than Discord which current tools have difficulty working with. Functionality is demonstrated by performing analysis of data downloaded from a chosen Discord server.

System for Management of Personal Music Libraries using Knowledge Graphs Technology

Author
Ondřej Viskup
Year
2024
Type
Master thesis
Supervisor
Ing. Milan Dojčinovski, Ph.D.
Reviewers
Ing. Jaroslav Kuchař, Ph.D.
Summary
With the rising popularity of multimedia streaming services, the challenge of user data storage becomes more prominent. The main objective of this thesis is to design and develop a system that allows users to manage music libraries and playlists outside streaming services and integrate them with various services that provide additional information through Knowledge Graphs technology and Linked Data. The system consists of a back-end server, front-end client, RDF store, and database for user credentials. Furthermore, it outlines the design of the system, which includes the system architecture, components of the back-end server and front-end client, and RDF store. It also reviews current solutions and relevant technologies, including the Semantic Web, the RDF, the SPARQL, and several music ontologies. Finally, software validation was performed and evaluated against the set requirements, and future directions were suggested.

Progressive Web Application based on Microservice Architecture for monitoring of Babyboxes

Author
Zbyněk Juřica
Year
2024
Type
Master thesis
Supervisor
Ing. Milan Dojčinovski, Ph.D.
Reviewers
Ing. Jaroslav Kuchař, Ph.D.
Summary
This thesis presents the design and implementation of a monitoring system based on microservice architecture for managing and analyzing data from babyboxes across the Czech Republic. The work involved transitioning from an outdated monolithic architecture to a more flexible and maintainable microservices architecture, aiming to empower staff working as operators and maintenance technicians. The system includes several microservices handling data ingestion, user management, notifications, and battery analysis. Built using Go, TypeScript, Python, MongoDB, InfluxDB, and RabbitMQ, the backend provides a scalable and modular structure. The front-end, developed with Next.js and React, offers comprehensive data visualization, aggregations, notifications, and analysis features. The application was continuously improved based on user feedback, laying a strong foundation for future enhancements and integrations.