Ing. Milan Dojčinovski, Ph.D.

Use of the Crowdsourcing Model for Data Annotation and Categorization

Author

Tomáš Kouba

Year

2013

Type

Master thesis

Supervisor

Ing. Milan Dojčinovski, Ph.D.

Reviewers

Ing. Ivo Lašek, Ph.D.

Department

Department of Software Engineering

Web application for online betting game

Author

Jaroslav Líbal

Year

2013

Type

Master thesis

Supervisor

Ing. Milan Dojčinovski, Ph.D.

Reviewers

Ing. Jaroslav Kuchař, Ph.D.

Department

Department of Software Engineering

Link Discovery Framework for the Web of Data

Author

Karel Svoboda

Year

2012

Type

Master thesis

Supervisor

Ing. Milan Dojčinovski, Ph.D.

Reviewers

Ing. Ivo Lašek, Ph.D.

Department

Department of Software Engineering

Use of Crowdsourcing to Improve the Quality of Web API Documentations

Author

Michal Majerník

Year

2014

Type

Master thesis

Supervisor

Ing. Milan Dojčinovski, Ph.D.

Reviewers

doc. Ing. Tomáš Vitvar, Ph.D.

Department

Department of Software Engineering

Automatic Generation of Web API Documentations

Author

Ondřej Karas

Year

2014

Type

Master thesis

Supervisor

Ing. Milan Dojčinovski, Ph.D.

Reviewers

Ing. Jaroslav Kuchař, Ph.D.

Department

Department of Software Engineering

Evaluation framework for the NER systems

Author

Marek Kužel

Year

2014

Type

Master thesis

Supervisor

Ing. Milan Dojčinovski, Ph.D.

Reviewers

Ing. Josef Pavlíček, Ph.D.

Department

Department of Software Engineering

Crawler for Collecting Web API Documentation

Author

Jiří Šmolík

Year

2015

Type

Master thesis

Supervisor

Ing. Milan Dojčinovski, Ph.D.

Reviewers

Ing. Petr Špaček, Ph.D.

Department

Department of Software Engineering

Summary

The diploma thesis tackles the crawling and analysis of Web API documentation with focused crawler, which searches the Internet for user-specified documents. Each document is then classified as either API documentation or Other. The classification part uses a number of supervised machine learning algorithms, which are applied to crawled documents to decide, whether document is or is not a Web API documentation.

Thesis on DSpace

Learning Domains for Named Entities

Author

Tomáš Benák

Year

2016

Type

Master thesis

Supervisor

Ing. Milan Dojčinovski, Ph.D.

Reviewers

Ing. Jaroslav Kuchař, Ph.D.

Department

Department of Software Engineering

Summary

The master thesis deals with the domains of the named entities and the possibilities of machine learning over them. At first the thesis analyses the problem of machine learning, the sources of data and the actual solutions. Based on these analyzes, the application, which creates the training datasets, and the REST API, which automates the process of learning domains for entities, are designed and implemented. Furthermore, the program Weka, which helps with creating models, and the project DBpedia, which is the main source of named entities, are described. Finally, the experiments are made to evaluate the quality of created models for learning domains for named entities.

Thesis on DSpace

Open-Source Crowdsourcing Application

Author

Tomáš Marek

Year

2016

Type

Master thesis

Supervisor

Ing. Milan Dojčinovski, Ph.D.

Reviewers

Ing. Petr Špaček, Ph.D.

Department

Department of Software Engineering

Summary

This diploma thesis deals with the use of crowdsourcing for retrieving information and data. After studying this method, some existing tools are analyzed. On the basis of the pros and cons of these solutions, a new application is designed and implemented. The possible results are demonstrated in few concrete use cases.

Thesis on DSpace

Summarizing Linked Open Data Datasets

Author

Jana Čabaiová

Year

2017

Type

Master thesis

Supervisor

Ing. Milan Dojčinovski, Ph.D.

Reviewers

RNDr. Jakub Klímek, Ph.D.

Department

Department of Software Engineering

Summary

The work deals with the study of the project Linked Open Data, its current state and also with the overview of the particular semantic technologies. It is RDF model, query language SPARQL, different formats for RDF datasets and the different accesses to the particular datasets. Part of the work is also the development of the web application which contains analysis, design, implementation and testing of the particular application. The main method of this application should enable the calculation of the summarization of LOD datasets on the base of domain specification, which means calculation of domains and entities proportion in particular dataset. The main result of this work is created and tested web application with the above mentioned implemented method on the real datasets DBpedia a GeoNames and also the processing and comparing of the particular results. This application should be useful mainly for these, who need to find out domain representation of their Linked Open Data dataset or they need to compare domain representation of two different datasets.

Thesis on DSpace

Graph Based Recommendation Algorithms for Linked Data

Author

Martin Chouň

Year

2016

Type

Master thesis

Supervisor

Ing. Milan Dojčinovski, Ph.D.

Reviewers

doc. Ing. Pavel Kordík, Ph.D.

Department

Department of Software Engineering

Summary

The thesis deals with graph based recommendation algorithms for Linked Data. The author focuses on the description of the Semantic Web and Linked Data principles, recommender systems, their functions, recommendation techniques, and mentions the current existing solutions in the Linked Data recommendation. He deals with the analysis of graph algorithms and presents design and implementation of the application which uses them. Finally, the author makes experiments with the application on real data, discusses acquired knowledges and gives readers insight into the future development and expansion of this thesis and application.

Thesis on DSpace

Collection, Transformation, and Integration of Data from the Web Services Domain

Author

Radmir Usmanov

Year

2018

Type

Master thesis

Supervisor

Ing. Milan Dojčinovski, Ph.D.

Reviewers

doc. Ing. Tomáš Vitvar, Ph.D.

Department

Department of Software Engineering

Summary

Currently, there are several repositories and data models that provide descriptions for Web APIs. The diploma thesis tackles the problem of transforming descriptions of Web APIs from several data models into one unified data model. It analyzes existing datasets and data models for Web APIs, establishes mappings between different data models, collects, transforms and integrates Web APIs data models into the unified data model, validates and evaluates extraction results.

Thesis on DSpace

Method for summarization and importance assessment of information on the Web of Data

Author

Marek Filteš

Year

2017

Type

Master thesis

Supervisor

Ing. Milan Dojčinovski, Ph.D.

Reviewers

Ing. Jaroslav Kuchař, Ph.D.

Department

Department of Software Engineering

Summary

This work deals with entity sumarization of semantic web. Firstly, the issue of information, evaluation the importance of information as well as the general summation of entities. It goes to entity sumarization of semantic web entities. The practical part deals with design of the model and implementation of the entity summary tool based on the DBpedia abstracts dataset. The generated knowledge base is integrated within the implementation of the web browser.

Thesis on DSpace

Blockchain Based RDF Management

Author

Remy Rojas

Year

2019

Type

Master thesis

Supervisor

Ing. Milan Dojčinovski, Ph.D.

Reviewers

Ing. Jaroslav Kuchař, Ph.D.

Department

Department of Software Engineering

Summary

As Structured open data sees a growth in popularity evidenced by the size of networks such as the Linked Open Data LOD cloud, aspects of its lifecycle management and scalability have yet to be adressed. At the time of writing, implementations of change tracking and provenance do not guarantee integrity and availability, and depend upon individual domain owners to be deployed and maintained. This represents a threat to the stability of a system in which data is composed of cross-domain URI references such as the Semantic Web's de-facto model: RDF. In this paper we explore the advantages and capabilities a solution based on Blockchain can provide when used as a support for RDF. We provide the design, implementation, testing, and evaluation of a Proof of Concept Distributed Ledger which addresses the use-cases of Create, Read, Update, Delete (CRUD) operations, Linked Data Notifications, and Publish/Subscribe Observer pattern. Our solution provides mutually distrusting parties a support for traceability and provenance of versioned RDF statements, leveraging integrity and availability with decentralization.

Thesis on DSpace

Domain-Specific NER Adaptation

Author

Bogoljub Jakovcheski

Year

2019

Type

Master thesis

Supervisor

Ing. Milan Dojčinovski, Ph.D.

Reviewers

Ing. Jaroslav Kuchař, Ph.D.

Department

Department of Software Engineering

Summary

The popular but still under development Named Entity Recognition (NER) technology has seen a significant usage in both academic and industrial sphere inspite of it's more dominant coarse grain usage compared to it's fine grain usage. In this thesis, we use DBpedia NIF dataset. We process them, and prepare new datasets ready for training models with Stanford NER. Experiments are provided with trained models which cover the impact of results when used with global domain model and domain specific model. In addition, the experiments examine the impact of number of articles used to train models. The results from the experiments show that the domain specific fine grain models provide a better results than domain specific coarse grain models and global models in both annotations. As well, models trained with higher number of articles give better results than models trained with lower number of articles.

Thesis on DSpace

Extraction of linguistic information from Wikipedia

Author

Andriy Nazim

Year

2019

Type

Master thesis

Supervisor

Ing. Milan Dojčinovski, Ph.D.

Reviewers

prof. Dr. Ing. Petr Kroha, CSc.

Department

Department of Software Engineering

Summary

DBpedia is a crowd-sourced community effort which aims at the extraction of information from Wikipedia and providing this information in a machine-readable format. Currently, the information contained in DBpedia is primarily derived for semi-structured sources such as Wikipedia infoboxes. However, vast amount of information is still hidden in the Wikipedia article texts. In this Thesis, I present approaches for extracting linguistic information from DBpedia, which are based on combining and parsing DBpedia sources - datasets and the results of the Master Project are datasets of linguistic information: synonyms, homonyms, semantic relationships, and inter-language synonyms. My project also pays special attention to cleaning, filtering of produced datasets, and its evaluation was carried out also by developing a Simple Web-Application for querying results.

Thesis on DSpace

Enrichment of the DBpedia NIF dataset

Author

Pragalbha Lakshmanan

Year

2020

Type

Master thesis

Supervisor

Ing. Milan Dojčinovski, Ph.D.

Reviewers

RNDr. Jakub Klímek, Ph.D.

Department

Department of Software Engineering

Summary

DBpedia is a crowd-sourced community effort which aims at extracting information from Wikipedia articles and providing this information in a machine-readable format. DBpedia provides the extracted information as NIF datasets with the content of all Wikipedia articles in 128 languages. The aim of the thesis is to enrich this dataset with additional information by providing the results of splitting sentences, segregating tokens, finding parts of speech for tokens and enhancing links for the content of Wikipedia articles in English, French, German, Spanish and Japanese languages. The implementation consists of performing NLP tasks namely sentence splitting, tokenization, part of speech tagging on the pre-processed NIF datasets. Eventually contributing to the DBpedia community by adding additional links to the Wikipedia articles. Finally, evaluating the runtime of various NLP tasks and checking the accuracy of the results statistically. Enriching NIF dataset with the result-set of NLP tasks generated from the tool, is useful for performing more complicated NLP task(s).

Thesis on DSpace

Fact extraction from Wikipedia article texts

Author

Jakub Trhlík

Year

2020

Type

Master thesis

Supervisor

Ing. Milan Dojčinovski, Ph.D.

Reviewers

Ing. Petr Špaček, Ph.D.

Department

Department of Software Engineering

Summary

Wikipedia is great source of information, currently its text information has not been extracted into fully machine-readable format. In this thesis, we use DBpedia NIF dataset, representing Wikipedia page structure, for targeted fact extraction. The dataset is parsed, enriched by links using several methods and then prepared for fact extraction. In this thesis multiple methods of fact extraction are researched, implemented and tested on selected relations. Experiments describe accuracy and viability of selected and implemented methods. Extracted relations are evaluated and submitted for addition to the DBpedia database.

Thesis on DSpace

Framework for Extraction of Wikipedia Articles Content

Author

Oleksandr Husiev

Year

2022

Type

Master thesis

Supervisor

Ing. Milan Dojčinovski, Ph.D.

Reviewers

Ing. Mgr. Ladislava Smítková Janků, Ph.D.

Department

Department of Software Engineering

Summary

This thesis describes the development process of the extraction of Wikipedia articles content for a DBpedia, a crowd-sourced community effort. The main goal of this thesis was to develop a framework for extraction of Wikipedia articles content, structure, and annotations. The result is a framework that processes large Wikipedia XML dumps in several popular languages, with the possibility to dynamically add new languages, and produces clean text output, links, and page structure in N-Triples format.

Thesis on DSpace

Personalised real estate search application using semantic web technologies

Author

Tomáš Dvořák

Year

2022

Type

Master thesis

Supervisor

Ing. Milan Dojčinovski, Ph.D.

Reviewers

Ing. Oldřich Malec

Department

Department of Software Engineering

Summary

The COVID-19 pandemic led to increasing demand for real estate, mainly for those in cities rich in civic amenities. Finding the right real estate property without any domain insights is difficult. Creating a real estate portal with more than just a base of advertisement listings can require the use of proprietary technologies, which often do not allow storing information for later usage and thus results in the state known as the vendor locking. This thesis proposes an alternative way of creating a web-based scalable application using open source technologies powered by a triple-store database which enables the potential of the linked data.

Thesis on DSpace

Archival tool for the Discord communications platform

Author

David Labský

Year

2022

Type

Master thesis

Supervisor

Ing. Milan Dojčinovski, Ph.D.

Reviewers

Ing. Jaroslav Kuchař, Ph.D.

Department

Department of Applied Mathematics

Summary

Discord is a popular instant messaging platform which currently does not allow its users to export all data which they can access. The ability to create backups of online data is important for personal reasons as well as to enable long term preservation. The goal of this thesis is to create an open-source tool for the archival of Discord chats to which one has gained access. We use a strategy of capturing network traffic performed by a headless web browser. This method is broadly applicable to archiving single page applications other than Discord which current tools have difficulty working with. Functionality is demonstrated by performing analysis of data downloaded from a chosen Discord server.

Thesis on DSpace

System for Management of Personal Music Libraries using Knowledge Graphs Technology

Author

Ondřej Viskup

Year

2024

Type

Master thesis

Supervisor

Ing. Milan Dojčinovski, Ph.D.

Reviewers

Ing. Jaroslav Kuchař, Ph.D.

Department

Department of Software Engineering

Summary

With the rising popularity of multimedia streaming services, the challenge of user data storage becomes more prominent. The main objective of this thesis is to design and develop a system that allows users to manage music libraries and playlists outside streaming services and integrate them with various services that provide additional information through Knowledge Graphs technology and Linked Data. The system consists of a back-end server, front-end client, RDF store, and database for user credentials. Furthermore, it outlines the design of the system, which includes the system architecture, components of the back-end server and front-end client, and RDF store. It also reviews current solutions and relevant technologies, including the Semantic Web, the RDF, the SPARQL, and several music ontologies. Finally, software validation was performed and evaluated against the set requirements, and future directions were suggested.

Thesis on DSpace

Progressive Web Application based on Microservice Architecture for monitoring of Babyboxes

Author

Zbyněk Juřica

Year

2024

Type

Master thesis

Supervisor

Ing. Milan Dojčinovski, Ph.D.

Reviewers

Ing. Jaroslav Kuchař, Ph.D.

Department

Department of Software Engineering

Summary

This thesis presents the design and implementation of a monitoring system based on microservice architecture for managing and analyzing data from babyboxes across the Czech Republic. The work involved transitioning from an outdated monolithic architecture to a more flexible and maintainable microservices architecture, aiming to empower staff working as operators and maintenance technicians. The system includes several microservices handling data ingestion, user management, notifications, and battery analysis. Built using Go, TypeScript, Python, MongoDB, InfluxDB, and RabbitMQ, the backend provides a scalable and modular structure. The front-end, developed with Next.js and React, offers comprehensive data visualization, aggregations, notifications, and analysis features. The application was continuously improved based on user feedback, laying a strong foundation for future enhancements and integrations.

Thesis on DSpace

Ing. Milan Dojčinovski, Ph.D.

Theses

Master theses

Use of the Crowdsourcing Model for Data Annotation and Categorization

Web application for online betting game

Link Discovery Framework for the Web of Data

Use of Crowdsourcing to Improve the Quality of Web API Documentations

Automatic Generation of Web API Documentations

Evaluation framework for the NER systems

Crawler for Collecting Web API Documentation

Learning Domains for Named Entities

Open-Source Crowdsourcing Application

Summarizing Linked Open Data Datasets

Graph Based Recommendation Algorithms for Linked Data

Collection, Transformation, and Integration of Data from the Web Services Domain

Method for summarization and importance assessment of information on the Web of Data

Blockchain Based RDF Management

Domain-Specific NER Adaptation

Extraction of linguistic information from Wikipedia

Enrichment of the DBpedia NIF dataset

Fact extraction from Wikipedia article texts

Framework for Extraction of Wikipedia Articles Content

Personalised real estate search application using semantic web technologies

Archival tool for the Discord communications platform

System for Management of Personal Music Libraries using Knowledge Graphs Technology

Progressive Web Application based on Microservice Architecture for monitoring of Babyboxes