Ing. Milan Dojčinovski, Ph.D.

Publikace

A Survey of Guidelines and Best Practices for the Generation, Interlinking, Publication, and Validation of Linguistic Linked Data

Autoři
Khan, A.F.; Chiarcos, C.; Declerck, T.; Buono, M.P.; Dojčinovski, M.; Gracia, J.; Oleskeviciene, G.V.; Gifu, D.
Rok
2022
Publikováno
Proceedings of LDL 2022. Paris: ELRA, 2022. p. 69-77. 8. ISBN 979-10-95546-93-1.
Typ
Stať ve sborníku
Anotace
This article discusses a survey carried out within the NexusLinguarum COST Action which aimed to give an overview of existing guidelines (GLs) and best practices (BPs) in linguistic linked data. In particular it focused on four core tasks in the production/publication of linked data: generation, interlinking, publication, and validation. We discuss the importance of GLs and BPs for LLD before describing the survey and its results in full. Finally we offer a number of directions for future work in order to address the findings of the survey.

Accepted Tutorials at The Web Conference 2022

Autoři
Tommasini, R.; Roy, S.B.; Wang, X.; Dojčinovski, M.
Rok
2022
Publikováno
Companion Proceedings of the Web Conference 2022. New York: Association for Computing Machinery, 2022. p. 391-399. ISBN 978-1-4503-9130-6.
Typ
Stať ve sborníku
Anotace
This paper summarizes the content of the 20 tutorials that have been given at The Web Conference 2022: 85% of these tutorials are lecture style, and 15% of these are hands on.

Cross-Lingual Link Discovery for Under-Resourced Languages

Autoři
Rosner, M.; Ahmadi, S.; Apostol, E.S.; Bosque-Gil, J.; Chiarcos, C.; Dojčinovski, M.; Gkirtzou, K.; Gracia, J.; Gromann, D.; Liebeskind, C.; Oleskeviciene, G.V.; Serasset, G.; Truica, C.O.
Rok
2022
Publikováno
Proceedings of the 13th Language Resources and Evaluation Conference. European Language Resources Association (ELRA), 2022. p. 181-192. ISBN 979-10-95546-72-6.
Typ
Stať ve sborníku
Anotace
In this paper, we provide an overview of current technologies for cross-lingual link discovery, and we discuss challenges, experiences and prospects of their application to under-resourced languages. We first introduce the goals of cross-lingual linking and associated technologies, and in particular, the role that the Linked Data paradigm (Bizer et al., 2011) applied to language data can play in this context. We define under-resourced languages with a specific focus on languages actively used on the internet, i.e., languages with a digitally versatile speaker community, but limited support in terms of language technology. We argue that languages for which considerable amounts of textual data and (at least) a bilingual word list are available, techniques for cross-lingual linking can be readily applied, and that these enable the implementation of downstream applications for under-resourced languages via the localisation and adaptation of existing technologies and resources.

The DBpedia Technology Tutorial

Autoři
Dojčinovski, M.; Forberg, J.; Frey, J.; Hofer, M.; Streitmatter, D.; Yankov, K.
Rok
2022
Publikováno
LDK Workshops and Tutorials 2021. Aachen: CEUR Workshop Proceedings, 2022. vol. 3064. ISSN 1613-0073.
Typ
Stať ve sborníku
Anotace
DBpedia (https://www.dbpedia.org) is a crowd-sourced community effort which aims at extraction and publishing structured information from various Wikimedia projects. This structured information resembles an open knowledge graph, the DBpedia Knowledge Graph, which is publicly available for everyone on the Web. The DBpedia Knowledge Graph has been under development for many years and is being improved to this day. In this tutorial, participants gained general information on the DBpedia Knowledge Graph and the DBpedia community. The tutorial also provided information on the complete DBpedia Knowledge Graph lifecycle, i.e. from extraction and modelling to publishing and maintenance of the DBpedia KG. A particular focus was put on the DBpedia Infrastructure, i.e. the DBpedia’s Databus publishing platform and the associated DBpedia services, i.e. DBpedia Spotlight, DBpedia Lookup, the DBpedia service endpoints and DBpedia Archivo.

EUROLAN 2021: Introduction to Linked Data for Linguistics Online Training School

Autoři
Dojčinovski, M.; Bosque Gil, J.; Gracia, J.; Stanković, R.
Rok
2021
Publikováno
Infotheca - Journal for Digital Humanities. 2021, 21(1), 113-120. ISSN 2217-9461.
Typ
Článek
Anotace
The first training school organized by the NexusLinguarum COST Action was held on February 8-12, 2021 and was aimed at students, academics, and practitioners wishing to learn the basics of Linguistic Data Science. During the training school, the participants were introduced to a wide range of topics: from Semantic Web, RDF and ontologies, to modeling and querying linguistic data with state-of-the-art ontology models and tools. The training school was organized under the umbrella of the EUROLAN series of summer schools and was hosted virtually (online) by several institutions: the Romanian Academy, the Research Institute for Artificial Intelligence in Bucharest and the In- stitute of Computer Science in Ias, i, as well as the “Alexandru Ioan Cuza” University of Ias, i, Romania. The training school was attended by 82 participants.

Towards a Systematic Approach to Sync Factual Data across Wikipedia, Wikidata and External Data Sources

Autoři
Hellmann, S.; Frey, J.; Hofer, M.; Dojčinovski, M.; Węcel, K.; Lewoniewski, W.
Rok
2021
Publikováno
Proceedings of the Conference on Digital Curation Technologies (Qurator 2021). Aachen: CEUR Workshop Proceedings, 2021. ISSN 1613-0073.
Typ
Stať ve sborníku
Anotace
This paper addresses one of the largest and most complex data curation workflows in existence: Wikipedia and Wikidata, with a high number of users and curators adding factual information from external sources via a non-systematic Wiki workflow to Wikipedia’s infoboxes and Wikidata items. We present high-level analyses of the current state, the challenges and limitations in this workflow and supplement it with a quantitative and semantic analysis of the resulting data spaces by deploying DBpedia’s integration and extraction capabilities. Based on an analysis of millions of references from Wikipedia infoboxes in different languages, we can find the most important sources which can be used to enrich other knowledge bases with information of better quality. An initial tool is presented, the GlobalFactSync browser, as a prototype to discuss further measures to develop a more systematic approach for data curation in the WikiVerse.

The New DBpedia Release Cycle: Increasing Agility and Efficiency in Knowledge Extraction Workflows

Autoři
Hofer, M.; Hellmann, S.; Dojčinovski, M.; Frey, J.
Rok
2020
Publikováno
Semantic Systems. In the Era of Knowledge Graphs. Cham: Springer International Publishing, 2020. p. 1-18. ISSN 0302-9743. ISBN 978-3-030-59832-7.
Typ
Stať ve sborníku
Anotace
Since its inception in 2007, DBpedia has been constantly releasing open data in RDF, extracted from various Wikimedia projects using a complex software system called the DBpedia Information Extraction Framework (DIEF). For the past 12 years, the software received a plethora of extensions by the community, which positively affected the size and data quality. Due to the increase in size and complexity, the release process was facing huge delays (from 12 to 17 months cycle), thus impacting the agility of the development. In this paper, we describe the new DBpedia release cycle including our innovative release workflow, which allows development teams (in particular those who publish large, open data) to implement agile, cost-efficient processes and scale up productivity. The DBpedia release workflow has been re-engineered, its new primary focus is on productivity and agility, to address the challenges of size and complexity. At the same time, quality is assured by implementing a comprehensive testing methodology. We run an experimental evaluation and argue that the implemented measures increase agility and allow for cost-effective quality-control and debugging and thus achieve a higher level of maintainability. As a result, DBpedia now publishes regular (i.e. monthly) releases with over 21 billion triples with minimal publishing effort.

Linked Web APIs Dataset

Rok
2018
Publikováno
Semantic Web. 2018, 9(4), 381-391. ISSN 1570-0844.
Typ
Článek
Anotace
Web APIs enjoy a significant increase in popularity and usage in the last decade. They have become the core technology for exposing functionalities and data. Nevertheless, due to the lack of semantic Web API descriptions their discovery, sharing, integration, and assessment of their quality and consumption is limited. In this paper, we present the Linked Web APIs dataset, an RDF dataset with semantic descriptions about Web APIs. It provides semantic descriptions for 11,339 Web APIs, 7,415 mashups and 7,717 developer profiles, which make it the largest available dataset from the Web APIs domain. The dataset captures the provenance, temporal, technical, functional, and non-functional aspects. In addition, we describe the Linked Web APIs Ontology, a minimal model which builds on top of several well-known ontologies. The dataset has been interlinked and published according to the Linked Data principles. Finally, we describe several possible usage scenarios for the dataset and show its potential.

Chainable and Extendable Knowledge Integration Web Services

Autoři
Sasaki, F.; Dojčinovski, M.; Nehring, J.
Rok
2017
Publikováno
Knowledge Graphs and Language Technology. Springer, Cham, 2017. p. 89-101. ISSN 0302-9743. ISBN 978-3-319-68722-3.
Typ
Stať ve sborníku
Anotace
This paper introduces the current state of the FREME framework. The paper puts FREME into the context of linguistic linked data and related approaches of multilingual and semantic processing. In addition, we focus on two specific aspects of FREME: the FREME NER e-Service, and chaining of FREME e-Services. We believe that the flexible and distributed combination of e-Services bears a potential for their mutual improvement. The FREME framework is an open source software available for free download (https://github.com/freme-project/).

Crowdsourced Corpus with Entity Salience Annotations

Autoři
Dojčinovski, M.; Reddy, D.; Kliegr, T.; Vitvar, T.; Sack, H.
Rok
2016
Publikováno
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016). Paris: European Language Recources Association (ELRA), 2016. p. 3307-3311. ISBN 978-2-9517408-9-1.
Typ
Stať ve sborníku
Anotace
In this paper, we present a crowdsourced dataset which adds entity salience (importance) annotations to the Reuters-128 dataset, which is subset of Reuters-21578. The dataset is distributed under a free license and publish in the NLP Interchange Format, which fosters interoperability and re-use. We show the potential of the dataset on the task of learning an entity salience classifier and report on the results from several experiments.

DBpedia Abstracts: A Large-Scale, Open, Multilingual NLP Training Corpus

Autoři
Brümmer, M.; Dojčinovski, M.; Hellmann, S.
Rok
2016
Publikováno
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016). Paris: European Language Recources Association (ELRA), 2016. p. 3339-3343. ISBN 978-2-9517408-9-1.
Typ
Stať ve sborníku
Anotace
The ever increasing importance of machine learning in Natural Language Processing is accompanied by an equally increasing need in large-scale training and evaluation corpora. Due to its size, its openness and relative quality, the Wikipedia has already been a source of such data, but on a limited scale. This paper introduces the DBpedia Abstract Corpus, a large-scale, open corpus of annotated Wikipedia texts in six languages, featuring over 11 million texts and over 97 million entity links. The properties of the Wikipedia texts are being described, as well as the corpus creation process, its format and interesting use-cases, like Named Entity Linking training and evaluation.

DBpedia Links: The Hub of Links for the Web of Data

Autoři
Dojčinovski, M.; Kontokostas, D.; Rößling, R.; Knuth, M.; Hellmann, S.
Rok
2016
Publikováno
Joint Proceedings of the Posters and Demos Track of the 12th International Conference on Semantic Systems - SEMANTiCS2016 and the 1st International Workshop on Semantic Change & Evolving Semantics (SuCCESS'16) co-located with the 12th International Conference on Semantic Systems (SEMANTiCS 2016). Aachen: CEUR Workshop Proceedings, 2016. 1695. ISSN 1613-0073.
Typ
Stať ve sborníku
Anotace
Links are the key enabler for retrieval of related information on the Web of Data. Currently, DBpedia is one of the central interlinking hubs in the Linked Open Data (LOD) cloud. With over 28 million of described and localized things it is one of the largest and open datasets. With the increasing number of linked datasets, there is need for proper maintenance of these links. In this paper, we describe the DBpedia Links repository, which maintains linksets between DBpedia and other LOD datasets. We describe the system for maintenance, update and quality assurance of the linksets.

Exploiting Temporal Dimension in Tensor-Based Link Prediction

Rok
2016
Publikováno
Web Information Systems and Technologies. Cham: Springer International Publishing, 2016. pp. 211-231. Lecture Notes in Business Information Processing. ISSN 1865-1348. ISBN 978-3-319-30995-8.
Typ
Stať ve sborníku vyzvaná či oceněná
Anotace
In the recent years, there is a significant interest in a link prediction - an important task for graph-based data structures. Although there exist many approaches based on the graph theory and factorizations, there is still lack of methods that can work with multiple types of links and temporal information. The creation time of a link is an important aspect: it reflects age and credibility of the information. In this paper, we introduce a method that predicts missing links in RDF datasets. We model multiple relations of RDF as a tensor that incorporates the creation time of links as a key component too. We evaluate the proposed approach on real world datasets: an RDF representation of the ProgrammableWeb directory and a subset of the DBpedia focused on movies. The results show that the proposed method outperforms other link prediction approaches.

FREME: Multilingual Semantic Enrichment with Linked Data and Language Technologies

Autoři
Dojčinovski, M.; Sasaki, F.; Gornostaja, T.; Hellmann, S.; Mannens, E.; Salliau, F.; Osella, M.; Ritchie, P.; Stoitsis, G.; Koidl, K.; Ackermann, M.; Chakraborty, N.
Rok
2016
Publikováno
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016). Paris: European Language Recources Association (ELRA), 2016. p. 4180-4183. ISBN 978-2-9517408-9-1.
Typ
Stať ve sborníku
Anotace
In the recent years, Linked Data and Language Technology solutions gained popularity. Nevertheless, their coupling in real-world business is limited due to several issues. Existing products and services are developed for a particular domain, can be used only in combination with already integrated datasets or their language coverage is limited. In this paper, we present an innovative solution FREME - an open framework of e-Services for multilingual and semantic enrichment of digital content. The framework integrates six interoperable e-Services. We describe the core features of each e-Service and illustrate their usage in the context of four business cases: i) authoring and publishing; ii) translation and localisation; iii) cross-lingual access to data; and iv) personalised Web content recommendations. Business cases drive the design and development of the framework.

Introducing FREME: Deploying Linguistic Linked Data

Autoři
Sasaki, F.; Gornostay, T.; Dojčinovski, M.; Osella, M.; Mannens, E.; Stoitsis, G.; Ritchie, P.; Koidl, K.
Rok
2015
Publikováno
Proceedings of the Fourth Workshop on the Multilingual Semantic Web (MSW4) co-located with 12th Extended Semantic Web Conference (ESWC 2015). Aachen: CEUR Workshop Proceedings, 2015. p. 59-66. ISSN 1613-0073.
Typ
Stať ve sborníku
Anotace
This paper introduces the FREME project, a new Horizon 2020 innovation action. It aims at building an open framework of e-Services for multilingual and semantic enrichment of digital content, based on a reusable set of open Application Programme Interfaces and Graphical User Interfaces to FREME enrichment services. In addition, the paper discusses how the project deploys Linguistic Linked Data (LLD), especially existing LLD resources, LLD best practices and the LLD reference architecture.

Language Resources and Linked Data: A Practical Perspective

Autoři
Dojčinovski, M.; Gracia, J.; Vila-Suero, D.; McCrae, J.P.; Flati, T.; Baron, C.
Rok
2015
Publikováno
Knowledge Engineering and Knowledge Management. Cham: Springer International Publishing AG, 2015. pp. 3-17. Lecture Notes in Artificial Intelligence. ISSN 0302-9743. ISBN 978-3-319-17965-0.
Typ
Stať ve sborníku
Anotace
Recently, experts and practitioners in language resources have started recognizing the benefits of the linked data (LD) paradigm for the representation and exploitation of linguistic data on the Web. The adoption of the LD principles is leading to an emerging ecosystem of multilingual open resources that conform to the Linguistic Linked Open Data Cloud, in which datasets of linguistic data are interconnected and represented following common vocabularies, which facilitates linguistic information discovery, integration and access. In order to contribute to this initiative, this paper summarizes several key aspects of the representation of linguistic information as linked data from a practical perspective. The main goal of this document is to provide the basic ideas and tools for migrating language resources (lexicons, corpora, etc.) as LD on the Web and to develop some useful NLP tasks with them (e.g., word sense disambiguation). Such material was the basis of a tutorial imparted at the EKAW’14 conference, which is also reported in the paper.

Multimodal Fusion: Combining Visual and Textual Cues for Concept Detection in Video

Autoři
Galanopoulos, D.; Dojčinovski, M.; Chandramouli, K.; Kliegr, T.; Mezaris, V.
Rok
2015
Publikováno
Multimedia Data Mining and Analytics. Cham: Springer International Publishing AG, 2015. p. 295-310. ISBN 978-3-319-14997-4.
Typ
Kapitola v knize
Anotace
Visual concept detection is one of the most active research areas in multimedia analysis. The goal of visual concept detection is to assign to each elementary temporal segment of a video, a confidence score for each target concept (e.g. forest, ocean, sky, etc.). The establishment of such associations between the video content and the concept labels is a key step toward semantics-based indexing, retrieval, and summarization of videos, as well as deeper analysis (e.g., video event detection). Due to its significance for the multimedia analysis community, concept detection is the topic of international benchmarking activities such as TRECVID. While video is typically a multi-modal signal composed of visual content, speech, audio, and possibly also subtitles, most research has so far focused on exploiting the visual modality. In this chapter we introduce fusion and text analysis techniques for harnessing automatic speech recognition (ASR) transcripts or subtitles for improving the results of visual concept detection. Since the emphasis is on late fusion, the introduced algorithms for handling text and the fusion can be used in conjunction with standard algorithms for visual concept detection. We test our techniques on the TRECVID 2012 Semantic indexing (SIN) task dataset, which is made of more than 800 h of heterogeneous videos collected from Internet archives.

Personalised, Serendipitous and Diverse Linked Data Resource Recommendations

Rok
2015
Publikováno
Knowledge Engineering and Knowledge Management. Cham: Springer International Publishing AG, 2015. pp. 106-110. Lecture Notes in Artificial Intelligence. ISSN 0302-9743. ISBN 978-3-319-17965-0.
Typ
Stať ve sborníku
Anotace
Due to the huge and diverse amount of information, the actual access to a piece of information in the Linked Open Data (LOD) cloud still demands significant amount of effort. To overcome this problem, number of Linked Data based recommender systems have been developed. However, they have been primarily developed for a particular domain, they require human intervention in the dataset pre-processing step, and they can be hardly adopted to new datasets. In this paper, we present our method for personalised access to Linked Data, in particular focusing on its applicability and its salient features.

Time-aware Link Prediction in RDF Graphs

Rok
2015
Publikováno
WEBIST 2015 - Proceedings of the 11th International Conference on Web Information Systems and Technologies. Madeira: SciTePress, 2015. ISBN 978-989-758-106-9.
Typ
Stať ve sborníku
Anotace
When a link is not explicitly present in an RDF dataset, it does not mean that the link could not exist in reality. Link prediction methods try to overcome this problem by finding new links in the dataset with support of a background knowledge about the already existing links in the dataset. In dynamic environments that change often and evolve over time, link prediction methods should also take into account the temporal aspects of data. In this paper, we present a novel time-aware link prediction method. We model RDF data as a tensor and take into account the time when RDF data was created. We use an ageing function to model a retention of the information over the time; lower the significance of the older information and promote more recent. Our evaluation shows that the proposed method improves quality of predictions when compared with methods that do not consider the time information.

Knowledge Base Creation, Enrichment and Repair

Autoři
Hellmann, S.; Bryl, V.; Bühmann, L.; Dojčinovski, M.; Kontokostas, D.; Lehmann, L.; Milošević, U.; Petrovski, P.; Svátek, V.; Stanojević, M.; Zamazal, O.
Rok
2014
Publikováno
Linked Open Data -- Creating Knowledge Out of Interlinked Data. Cham: Springer International Publishing AG, 2014. p. 45-69. 8661. ISSN 0302-9743. ISBN 978-3-319-09845-6.
Typ
Kapitola v knize
Anotace
This chapter focuses on data transformation to RDF and Linked Data and furthermore on the improvement of existing or extracted data especially with respect to schema enrichment and ontology repair. Tasks concerning the triplification of data are mainly grounded on existing and well-proven techniques and were refined during the lifetime of the LOD2 project and integrated into the LOD2 Stack. Triplification of legacy data, i.e. data not yet in RDF, represents the entry point for legacy systems to participate in the LOD cloud. While existing systems are often very useful and successful, there are notable differences between the ways knowledge bases and Wikis or databases are created and used. One of the key differences in content is in the importance and use of schematic information in knowledge bases. This information is usually absent in the source system and therefore also in many LOD knowledge bases. However, schema information is needed for consistency checking and finding modelling problems. We will present a combination of enrichment and repair steps to tackle this problem based on previous research in machine learning and knowledge representation. Overall, the Chapter describes how to enable tool-supported creation and publishing of RDF as Linked Data (Sect. 1) and how to increase the quality and value of such large knowledge bases when published on the Web (Sect. 2).

Linked Hypernyms Dataset - Generation framework and Use Cases

Autoři
Kliegr, T.; Zeman, V.; Dojčinovski, M.
Rok
2014
Publikováno
3rd Workshop on Linked Data in Linguistics: Multilingual Knowledge Resources and Natural Language Processing. Paris: European Language Recources Association (ELRA), 2014. pp. 82-87. ISBN 978-2-9517408-8-4.
Typ
Stať ve sborníku
Anotace
The Linked Hypernyms Dataset (LHD) provides entities de- scribed by Dutch, English and German Wikipedia articles with types taken from the DBpedia namespace. LHD contains 2.8 million entity- type assignments. Accuracy evaluation is provided for all languages. These types are generated based on one-word hypernym extracted from the free text of Wikipedia articles, the dataset is thus to a large ex- tent complementary to DBpedia 3.8 and YAGO 2s ontologies. LHD is available at http://ner.vse.cz/datasets/linkedhypernyms.

Personalised Access to Linked Data

Rok
2014
Publikováno
Knowledge Engineering and Knowledge Management. Cham: Springer International Publishing AG, 2014. p. 121-136. Lecture Notes in Artificial Intelligence. ISSN 0302-9743. ISBN 978-3-319-13703-2.
Typ
Stať ve sborníku
Anotace
Recent efforts in the Semantic Web community have been primarily focused on developing technical infrastructure and technologies for efficient Linked Data acquisition, publishing and interlinking. Nevertheless, due to the huge and diverse amount of information, the actual access to a piece of information in the LOD cloud still demands significant amount of effort. In this paper, we present a novel configurable method for personalised access to Linked Data. The method recommends resources of interest from users with similar tastes. To measure the similarity between the users we introduce a novel resource semantic similarity metric, which takes into account the commonalities and informativeness of the resources. We validate and evaluate the method on a real-world dataset from the Web services domain. The results show that our method outperforms the other baseline methods in terms of accuracy, serendipity and diversity.

Datasets, GATE Evaluation Framework for Benchmarking Wikipedia-Based NER Systems

Autoři
Dojčinovski, M.; Kliegr, T.
Rok
2013
Publikováno
Proceedings of the NLP & DBpedia workshop. Tilburg: CEUR Workshop Proceedings, 2013. ISSN 1613-0073.
Typ
Stať ve sborníku
Anotace
Wepresentawikifierevaluationframeworkconsistingofsoft- ware support and two datasets (News and Tweets), which were derived from datasets previously published at WEKEX 2011 and MSM Challenge 2013. Entities recognized in the original datasets were enriched with new annotations – a link to Wikipedia and the most specific type from the DBpedia Ontology. The annotations were created by two annotators and a judge. The datasets are supplemented by plugins for their import to the GATE NLP framework and a DBpedia Ontology-aware plugin for aligning annotations created by a wikifier with the ground truth.

Entityclassifier.eu: Real-Time Classification of Entities in Text with Wikipedia

Autoři
Dojčinovski, M.; Kliegr, T.
Rok
2013
Publikováno
Machine Learning and Knowledge Discovery in Databases. Heidelberg: Springer-Verlag, GmbH, 2013. pp. 654-658. ISSN 0302-9743. ISBN 978-3-642-40993-6.
Typ
Stať ve sborníku
Anotace
Targeted Hypernym Discovery (THD) performs unsupervised classification of entities appearing in text. A hypernym mined from the free-text of the Wikipedia article describing the entity is used as a class. The type as well as the entity are cross-linked with their representation in DBpedia, and enriched with additional types from DBpedia and YAGO knowledge bases providing a semantic web interoperability. The system, available as a web application and web service at entityclassifier.eu , currently supports English, German and Dutch.

Wikipedia Search as Effective Entity Linking Algorithm

Autoři
Dojčinovski, M.; Kliegr, T.; Lašek, I.; Zamazal, O.
Rok
2013
Publikováno
TAC 2013 Proceedings Papers. 2013.
Typ
Stať ve sborníku
Anotace
This paper reports on the participation of the LKD team in the English entity linking task at the TAC KBP 2013. We evaluated various modifications and combinations of the Most- Frequent-Sense (MFS) based linking, the En- tity Co-occurrence based linking (ECC), and the Explicit Semantic Analysis (ESA) based linking. We employed two our Wikipedia- based NER systems, the Entityclassifier.eu and the SemiTags. Additionally, two Lucene- based entity linking systems were developed. For the competition we submitted 9 submis- sions in total, from which 5 used the textual context of the entities, and 4 submissions did not. Surprisingly, the MFS method based on the Wikipedia Search has proved to be the most effective approach – it achieved the best 0.555 B3+ F1 score from all our submissions and it achieved high 0.677 B3+ F1 score for Geo-Political (GPE) entities. In addition, the ESA based method achieved best 0.483 B3+ F1 for Organization (ORG) entities.

Personalised Graph-Based Selection of Web APIs

Autoři
Rok
2012
Publikováno
The Semantic Web -- ISWC 2012. Heidelberg: Springer-Verlag, GmbH, 2012. p. 34-48. Lecture Notes in Computer Science. ISSN 0302-9743. ISBN 978-3-642-35175-4.
Typ
Stať ve sborníku
Anotace
Modelling and understanding various contexts of users is important to enable personalised selection of Web APIs in directories such as Programmable Web. Currently, relationships between users and Web APIs are not clearly understood and utilized by existing selection approaches. In this paper, we present a semantic model of a Web API directory graph that captures relationships such as Web APIs, mashups, developers, and categories. We describe a novel configurable graph-based method for selection of Web APIs with personalised and temporal aspects. The method allows users to get more control over their preferences and recommended Web APIs while they can exploit information about their social links and preferences. We evaluate the method on a real-world dataset from ProgrammableWeb.com, and show that it provides more contextualised results than currently available popularity-based rankings.

Recognizing, Classifying and Linking Entities with Wikipedia and DBpedia

Autoři
Dojčinovski, M.; Kliegr, T.
Rok
2012
Publikováno
WIKT 2012: 7th Workshop on Intelligent and Knowledge Oriented Technologies. Slovenská technická univerzita v Bratislave, 2012. pp. 41-44. ISBN 978-80-227-3812-5.
Typ
Stať ve sborníku
Anotace
In this paper we present system for entity recognition and classification. Entity candidates are recognized in the input text with a JAPE grammar. Hypernym for the entity is discovered from Wikipedia article describing the entity also with a JAPE grammar; this hypernym is the classification result. Both the entity and its hypernym are cross-linked with their representation in DBpedia and the result is published in the machine-readable NIF format.