Web Intelligence (WIRG)

The Web Intelligence research group explores and develops novel methods and tools for advanced search for information and services that are empowered by the World Wide Web. More specifically, we develop new algorithms for data mining from the web, social media analysis and design and implementation of new types of middleware infrastructures and technologies, web services and cloud computing.

Publications

EasyMiner.eu: Web Framework for Interpretable Machine Learning based on Rules and Frequent Itemsets

Authors
Vojíř, S.; Zeman, V.; Kuchař, J.; Kliegr, T.
Year
2018
Published
Knowledge-Based Systems. 2018, 150 111-115. ISSN 0950-7051.
Type
Article
Annotation
EasyMiner (http://www.easyminer.eu) is a web-based machine learning system for interpretable machine learning based on frequent itemsets. The system currently offers association rule learning (apriori, FP-Growth) and classification (CBA). For association rule learning and classification, EasyMiner offers a visual interface designed for interactivity, allowing the user to define a constraining pattern for the mining task. The CBA algorithm can also be used for pruning of the rule set, thus addressing the common problem of “too many rules” on the output, and the implementation supports automatic tuning of confidence and support thresholds. The development version additionally supports anomaly detection (FPI and its variations) and linked data mining (AMIE+). EasyMiner is dockerized, some of its components are available as open source R packages.

InBeat: JavaScript recommender system supporting sensor input and linked data

Authors
Kuchař, J.; Kliegr, T.
Year
2017
Published
Knowledge-Based Systems. 2017, 135 40-43. ISSN 0950-7051.
Type
Article
Annotation
Interest Beat (inbeat.eu) is an open source recommender framework that fulfills some of the demands raised by emerging applications that infer ratings from sensor input or use linked open data cloud for feature expansion. As a recommender algorithm, InBeat uses association rules, which allow to explain why a specific recommendation was made. Due to modular architecture, other algorithms can be easily plugged in. InBeat has a pure JavaScript version, which allows to confine processing to a client-side device. There is a performance optimized server-side bundle, which succesfully participated in two recent recommender competitions involving large volumes of streaming data. InBeat works on a number of platforms and is also available for Docker.

All publications

GAIN: web service for user tracking and preference learning - a smart TV use case

Authors
Kuchař, J.; Kliegr, T.
Year
2013
Published
RecSys '13 Proceedings of the 7th ACM conference on Recommender systems. New York: ACM, 2013. pp. 467-468. ISBN 978-1-4503-2409-0.
Type
Proceedings paper
Annotation
GAIN (inbeat.eu) is a web application and service for capturing and preprocessing user interactions with semantically described content. GAIN outputs a set of instances in tabular form suitable for further processing with generic machine-learning algorithms. GAIN is demoed as a component of a "SMART-TV" recommender system. Content is automatically described with DBpedia types using a Named Entity Recognition (NER) system. Interest is determined based on explicit user actions and user's attention computed by 3D head pose estimation. Preference rules are learnt with an association rule mining algorithm. These can be e.g. deployed to a business rules system, acting as a recommender.

Bag-of-Entities text representation for client-side (video) recommender systems

Authors
Kuchař, J.; Kliegr, T.
Year
2014
Published
RecSysTV 2014. 2014.
Type
Proceedings paper
Annotation
Client-side execution of a recommender system requires enrichment of the content delivered to the user with a list of potentially related content. A possible bottleneck for client-side recommendation is the data volume entailed by transferring the feature set describing each content item to the client, and the computational resources needed to process this feature set. This paper investigates whether the representation of the textual content (e.g. of videos) with Bag of Entities (BoE) vector generated by a wikifier can yield a classifier with the same accuracy at smaller size than the standard BoW approach. Experimental evaluation performed on the Reuters-21578 text categorization collection shows that there is a small improvement for small term vector sizes.

Personalised Graph-Based Selection of Web APIs

Authors
Year
2012
Published
The Semantic Web -- ISWC 2012. Heidelberg: Springer-Verlag, GmbH, 2012. p. 34-48. Lecture Notes in Computer Science. ISSN 0302-9743. ISBN 978-3-642-35175-4.
Type
Proceedings paper
Annotation
Modelling and understanding various contexts of users is important to enable personalised selection of Web APIs in directories such as Programmable Web. Currently, relationships between users and Web APIs are not clearly understood and utilized by existing selection approaches. In this paper, we present a semantic model of a Web API directory graph that captures relationships such as Web APIs, mashups, developers, and categories. We describe a novel configurable graph-based method for selection of Web APIs with personalised and temporal aspects. The method allows users to get more control over their preferences and recommended Web APIs while they can exploit information about their social links and preferences. We evaluate the method on a real-world dataset from ProgrammableWeb.com, and show that it provides more contextualised results than currently available popularity-based rankings.

Exploiting Temporal Dimension in Tensor-Based Link Prediction

Year
2016
Published
Web Information Systems and Technologies. Cham: Springer International Publishing, 2016. pp. 211-231. Lecture Notes in Business Information Processing. ISSN 1865-1348. ISBN 978-3-319-30995-8.
Type
Invited/Awarded proceedings paper
Annotation
In the recent years, there is a significant interest in a link prediction - an important task for graph-based data structures. Although there exist many approaches based on the graph theory and factorizations, there is still lack of methods that can work with multiple types of links and temporal information. The creation time of a link is an important aspect: it reflects age and credibility of the information. In this paper, we introduce a method that predicts missing links in RDF datasets. We model multiple relations of RDF as a tensor that incorporates the creation time of links as a key component too. We evaluate the proposed approach on real world datasets: an RDF representation of the ProgrammableWeb directory and a subset of the DBpedia focused on movies. The results show that the proposed method outperforms other link prediction approaches.

Contact person

doc. Ing. Tomáš Vitvar, Ph.D.

Where to find us

Web Intelligence Research Group
Department of Software Engineering
Faculty of Information Technology
Czech Technical University in Prague

Místnost TH:A-922 (Building A, 9th floor)
Thákurova 7
Prague 6 – Dejvice
160 00

The person responsible for the content of this page: doc. Ing. Štěpán Starosta, Ph.D.