Ing. Jan Fesl, Ph.D.

Publikace

A user DNS fingerprint dataset

Autoři
Zápotocký, J.; Fesl, J.; Fiala, J.
Rok
2024
Publikováno
Data in Brief. 2024, 54 ISSN 2352-3409.
Typ
Článek
Anotace
Using a user DNS fingerprint allows one to identify a specific network user regardless of the knowledge of his IP address. This method is proper, for example, when examining the behavior of a monitored network user in more depth. In contrast to other studies, this work introduces a dataset for possible user identification based only on the knowledge of its DNS fingerprint created from the previously sent DNS queries. We created a large dataset from the real network traffic of a metropolitan Internet service provider. The dataset was created from 2.3 billion DNS queries representing 6.2 million different domain names. The data collection took place over three months from 12/2023 to 02/2024. The dataset contains a detailed user activity description in the sense of overall daily activity statistics and detailed 24 h activity statistics. Each dataset record contains a list of 1137 classification attributes. The absolutely unique feature of this data set is the classification of user activity based on categories of content accessed by a user. The new dataset can be used for the creation of machine learning models, allowing the identification of a specific user without direct knowledge of their IP addresses or additional network location information. The dataset can also serve as a reference dataset for the creation of DNS fingerprints of users.

Data center network monitoring framework

Autoři
Sedlák, D.; Polák, M.; Fesl, J.; Tvrdík, P.
Rok
2024
Publikováno
Proceedings of 2024 IEEE International Conference on Cloud Engineering (IC2E). Piscataway: Institute of Electrical and Electronic Engineers, 2024. p. 256-257. ISBN 979-8-3315-2869-0.
Typ
Stať ve sborníku
Anotace
A key responsibility for Data center (DC) operators is monitoring, which involves analyzing logs, aggregating traffic statistics, and assessing hardware utilization. Monitoring is vital for troubleshooting and predicting traffic patterns. This paper introduces a new highly scalable, easy configurable framework for utilizing sFlow technology to collect and process network statistics of DC network communication, like connection durations, detection of traffic anomalies, etc. We tested the proposed framework in production, thanks to a partnership with a privately owned DC that provides Infrastructure as a Service worldwide.

Data center TCP dataset

Autoři
Fesl, J.; Čapková, T.; Konopa, M.
Rok
2024
Publikováno
Data in Brief. 2024, 54 ISSN 2352-3409.
Typ
Článek
Anotace
In this paper, we would like to introduce a unique dataset that covers thousands of network flow measurements realized through TCP in a data center environment. The TCP protocol is widely used for reliable data transfers and has many different versions. The various versions of TCP are specific in how they deal with link congestion through the congestion control algorithm (CCA). Our dataset represents a unique, comprehensive comparison of the 17 currently used versions of TCP with different CCAs. Each TCP flow was measured precisely 50 times to eliminate the measurement instability. The comparison of the various TCP versions is based on the knowledge of 18 quantitative attributes representing the parameters of a TCP transmission. Our dataset is suitable for testing and comparing different versions of TCP, creating new CCAs based on machine learning models, or creating and testing machine learning models, allowing the identification and optimization of the currently existing versions of TCP.

Real Data Center Network Traffic Dataset and Analysis

Autoři
Polák, M.; Sedlák, D.; Fesl, J.; Tvrdík, P.
Rok
2024
Publikováno
Proceedings of 2024 IEEE 13th International Conference on Cloud Networking (CloudNet). Piscataway: Institute of Electrical and Electronic Engineers, 2024. p. 1-6. ISSN 2771-5663. ISBN 979-8-3503-7656-2.
Typ
Stať ve sborníku
Anotace
Data centers create the backbone of the modern Internet. However, internal network traffic characteristics are closed know-how of data centers. We have collected an internal network traffic analysis based on the data from one of major world data centers. We have analysed 4 internal network traffic characteristics (clustering of IP addresses, application traffic patterns, frequency of changes in cluster topologies and histograms of communicating IP address pairs). The data set has been published at GitHub.

A novel dataset for encrypted virtual private network traffic analysis

Autoři
Fesl, J.; Naas, M.N.
Rok
2023
Publikováno
Data in Brief. 2023, 47 ISSN 2352-3409.
Typ
Článek
Anotace
Encryption of network traffic should guarantee anonymity and prevent potential interception of information. Encrypted virtual private networks (VPNs) are designed to create special data tunnels that allow reliable transmission between networks and/or end users. However, as has been shown in a number of scientific papers, encryption alone may not be sufficient to secure data transmissions in the sense that certain information may be exposed. Our team has constructed a large dataset that contains generated encrypted network traffic data. This dataset contains a general network traffic model consisting of different types of network traffic such as web, emailing, video conferencing, video streaming, and terminal services. For the same network traffic model, data are measured for different scenarios, i.e., for data traffic through different types of VPNs and without VPNs. Additionally, the dataset contains the initial handshake of the VPN connections. The dataset can be used by various data scientists dealing with the classification of encrypted network traffic and encrypted VPNs.

An encrypted network video stream dataset

Autoři
Fesl, J.; Sedlák, D.; Konopa, M.
Rok
2023
Publikováno
Data in Brief. 2023, 49 ISSN 2352-3409.
Typ
Článek
Anotace
Most of the video content on the Internet today is distributed through online streaming platforms. To ensure user privacy, data transmissions are often encrypted using cryptographic protocols. In previous research, we first experimentally validated the idea that the amount of transmitted data belonging to a particular video stream is not constant over time or that it changes periodically and forms a specific fingerprint. Based on the knowledge of the fingerprint of a specific video stream, this video stream can be subsequently identified. Over several months of intensive work, our team has created a large dataset containing a large number of video streams that were captured by network traffic probes during their playback by end users. The video streams were deliberately chosen to fall thematically into pre-selected categories. We selected two primary platforms for streaming - PeerTube and YouTube The first platform was chosen because of the possibility of modifying any streaming parameters, while the second one was chosen because it is used by many people worldwide. Our dataset can be used to create and train machine learning models or heuristic algorithms, allowing encrypted video stream identification according to their content resp. type category or specifically.

Cluster-oriented virtual machine low latency consolidation algorithm

Autoři
Polák, M.; Fesl, J.
Rok
2022
Publikováno
ACM International Conference Proceeding Series. New York: Association for Computing Machinery, 2022. p. 113-120. ISBN 978-1-4503-9622-6.
Typ
Stať ve sborníku
Anotace
With the growing amount of data processed in the virtual environment, many researchers focus their efforts on optimizing the load distribution on data centers according to various criteria. In this article, we propose optimization at the network infrastructure load of the data center. The new heuristic algorithm, based on grouping virtual machines into clusters, was compared with heuristics based on a genetic algorithm. The performed measurements indicate that clustering-based heuristics, although data-dependent, shows promising characteristics with significantly lower computational complexity. The algorithm was tested on a rigorous number of instances, proving its general usability.

Decentralized Evaluation of Trust in Ad Hoc Networks using Neural Networks

Rok
2022
Publikováno
2022 18th International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob). USA: IEEE Computer Society, 2022. p. 30-35. ISSN 2160-4894. ISBN 978-1-6654-6975-3.
Typ
Stať ve sborníku
Anotace
Trust is an essential concept in ad hoc network security. Creating and maintaining trusted relationships between nodes is a challenging task. This paper proposes a decentralized method for evaluating trust in ad hoc networks. The method uses neural networks and local information to predict the trust of neighboring nodes. The method was compared with the original centralized version, showing that even without global information knowledge, the method has, on average, 97% accuracy in classification and 94% in regression problem. An important contribution of this paper is overcoming the main limitation of the original method, which is the centralized evaluation of trust. Moreover, the decentralized method output is a perfect fit to use as an input to enhance routing in ad hoc networks.

Are Encypted Protocols Really a Guarantee of Privacy?

Autoři
Fesl, J.; Trofimova, Y.; Janeček, J.
Rok
2021
Publikováno
Are Encypted Protocols Really a Guarantee of Privacy?. Academic Conferences International Limited Reading, 2021. p. 130-138. 20. vol. 1. ISBN 978-1-912764-99-0.
Typ
Stať ve sborníku
Anotace
Most internet traffic is being encrypted by application protocols that should guarantee users' privacy and anonymity of data during the transmission. Our team has developed a unique system that can create a specific pattern of traffic and further analyze it by using machine learning methods. We investigated the possibility of identifying the network video streams encrypted within the HTTPS protocol and explored that it is possible to identify a particular content with a certain probability. Our paper provides a methodology and results retrieved from the real measurements. As the testing data set, we used the streams coming from the popular platform Youtube. Our results confirm that it is possible to identify encrypted video streams via their specific traffic imprints, although it should not be possible due to the used encryption.

Performance Analysis of Neural Network Approach for Evaluation of Trust in Ad-Hoc Networks

Rok
2021
Publikováno
11th International Conference on Advanced Computer Information Technologies (ACIT). IEEE (Institute of Electrical and Electronics Engineers), 2021. p. 691-695. ISBN 978-1-6654-1854-6.
Typ
Stať ve sborníku
Anotace
With the world becoming more mobile and dynamic each year, the application of ad-hoc networks has broadened. Ad-hoc networks do not have a predefined infrastructure; each node serves as a router, bringing security challenges. Trust and trustworthiness mechanisms are among the most common methods for ensuring security in an ad-hoc network. In [1], we proposed a method for the evaluation of trust in ad-hoc networks. This paper aims to describe the method formally and analyze its performance. The original paper showed that neural networks could do trust estimation with an average 98% accuracy of the classification and 94% of the regression problem. This paper aims to investigate the capabilities of our method under malicious conditions. The analysis could also provide insight for tuning trust parameters, such as the threshold of trust. Furthermore, this paper presents a mathematical model behind the problem to show that the neural network approach is reasonable.

Promising new Techniques for Computer Network Traffic Classification: A Survey

Autoři
Konopa, M.; Fesl, J.; Janeček, J.
Rok
2020
Publikováno
2020 10th International Conference on Advanced Computer Information Technologies, ACIT 2020 - Proceedings. IEEE Xplore, 2020. p. 418-421. vol. 10. ISBN 978-1-7281-6760-2.
Typ
Stať ve sborníku
Anotace
This paper aims to give an overview of the application of image processing to network traffic analysis, including a description of the essence of the most important works in the last 15 years. The importance of efficient, automated analysis of network traffic is growing especially today, when huge volumes of diverse data need to be quickly processed. With the rapid development of artificial intelligence in the field of image processing, it seems logical to use it for analyzing network traffic image data. Recent results on this topic are very promising.

Using Machine Learning for DNS over HTTPS Detection

Autoři
Konopa, M.; Fesl, J.; Jelínek, J.; Feslová, M.; Cehák, J.; Janeček, J.; Drdák, F.
Rok
2020
Publikováno
Proceedings of European Conference on Cyber Warfare and Security (ECCWS 2020). Academic Conferences and Publishing International Ltd., 2020. p. 205-211. ISBN 9781912764617.
Typ
Stať ve sborníku
Anotace
DNS over HTTPS (DoH) is a new standard that is being adopted by most of the new versions of web-browsers. This protocol allows translating the canonical domain name to an IP address by using the HTTPS tunnel. The usage of such a protocol has many pros and cons. In our paper, we try to evaluate these aspects from different points of view. One of the most critical disadvantages lies in the much more complicated possibility of network traffic logging. Our team has created a machine learning-based approach allowing automated DoH detection, which seems to be pretty well usable in advanced firewalls.

Towards HPC-Based Autonomous Cyber Security System

Autoři
Fesl, J.; Feslova, M.; Gokhale, V.; Lejtnar, M.; Cehak, J.; Janeček, J.
Rok
2019
Publikováno
2019 9th International Conference on Advanced Computer Information Technologies, ACIT 2019 - Proceedings. Piscataway: IEEE, 2019. p. 435-438. ISBN 978-1-7281-0449-2.
Typ
Stať ve sborníku
Anotace
Cyber security is the one of the most hot topics of nowadays. Millions of devices which daily communicate via the Internet are permanently under potential danger of many network attacks. The majority of such attacks is caused by various automatic bot nets or malicious cyber systems. Our research group has created an efficient autonomous architecture, which is able to collect information of the traffic from network devices, analyse it and detect an ongoing network attack. The automated AI based network administrator module is able to mitigate the aftermath of such malicious activity by execution of specific blocking action. Regarding to the volume of current network throughputs, our proposed solution is based on modern big data processing technologies, which allow to analyse the data flows of very large network infrastructures.

CloudEVBench – Virtualization Technology Efficiency Testing Tool for the Distributed Infrastructures

Autoři
Fesl, J.; Cehák, Jiří; Doležalová, Marie; Janeček, J.
Rok
2016
Publikováno
International Journal of Grid and Distributed Computing. 2016, 9(8), 249-260. ISSN 2005-4262.
Typ
Článek
Anotace
The virtualized systems are today a very popular topic and their using plays the great role in current datacenters. The virtualization efficiency is a very important aspect in the real system deployment. Some studies have been published about this topic [1], mainly are based on various benchmarking techniques and are integrated into the specialized testing tools. Such a benchmark tool, which is able to simulate the behavior of a real computing system under the stress of different virtualization configurations, can e.g. well answer the question how many virtual machines could be simultaneously executed on it and how big the virtualization overhead is [2]. We developed and applied a new benchmark tool, which is able to measure the virtualization efficiency and overhead in the virtualization environment.

New Approach for Virtual Machines Consolidation In Heterogeneous Computing Systems

Autoři
Fesl, J.; Cehák, Jiří; Doležalová, Marie; Janeček, J.
Rok
2016
Publikováno
International Journal of Hybrid Information Technology. 2016, 9(12), 321-332. ISSN 1738-9968.
Typ
Článek
Anotace
The energy consumption is one of the most important factors in the virtual machines deployment in the current data centres. Various studies proved that the energy aware management of the virtual machines can reduce the total energy consumption about tens of percents. We developed the new approach, based on the distributed algorithm, which is able to consolidate the virtual machines between various virtualization nodes without the central coordinator. The input data for this algorithm is collected online from the electronic wattmeters, which are placed before the energy input of each virtualization node.

Virtuální paralelní infrastruktury pro velká data v metabolomice

Autoři
Fesl, J.; Doležalová, Marie; Cehák, Jiří; Moos, M.; Janeček, J.; Šimek, P.
Rok
2016
Publikováno
Konferenční sborník ENBIK 2016. Praha: Centrální laboratoře, 2016. ISBN 978-80-7080-960-0.
Typ
Stať ve sborníku
Anotace
Metabolomická data, jejich analýza a interpretace jsou žhavým fenoménem posledních let. Stále objemnější a komplikovanější data vyžadují pro své zpracování dostatečně výkonné výpočetní infrastruktury v podobě dedikovaných paralelních počítačů či distribuovaných počítačových clusterů, jejichž vytváření a správa vyžaduje nemalé finanční prostředky. Náš tým navrhl a sestavil prototyp univerzálního elastického distribuovaného výpočetního systému (Centrální Mozek Univerzity, CMU) založeného na virtualizační technologii, který lze využít k řešení náročných bioinformatických problémů. Elasticita systému spočívá v tom, že během několika minut lze změnit jednu konkrétní výpočetní infrastrukturu na jinou bez nutnosti použití jiného hardwaru. Prototyp systému obsahuje sofistikovaný modul pro alokaci výpočetních zdrojů, vyvažování zátěže a monitorování stavu systému. Další předností našeho řešení je možnost dynamické rozšiřitelnosti výpočetní kapacity přímo za běhu, včetně možnosti hardwarových úprav či údržby.

New Techniques of IEEE 802.11 Family Hotspots Attacks, Principles and Defense

Autoři
Fesl, J.; Doležalová, M.; Drdák, F.; Janeček, J.
Rok
2015
Publikováno
PROCEEDINGS OF THE 14TH EUROPEAN CONFERENCE ON CYBER WARFARE AND SECURITY. CURTIS FARM, KIDMORE END, NR READING: ACAD CONFERENCES LTD, 2015, pp. 61-70. ISSN 2048-8602. ISBN 978-1-910810-28-6.
Typ
Stať ve sborníku
Anotace
Today many places in the world allow paid internet connection via wireless hotspots. These solutions are available in places such as hotels, airports and conference halls. Because a wireless hotspot is accessible for any new potential user, common security techniques based on WPA/WPA2 encryption cannot be used. In the last few years a new type of an attack, based on DNS tunneling has been described. We will focus on detail analysis of this attack and we will propose a possible defense strategy. DNS tunneling attack has been implemented in several applications and inspired us to look at another type of a wireless hotspot attack. We will describe in detail all parts of the solution necessary to understand the defense against this type of an attack. At this time, there is no way to prevent this attack, except switching off all unencrypted common wireless hotspots infrastructures.