doc. Ing. Tomáš Čejka, Ph.D.

Publikace

CESNET-TLS-Year22: A year-spanning TLS network traffic dataset from backbone lines

Autoři
Hynek, K.; Luxemburk, J.; Pešek, J.; Čejka, T.; Šiška, P.
Rok
2024
Publikováno
Scientific Data. 2024, 11(1), ISSN 2052-4463.
Typ
Článek
Anotace
The modern approach for network traffic classification (TC), which is an important part of operating and securing networks, is to use machine learning (ML) models that are able to learn intricate relationships between traffic characteristics and communicating applications. A crucial prerequisite is having representative datasets. However, datasets collected from real production networks are not being published in sufficient numbers. Thus, this paper presents a novel dataset, CESNET-TLS-Year22, that captures the evolution of TLS traffic in an ISP network over a year. The dataset contains 180 web service labels and standard TC features, such as packet sequences. The unique year-long time span enables comprehensive evaluation of TC models and assessment of their robustness in the face of the ever-changing environment of production networks.

Machine Learning Metrics for Network Datasets Evaluation

Autoři
Soukup, D.; Uhříček, D.; Vašata, D.; Čejka, T.
Rok
2024
Publikováno
ICT Systems Security and Privacy Protection. Cham: Springer, 2024. p. 307-320. vol. 679. ISSN 1868-422X. ISBN 978-3-031-56326-3.
Typ
Stať ve sborníku
Anotace
High-quality datasets are an essential requirement for leveraging machine learning (ML) in data processing and recently in network security as well. However, the quality of datasets is overlooked or underestimated very often. Having reliable metrics to measure and describe the input dataset enables the feasibility assessment of a dataset. Imperfect datasets may require optimization or updating, e.g., by including more data and merging class labels. Applying ML algorithms will not bring practical value if a dataset does not contain enough information. This work addresses the neglected topics of dataset evaluation and missing metrics. We propose three novel metrics to estimate the quality of an input dataset and help with its improvement or building a new dataset. This paper describes experiments performed on public datasets to show the benefits of the proposed metrics and theoretical definitions for more straightforward interpretation. Additionally, we have implemented and published Python code so that the metrics can be adopted by the worldwide scientific community.

NetTiSA: Extended IP flow with time-series features for universal bandwidth-constrained high-speed network traffic classification

Autoři
Rok
2024
Publikováno
Computer Networks. 2024, 240 1-22. ISSN 1389-1286.
Typ
Článek
Anotace
Network traffic monitoring based on IP Flows is a standard monitoring approach that can be deployed to various network infrastructures, even the large ISP networks connecting millions of people. Since flow records traditionally contain only limited information (addresses, transport ports, and amount of exchanged data), they are also commonly extended by additional features that enable network traffic analysis with high accuracy. These flow extensions are, however, often too large or hard to compute, which then allows only offline analysis or limits their deployment only to smaller-sized networks. This paper proposes a novel extended IP flow called NetTiSA (Network Time Series Analysed) flow, based on analysing the time series of packet sizes. By thoroughly testing 25 different network traffic classification tasks, we show the broad applicability and high usability of NetTiSA flow. For practical deployment, we also consider the sizes of flows extended by NetTiSA features and evaluate the performance impacts of their computation in the flow exporter. The novel features proved to be computationally inexpensive and showed excellent discriminatory performance. The trained machine learning classifiers with proposed features mostly outperformed the state-of-the-art methods. NetTiSA finally bridges the gap and brings universal, small-sized, and computationally inexpensive features for traffic classification that can be scaled up to extensive monitoring infrastructures, bringing the machine learning traffic classification even to 100 Gbps backbone lines.

Active Learning Framework For Long-term Network Traffic Classification

Autoři
Pešek, J.; Soukup, D.; Čejka, T.
Rok
2023
Publikováno
IEEE Annual Computing and Communication Workshop and Conference (CCWC). New Jersey: IEEE, 2023. p. 893-899. ISBN 979-8-3503-3286-5.
Typ
Stať ve sborníku vyzvaná či oceněná
Anotace
Recent network traffic classification methods benefit from machine learning (ML) technology. However, there are many challenges due to the use of ML, such as lack of high-quality annotated datasets, data drifts and other effects causing aging of datasets and ML models, high volumes of network traffic, etc. This paper presents the benefits of augmenting traditional workflows of ML training&deployment and adaption of the Active Learning (AL) concept on network traffic analysis. The paper proposes a novel Active Learning Framework (ALF) to address this topic. ALF provides prepared software components that can be used to deploy an AL loop and maintain an ALF instance that continuously evolves a dataset and ML model automatically. Moreover, ALF includes monitoring, datasets quality evaluation, and optimization capabilities that enhance the current state of the art in the AL domain. The resulting solution is deployable for IP flow-based analysis of high-speed (100 Gb/s) networks, where it was evaluated for more than eight months. Additional use cases were evaluated on publicly available datasets.

Augmenting Monitoring Infrastructure For Dynamic Software-Defined Networks

Autoři
Pešek, J.; Plný, R.; Koumar, J.; Jeřábek, K.; Čejka, T.
Rok
2023
Publikováno
2023 8th International Conference on Smart and Sustainable Technologies (SpliTech). New Jersey: IEEE, 2023. ISBN 978-953-290-128-3.
Typ
Stať ve sborníku
Anotace
Software-Defined Networking (SDN) and virtual environment raise new challenges for network monitoring tools. The dynamic and flexible nature of these network technologies requires adaptation of monitoring infrastructure to overcome challenges of analysis and interpretability of the monitored network traffic. This paper describes a concept of automatic on-demand deployment of monitoring probes and correlation of network data with infrastructure state and configuration in time. Such an approach to monitoring SDN virtual networks is usable in several use cases, such as IoT networks and anomaly detection. It increases visibility into complex and dynamic networks. Additionally, it can help with the creation of well-annotated datasets that are essential for any further research.

BOTA: Explainable IoT malware detection in large networks

Autoři
Uhříček, D.; Hynek, K.; Čejka, T.; Kolář, D.
Rok
2023
Publikováno
IEEE Internet of Things Journal. 2023, 10(10), 8416-8431. ISSN 2327-4662.
Typ
Článek
Anotace
Explainability and alert reasoning are essential but often neglected properties of intrusion detection systems. The lack of explainability reduces security personnel’s trust, limiting the overall impact of alerts. This paper proposes the BOTA (Botnet Analysis) system, which uses the concepts of weak indicators and heterogeneous meta-classifiers to maintain accuracy compared with state-of-the-art systems while also providing explainable results that are easy to understand. To evaluate the proposed system, we have implemented a demonstration of intrusion weak-indication detectors, each working on a different principle to ensure robustness. We tested the architecture with various real-world and lab-created datasets, and it correctly identified 94.3% of infected IoT devices without false positives. Furthermore, the implementation is designed to work on top of extended bidirectional flow data, making it deployable on large 100 Gbps large-scale networks at the level of Internet Service Providers. Thus, a single instance of BOTA can protect millions of devices connected to end-users’ local networks and significantly reduce the threat arising from powerful IoT botnets.

CESNET-QUIC22: A large one-month QUIC network traffic dataset from backbone lines

Autoři
Luxemburk, J.; Hynek, K.; Čejka, T.; Lukačovič, A.; Šiška, P.
Rok
2023
Publikováno
Data in Brief. 2023, 2023(46), ISSN 2352-3409.
Typ
Článek
Anotace
The QUIC (Quick UDP Internet Connection) protocol has the potential to replace TLS over TCP, which is the standard choice for reliable and secure Internet communication. Due to its design that makes the inspection of QUIC handshakes challenging and its usage in HTTP/3, there is an increasing demand for research in QUIC traffic analysis. This dataset contains one month of QUIC traffic collected in an ISP backbone network, which connects 500 large institutions and serves around half a million people. The data are delivered as enriched flows that can be useful for various network monitoring tasks. The provided server names and packet-level information allow research in the encrypted traffic classification area. Moreover, included QUIC versions and user agents (smartphone, web browser, and operating system identifiers) provide information for large-scale QUIC deployment studies.

Encrypted traffic classification: the QUIC case

Autoři
Luxemburk, J.; Hynek, K.; Čejka, T.
Rok
2023
Publikováno
Proceedings of the 7th Network Traffic Measurement and Analysis Conference. Piscataway: IEEE, 2023. ISBN 978-3-903176-58-4.
Typ
Stať ve sborníku
Anotace
The QUIC protocol is a new reliable and secure transport protocol that is an alternative to TLS over TCP. However, compared to TLS, QUIC obfuscates the connection hand-shake and the server name indication domain, making a simple inspection more challenging. The classification of QUIC traffic has also received less attention than that of TLS. In this work, we present a comprehensive study aiming to explore the challenges of QUIC traffic classification. We selected three models: 1) multi-modal CNN, 2) LighGBM, and 3) IP-based classifier, and evaluated their properties using a large one-month CESNET-QUIC22 dataset with 102 web service labels. The developed classifiers reached up to 88% accuracy and set the new baseline in fine-grained QUIC service classification. Moreover, the real nature of the dataset and its long time span allowed us to collect a number of insights and measure the classifiers' performance in the presence of data drift.

Enhancing DeCrypto: Finding Cryptocurrency Miners Based on Periodic Behavior

Rok
2023
Publikováno
2023 19th International Conference on Network and Service Management (CNSM). New York: IEEE, 2023. International Conference on Network and Service Management. vol. 19. ISSN 2165-9605. ISBN 978-3-903176-59-1.
Typ
Stať ve sborníku
Anotace
While the popularity of cryptocurrencies and the whole industry's value are rising, the number of threat actors who use illegal “coin miner mal ware” is increasing as well. The threat actors commonly use computational resources of companies, research and educational institutions, or end users. In this paper, we analyzed the long-term periodic behavior of the cryptocurrency miners communicating in computer networks. We propose a novel method for cryptominers detection using specially designed periodicity features. The detection algorithm is based on the mathematical detection of periodic Flow time series (FTS) and feature mining. Altogether with the Machine Learning technique, the resulting system achieves high-precision performance. Furthermore, our approach enhances a flow-based cryptominers detection system DeCrypto to further improve its reliability and feasibility for high-speed networks.

Evaluation of passive OS fingerprinting methods using TCP/IP fields

Autoři
Hulák, M.; Bartoš, V.; Čejka, T.
Rok
2023
Publikováno
2023 8th International Conference on Smart and Sustainable Technologies (SpliTech). New Jersey: IEEE, 2023. ISBN 978-953-290-128-3.
Typ
Stať ve sborníku
Anotace
An important part of network management is to keep knowledge about the connected devices. One of the tools that can provide such information in real-time is passive OS fingerprinting, in particular the method based on analyzing values of specific TCP/IP headers. The state-of-the-art approach is to use machine learning to create such OS classifier. In this paper, we focus on the evaluation of this approach from several perspectives. We took two existing public datasets and created a new one from our network and trained machine learning models to classify the 4 most common operation system families based on selected TCP/IP fields. We compare different models, discuss the need to round TTL values to avoid over-fitting, and test the transferability of models trained on data from different networks. Although TCP/IP-related characteristics of individual operating systems should be independent on where the device is located, our experiments show that a model trained in one network performs much worse in another one, making model creation and deployment more difficult in practice. A good solution may be to combine data from multiple networks. A model trained on a combination of all three datasets exhibited the best results on average across the datasets.

Evaluation of the Limit of Detection in Network Dataset Quality Assessment with PerQoDA

Autoři
Wasielewska, K.; Soukup, D.; Čejka, T.; Camacho, J.
Rok
2023
Publikováno
ECML PKDD 2022: Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Cham: Springer, 2023. p. 170-185. ISSN 1865-0929. ISBN 978-3-031-23632-7.
Typ
Stať ve sborníku
Anotace
Machine learning is recognised as a relevant approach to detect attacks and other anomalies in network traffic. However, there are still no suitable network datasets that would enable effective detection. On the other hand, the preparation of a network dataset is not easy due to privacy reasons but also due to the lack of tools for assessing their quality. In a previous paper, we proposed a new method for data quality assessment based on permutation testing. This paper presents a parallel study on the limits of detection of such an approach. We focus on the problem of network flow classification and use well-known machine learning techniques. The experiments were performed using publicly available network datasets.

Fine-grained TLS services classification with reject option

Autoři
Luxemburk, J.; Čejka, T.
Rok
2023
Publikováno
Computer Networks. 2023, 220 ISSN 1389-1286.
Typ
Článek
Anotace
The recent success and proliferation of machine learning and deep learning have provided powerful tools, which are also utilized for encrypted traffic analysis, classification, and threat detection in computer networks. These methods, neural networks in particular, are often complex and require a huge corpus of training data. Therefore, this paper focuses on collecting a large up-to-date dataset with almost 200 fine-grained service labels and 140 million network flows extended with packet-level metadata. The number of flows is three orders of magnitude higher than in other existing public labeled datasets of encrypted traffic. The number of service labels, which is important to make the problem hard and realistic, is four times higher than in the public dataset with the most class labels. The published dataset is intended as a benchmark for identifying services in encrypted traffic. Service identification can be further extended with the task of “rejecting” unknown services, i.e., the traffic not seen during the training phase. Neural networks offer superior performance for tackling this more challenging problem. To showcase the dataset’s usefulness, we implemented a neural network with a multi-modal architecture, which is the state-of-the-art approach, and achieved 97.04% classification accuracy and detected 91.94% of unknown services with 5% false positive rate.

Look at my Network: An Insight into the ISP Backbone Traffic

Autoři
Beneš, T.; Pešek, J.; Čejka, T.
Rok
2023
Publikováno
2023 19th International Conference on Network and Service Management (CNSM). New York: IEEE, 2023. International Conference on Network and Service Management. vol. 19. ISSN 2165-9605. ISBN 978-3-903176-59-1.
Typ
Stať ve sborníku
Anotace
High-speed ISP networks provide several challenges that prevent the creation of long-term datasets for giving insight into the traffic. Currently, there are no publicly available long-term datasets capturing the entirety of high-speed ISP networks. Such networks are traditionally monitored using IP Flows, which provide enough high-level information about the situation in the network and support various use cases, such as the detection of outages or security threats. Even with this type of aggregation long-term datasets are very unpractical due to their size. The other problem is that flow monitoring comes with significant aggregation and common traffic statistics are brief and lack useful details and require further processing. This paper addresses these problems and presents a new long-term aggregated dataset, a detailed analysis of public network traffic measured on the ISP backbone, and a monitoring architecture composed of open-source tools capable of using an existing flow exporter infrastructure. Such insight into traffic helps to design and develop hardware optimizations, tuning the performance of monitoring systems, and adapting security detection algorithms.

Network Traffic Classification Based on Single Flow Time Series Analysis

Rok
2023
Publikováno
2023 19th International Conference on Network and Service Management (CNSM). New York: IEEE, 2023. International Conference on Network and Service Management. vol. 19. ISSN 2165-9605. ISBN 978-3-903176-59-1.
Typ
Stať ve sborníku
Anotace
Network traffic monitoring using IP flows is used to handle the current challenge of analyzing encrypted network communication. Nevertheless, the packet aggregation into flow records naturally causes information loss; therefore, this paper proposes a novel flow extension for traffic features based on the time series analysis of the Single Flow Time series, i.e., a time series created by the number of bytes in each packet and its timestamp. We propose 69 universal features based on the statistical analysis of data points, time domain analysis, packet distribution within the flow timespan, time series behavior, and frequency domain analysis. We have demonstrated the usability and universality of the proposed feature vector for various network traffic classification tasks using 15 well-known publicly available datasets. Our evaluation shows that the novel feature vector achieves classification performance similar or better than related works on both binary and multiclass classification tasks. In more than half of the evaluated tasks, the classification performance increased by up to 5 %.

Unevenly Spaced Time Series from Network Traffic

Rok
2023
Publikováno
Proceedings of the 7th Network Traffic Measurement and Analysis Conference. Piscataway: IEEE, 2023. ISBN 978-3-903176-58-4.
Typ
Stať ve sborníku
Anotace
Reliable detection of security events is essential for network security. Therefore, a suitable traffic representation and model are required. Contrary to the currently used approaches, this paper presents Unevenly Spaced Time Series (USTS) as a feasible representation of network traffic with several brilliant benefits for analysis. The article concerns several types of USTS. A dataset captured on a real ISP network was created to evaluate the properties of USTS. The dataset contains over 35 million time series. We experimentaly proved the USTS is suitable for network traffic analysis and allow automatic processing, e.g., to classify network traffic.

Classification of network traffic

Rok
2022
Publikováno
Proceedings of the 10th Prague Embedded Systems Workshop. Praha: CTU. Faculty of Information Technology, 2022. p. 52-58. ISBN 978-80-01-07015-4.
Typ
Stať ve sborníku
Anotace
This paper describes the context of existing approaches to real-time net- work flow classification and focuses on the contributions of bachelor and master thesis of the author. The paper also proposes several research questions that are planned for the future Ph.D. study.

Collection of datasets with DNS over HTTPS traffic

Autoři
Jeřábek, K.; Hynek, K.; Čejka, T.; Ryšavý, O.
Rok
2022
Publikováno
Data in Brief. 2022, 2022(42), ISSN 2352-3409.
Typ
Článek
Anotace
Recently, the Internet has adopted the DNS over HTTPS (DoH) resolution mechanism for privacy-aware network applications. As DoH becomes more disseminated, it has also become a network monitoring research topic. For comprehensive evaluation and comparison of developed classifiers, real-world datasets are needed, motivating this contribution. We created a new large-scale collection of datasets consisting of two classes of traffic: i) DoH HTTPS communication and ii) non-DoH HTTPS connections. The DoH traffic is captured for multiple DoH providers and clients to include nuances of various DoH implementations and configurations. The non-DoH HTTPS connections complement the DoH communication aiming to include a wide range of existing network applications. The dataset collection consists of network traffic generated in a controlled environment and traffic captured from a real ISP network. The resulting datasets thus provide real-world network traffic data suitable for evaluating existing classifiers and the development of new methods.

DeCrypto: Finding Cryptocurrency Miners on ISP networks

Rok
2022
Publikováno
Secure IT Systems. Cham: Springer, 2022. p. 139-158. ISSN 0302-9743. ISBN 978-3-031-22294-8.
Typ
Stať ve sborníku
Anotace
With the rising popularity of cryptocurrencies and the increasing value of the whole industry, people are incentivized to join and earn revenues by cryptomining — using computational resources for cryptocurrency transaction verification. Nevertheless, there is an increasing number of abusive cryptomining cases, and it is reported that “coin miner malware” grew by more than 4000% in 2018. In this work, we analyzed the cryptominer network communication and proposed the DeCrypto system that can detect and report mining on high-speed 100 Gbps backbone Internet lines with millions of users. The detector uses the concept of heterogeneous weak-indication detectors (Machine-Learning-based, domain-based, and payload-based) that work together and create a robust and accurate detector with an extremely low false-positive rate. The detector was implemented and evaluated on a real nationwide high-speed network and proved efficient in a real-world deployment.

Discovering Coordinated Groups of IP Addresses Through Temporal Correlation of Alerts

Autoři
Žádník, M.; Wrona, J.; Hynek, K.; Čejka, T.; Husák, M.
Rok
2022
Publikováno
IEEE Access. 2022, 10(2022), 82799-82813. ISSN 2169-3536.
Typ
Článek
Anotace
Network-based monitoring and intrusion detection systems generate a high number of alerts reporting the suspicious activity of IP addresses. The majority of alerts are dropped due to their low relevance, low priority, or due to high number of alerts themselves. We assume that these alerts still contain valuable information, namely, about the coordination of IP addresses. Knowledge of the coordinated IP addresses improves situational awareness and reflects the requirement of security analysts as well as automated reasoning tools to have as much contextual information as possible to make an informed decision. To validate our assumption, we introduce a novel method to discover the groups of coordinated IP addresses that exhibit a temporal correlation of their alerts. We evaluate our method on data from a real sharing platform reporting approximately 1.5 million alerts per day. The results show that our method can indeed discover groups of truly coordinated IP addresses.

Large Scale Analysis of DoH Deployment on the Internet

Autoři
García, S.; Bogado Garcia, J.; Hynek, K.; Vekshin, D.; Čejka, T.; Wasicek, A.
Rok
2022
Publikováno
Computer Security - ESORICS 2022. Cham: Springer International Publishing, 2022. p. 145-165. vol. 13556. ISSN 0302-9743. ISBN 978-3-031-17142-0.
Typ
Stať ve sborníku
Anotace
DNS over HTTPS (DoH) is one of the standards to protect the security and privacy of users. The choice of DoH provider has controversial consequences, from monopolisation of surveillance to lost visibility by network administrators and security providers. More importantly, it is a novel security business. Software products and organisations depend on users choosing well-known and trusted DoH resolvers. However, there is no comprehensive study on the number of DoH resolvers on the Internet, its growth, and the trustworthiness of the organisations behind them. This paper studies the deployment of DoH resolvers by (i) scanning the whole Internet for DoH resolvers in 2021 and 2022; (ii) creating lists of well-known DoH resolvers by the community; (iii) characterising what those resolvers are, (iv) comparing the growth and differences. Results show that (i) the number of DoH resolvers increased 4.8 times in the period 2021-2022, (ii) the number of organisations providing DoH services has doubled, and (iii) the number of DoH resolvers in 2022 is 28 times larger than the number of well-known DoH resolvers by the community. Moreover, 94% of the public DoH resolvers on the Internet are unknown to the community, 77% use certificates from free services, and 57% belong to unknown organisations or personal servers. We conclude that the number of DoH resolvers is growing at a fast rate; also that at least 30% of them are not completely trustworthy and users should be very careful when choosing a DoH resolver.

Network traffic classification based on periodic behavior detection

Rok
2022
Publikováno
Proceedings of 2022 18th International Conference on Network and Service Management (CNSM). New York: IEEE, 2022. p. 359-363. ISSN 2165-9605. ISBN 978-3-903176-51-5.
Typ
Stať ve sborníku
Anotace
Even though encryption hides the content of communication from network monitoring and security systems, this paper shows a feasible way to retrieve useful information about the observed traffic. The paper deals with detection of periodic behavioral patterns of the communication that can be detected using time series created from network traffic by autocorrelation function and Lomb-Scargle periodogram. The revealed characteristics of the periodic behavior can be further exploited to recognize particular applications. We have experimented with the created dataset of 61 classes, and trained a machine learning classifier based on XGBoost that performed the best in our experiments, reaching 90% F1-score.

Summary of DNS Over HTTPS Abuse

Autoři
Hynek, K.; Vekshin, D.; Luxemburk, J.; Čejka, T.; Wasicek, A.
Rok
2022
Publikováno
IEEE Access. 2022, 10(2022), 54668-54680. ISSN 2169-3536.
Typ
Článek
Anotace
The Internet Engineering Task Force adopted the DNS over HTTPS protocol in 2018 to remediate privacy issues regarding the plain text transmission of the DNS protocol. According to our observations and the analysis described in this paper, protecting DNS queries using HTTPS entails security threats. This paper surveys DoH related research works and analyzes malicious and unwanted activities that leverage DNS over HTTPS and can be currently observed in the wild. Additionally, we describe three real-world abuse scenarios observed in the web environment that reveal how service providers intentionally use DNS over HTTPS to violate policies. Last but not least, we identified several research challenges that we consider important for future security research.

Tunneling through DNS over TLS providers

Autoři
Melcher, L.; Hynek, K.; Čejka, T.
Rok
2022
Publikováno
Proceedings of 2022 18th International Conference on Network and Service Management (CNSM). New York: IEEE, 2022. p. 359-363. ISSN 2165-9605. ISBN 978-3-903176-51-5.
Typ
Stať ve sborníku vyzvaná či oceněná
Anotace
DNS over TLS (DoT) is one of the approaches for private DNS resolution, which has already gained support by open resolvers. Moreover, DoT is used by default in Android operating systems. This study investigates the possibility of creating DNS covert channels using DoT, which is a security threat that benefits from the increased privacy of encrypted communication. We evaluated the performance and usability of DoT tunnels created via commonly used resolvers. Our results show that the performance characteristics of DoT tunnels differ vastly depending on the used DoT resolver; however, the creation of a DoT tunnel is possible, reaching speeds up to 232 Kbps. Moreover, we successfully transferred data via DoT servers claiming Anti-Virus protection and family-friendly content.

Vision of Active Learning Framework Approach to Network Traffic Analysis Research

Autoři
Pešek, J.; Soukup, D.; Čejka, T.
Rok
2022
Publikováno
Proceedings of the 10th Prague Embedded Systems Workshop. Praha: CTU. Faculty of Information Technology, 2022. p. 68-72. ISBN 978-80-01-07015-4.
Typ
Stať ve sborníku
Anotace
Current research in the network security domain intensively uses machine learning (ML) and artificial intelligence to automate processes and reveal hidden patterns in data. These technologies, however, require lots of training datasets with ideally high quality. Additionally, network infrastructures continuously evolve and thus network traffic dynamically changes in time as well. There is an urgent need to adapt machine learning models, update datasets with the latest samples of annotated network traffic and retrain the models regularly to sustain feasible performance. Active Learning Framework (ALF) directly targets these demands and aims to provide a modular platform for scientific experiments and deployment in practice as well as to support research activities regarding quality of datasets. This paper particularly describes ALF software and proposes its possible use cases in research and practice domains.

Detection of HTTPS Brute-Force Attacks with Packet-Level Feature Set

Autoři
Luxemburk, J.; Hynek, K.; Čejka, T.
Rok
2021
Publikováno
11th Annual Computing and Communication Workshop and Conference (CCWC2021). Piscataway (New Jersey): IEEE, 2021. p. 0115-0123. ISBN 978-0-7381-4394-1.
Typ
Stať ve sborníku
Anotace
This paper presents a novel approach to detect brute-force attacks against web services in high-speed networks. The prevalence of brute-force attacks is so high that service providers, such as ISPs or web-hosting providers, cannot depend on their customers' host-based defenses. Moreover, the rising usage of encryption makes it more difficult to detect attacks on the network level. In our research, we created a dataset, which consists of 1.8 million extended IP flows from a backbone network combined with IP flows generated with three popular open-source brute-forcing tools. We identified a distinctive packet-level feature set and trained a machine-learning classifier with a false positive rate of 10^-4 and a true positive rate (the ratio of discovered attacks) of 0.938. The achieved results surpass the state-of-the-art solutions and show that the developed HTTPS brute-force detection algorithm is viable for production deployment.

Novel HTTPS classifier driven by packet bursts, flows, and machine learning

Autoři
Tropková, Z.; Hynek, K.; Čejka, T.
Rok
2021
Publikováno
Proceedings of the 2021 17th International Conference on Network and Service Management. New York: IEEE, 2021. p. 345-349. ISSN 2165-963X. ISBN 978-3-903176-36-2.
Typ
Stať ve sborníku
Anotace
Encryption of network traffic recently starts to cover remaining readable information, which is heavily used by current monitoring systems; thus, it is time to focus on novel methods of encrypted traffic analysis and classification. The aim of this paper is to define a new network traffic characteristic called Sequence of packet Burst Length and Time (SBLT), which was inspired by existing approaches and definitions. Contrary to other works, SBLT is feasible even for high-speed backbone networks as a part of IP flow data. The advantage of SBLT features is shown using a machine learning classification model for HTTPS traffic types as an example. This paper presents the definition of SBLT, proposes a new annotated public dataset of HTTPS traffic with 5 categories, and evaluates the developed classifier reaching accuracy over 99 %. This classifier can help analysts to deal with a huge amount of encrypted traffic and maintain situational awareness.

Towards Evaluating Quality of Datasets for Network Traffic Domain

Autoři
Soukup, D.; Tisovčík, P.; Hynek, K.; Čejka, T.
Rok
2021
Publikováno
Proceedings of the 2021 17th International Conference on Network and Service Management. New York: IEEE, 2021. p. 264-268. ISSN 2165-963X. ISBN 978-3-903176-36-2.
Typ
Stať ve sborníku
Anotace
This paper deals with the quality of network traffic datasets created to train and validate machine learning classification and detection methods. Naturally, there is a long epoch of research targeted at data quality; however, it is focused mainly on data consistency, validity, precision, and other metrics, which are insufficient for network traffic use-cases. The rise of Machine learning usage in network monitoring applications requires a new methodology for evaluation datasets. There is a need to evaluate and compare traffic samples captured at different conditions and decide the usability of the already captured and annotated data. This paper aims to explain a use case of dataset creation, propose definitions regarding the quality of the network traffic datasets, and finally, describe a framework for datasets analysis.

Behavior Anomaly Detection in IoT Networks

Rok
2020
Publikováno
Proceeding of the International Conference on Computer Networks, Big Data and IoT (ICCBI - 2019). Cham: Springer International Publishing, 2020. p. 465-473. Lecture Notes on Data Engineering and Communications Technologies. vol. 49. ISSN 2367-4520. ISBN 978-3-030-43192-1.
Typ
Kapitola v knize
Anotace
Data encryption makes deep packet inspection less suitable nowadays, and the need of analyzing encrypted traffic is growing. Machine learning brings new options to recognize a type of communication despite the heterogeneity of encrypted IoT traffic right at the network edge. We propose the design of scalable architecture and the method for behavior anomaly detection in IoT networks. Combination of two existing semi-supervised techniques that we used ensures higher reliability of anomaly detection and improves results achieved by a single method. We describe conducted classification and anomaly detection experiments allowed thanks to existing and our training datasets. Presented satisfying results provide a subject for further work and allow us to elaborate on this idea.

Classification of Network Traffic using Traffic Features

Rok
2020
Publikováno
Proceedings of the 8th Prague Embedded Systems Workshop. Praha: Czech Technical University in Prague, 2020. p. 17-18. ISBN 978-80-01-06772-7.
Typ
Stať ve sborníku
Anotace
Computer networks are gradually becoming essential people’s needs. The amount of network traffic and network devices is increasing every day due to improvements and expansion of network infrastructure.The new trend of smart phones, watches, fridges and, in general, smart homes connect a high number of new devices into a network infrastructure. Therefore, the overall volume of network traffic grows, and also networks are getting more complex, which means they are harder to monitor. The main focus of our presentation is the monitoring technology for high speed networks that is able to analyze and classify network traffic automatically. Traffic classification is an essential functionality for various purposes, such as network security. Identification of types of network traffic is a part of the process of, e.g., forensic analysis. Therefore, the accurate and fast classification algorithm provides valuable information for network operators and security analysts. As a software prototype for our experiments, we use NEMEA system. We have developed NEMEA modules that contain the classification algorithms. These prototypes allow us to compare different algorithms in an experimental environment with offline data, and the same software module (with the best performance) can also be deployed in production for online analysis.

DoH detection: Discovering hidden DNS

Autoři
Hynek, K.; Čejka, T.; Vekshin, D.
Rok
2020
Publikováno
Proceedings of the 8th Prague Embedded Systems Workshop. Praha: Czech Technical University in Prague, 2020. p. 14-16. ISBN 978-80-01-06772-7.
Typ
Stať ve sborníku
Anotace
The necessity of securing users’ privacy on the internet has given the rise of a new protocol called DNSover HTTPS (DoH). It aims to replace traditional DNS for domain name translation with encryption as a benefit. Unfortunately, the laudable attempt to increase the privacy of users also brings some security threats as well. Readable information from DNS is one of the most essential data-source in computer security, especially for security forensic analysis. The DNS queries in the network can reveal malicious activity in the network like the presence of malware, botnet communication, and also data exfiltration.Thus network administrators might want to block encrypted DoH in their network, however, the currently available approaches are based on lists of IP adresses of well-known DoH providers/resolvers. This way of detection can be easily surpassed by its own private or not generally known DoH resolver. Since the presence of DoH communication might also indicate some malicious activity or at least a policy violation, we decided to find a possible way to detect DoH based on the traffic behavior. This research aims to recognize DoH from extended IP flow data by Machine Learning regardless IP addresses.

DoH Insight: Detecting DNS over HTTPS by Machine Learning

Autoři
Vekshin, D.; Hynek, K.; Čejka, T.
Rok
2020
Publikováno
ARES '20: Proceedings of the 15th International Conference on Availability, Reliability and Security. New York: ACM, 2020. p. 1-8. ISBN 978-1-4503-8833-7.
Typ
Stať ve sborníku
Anotace
Over the past few years, a new protocol DNS over HTTPS (DoH) has been created to improve users' privacy on the internet. DoH can be used instead of traditional DNS for domain name translation with encryption as a benefit. This new feature also brings some threats because various security tools depend on readable information from DNS to identify, e.g., malware, botnet communication, and data exfiltration. Therefore, this paper focuses on the possibilities of encrypted traffic analysis, especially on the accurate recognition of DoH. The aim is to evaluate what information (if any) can be gained from HTTPS extended IP flow data using machine learning. We evaluated five popular ML methods to find the best DoH classifiers. The experiments show that the accuracy of DoH recognition is over 99.9 %. Additionally, it is also possible to identify the application that was used for DoH communication, since we have discovered (using created datasets) significant differences in the behavior of Firefox, Chrome, and cloudflared. Our trained classifier can distinguish between DoH clients with the 99.9 % accuracy.

Evaluating Bad Hosts Using Adaptive Blacklist Filter

Autoři
Rok
2020
Publikováno
Proceedings of the 9th Mediterranean Conference on Embedded Computing - MECO'2020. Institute of Electrical and Electronics Engineers, Inc., 2020. p. 306-310. ISSN 2637-9511. ISBN 978-1-7281-6949-1.
Typ
Stať ve sborníku
Anotace
Publicly available blacklists are popular tools to capture and spread information about misbehaving entities on the Internet. In some cases, their straight-forward utilization leads to many false positives. In this work, we propose a system that combines blacklists with network flow data while introducing automated evaluation techniques to avoid reporting unreliable alerts. The core of the system is formed by an Adaptive Filter together with an Evaluator module. The assessment of the system was performed on data obtained from a national backbone network. The results show the contribution of such a system to the reduction of unreliable alerts.

Pipelined ALU for effective external memory access in FPGA

Autoři
Beneš, T.; Kekely, M.; Hynek, K.; Čejka, T.
Rok
2020
Publikováno
Proceedings of the 23rd Euromicro Conference on Digital Systems Design. Los Alamitos, CA: IEEE Computer Soc., 2020. p. 97-100. ISBN 978-1-7281-9535-3.
Typ
Stať ve sborníku
Anotace
The external memories in digital design are closely related to high response time. The most common approach to mitigate latency is adding a caching mechanism into the memory subsystem. This solution might be sufficient in CPU architecture, where we can reschedule operations when a cache miss occurs. However, the FPGA architectures are usually accelerators with simple functionality, where it is not possible to postpone work. The cache miss often leads to whole pipeline stall or even to data loss. The architecture we present in this paper reduces this problem by aggregating arithmetic operations into the memory subsystem itself. Our architecture reaches a speed of 200 Mp/s (operations carried out). It is designed to be used in systems with link speeds of 100 Gb/s. It outperforms other implementations by a factor of at least 3. The additional benefit of our architecture is reducing the number of memory transactions by a factor of two on real-world datasets.

Privacy Illusion: Beware of Unpadded DoH

Rok
2020
Publikováno
2020 11th IEEE Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON). Montreal: IEEE, 2020. p. 621-628. ISSN 2644-3163. ISBN 978-1-7281-8416-6.
Typ
Stať ve sborníku vyzvaná či oceněná
Anotace
DNS over HTTPS (DoH) has been created with ambitions to improve the privacy of users on the internet. Domain names that are being resolved by DoH are transferred via an encrypted channel, ensures nobody should be able to read the content. However, even though the communication is encrypted, we show that it still leaks some private information, which can be misused. Therefore, this paper studies the behavior of the DoH protocol implementation in Firefox and Chrome web-browsers, and the level of detail that can be revealed by observing and analyzing packet-level information. The aim of this paper is to evaluate and highlight discovered privacy weaknesses hidden in DoH. By the trained machine learning classifier, it is possible to infer individual domain names only from the captured encrypted DoH connection. The resulting trained classifier can infer domain name from encrypted DNS traffic with surprisingly high accuracy up to 90% on HTTP 1.1, and up to 70% on HTTP 2 protocol.

QoD: Ideas about Evaluating Quality of Datasets

Rok
2020
Publikováno
Proceedings of the 8th Prague Embedded Systems Workshop. Praha: Czech Technical University in Prague, 2020. p. 8-9. ISBN 978-80-01-06772-7.
Typ
Stať ve sborníku
Anotace
Importance of computer networks is raising every year. The reason is that we are connecting more and more devices, applications and our daily routines depends on connectivity. On the other hand, this is a great potential for attackers. They can hide their activities in complex network environment and steal valuable data. Without solid dataset, our evaluation score is misinterpreting the real score in production environment, and, therefore, proper datasets have essential role in research&development of any ML-based classifier or detector. The main motivation for this paper is to find a way how to evaluate quality of any dataset to estimate if it is good enough for ML experiments. To our best knowledge, there are only a few studies focused on quality evaluation of datasets with network traffic. For experiments, we selected datasets about DNS over HTTP (DoH) detection and URL classification problems that are already being elaborated. All metrics are calculated from dataset level. Impact of these metrics is evaluated on Random Forest (RF) model. We show results we have discovered in our datasets and ML detection modules. Eventually, we discuss possible next steps in this research.

Refined detection of SSH brute-force attackers using machine learning

Autoři
Rok
2020
Publikováno
ICT Systems Security and Privacy Protection. Cham: Springer, 2020. p. 49-63. IFIP Advances in Information and Communication Technology. vol. 580. ISSN 1868-4238. ISBN 978-3-030-58200-5.
Typ
Stať ve sborníku
Anotace
This paper presents a novel approach to detect SSH brute-force (BF) attacks in high-speed networks. Contrary to host-based approaches, we focus on network traffic analysis to identify attackers. Recent papers describe how to detect BF attacks using pure NetFlow data. However, our evaluation shows significant false-positive (FP) results of the current solution. To overcome the issue of high FP rate, we propose a machine learning (ML) approach to detection using specially extended IP Flows. The contributions of this paper are a new dataset from real environment, experimentally selected ML method, which performs with high accuracy and low FP rate, and an architecture of the detection system. The dataset for training was created using extensive evaluation of captured real traffic, manually prepared legitimate SSH traffic with characteristics similar to BF attacks, and, finally, using a packet trace with SSH logs from real production servers.

The next step of P4 FPGA architectures: External Memories

Autoři
Beneš, T.; Čejka, T.; Kubátová, H.
Rok
2020
Publikováno
Proceedings of the 8th Prague Embedded Systems Workshop. Praha: Czech Technical University in Prague, 2020. p. 5-7. ISBN 978-80-01-06772-7.
Typ
Stať ve sborníku
Anotace
P4 is a recent feasible technology that helps to make a modern infrastructure flexible and readyfor changes. Software solutions are available, but not efficient enough for high throughput and lowlatency applications. Therefore, hardware acceleration is used commonly. This paper discusses caveatsof currently existing approaches, mainly focused on FPGAs, which are flexible but resource-limited.Our aim is to propose an extension of standard P4 architecture to support external memory and explain apossible approach to overcome the issues.

Future approaches to monitoring in high-speed backbone networks

Autoři
Rok
2019
Publikováno
Proceedings of the 7th Prague Embedded Systems Workshop. Praha: ČVUT FIT, Katedra číslicového návrhu, 2019. p. 27-28. ISBN 978-80-01-06607-2.
Typ
Stať ve sborníku
Anotace
Network monitoring features has been always a challenge in high-speed networks. Some of themlike detailed traffic analysis and packet inspection are not suited or simply not feasible even on modernhardware. The challenges are becoming even greater with an uprise of encrypted traffic. This leaves largeopportunity for threat actors to take advantage of. Therefore, it is necessary to develop a new generationof monitoring tools that can deal with the current issues for security purposes. This research aims toimprove traffic analysis techniques to handle encrypted traffic, and also to adapt hardware acceleratedmonitoring components for processing.

L7 capable flow exporter described in P4

Autoři
Havránek, J.; Čejka, T.; Benáček, P.
Rok
2019
Publikováno
Proceedings of the 7th Prague Embedded Systems Workshop. Praha: ČVUT FIT, Katedra číslicového návrhu, 2019. p. 29-32. ISBN 978-80-01-06607-2.
Typ
Stať ve sborníku
Anotace
Current flow exporters are the essential source of information for monitoring systems. They usually cre-ate aggregated information as flow data and, additionally, it is possible to extract headers from higherlayer protocols (L7). Due to requirements on high throughput, the flow exporters use hardware accel-eration to handle high packet rate at link speed (aiming at least 100 Gb/s). However, manually createddesign of such high-performance devices is very complex and complicated. Therefore, we propose touse a high-level P4 language for description of network traffic processing device that will be capable ofhandling L7 information. As our recent works show, it is possible to generate high-performance firmwaredesign automatically based on P4 description. Since P4 is not primarily intended for processing L7 data,this paper proposes a feasible way to overcome limits of P4.

Augmented DDoS Mitigation with Reputation Scores

Autoři
Jánský, T.; Čejka, T.; Žádník, M.; Bartoš, V.
Rok
2018
Publikováno
Proceedings of the 13th International Conference on Availability, Reliability and Security. New York: ACM, 2018. ARES 2018. ISBN 978-1-4503-6448-5.
Typ
Stať ve sborníku
Anotace
Network attacks, especially DoS and DDoS attacks, are a significant threat for all providers of services or infrastructure. The biggest attacks can paralyze even large-scale infrastructures of worldwide companies. Attack mitigation is a complex issue studied by many researchers and security companies. While several approaches were proposed, there is still space for improvement. This paper proposes to augment existing mitigation heuristic with knowledge of reputation score of network entities. The aim is to find a way to mitigate malicious traffic present in DDoS amplification attacks with minimal disruption to communication of legitimate traffic.

Enhanced Flow Monitoring with P4 Generated Flexible Packet Parser

Autoři
Čejka, T.; Velan, P.; Havránek, J.; Benáček, P.
Rok
2018
Publikováno
Proceedings of the 12th International Conference on Autonomous Infrastructure, Management and Security. Laxenburg: International Federation for Information Processing, 2018. p. 21-32. ISBN 978-3-903176-12-6.
Typ
Stať ve sborníku
Anotace
Passive network flow monitoring provides visibility into network traffic. It is necessary for many applications such as accounting, network management, and security. As its origins are in packet switching and routing devices, the common flow exporter implementations process only necessary packet headers. Link layer protocols are often skipped, and only the first network and transport layer headers are used to construct flow records. However, the network traffic is gradually becoming much more complex as new protocols are being used in practice. We present a novel multi-layer flow monitoring approach that handles complex protocol encapsulation. To process packets with an arbitrary number of protocols, we have created a new packet parser based on the P4 language, which is easily extensible and widely used in SDN networks. We argue that the new multi-layer flow monitoring approach provides more precise and detailed statistics about the traffic of overlay networks at a backbone level.

P4-To-VHDL: Automatic generation of high-speed input and output network blocks

Autoři
Benáček, P.; Puš, V.P.; Kubátová, H.; Čejka, T.
Rok
2018
Publikováno
Microprocessors and Microsystems. 2018, 56 22-33. ISSN 0141-9331.
Typ
Článek
Anotace
High-performance embedded architectures typically contain many stand-alone blocks which communicate and exchange data; additionally a high-speed network interface is usually needed at the boundary of the system. The software-based data processing is typically slow which leads to a need for hardware accelerated approaches. The problem is getting harder if the supported protocol stack is rapidly changing. Such problem can be effectively solved by the Field Programmable Gate Arrays and high-level synthesis which together provide a high degree of generality. This approach has several advantages like fast development or possibility to enable the area of packet-oriented communication to domain oriented experts. However, the typical disadvantage of this approach is the insufficient performance of generated system from a high-level description. This can be a serious problem in the case of a system which is required to process data at high packet rates. This work presents a generator of high-speed input (Parser) and output (Deparser) network blocks from the P4 language which is designed for the description of modern packet processing devices. The tool converts a P4 description to a synthesizable VHDL code suitable for the FPGA implementation. We present design, analysis and experimental results of our generator. Our results show that the generated circuits are able to process 100 Gbps traffic with fairly complex protocol structure at line rate on Xilinx Virtex-7 XCVH580T FPGA. The approach can be used not only in networking devices but also in other applications like packet processing engines in embedded cores because the P4 language is device and protocol independent.

Gateway for IoT Security

Autoři
Čejka, T.; Švepeš, M.; Viktorin, J.
Rok
2017
Publikováno
Proceedings of the 5th Prague Embedded Systems Workshop. Praha: katedra číslicového návrhu, 2017. ISBN 978-80-01-06178-7.
Typ
Stať ve sborníku
Anotace
In the last years, many devices and systems containing electronics were equipped with communication interfaces and it allowed people to read data from them and control the functionality of the devices remotely. Using the communication interfaces, it was possible to let devices communicate between each other without human interaction. The current state-of-the-art call this phenomenon as an Internet of Things (IoT). This kind of automation helps people to improve their lives and therefore in many cases people can become dependent on the devices. In some cases, the security of the devices and their communication is crucial. Unfortunately, as some of the manufacturers focus on low price, many devices and technologies are not secured enough. There is a research project called Secure Gateway for Internet of Things (SIoT) with several participants from the Czech academic institutions. The main goal of the project is a gateway based on open source technologies for secure deployment and operation of IoT devices.

Hunting SIP Authentication Attacks Efficiently

Autoři
Jánský, T.; Čejka, T.; Bartoš, V.
Rok
2017
Publikováno
Security of Networks and Services in an All-Connected World. Basel: Springer, 2017. p. 125-130. ISSN 0302-9743. ISBN 978-3-319-60773-3.
Typ
Stať ve sborníku
Anotace
Extended flow records with application layer (L7) information allow for detection of various types of malicious traffic. Voice over IP (VoIP) is an example of technology that works on L7 and many attacks against it cannot be reliably detected using just basic flow information. Session Initiation Protocol (SIP), which is commonly used for VoIP signalling, is a frequent target of many types of attacks. This paper proposes and evaluates a novel algorithm for near real time detection of username scanning and password guessing attacks on SIP servers. The detection is based on analysis of L7 extended flow records.

Making Flow-Based Security Detection Parallel

Autoři
Švepeš, M.; Čejka, T.
Rok
2017
Publikováno
Security of Networks and Services in an All-Connected World. Basel: Springer, 2017. p. 3-15. ISSN 0302-9743. ISBN 978-3-319-60773-3.
Typ
Stať ve sborníku
Anotace
Flow based monitoring is currently a standard approach suitable for large networks of ISP size. The main advantage of flow processing is a smaller amount of data due to aggregation. There are many reasons (such as huge volume of transferred data, attacks represented by many flow records) to develop scalable systems that can process flow data in parallel. This paper deals with splitting a stream of flow data in order to perform parallel anomaly detection on distributed computational nodes. Flow data distribution is focused not only on uniformity but mainly on successful detection. The results of an experimental analysis show that the proposed approach does not break important semantic relations between individual flow records and therefore it preserves detection results. All experiments were performed using real data traces from Czech National Education and Research Network.

Preserving Relations in Parallel Flow Data Processing

Autoři
Čejka, T.; Žádník, M.
Rok
2017
Publikováno
Security of Networks and Services in an All-Connected World. Basel: Springer, 2017. p. 153-156. ISSN 0302-9743. ISBN 978-3-319-60773-3.
Typ
Stať ve sborníku
Anotace
Network monitoring produces high volume of data that must be analyzed ideally in near real-time to support network security operations. It is possible to process the data using Big Data frameworks, however, such approach requires adaptation or complete redesign of processing tools to get the same results. This paper elaborates on a parallel processing based on splitting a stream of flow records. The goal is to create subsets of traffic that contain enough information for parallel anomaly detection. The paper describes a methodology based on so called witnesses that helps to scale up without any need to modify existing algorithms.

Analysis of Vertical Scans Discovered by Naive Detection

Autoři
Čejka, T.; Švepeš, M.
Rok
2016
Publikováno
Management and Security in the Age of Hyperconnectivity. Cham: Springer International Publishing, 2016. p. 165-169. 9701. ISSN 0302-9743. ISBN 978-3-319-39813-6.
Typ
Stať ve sborníku
Anotace
Network scans are very common and frequent events that appear in almost every network. Generally, the scans are quite harmless. Scanning can be useful for network operators, who need to know state of their infrastructures. Contrary, scans can be used also for gathering sensitive information by attackers. This paper describes a simple detection method that was used to detect vertical scans. Our aim is to show results of long-term measurement on backbone network and to show that it is possible to detect scans efficiently even with a simple method. The paper presents several interesting statistics that characterize network behavior and scanning frequency in a large high-speed national academic network.

Building a Feedback Loop to Capture Evidence of Network Incidents

Autoři
Rosa, Z.; Čejka, T.; Žádník, M.; Puš, V.
Rok
2016
Publikováno
12th International Conference on Network and Service Management. Montreal: IEEE, 2016. p. 292-296. ISSN 2165-963X. ISBN 978-3-901882-85-2.
Typ
Stať ve sborníku
Anotace
Flow measurement is extremely useful in network management, however, in some cases it is vital to observe the packets in full detail. To this end, we propose combining flow measurement, packet capture and network behavioral analysis. The evaluation of the proposed system shows its feasibility even in high-speed network environment.

Detecting Spoofed Time in NTP Traffic

Autoři
Čejka, T.; Robledo, A.
Rok
2016
Publikováno
Proceedings of the 4th Prague Embedded Systems Workshop. Praha: ČVUT FIT, Katedra číslicového návrhu, 2016. pp. 49-52. ISBN 978-80-01-05984-5.
Typ
Stať ve sborníku
Anotace
Almost every device connected into a computer network uses its own system time. In order to maintain precise system time, various time synchronization protocols are used. Such protocols allow for automatic adaptation of system time to keep it precise as much as possible. This paper deals with detection of possible exploit of vulnerability of the mostly used Network Time Protocol (NTP). Using spoofed NTP messages, an attacker is able to modify the system time of victims. Bad system time might lead to crucial security threats such as usage of already-expired certificated or cache poisoning or clearing.

NEMEA: A Framework for Network Traffic Analysis

Autoři
Čejka, T.; Bartoš, V.; Švepeš, M.; Rosa, Z.; Kubátová, H.
Rok
2016
Publikováno
12th International Conference on Network and Service Management. Montreal: IEEE, 2016. p. 195-201. ISSN 2165-963X. ISBN 978-3-901882-85-2.
Typ
Stať ve sborníku
Anotace
Since network attacks become more sophisticated, it is difficult to discover them using traditional analysis tools. For some kinds of attacks, it is necessary to analyze Application Layer (L7) information in order to detect them. However, there is a lack of existing tools capable of L7 processing and manipulation. Therefore, we propose a flow-based modular Network Measurements Analysis (NEMEA) system to overcome the situation. NEMEA is designed with respect to a stream-wise concept, i. e. data are analyzed continuously in memory with minimal data storage. NEMEA is developed as an open-source project and is publicly available for world-wide community. It is designed for both experimental and operational use. It is able to process off-line traffic traces as well as live network flows. The system is very flexible and can be easily extended by new modules. The modules are developed within a NEMEA framework that is a key component of the project. NEMEA thus represents a unified platform for research and development of new traffic analysis methods. It covers several important topics not limited to analysis and detection. Some of them are described in this paper. Originally, NEMEA has been developed for the purposes of Czech National Research and Education Network operator. Therefore, it is focused on handling high speed network traffic with links working at 100 Gbps.

Overload-resistant Network Traffic Analysis

Autoři
Švepeš, M.; Čejka, T.
Rok
2016
Publikováno
Proceedings of the 4th Prague Embedded Systems Workshop. Praha: ČVUT FIT, Katedra číslicového návrhu, 2016. pp. 53-58. ISBN 978-80-01-05984-5.
Typ
Stať ve sborníku
Anotace
Flow-based monitoring is currently a leading approach of network security analysis. A flow record is an aggregated information about network traffic. Since various network attacks use just a few packets per flow, the advantage of aggregation is seriously limited. As a side effect, monitoring infrastructure and analysis system are affected. This paper proposes an overload-resistant architecture of the detection system that would overcome high load of flow records in time of attack.

Easy configuration of NETCONF devices

Autoři
Alexa, D.; Čejka, T.
Rok
2015
Publikováno
Proceedings of the 3rd Prague Embedded Systems Workshop. Praha: ČVUT FIT, Katedra číslicového návrhu, 2015. pp. 3-9. ISBN 978-80-01-05776-6.
Typ
Stať ve sborníku
Anotace
It is necessary for developers of devices or systems to supply a user interface that can be used for control and monitoring. Visualisation of device’s configuration and state data belongs to non-trivial tasks as well as preparation of easy mechanism of configuration for end users. This paper is focused on universal graphical user interface for NETCONF protocol NetopeerGUI that is developed as an open-source project.NetopeerGUI is based on usage of standard technologies such as configuration protocol NETCONF and modeling language Yang, oth standardized by IETF. NetopeerGUI is a NETCONF client that can be easily used as a user interface for configuration and control of any device supporting NETCONF protocol. NetopeerGUI provides basic universal way of data presentation that helps developers to concentrate on device development. This paper proposes NetopeerGUI as an interface that can be deployed on a system to supply remote configuration and monitoring through the web browser and that can increase the speed of development process.

Nemea: Searching for Botnet Footprints

Autoři
Rok
2015
Publikováno
Proceedings of the 3rd Prague Embedded Systems Workshop. Praha: ČVUT FIT, Katedra číslicového návrhu, 2015. pp. 11-16. ISBN 978-80-01-05776-6.
Typ
Stať ve sborníku
Anotace
Malicious network traffic originated by malware means a serious threat. Current malware is designed to hide itself from the eyes of victim users as well as network administrators. It is very difficult or impossible to discover such traffic using traditional ways of flow-based monitoring. This paper describes a network traffic analysis of a backbone network as an attempt to discover infected devices. Cooperation with forensic laboratory and analysis of samples of malware allow to gain information that can lead to find unwanted traffic. Special tailored Nemea framework with high speed monitoring pipeline was used to discover infected devices on the network.

Using Application-Aware Flow Monitoring for SIP Fraud Detection

Autoři
Čejka, T.; Bartoš, V.; Truxa, L.; Kubátová, H.
Rok
2015
Publikováno
Intelligent Mechanisms for Network Configuration and Security. Cham: Springer International Publishing, 2015. p. 87-99. ISSN 0302-9743. ISBN 978-3-319-20033-0.
Typ
Stať ve sborníku
Anotace
Flow monitoring helps to discover many network security threats targeted to various applications or network protocols. In this paper, we show usage of the flow data for analysis of a Voice over IP (VoIP) traffic and a threat detection. A traditionally used flow record is insufficient for this purpose and therefore it was extended by application-layer information. In particular, we focus on the Session Initiation Protocol (SIP) and the type of a toll-fraud in which an attacker tries to exploit poor configuration of a private branch exchange (PBX). The attacker’s motivation is to make unauthorized calls to PSTN numbers that are usually charged at high rates and owned by the attacker. As a result, a successful attack can cause a significant financial loss to the owner of PBX. We propose a method for stream-wise and near real-time analysis of the SIP traffic and detection of the described threat. The method was implemented as a module of the Nemea system and deployed on a backbone network. It was evaluated using simulated as well as real attacks.

Change-point detection method on 100 Gb/s ethernet interface

Autoři
Benáček, P.; Blažek, R.; Čejka, T.; Kubátová, H.
Rok
2014
Publikováno
Architectures for Networking and Communications Systems (ANCS), 2014 ACM/IEEE Symposium on. New York: ACM, 2014. p. 245-246. ISBN 978-1-4503-2839-5.
Typ
Stať ve sborníku
Anotace
This paper deals with hardware acceleration of statistical methods for detection of anomalies on 100Gb/s Ethernet. The approach is demonstrated by implementing a sequential Non-Parametric Cumulative Sum (NP-CUSUM) procedure. We use high-level synthesis in combination with emerging software defined monitoring (SDM) methodology for rapid development of FPGA-based hardware-accelerated network monitoring applications. The implemented method offloads detection of network attacks and anomalies directly into an FPGA chip. The parallel nature of FPGA allows for simultaneous detection of various kinds of anomalies. Our results show that hardware acceleration of statistical methods using the SDM concept with high-level synthesis from C/C++ is possible and very promising for traffic analysis and anomaly detection in high-speed 100Gb/s networks.

FPGA Accelerated Change-Point Detection Method for 100 Gb/s Networks

Autoři
Čejka, T.; Kekely, L.; Benáček, P.; Blažek, R.; Kubátová, H.
Rok
2014
Publikováno
MEMICS proceedings. Brno: NOVPRESS, 2014. pp. 40-51. ISBN 978-80-214-5022-6.
Typ
Stať ve sborníku
Anotace
The aim of this paper is a hardware realization of a statistical anomaly detection method as a part of high-speed monitoring probe for computer networks. The sequential Non-Parametric Cumulative Sum (NP-CUSUM) procedure is the detection method of our choice and we use an FPGA based accelerator card as the target platform. For rapid detection algorithm development, a high-level synthesis (HLS) approach is applied. Furthermore, we combine HLS with the usage of Software Defined Monitoring (SDM) framework on the monitoring probe, which enables easy deployment of various hardware-accelerated monitoring applications into high-speed networks. Our implementation of NP-CUSUM algorithm serves as hardware plug-in for SDM and realizes the detection of network attacks and anomalies directly in FPGA. Additionally, the parallel nature of the FPGA technology allows us to realize multiple different detections simultaneously without any losses in throughput. Our experimental results show the feasibility of HLS and SDM combination for effective realization of traffic analysis and anomaly detection in networks with speeds up to 100 Gb/s.

Stream-wise Detection of Surreptitious Traffic over DNS

Autoři
Rok
2014
Publikováno
2014 IEEE 19th International Workshop on Computer Aided Modeling and Design of Communication Links and Networks (CAMAD) (CAMAD 2014). Pomona, California: IEEE Communications Society, 2014. p. 300-304. ISSN 2378-4865. ISBN 978-1-4799-5725-5.
Typ
Stať ve sborníku
Anotace
The Domain Name System (DNS) belongs to crucial services in a computer network. Because of its importance, DNS is usually allowed in security policies. That opens a way to break policies and to transfer data from/to restricted area due to misusage of a DNS infrastructure. This paper is focused on a detection of communication tunnels and other anomalies in a DNS traffic. The proposed detection module is designed to process huge volume of data and to detect anomalies at near real-time. It is based on combination of statistical analysis of several observed features including application layer information. Our aim is a stream-wise processing of huge volume of DNS data from backbone networks. To achieve these objectives with minimal resource consumption, the detection module uses efficient extended data structures. The performance evaluation has shown that the detector is able to process approximately 511 thousand DNS flow records per second. In addition, according to experiments, a tunnel that lasts over 30 seconds can be detected in a minute. During the on-line testing on a real traffic from production network, the module signalized on average over 60 confirmed alerts including DNS tunnels per day.

Systém pro detekci anomálií v počítačových sítích

Autoři
Rok
2013
Publikováno
Počítačové architektury a diagnostika - PAD 2013. Plzeň: Západočeská universita, Fakulta aplikovaných věd, 2013, pp. 51-56. ISBN 978-80-261-0270-0.
Typ
Stať ve sborníku
Anotace
Tato práce se zabývá systémem NEMEA pro analýzu síťových toků a detekci anomálií v počítačových sítích. Jedná se o vyvíjený distribuovaný modulární systém, který může sloužit pro porovnávání existujících detekčních metod, ale i snadnější vývoj a testování nových detekčních metod. Pod pojmem anomálie je v tomto kontextu myšlen stav, při kterém dochází k omezení kvality služeb, nebo kdy dochází k bezpečnostnímu incidentu (případně obojí). Tyto stavy je potřeba co nejdříve detekovat a nahlásit operátorům systému nebo sítě. Navrhovaný distribuovaný systém musí řešit včasnou detekci za pomoci detekčních metod implementovaných jako zapojitelné moduly, které si mezi sebou předávají data. Pro dosažení minimálního průměrného zpoždění detekce a četnosti falešných poplachů je třeba u některých metod vhodně nastavit jejich parametry. Navržení mechanismu automatického odhadu optimálních hodnot parametrů patří k vizím mé disertační práce.

Hardwarově akcelerovaná detekce anomálií v počítačových sítích s využitím FPGA

Autoři
Rok
2012
Publikováno
Počítačové architektury a diagnostika - PAD 2012. Praha: ČVUT v Praze, 2012, pp. 13-16. ISBN 978-80-01-05106-1.
Typ
Stať ve sborníku
Anotace
Tento příspěvek vysvětluje téma disertační práce a její motivaci. Cílem by měla být studie metodologie, která zefektivní funkcionalitu existujícícj metod detekce anomální v počítačových sítích, včetně vlastní vzorové implementace pomocí FPGA na COMBO kartě. Tento princip by měl být využitelný pro nasazení na velkých vysokorychlostních počítačových sítích, kde je nutná vysoká spolehlivost a dostupnost. Proto je nezbytné stav takovýchto sítí monitorovat a detekovat případné anomálie v reálném čase s co nejnižším výskytem falaešných poplachů a nízkou průměrnou dobou detekce.