Publication details

Stream4Flow: Software for mining and analysis of the large volumes of network traffic

Authors

JIRSÍK Tomáš ČERMÁK Milan TOVARŇÁK Daniel PAULOVIČ Jakub Samuel ŠTEFÁNIK Michal

Year of publication 2016
MU Faculty or unit

Institute of Computer Science

Description A framework for the real-time IP flow data analysis built on Apache Spark Streaming, a modern distributed stream processing system. The basis of the Stream4Flow framework is formed by the IPFIXCol collector, Kafka messaging system, Apache Spark, and Elastic Stack. IPFIXCol enables incoming IP flow records to be transformed into the JSON format provided to the Kafka messaging system. The selection of Kafka was based on its scalability and partitioning possibilities, which provide sufficient data throughput. Apache Spark was selected as the data stream processing framework for its quick IP flow data throughput, available programming languages (Scala, Java, or Python) and MapReduce programming model. The analysis results are stored in Elastic Stack containing Logstash, Elasticsearch, and Kibana, which enable storage, querying, and visualizing the results. The Stream4Flow framework also contains the additional web interface in order to make administration easier and visualize complex results of the analysis. Due to above-described architecture, the framework is suitable for host monitoring and long-term malicious behavior discovery, description of the behavior of individual entities in the network and building its reputation record. It is also suitable for real-time attack detection, network monitoring, and overall situational awareness.
Related projects: