Skip to content

A code repository about network intrusion detection using federated learning by utilizing UNR-IDD dataset

Notifications You must be signed in to change notification settings

Shekswess/Federated-Network-Intrusion-Detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Network-Intrusion-Detection

This is a code repository that includes all the code for the project Federated Network Intrusion Detection. The goal of this project is to compare XGBoost tabular classification model both in centralized and federated learning settings. The dataset used for this project is the UNR-IDD Intrusion Detection Dataset 2023. The dataset is used for binary and multi-class classification.

The dataset allows us to develop two kinds of models:

  • Binary Classification: The goal is to classify the network traffic as normal or attack.
  • Multi-class Classification: The goal is to classify the network traffic as normal or one of the 6 attack types.

Dataset

Features

The dataset has 30 features. The features are divided into 3 categories:

  • Port Statistics (9 features)
  • Delta Port Statistics (9 features)
  • Flow Entry and Flow Table Statistics (12 features)
Port Statistic Description
Received Packets Number of packets received by the port
Received Bytes Number of bytes received by the port
Sent Packets Number of packets sent by the port
Sent Bytes Number of bytes sent
Port alive Duration The time port has been alive in seconds
Packets Rx Dropped Number of packets dropped by the receiver
Packets Tx Dropped Number of packets dropped by the sender
Packets Rx Errors Number of transmit errors
Packets Tx Errors Number of receive errors
Delta Port Statistic Description
Delta Received Packets Number of packets received by the port
Delta Received Bytes Number of bytes received by the port
Delta Sent Packets Number of packets sent by the port
Delta Sent Bytes Number of bytes sent
Delta Port alive Duration The time port has been alive in seconds
Delta Packets Rx Dropped Number of packets dropped by the receiver
Delta Packets Tx Dropped Number of packets dropped by the sender
Delta Packets Rx Errors Number of transmit errors
Delta Packets Tx Errors Number of receive errors
Statistic Description
Connection Point Network connection point expressed as a pair of the network element identifier and port number.
Total Load/Rate Obtain the current observed total load/rate (in bytes/s) on a link.
Total Load/Latest Obtain the latest total load bytes counter viewed on that link.
Unknown Load/Rate Obtain the current observed unknown-sized load/rate (in bytes/s) on a link.
Unknown Load/Latest Obtain the latest unknown-sized load bytes counter viewed on that link.
Time seen When the above-mentioned values were last seen.
is_valid Indicates whether this load was built on valid values.
TableID Returns the Table ID values.
ActiveFlowEntries Returns the number of active flow entries in this table.
PacketsLookedUp Returns the number of packets looked up in the table.
PacketsMatched Returns the number of packets that successfully matched in the table.
MaxSize Returns the maximum size of this table.

Labels

The dataset has 2 kinds of labels:

  • For Binary Classification(Normal/Attack)
  • For Multi-class Classification(Normal/TCP-SYN Flood/Port Scan/Flow Table Overflow/Blackhole/Traffic Diversion)

Attacks:

  • TCP-SYN Flood: A Distributed Denial of Service (DDoS) attack where attackers target hosts by initiating many Transmission Control Protocol (TCP) handshake processes without waiting for the response from the target node. By doing so, the target device's resources are consumed as it has to keep allocating some memory space for every new TCP request.

  • Port scan: An attack in which attackers scan available ports on a host device to learn information about the services, versions, and even security mechanisms that are running on that host.

  • Flow Table Overflow: An attack that targets network switches/routers where attacks compromise the functionality of a switch/router by consuming the flow tables that forwards packets with illegitimate flow entries and rules so that legitimate flow entries and rules cannot be installed.

  • Blackhole: An attack that targets network switches/routers to discard the packets that pass through, instead of relaying them on to the next hop.

  • Traffic Diversion: A attack that targets network switches/routers to reroute the direction of packets away from their destination, intending to increase travel time and/or spying on network traffic through a man-in-the-middle scenario.\

These intrusion types were selected for this dataset as they are common cyber attacks that can occur in any networking environment. Also, these intrusion types cover attacks that can be launched on both network devices and end hosts.

Description of the labels:

Label Description
Normal Normal Network Functionality.
Attack Network Intrusion.
Label Description
Normal Normal Network Functionality.
TCP-SYN TCP-SYN Flood.
PortScan Port Scanning.
Overflow Flow Table Overflow.
Blackhole Blackhole Attack.
Diversion Traffic Diversion Attack.

More information about what is the idea behind the dataset, how it is collected, etc. can be found in the UNR-IDD Intrusion Detection Dataset 2023 website.

Data Exploration & Experimentation

The whole dataset has 37441 entries. The dataset is cleared and doesn't have any duplicates(it had only one duplicate in the original dataset). The dataset is imbalanced. There are some switches that don't have normal data. We run experiments with both all switches and only with switches that have balanced data with normal network data.

For centralized learning we are doing Leave-One-Subject-Out Cross Validation, mostly because the dataset is small and to have a compariable results to the federated learning experiments, because they are done on client level(switch). For the centralized learning also deep learning approaches like TabNet were tested, but the results were not as good as the XGBoost model, that's why the experimnets with federated learning are done only with XGBoost model.

For federated learning we are doing evaluation on the client level. Each client(switch) dataset is divided on train/test set(80:20 ratio) by keeping the same distribution of the classes. We tried different combination of rounds and number of clients for the federated learning experiments. The algorithm used for the federated learning is FedXgbBagging.

The main dataset which results are used has these clients(switches): ["of_000000000000000c", "of_000000000000000a", "of_000000000000000b", "of_0000000000000003","of_0000000000000004"]

Results

The results of the models can be seen by running the mlflow server in the pipeline folder. To run the mlflow server issue the following command in the terminal:

mlflow server

The results of the models can be seen in the mlflow UI. The mlflow UI can be accessed at http://localhost:5000.

Repository Structure

.
├── .vscode                                         # VS Code settings
├── src                                             # Source files
│   ├── dataset                                     # Dataset files
│   │   ├── initial_dataset                         # Initial dataset (raw)
│   │   └── processed_dataset                       # Processed dataset
│   ├── federated_pipeline                          # Federated learning pipeline files
│   │   ├── xgboost_federated_client.py             # XGBoost federated client
│   │   ├── xgboost_federated_server.py             # XGBoost federated server
│   │   └── xgboost_federated.py                    # XGBoost federated learning pipeline
│   ├── pipeline                                    # Centralized pipeline
│   │   ├── centralized_inference_pipeline.py       # Centralized inference pipeline
│   │   └── data_pipeline.py                        # Data preparation pipeline
│   └── utils                                       # Utility files
│       ├── centralized_inferences.py               # Centralized inferences class
│       ├── data_preparation.py                     # Data preparation functions
│       └── validation.py                           # Validation functions
├──.gitignore                                       # Gitignore file
├── README.md                                       # This file
└── requirements.txt                                # Requirements file

About

A code repository about network intrusion detection using federated learning by utilizing UNR-IDD dataset

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages