NetFlow V1 Datasets

Version 1 of the datasets is made up of 8 basic NetFlow features explained here.

Please click here to download the datasets in CSV format. The details of the datasets are published in;

Sarhan M., Layeghy S., Moustafa N., Portmann M. (2021) NetFlow Datasets for Machine Learning-Based Network Intrusion Detection Systems. In: Big Data Technologies and Applications. BDTA 2020, WiCON 2020. Springer, Cham. https://doi.org/10.1007/978-3-030-72802-1_9


NetFlow V2 Datasets

Version 2 of the datasets is made up of 43 extended NetFlow features explained here.

Please click here to download the datasets in CSV format. The details of the datasets are published in;

Mohanad Sarhan, Siamak Layeghy, and Marius Portmann, Towards a Standard Feature Set for Network Intrusion Detection System Datasets, Mobile Networks and Applications, 103, 108379, 2022.
https://doi.org/10.1007/s11036-021-01843-0


CICFlowMeter Datasets

CICFlowMeter format of the datasets is made up of 83 network features explained here.

Please click here to download the datasets in CSV format. The details of the datasets are published in;

Mohanad Sarhan, Siamak Layeghy, and Marius Portmann, Evaluating Standard Feature Sets Towards Increased Generalisability and Explainability of ML-based Network Intrusion Detection, Big Data Research, 30, 100359, 2022 https://doi.org/10.1016/j.bdr.2022.100359


License

The use of the datasets for academic research purposes is granted in perpetuity after citing the above papers. For commercial purposes, it should be agreed upon by the authors.

Please get in touch with the author Mohanad Sarhan for more details.


1. NF-UNSW-NB15

Please click here to download the dataset.

The NetFlow-based format of the UNSW-NB15 dataset, named NF-UNSW-NB15, has been developed and labelled with its respective attack categories. The total number of data flows is 1,623,118 out of which 72,406 (4.46%) are attack samples and 1,550,712 (95.54%) are benign. The attack samples are further classified into nine subcategories, The table below represents the NF-UNSW-NB15 dataset's distribution of all flows.

Class Count Description
Benign 1550712 Normal unmalicious flows
Fuzzers 19463 An attack in which the attacker sends large amounts of random data which cause a system to crash and also aim to discover security vulnerabilities in a system.
Analysis 1995 A group that presents a variety of threats that target web applications through ports, emails and scripts.
Backdoor 1782 A technique that aims to bypass security mechanisms by replying to specific constructed client applications.
DoS 5051 Denial of Service is an attempt to overload a computer system's resources with the aimof preventing access to or availability of its data.
Exploits 24736 Are sequences of commands controlling the behaviour of a host through a known vulnerability
Generic 5570 A method that targets cryptography and causes a collision with each block-cipher.
Reconnaissance 12291 A technique for gathering information about a network host and is also known as a probe.
Shellcode 1365 A malware that penetrates a code to control a victim's host.
Worms 153 Attacks that replicate themselves and spread to other computers.

2. NF-ToN-IoT

Please click here to download the dataset.

We utilised the publicly available pcaps of the ToN-IoT dataset to generate its NetFlow records, leading to a NetFlow-based IoT network dataset called NF-ToN-IoT. The total number of data flows is 1,379,274 out of which 1,108,995 (80.4%) are attack samples and 270,279 (19.6%) are benign ones, the table below lists and defines the distribution of the NF-ToN-IoT dataset.

Class Count Description
Benign 270279 Normal unmalicious flows
Backdoor 17247 A technique that aims to attack remote-access computers by replying to specific constructed client applications.
DoS 17717 An attempt to overload a computer system's resources with the aim of preventing access to or availability of its data.
DDoS 326345 An attempt similar to DoS but has multiple different distributed sources.
Injection 468539 A variety of attacks that supply untrusted inputs that aim to alter the course of execution, with SQL and Code injections two of the main ones.
MITM 1295 Man In The Middle is a method that places an attacker between a victim and host with which the victim is trying to communicate, with the aim of intercepting traffic and communications.
Password 156299 covers a variety of attacks aimed at retrieving passwords by either brute force or sniffing.
Ransomware 142 An attack that encrypts the files stored on a host and asks for compensation in exchange for the decryption technique/key.
Scanning 21467 A group that consists of a variety of techniques that aim to discover information about networks and hosts, and is also known as probing.
XSS 99944 Cross-site Scripting is a type of injection in which an attacker uses web applications to send malicious scripts to end-users.

3. NF-BoT-IoT

Please click here to download the dataset.

An IoT NetFlow-based dataset was generated using the BoT-IoT dataset, named NF-BoT-IoT. The features were extracted from the publicly available pcap files and the flows were labelled with their respective attack categories. The total number of data flows is 600,100 out of which 586,241 (97.69%) are attack samples and 13,859 (2.31%) are benign. There are four attack categories in the dataset, the table below represents the NF-BoT-IoT distribution of all flows.

Class Count Description
Benign 13859 Normal unmalicious flows
Reconnaissance 470655 A technique for gathering information about a network host and is also known as a probe.
DDoS 56844 Distributed Denial of Service is an attempt similar to DoS but has multiple different distributed sources.
DoS 56833 An attempt to overload a computer system's resources with the aim of preventing access to or availability of its data.
Theft 1909 A group of attacks that aims to obtain sensitive data such as data theft and keylogging

4. NF-CSE-CIC-IDS2018

Please click here to download the dataset.

We utilised the original pcap files of the CSE-CIC-IDS2018 dataset to generate a NetFlow-based dataset called NF-CSE-CIC-IDS2018. The total number of flows is 8,392,401 out of which 1,019,203 (12.14%) are attack samples and 7,373,198 (87.86%) are benign ones, the table below represents the dataset's distribution.

Class Count Description
Benign 7373198 Normal unmalicious flows
BruteForce 287597 A technique that aims to obtain usernames and password credentials by accessing a list of predefined possibilities
Bot 15683 An attack that enables an attacker to remotely control several hijacked computers to perform malicious activities.
DoS 269361 An attempt to overload a computer system's resources with the aim of preventing access to or availability of its data.
DDoS 380096 An attempt similar to DoS but has multiple different distributed sources.
Infiltration 62072 An inside attack that sends a malicious file via an email to exploit an application and is followed by a backdoor that scans the network for other vulnerabilities
Web Attacks 4394 A group that includes SQL injections, command injections and unrestricted file uploads

5. NF-UQ-NIDS

Please click here to download the dataset.

A comprehensive dataset, merging all the aforementioned datasets. The newly published dataset represents the benefits of shared dataset feature sets, where the merging of multiple smaller ones is possible. This will eventually lead to a bigger and more universal NIDS dataset containing flows from multiple network setups and different attack settings. An additional label feature identifies the original dataset of each flow. This can be used to compare the same attack scenarios conducted over two or more different test-bed networks. The attack categories have been modified to combine all parent categories. Attacks named DoS attacks-Hulk, DoS attacks-SlowHTTPTest, DoS attacks-GoldenEye and DoS attacks-Slowloris have been renamed to the parent DoS category. Attacks named DDOS attack-LOIC-UDP, DDOS attack-HOIC and DDoS attacks-LOIC-HTTP have been renamed to DDoS. Attacks named FTP-BruteForce, SSH-Bruteforce, Brute Force -Web and Brute Force -XSS have been combined as a brute-force category. Finally, SQL Injection attacks have been included in the injection attacks category. The NF-UQ-NIDS dataset has a total of 11,994,893 records, out of which 9,208,048 (76.77%) are benign flows and 2,786,845 (23.23%) attacks. The table below lists the distribution of the final attack categories.

Class Count
Benign 9208048
DDoS 763285
Reconnaissance 482946
Injection 468575
DoS 348962
Brute Force 291955
Password 156299
XSS 99944
Infilteration 62072
Exploits 24736
Scanning 21467
Fuzzers 19463
Backdoor 19029
Bot 15683
Generic 5570
Analysis 1995
Theft 1909
Shellcode 1365
MITM 1295
Worms 153
Ransomware 142

6. NF-UNSW-NB15-v2

Please click here to download the dataset.

The NetFlow-based format of the UNSW-NB15 dataset, named NF-UNSW-NB15, has been expanded with additional NetFlow features and labelled with its respective attack categories. The total number of data flows is 2,390,275 out of which 95,053 (3.98%) are attack samples and 2,295,222 (96.02%) are benign. The attack samples are further classified into nine subcategories, the table below represents the NF-UNSW-NB15-v2 dataset's distribution of all flows.

Class Count Description
Benign 2295222 Normal unmalicious flows
Fuzzers 22310 An attack in which the attacker sends large amounts of random data which cause a system to crash and also aim to discover security vulnerabilities in a system.
Analysis 2299 A group that presents a variety of threats that target web applications through ports, emails and scripts.
Backdoor 2169 A technique that aims to bypass security mechanisms by replying to specific constructed client applications.
DoS 5794 Denial of Service is an attempt to overload a computer system's resources with the aimof preventing access to or availability of its data.
Exploits 31551 Are sequences of commands controlling the behaviour of a host through a known vulnerability
Generic 16560 A method that targets cryptography and causes a collision with each block-cipher.
Reconnaissance 12779 A technique for gathering information about a network host and is also known as a probe.
Shellcode 1427 A malware that penetrates a code to control a victim's host.
Worms 164 Attacks that replicate themselves and spread to other computers.

7. NF-ToN-IoT-v2

Please click here to download the dataset.

The publicly available pcaps of the ToN-IoT dataset are utilised to generate its NetFlow records, leading to a NetFlow-based IoT network dataset called NF-ToN-IoT. The total number of data flows is 16,940,496 out of which 10,841,027 (63.99%) are attack samples and 6,099,469 (36.01%), the table below lists and defines the distribution of the NF-ToN-IoT-v2 dataset.

Class Count Description
Benign 6099469 Normal unmalicious flows
Backdoor 16809 A technique that aims to attack remote-access computers by replying to specific constructed client applications.
DoS 712609 An attempt to overload a computer system's resources with the aim of preventing access to or availability of its data.
DDoS 2026234 An attempt similar to DoS but has multiple different distributed sources.
Injection 684465 A variety of attacks that supply untrusted inputs that aim to alter the course of execution, with SQL and Code injections two of the main ones.
MITM 7723 Man In The Middle is a method that places an attacker between a victim and host with which the victim is trying to communicate, with the aim of intercepting traffic and communications.
Password 1153323 covers a variety of attacks aimed at retrieving passwords by either brute force or sniffing.
Ransomware 3425 An attack that encrypts the files stored on a host and asks for compensation in exchange for the decryption technique/key.
Scanning 3781419 A group that consists of a variety of techniques that aim to discover information about networks and hosts, and is also known as probing.
XSS 2455020 Cross-site Scripting is a type of injection in which an attacker uses web applications to send malicious scripts to end-users.

8. NF-BoT-IoT-v2

Please click here to download the dataset.

An IoT NetFlow-based dataset was generated by expanding the NF-BoT-IoT dataset. The features were extracted from the publicly available pcap files and the flows were labelled with their respective attack categories. The total number of data flows is 37,763,497 out of which 37,628,460 (99.64%) are attack samples and 135,037 (0.36%) are benign. There are four attack categories in the dataset, the table below represents the NF-BoT-IoT-v2 distribution of all flows.

Class Count Description
Benign 135037 Normal unmalicious flows
Reconnaissance 2620999 A technique for gathering information about a network host and is also known as a probe.
DDoS 18331847 Distributed Denial of Service is an attempt similar to DoS but has multiple different distributed sources.
DoS 16673183 An attempt to overload a computer system's resources with the aim of preventing access to or availability of its data.
Theft 2431 A group of attacks that aims to obtain sensitive data such as data theft and keylogging

9. NF-CSE-CIC-IDS2018-v2

Please click here to download the dataset.

The original pcap files of the CSE-CIC-IDS2018 dataset are utilised to generate a NetFlow-based dataset called NF-CSE-CIC-IDS2018-v2. The total number of flows is 18,893,708 out of which 2,258,141 (11.95%) are attack samples and 16,635,567 (88.05%) are benign ones, the table below represents the dataset's distribution.

Class Count Description
Benign 16635567 Normal unmalicious flows
BruteForce 120912 A technique that aims to obtain usernames and password credentials by accessing a list of predefined possibilities
Bot 143097 An attack that enables an attacker to remotely control several hijacked computers to perform malicious activities.
DoS 483999 An attempt to overload a computer system's resources with the aim of preventing access to or availability of its data.
DDoS 1390270 An attempt similar to DoS but has multiple different distributed sources.
Infiltration 116361 An inside attack that sends a malicious file via an email to exploit an application and is followed by a backdoor that scans the network for other vulnerabilities
Web Attacks 3502 A group that includes SQL injections, command injections and unrestricted file uploads

10. NF-UQ-NIDS-v2

Please click here to download the dataset.

A comprehensive dataset, merging all the aforementioned datasets. The newly published dataset represents the benefits of the shared dataset feature sets, where the merging of multiple smaller datasets is possible. This will eventually lead to a bigger and a universal NIDS dataset containing flows from multiple network setups and different attack settings. It includes an additional label feature, identifying the original dataset of each flow. This can be used to compare the same attack scenarios conducted over two or more different testbed networks. The attack categories have been modified to combine all parent categories. Attacks named DoS attacks-Hulk, DoS attacks-SlowHTTPTest, DoS attacks-GoldenEye and DoS attacks-Slowloris have been renamed to the parent DoS category. Attacks named DDoS attack-LOIC-UDP, DDoS attack-HOIC and DDoS attacks-LOIC-HTTP have been renamed to DDoS. Attacks named FTP-BruteForce, SSH-Bruteforce, Brute Force -Web and Brute Force -XSS have been combined as a brute-force category. Finally, SQL Injection attacks have been included in the injection attacks category. The NF-UQ-NIDS dataset has a total of 75,987,976 records, out of which 25,165,295 (33.12%) are benign flows and 50,822,681 (66.88%) are attacks. The table below lists the distribution of the final attack categories.

Class Count
Benign 25165295
DDoS 21748351
Reconnaissance 2633778
Injection 684897
DoS 17875585
Brute Force 123982
Password 1153323
XSS 2455020
Infilteration 116361
Exploits 31551
Scanning 3781419
Fuzzers 22310
Backdoor 18978
Bot 143097
Generic 16560
Analysis 2299
Theft 2431
Shellcode 1427
MITM 7723
Worms 164
Ransomware 3425

11. CIC-ToN-IoT

Please click here to download the dataset.

A dataset generated where the feature set of the CICFlowMeter was extracted from the pcap files of the ToN-IoT dataset. The CICFlowMeter-v4 tool was utilised to extract 83 features. There are 5,351,760 data samples where 2,836,524 (53.00%) are attacks and 2,515,236 (47.00%) are benign samples.

Class Count Description
Benign 2515236 Normal unmalicious flows
Backdoor 27145 A technique that aims to attack remote-access computers by replying to specific constructed client applications.
DoS 145 An attempt to overload a computer system's resources with the aim of preventing access to or availability of its data.
DDoS 202 An attempt similar to DoS but has multiple different distributed sources.
Injection 277696 A variety of attacks that supply untrusted inputs that aim to alter the course of execution, with SQL and Code injections two of the main ones.
MITM 517 Man In The Middle is a method that places an attacker between a victim and host with which the victim is trying to communicate, with the aim of intercepting traffic and communications.
Password 340208 covers a variety of attacks aimed at retrieving passwords by either brute force or sniffing.
Ransomware 5098 An attack that encrypts the files stored on a host and asks for compensation in exchange for the decryption technique/key.
Scanning 36205 A group that consists of a variety of techniques that aim to discover information about networks and hosts, and is also known as probing.
XSS 2149308 Cross-site Scripting is a type of injection in which an attacker uses web applications to send malicious scripts to end-users.

12. CIC-BoT-IoT

Please click here to download the dataset.

The CICFlowMeter-v4 was used to extract 83 features from the BoT-IoT dataset pcap files. The dataset contains 13,428,602 records in total, containing 13,339,356 (99.34%) attack samples and 89,246 (0.66%) benign samples. The attack samples are made up of four attack scenarios inherited from the parent dataset, i.e., DDoS, DoS, reconnaissance, and theft.

Class Count Description
Benign 89246 Normal unmalicious flows
Reconnaissance 3514330 A technique for gathering information about a network host and is also known as a probe.
DDoS 4913920 Distributed Denial of Service is an attempt similar to DoS but has multiple different distributed sources.
DoS 4909405 An attempt to overload a computer system's resources with the aim of preventing access to or availability of its data.
Theft 1701 A group of attacks that aims to obtain sensitive data such as data theft and keylogging