I was investigating to create KDD Cup 99 attributes on a live traffic. I encountered with some papers telling that they reproduce the same attribute values by using Bro-IDS. I am not sure whether all the values can be gathered from a live traffic, so i am asking whether it is possible to calculate the below attributes from a live GBit traffic.
Num.
|
Name
|
Type
|
Description
|
- | - | - | - |
1
|
duration
|
integer
|
duration of the connection
|
2
|
protocol_type
|
nominal
|
protocol type of the connection: TCP, UDP and ICMP
|
3
|
service
|
nominal
|
http, ftp, smtp, telnet… and other (if not much used service)
|
4
|
flag
|
nominal
|
connection status. The possible status are this: SF, S0, S1, S2, S3,OTH, REJ, RSTO, RSTOS0, SH, RSTRH, SHR
|
5
|
src_bytes
|
integer
|
bytes sent in one connection
|
6
|
dst_bytes
|
integer
|
bytes received in one connection
|
7
|
land
|
binary
|
if source and destination IP addresses and port numbers are equal then this variable takes value 1 else 0
|
8
|
wrong_fragment
|
integer
|
sum of bad checksum packets in a connection
|
9
|
urgent
|
integer
|
sum of urgent packets in a connections. Urgent packets are packet with the urgent bit activated
|
Here i am not sure about the wrong_fragment and urgent packet number part. Will be great if someone enlightens me.
Num.
|
Name
|
Type
|
Description
|
- | - | - | - |
10
|
hot
|
integer
|
sum of hot actions in a connection such as: entering a systetory, creating programs and executing programs
|
11
|
num_failed_logins
|
integer
|
number of incorrect logins in a connection
|
12
|
logged_in
|
integer
|
if the login is correct then 1 else 0
|
13
|
num_compromised
|
integer
|
sum of times appearance “not found” error in a connection
|
14
|
root_shell
|
integer
|
if the root gets the shell then 1 else 0
|
15
|
su_attempted
|
integer
|
if the su command has been used then 1 else 0
|
16
|
num_root
|
integer
|
sum of operations performed as root in a connection
|
17
|
num_file_creations
|
integer
|
sum of file creations in a connection
|
18
|
num_shells
|
integer
|
number of logins of normal users
|
19
|
num_access_files
|
integer
|
sum of operations in control files in a connection
|
20
|
num_outbound_cmds
|
integer
|
sum of outbound commands in a ftp session
|
21
|
is_hot_login
|
integer
|
if the user is accessing as root or adm
|
22
|
is_guest_login
|
integer
|
if the user is accessing as guest, anonymous or visitor
|
It seems these attributes require payload analysis. I am not sure whether Bro is able to detect some of them by default rules or whether i will need to write some custom ones.
Num.
|
Name
|
Type
|
Description
|
- | - | - | - |
23
|
count
|
integer
|
sum of connections to the same destination IP address
|
24
|
srv_count
|
integer
|
sum of connections to the same destination port number
|
25
|
serror_rate
|
real
|
the percentage of connections that have activated the flag (4) s0, s1, s2
or s3, among the connections aggregated in count (23)
|
26
|
srv_serror_rate
|
real
|
the percentage of connections that have activated the flag (4) s0, s1, s2
or s3, among the connections aggregated in srv_count (24)
|
27
|
rerror_rate
|
real
|
the percentage of connections that have activated the flag (4) REJ,
among the connections aggregated in count (23)
|
28
|
srv_error_rate
|
real
|
the percentage of connections that have activated the flag (4) REJ,
among the connections aggregated in srv_count (24)
|
29
|
same_srv_rate
|
real
|
the percentage of connections that were to the same service, among
the connections aggregated in count (23)
|
30
|
diff_srv_rate
|
real
|
the percentage of connections that were to different services, among
the connections aggregated in count (23)
|
31
|
srv_diff_host_rate
|
real
|
the percentage of connections that were to different destination ma-
chines among the connections aggregated in srv_count (24)
|
These are totally ambiguous to me. I think i will need extra issue to handle som results. But whether to wait some people to guide me first.
So if bro-ids is enough to calculate above attributes from a live traffic somehow, whether either saving some attributes to DB and then reprocessing them or any guidance will be appreciated. What i am trying is to recreate these attributes for a real traffic and test my algorithm with the up to date dataset.