another kdd cup question

I was investigating to create KDD Cup 99 attributes on a live traffic. I encountered with some papers telling that they reproduce the same attribute values by using Bro-IDS. I am not sure whether all the values can be gathered from a live traffic, so i am asking whether it is possible to calculate the below attributes from a live GBit traffic.



Num.

|

Name

|

Type

|

Description

|

  • | - | - | - |


    1

    |

    duration

    |

    integer

    |

    duration of the connection

    |


    2









    |

    protocol_type

    |

    nominal

    |

    protocol type of the connection: TCP, UDP and ICMP

    |


    3









    |

    service

    |

    nominal

    |

    http, ftp, smtp, telnet… and other (if not much used service)

    |


    4














    |

    flag

    |

    nominal

    |

    connection status. The possible status are this: SF, S0, S1, S2, S3,OTH, REJ, RSTO, RSTOS0, SH, RSTRH, SHR

    |


    5










    |

    src_bytes

    |

    integer

    |

    bytes sent in one connection

    |


    6









    |

    dst_bytes

    |

    integer

    |

    bytes received in one connection

    |


    7














    |

    land

    |

    binary

    |

    if source and destination IP addresses and port numbers are equal then this variable takes value 1 else 0

    |


    8









    |

    wrong_fragment

    |

    integer

    |

    sum of bad checksum packets in a connection

    |


    9














    |

    urgent

    |

    integer

    |

    sum of urgent packets in a connections. Urgent packets are packet with the urgent bit activated

    |

Here i am not sure about the wrong_fragment and urgent packet number part. Will be great if someone enlightens me.



Num.

|

Name

|

Type

|

Description

|

  • | - | - | - |


    10

    |

    hot






    |

    integer






    |

    sum of hot actions in a connection such as: entering a systetory, creating programs and executing programs






    |


    11









    |

    num_failed_logins






    |

    integer






    |

    number of incorrect logins in a connection






    |


    12









    |

    logged_in






    |

    integer






    |

    if the login is correct then 1 else 0






    |


    13














    |

    num_compromised






    |

    integer






    |

    sum of times appearance “not found” error in a connection






    |


    14










    |

    root_shell






    |

    integer






    |

    if the root gets the shell then 1 else 0






    |


    15









    |

    su_attempted






    |

    integer






    |

    if the su command has been used then 1 else 0






    |


    16














    |

    num_root






    |

    integer






    |

    sum of operations performed as root in a connection






    |


    17









    |

    num_file_creations






    |

    integer






    |

    sum of file creations in a connection






    |


    18














    |

    num_shells






    |

    integer






    |

    number of logins of normal users






    |


    19

    |

    num_access_files






    |

    integer






    |

    sum of operations in control files in a connection






    |


    20

    |

    num_outbound_cmds






    |

    integer






    |

    sum of outbound commands in a ftp session






    |


    21

    |

    is_hot_login






    |

    integer






    |

    if the user is accessing as root or adm






    |


    22

    |

    is_guest_login






    |

    integer






    |

    if the user is accessing as guest, anonymous or visitor






    |

It seems these attributes require payload analysis. I am not sure whether Bro is able to detect some of them by default rules or whether i will need to write some custom ones.



Num.

|

Name

|

Type

|

Description

|

  • | - | - | - |


    23

    |

    count





    |

    integer

    |

    sum of connections to the same destination IP address

    |


    24









    |

    srv_count





    |

    integer

    |

    sum of connections to the same destination port number

    |


    25









    |

    serror_rate





    |

    real

    |

    the percentage of connections that have activated the flag (4) s0, s1, s2



    or s3, among the connections aggregated in count (23)

    |


    26














    |

    srv_serror_rate





    |

    real

    |

    the percentage of connections that have activated the flag (4) s0, s1, s2



    or s3, among the connections aggregated in srv_count (24)

    |


    27










    |

    rerror_rate





    |

    real

    |

    the percentage of connections that have activated the flag (4) REJ,



    among the connections aggregated in count (23)

    |


    28









    |

    srv_error_rate





    |

    real

    |

    the percentage of connections that have activated the flag (4) REJ,



    among the connections aggregated in srv_count (24)

    |


    29














    |

    same_srv_rate





    |

    real

    |

    the percentage of connections that were to the same service, among



    the connections aggregated in count (23)

    |


    30









    |

    diff_srv_rate





    |

    real

    |

    the percentage of connections that were to different services, among



    the connections aggregated in count (23)

    |


    31














    |

    srv_diff_host_rate





    |

    real

    |

    the percentage of connections that were to different destination ma-



    chines among the connections aggregated in srv_count (24)

    |

These are totally ambiguous to me. I think i will need extra issue to handle som results. But whether to wait some people to guide me first.

So if bro-ids is enough to calculate above attributes from a live traffic somehow, whether either saving some attributes to DB and then reprocessing them or any guidance will be appreciated. What i am trying is to recreate these attributes for a real traffic and test my algorithm with the up to date dataset.

Hi:

A number of your items (specifically # 10-22) appear to require inspection inside interactive sessions, which (unless the connection is cleartext), is not accessible to a network level monitor. Lack of inspection into sessions, and the security benefits gained as a result are major benefits of modern session tools, of which the standard is ssh.

If you have access to the systems you wish to monitor, you can install Instrumented SSHd, which will send a clear-text stream of the session to a bro monitor for inspection. See: https://code.google.com/p/auditing-sshd/

Some of the information you want might also be logged via syslog, such as authentication events.

Hope this helps.

Jim Mellander
NERSC Cybersecurity

Hi:

Hi,

A number of your items (specifically # 10-22) appear to require inspection
inside interactive sessions, which (unless the connection is cleartext), is
not accessible to a network level monitor. Lack of inspection into
sessions, and the security benefits gained as a result are major benefits
of modern session tools, of which the standard is ssh.

If you have access to the systems you wish to monitor, you can install
Instrumented SSHd, which will send a clear-text stream of the session to a
bro monitor for inspection. See: Google Code Archive - Long-term storage for Google Code Project Hosting.

Some of the information you want might also be logged via syslog, such as
authentication events.

Very informative thank you. How about sum of bad checksum packets in a
connection and sum of urgent packets in a connections? Does Bro display the
packet based info as well or should i write some custom handlers for it?