Sanity check - Grabbing platform tokens from browser user agents (was p0f)

After being asked if Bro could be used to gather passive intelligence on OS usage I started investigating places to try to identify OS. I initially was looking into p0f and Seth showed me a way to invoke the existing p0f fingerprinting functionality within Bro, but also suggested a slew of other data sources to look at. I wasn't terribly excited with the p0f fingerprint output, and while browser user agents may not be the best data source, I decided to start by looking at platform tokens and reporting on those instead of the p0f data. This is my first-ish bro script and it is by no means a complete script (it only matches a handful of Windows OS). I'm wondering if folks see anything in the attached that would misbehave badly if used on live traffic instead of pcaps?

Regards,

browser-platform.bro (1.84 KB)

I've tried the below script now on a Bro 2.2 (release version) cluster watching about 8Gbps of traffic and it seems to do what I intended which is to create a separate log file that keeps track of IP to OS mappings, for a handful of Windows Desktop OS versions, as pulled from user agent strings sent over HTTP. In the first full day in production I successfully logged around 227,000 unique IP to OS mappings. I have since slightly modified it to only log IPs represented in local nets to reduce log volume as I'm mostly interested in my own networks.

One issue I'm running into is in keeping track of IP to OS mappings and only logging them once per day. I've set an expire timer for 1 day, but in production it seems to only keep track of those IPs for the duration of a log rotation interval which is set to 20 minutes. I have observed that without the expire timer each mapping will log continuously, so it appears to get used in some way, but just seems to be tied to log rotation instead of the explicit value in the script. I'm guessing I need to do something different, but not sure what. Thought? Script is pasted below.

================= Begin Script ==============
@load base/utils/site

module BrowserPlatform;

export
{
     # The fully resolved name for this log will be BrowserPlatform::LOG
     redef enum Log::ID += { LOG };

     type Info: record {
         ts: time &log &optional;
         uid: string &log &optional;
         host: addr &log &optional;
         platform_token: string &log &optional;
         unparsed_version: string &log &optional;
     };

     # A set of seen IP + OS combinations. Used to prevent logging the same combo repeatedly.
     global seen_browser_platforms: set[string] &create_expire = 1.0 day &synchronized &redef;
}

event bro_init() &priority=5
     {
     Log::create_stream(BrowserPlatform::LOG,[$columns=Info]);
     }

event http_header(c: connection, is_orig: bool, name: string, value: string)
{
     local platform = "Unknown OS";
     if ( is_orig && Site::is_local_addr(c$id$orig_h) )
         {
         if ( name == "USER-AGENT" && /Windows NT 5.1/ in value )
                 {
                 platform = "Windows XP";
                 }
         else if ( name == "USER-AGENT" && /Windows NT 6.0/ in value )
                 {
                 platform = "Windows Vista";
                 }
         else if ( name == "USER-AGENT" && /Windows NT 6.1/ in value )
                 {
                 platform = "Windows 7";
                 }
         else if ( name == "USER-AGENT" && /Windows NT 6.2/ in value )
                 {
                 platform = "Windows 8";
                 }
         else if ( name == "USER-AGENT" && /Windows NT 6.3/ in value )
                 {
                 platform = "Windows 8.1";
                 }
         }
     local saw = cat(c$id$orig_h,platform); #There is probably a less ugly way to do this than cat, but it seems to work
     if ( platform != "Unknown OS" && saw !in seen_browser_platforms )
         {
         local rec: BrowserPlatform::Info = [$ts=network_time(), $uid=c$uid, $host=c$id$orig_h, $platform_token=platform, $unparsed_version=value];
         Log::write(BrowserPlatform::LOG, rec);
         add seen_browser_platforms[saw];
         }
}

================ End script ==================

Thanks,

Gary Faulkner
UW Madison
Office of Campus Information Security

..

Modifying the http_header event handler as follows will increase performance:

event http_header(c: connection, is_orig: bool, name: string, value: string)
{
    if(!is_orig || name != "USER-AGENT")
        return;
    if(/Windows NT 5.1/ in value)
        platform = "Windows XP";
    else if ...

FWIW, I used to do this kind of thing outside of bro using splunk:

https://github.com/JustinAzoff/splunk-scripts/blob/master/ua2os.py

One thing you may want to do is rather than use the http_header event
use

event log_software(rec: Info)
{
    ...
}

which will be raised every time a new software version is seen. The
software framework is already pulling most of the info out that you
might need, so you can piggy back on the work that it is doing.

Thanks for the suggestions, that cleans that bit up quite nicely. I actually started by trying to deconstruct the various software.bro scripts and work my way backwards through the framework to see what was doing what. I'm still trying to navigate my way through that code, but I agree that it would make more sense to leverage it directly than create a derivative just to pull out a specific bit of the data. I'm not currently running Splunk in any production sense, but that is pretty much what I'm trying to do in Bro. Thanks for sharing it!

Regards,
Gary

After running various iterations of the original script against several pcaps of our local traffic (and a couple days of live traffic) I ended up finding a lot of user agents that would match against the desktop/server OS rules, but were not necessarily desktops or servers. I ended up adding to the matching rules in part to parse out these things and also to detect other things we were interested in. Checking for more things seems to incur a performance penalty, so I also made some effort to move some of the more common matches sooner in the if/else statements to avoid having to check all of the less likely items first. The create_expire statement still doesn't behave as I expected, as each match is logged once per log rotation as opposed to once per day, but the matching seems to work with the exception that it doesn't check for every possible user agent case. I may also be missing explicitly including scripts that are already commonly loaded.

======================== Begin Script ========================
@load base/utils/site

module BrowserPlatform;

export
{
     # The fully resolved name for this log will be BrowserPlatform::LOG
     redef enum Log::ID += { LOG };

     type Info: record {
         ts: time &log &optional;
         uid: string &log &optional;
         host: addr &log &optional;
         platform_token: string &log &optional;
         unparsed_version: string &log &optional;
     };

     # A set of seen IP + OS combinations. Used to prevent logging the same combo repeatedly.
     global seen_browser_platforms: set[string] &create_expire = 1.0 day &synchronized &redef;
}

event bro_init() &priority=5
     {
     Log::create_stream(BrowserPlatform::LOG,[$columns=Info]);
     }

event http_header(c: connection, is_orig: bool, name: string, value: string)
{
     local platform = "Unknown OS";
     if (!is_orig || name != "USER-AGENT" || !Site::is_local_addr(c$id$orig_h))
         return;

# Parse out Apple IOS and Android variants first as some apps will dispay as compatible with a desktop OS version

     if ( /iPhone/ in value )
     platform = "iPhone";
     else if ( /iPad/ in value )
         platform = "iPad";
     else if ( /iPod/ in value )
         platform = "iPod";
     else if ( /Android/ in value )
         platform = "Android";

# Once we've parsed out mobiles move onto desktop/server OS
# User agents listed in order of expected use or to pre-parse user-agents that might otherwise match multiple rules.

     else if ( /Windows/ in value )
         {
     if ( /Xbox/ in value ) # often includes a Windows OS version or identifies as a Mobile browser
         platform = "Xbox";
         else if ( /Phone/ in value || /Mobile/ in value ) # often includes Windows OS version
             platform = "Windows Phone";
         else if ( /Windows NT 6.1/ in value )
              platform = "Windows 7";
         else if ( /Windows NT 5.1/ in value )
              platform = "Windows XP";
         else if ( /Windows NT 5.2/ in value && /WOW64/ in value )
              platform = "Windows XP x64";
         else if ( /Windows NT 6.0/ in value )
              platform = "Windows Vista";
         else if ( /Windows NT 6.2/ in value )
              platform = "Windows 8";
         else if ( /Windows NT 6.3/ in value )
              platform = "Windows 8.1";
        else if ( /Windows 95/ in value )
              platform = "Windows 95";
         else if ( /Windows 98/ in value && /4.90/ !in value )
              platform = "Windows 98";
         else if ( /Win 9x 4.90/ in value )
              platform = "Windows Me";
         else if ( /Windows NT 4.0/ in value )
              platform = "Windows NT 4.0";
         else if ( /Windows NT 5.0/ in value || /Windows 2000/ in value )
              platform = "Windows 2000";
# Catch-all for identifying less common user-agents. Can be noisy.
# else
# platform = "Windows Other";
         }
     else if ( /Mac OS X/ in value )
         {
     if ( /Mac OS X 10_9/ in value || /Mac OS X 10.9/ in value )
             platform = "Mac OS X 10.9";
         else if ( /Mac OS X 10_8/ in value || /Mac OS X 10.8/ in value )
             platform = "Mac OS X 10.8";
         else if ( /Mac OS X 10_7/ in value || /Mac OS X 10.7/ in value )
             platform = "Mac OS X 10.7";
         else if ( /Mac OS X 10_6/ in value || /Mac OS X 10.6/ in value )
             platform = "Mac OS X 10.6";
         else if ( /Mac OS X 10_5/ in value || /Mac OS X 10.5/ in value )
             platform = "Mac OS X 10.5";
         else if ( /Mac OS X 10_4/ in value || /Mac OS X 10.4/ in value )
             platform = "Mac OS X 10.4";
# Catch-all for identifying less common user-agents. Can be noisy.
# else
# platform = "Mac OS X Other";
         }
     else if ( /Linux/ in value )
         platform = "Linux";

# Check to see if IP+OS combo already logged and if not log it and add it to the list of tracked combos.

     local saw = cat(c$id$orig_h,platform); #There is probably a less ugly way to do this than cat, but it seems to work
     if ( platform != "Unknown OS" && saw !in seen_browser_platforms )
         {
         local rec: BrowserPlatform::Info = [$ts=network_time(), $uid=c$uid, $host=c$id$orig_h, $platform_token=platform, $unparsed_version=value];
         Log::write(BrowserPlatform::LOG, rec);
         add seen_browser_platforms[saw];
         }
}

======================== End Script ========================

Gary,

This looks very nice. I’m curious if you had any more updates or improvements for this?

I haven't updated it from this point yet as I've been struggling with hooking into the existing software logging as well as having problems keeping track of state to prevent/reduce duplicate log entries. There were also some performance concerns raised so I've been hesitant to post any in progress work that might inadvertently cause someone else grief.

My observations after running the script continuously for the last month is that it probably needs the ability to exclude specific subnets. An example might be wireless networks that may have high client IP churn, short DHCP lease times, are more likely to have mobile devices with apps that have ugly user-agents, and just generally likely to provide unreliable data.

Regards,
Gary