Better Handling of User Agents in Software Framework

Vlad_Grigorescu1 · December 14, 2015, 3:51pm

I’m not thrilled with those user agents are being handled right now, and I’m curious to get some thoughts. Take, for example the Safari user-agent string of:

Safari/11601.3.9 CFNetwork/760.2.6 Darwin/15.2.0 (x86_64)

Right now, this gets parsed as:

name=Safari,
version=[
major=11601,
minor=3,
minor2=9,
minor3=,
addl=CFNetwork/760
],
unparsed_version=Safari/11601.3.9 CFNetwork/760.2.6 Darwin/15.2.0 (x86_64)

RFC 7231 says:

“The User-Agent field-value consists of one or more product identifiers, each followed by zero or more comments (Section 3.2 of [RFC7230]), which together identify the user agent software and its significant subproducts.”

What I would like to see is this user-agent generate three separate entries in software.log:

Safari 11601.3.9
CFNetwork 760.2.6
Darwin 15.2.0 (x86_64)

I think this is a better representation of the software that’s actually running on the machine (they’re running this version of Safari, this version of the CFNetwork library, and this version of the Darwin kernel).

Taking this to the server-side, given:

Apache/2.2.25 (Unix) mod_ssl/2.2.25 OpenSSL/0.9.8j-fips mod_auth_kerb/5.4 PHP/5.4.13

I’d like to see:

Apache/2.2.25 (Unix)
mod_ssl/2.2.25
OpenSSL/0.9.8j-fips
mod_auth_kerb/5.4
PHP/5.4.13

All of those are pieces of software running on that system, and maintaining it as a user-agent is a construct from HTTP, which I don’t feel belongs in the software.log. Another warning sign that this is an area that could use some work is the comment above Software::parse:

Don’t even try to understand this now, just make sure the tests are working.

Curious to hear thoughts on this.

–Vlad

Seth_Hall3 · December 14, 2015, 9:24pm

I think your proposal sounds reasonable. I’d go ahead and implement it and see what you think about overload situations since I can easily see the amount of software being tracked quickly get out of hand with that. After it’s implemented, get it running on several networks that are willing to run it and see if it causes problems for them.

This could be a good time to also implement some better handling around software tracking to avoid obvious DoS issues by doing traffic that causes lots of state being tracked.

.Seth

Vlad_Grigorescu1 · December 15, 2015, 3:18pm

The other question I was wondering about is: should this be a BIF? Software::parse is a rather lengthy function, with a lot of string manipulation, which gets called rather frequently. I suspect there’d be some performance improvements for implementing this directly as a BIF.

Seth_Hall3 · December 15, 2015, 3:23pm

Ah, possibly. It probably would make sense to measure that first somehow.

.Seth

robin · December 15, 2015, 4:23pm

Agree. Would be good to keep it in script-land unless it indeed has a
substantial impact (and if so, maybe there are some optimizations to
short-cut common cases or so).

Robin

Seth_Hall3 · December 15, 2015, 4:39pm

Yep, something along these lines was/is implemented in the core, but that just made it difficult to understand and make changes to.

.Seth

Vern · December 16, 2015, 4:14am

... (and if so, maybe there are some optimizations to
short-cut common cases or so)

(... and/or: a few key BiFs to add that don't bite off the whole task
but accelerate some particular processing)

Topic		Replies	Views
user agent string data enrichment Zeek	2	170	May 6, 2022
User Agent parser in bro Zeek	5	183	May 6, 2022
Script for malicious User agent list Development development	2	80	February 14, 2025
Bug Report - Software Framework - Flash Player Version Parsing Zeek	2	107	May 6, 2022
Sanity check - Grabbing platform tokens from browser user agents (was p0f) Zeek	7	69	May 6, 2022

Better Handling of User Agents in Software Framework

Don’t even try to understand this now, just make sure the tests are working.

Related topics