Better Handling of User Agents in Software Framework

I’m not thrilled with those user agents are being handled right now, and I’m curious to get some thoughts. Take, for example the Safari user-agent string of:

Safari/11601.3.9 CFNetwork/760.2.6 Darwin/15.2.0 (x86_64)

Right now, this gets parsed as:

name=Safari,
version=[
major=11601,
minor=3,
minor2=9,
minor3=,
addl=CFNetwork/760
],
unparsed_version=Safari/11601.3.9 CFNetwork/760.2.6 Darwin/15.2.0 (x86_64)

RFC 7231 says:

“The User-Agent field-value consists of one or more product identifiers, each followed by zero or more comments (Section 3.2 of [RFC7230]), which together identify the user agent software and its significant subproducts.”

What I would like to see is this user-agent generate three separate entries in software.log:

Safari 11601.3.9
CFNetwork 760.2.6
Darwin 15.2.0 (x86_64)

I think this is a better representation of the software that’s actually running on the machine (they’re running this version of Safari, this version of the CFNetwork library, and this version of the Darwin kernel).

Taking this to the server-side, given:

Apache/2.2.25 (Unix) mod_ssl/2.2.25 OpenSSL/0.9.8j-fips mod_auth_kerb/5.4 PHP/5.4.13

I’d like to see:

Apache/2.2.25 (Unix)
mod_ssl/2.2.25
OpenSSL/0.9.8j-fips
mod_auth_kerb/5.4
PHP/5.4.13

All of those are pieces of software running on that system, and maintaining it as a user-agent is a construct from HTTP, which I don’t feel belongs in the software.log. Another warning sign that this is an area that could use some work is the comment above Software::parse:

Don’t even try to understand this now, just make sure the tests are working.

Curious to hear thoughts on this.

–Vlad

I think your proposal sounds reasonable. I’d go ahead and implement it and see what you think about overload situations since I can easily see the amount of software being tracked quickly get out of hand with that. After it’s implemented, get it running on several networks that are willing to run it and see if it causes problems for them. :slight_smile:

This could be a good time to also implement some better handling around software tracking to avoid obvious DoS issues by doing traffic that causes lots of state being tracked.

  .Seth

The other question I was wondering about is: should this be a BIF? Software::parse is a rather lengthy function, with a lot of string manipulation, which gets called rather frequently. I suspect there’d be some performance improvements for implementing this directly as a BIF.

Ah, possibly. It probably would make sense to measure that first somehow.

  .Seth

Agree. Would be good to keep it in script-land unless it indeed has a
substantial impact (and if so, maybe there are some optimizations to
short-cut common cases or so).

Robin

Yep, something along these lines was/is implemented in the core, but that just made it difficult to understand and make changes to.

  .Seth

... (and if so, maybe there are some optimizations to
short-cut common cases or so)

(... and/or: a few key BiFs to add that don't bite off the whole task
but accelerate some particular processing)