State of p0f support

Looking for some input here.

Zeek has provided support for passive OS fingerprinting for a long
time through p0f. However, we are using using a very outdated version
of the p0f engine, and the signature set is likewise stale (last
update from 2011!).

Unfortunately p0f has changed quite a bit in meantime, so that it's
not easy to upgrade. While we'd certainly be happy to do that if
anybody wanted to work on it, for now we are considering to remove the
old engine that's currently shipping with Zeek because it doesn't seem
to provide much value anymore.

Please chime in if that would be a problem for you. Is anybody still
relying on the p0f support in Zeek as it is today?

Thanks,

Robin

There is so much data in various logs, like software.log, http.log, SSL, DNS, known_*, x509 and even in the conn.log that recognizing the OS is most of the time trivial. I would rather invest into correlation and build a scoring engine that logs a verdict “based on A, B and C I think this is a Windows 10”

(Returning this to the non-digest thread)

I wrote https://docs.zeek.org/en/stable/scripts/policy/frameworks/software/windows-version-detection.bro.html specifically because p0f wasn’t doing a good job of finding XP hosts.

I think the best approach is using application-layer data, as Michal suggested, as well as TCP fingerprinting. If there’s data on the wire that provides operational use, we shouldn’t just be ignoring it. There was a p0f rewrite called p0f v3 (http://lcamtuf.coredump.cx/p0f3/#) which last had a release in 2016. There’s also a tool called PRADS: https://github.com/gamelinux/prads

These tools all rely on low-level TCP semantics, basically the same data in the SYN_packet record[1] . What I would want is some mechanism to expose that in script-land, so I can do whatever makes sense in my environment: Run them past p0f signatures, add a field to conn.log, raise a notice on some odd combination that only Metasploit uses.

This data falls in a weird grey area in Zeek: it gets parsed, but is essentially unavailable in script-land because it can only be accessed through events that we’ve always been told are too expensive to handle in production (and rightly so).

This discussion comes at an opportune time, with the recent SACK vulnerability (https://access.redhat.com/security/vulnerabilities/tcpsack).

Ultimately, I’m not sure what the right model looks like. Adding new events that only are generated once isn’t the right answer either, as the SACK vulnerability requires a sequence of malicious packets. However, I think there’s a better solution out there than the current behavior.

–Vlad

[1] - <https://docs.zeek.org/en/stable/scripts/base/init-bare.bro.html#type-SYN_packet>

It seems to me that a fairly lightweight approach might be a per-connection event returning the factors of interest, since according to the p03 v3 README:

For TCP/IP, the tool fingerprints the client-originating SYN packet and the
first SYN+ACK response from the server, paying attention to factors such as the
ordering of TCP options, the relation between maximum segment size and window
size, the progression of TCP timestamps, and the state of about a dozen possible
implementation quirks (e.g. non-zero values in "must be zero" fields).

(from [http://lcamtuf.coredump.cx/p0f3/README](http://lcamtuf.coredump.cx/p0f3/README) - which also documents the actual factors that are observed).

As far as the SACK vulnerability, the last paragraph of the document indicates that the MSS is set to 48 to trigger the vulnerability, so reporting MSS might give a leg up on that, as well.

Hi Robin et all,

I would like to underline the importance of having a way to identify machines based on traffic flow and connection behavior. It’s clear that Zeek works well at a connection level, so it’s important to have some way to determine what systems are connecting to each other. Otherwise, what’s the point?

However, relying on fingerprints constructed by a library from 2011 that has not been updated since 2014 is not a great strategy. Also because it seems to me that the methodology used to generate these signatures was submission via email.

If I were in your shoes, I’d remove it.

And while I am commenting, why not consider a strategy to use zeek to generate fingerprints based on a pcap and some standard format to define hosts in a known controlled network (like a Configuration Management DB). This way at startup, the user can choose to apply a signature DB, that they can modify, that annotates their systems.

To wrap this up: What I think I'm hearing is that there's certainly
opportunity for a much improved/modern version of such functionality,
but it also sounds like that nobody's is relying on that old
functionality anymore (not a surprise). So we'll go ahead and remove
the current p0f code in Zeek.

Robin