Bro --> Google Safe Browsing API?

Hello all,

Has anybody developed a script to have Bro query the Google Safe Browsing API?


I was actually looking at this yesterday, however, because of the way that Google implements the API, this is non-trivial and would not really be something that I would feel comfortable using the current active http function(s) for. Basically, the API has requirements that you implement their rate limiting at the client level... so under certain conditions, Google could tell you 'Do not query again for another hour' and you're supposed to play along with their request.

Rumor has it that someone is working on the active http module, so, I haven't looked into doing any of that myself. I'd love to take on Safe Browsing integration though. Maybe I'll just look into making Safe Browsing its own full blown plugin? Querying safe-browsing for at least the links that I parse from emails would be extremely desirable from my perspective.

If you want to talk about it, feel free to ping me on IRC, since I'm always logged in during the day anyway, or, we can just keep the discussion on the mailing list so everyone can feel free to chime in.

Hi Stephen,

Does the rate limiting apply to the new "API v3"?

"The Safe Browsing API is an experimental API that enables
applications to download an encrypted table for local, client-side
lookups of URLs that you would like to check. In 2014, we published a
new version (v3) of the Safe Browsing API, which adds features and
efficiency improvements to the previous v2. The Safe Browsing API is
used by several browsers, including Google Chrome and Mozilla Firefox.
You can start using the Safe Browsing API v3 now."

I believe it still does:

For bulk lookups you need to maintain a local copy of the chunks which are
basically black/white lists of hash prefixes of the canonicalised URL
(Bloom filter). This is the same data Chrome/Firefox use for safe browsing.

There is a reference implementation available which maintains a local
copy. Then your script just needs to hash the URL (or one of a number of
different permutations) and check the prefix if it is present in both
lists. If it is present in the blacklist then followup with a query to
Google for the full hash and compare.

I wrote some shoddy code a while ago against v2 of this spec to maintain a
local copy of the partial hashes within postgres.


Yes, see:

Essentially the issue is that there needs to be a piece between Bro and the
API, which is handling downloading/updating the prefixsets, and ensuring
that the request frequency is observed. It'd be interesting, but Bro
integration with v3 is a difficult task.


Yeah, I agree. Google has been moving the service toward needing more frequent touches with them to get an accurate picture of matches against their list. This works perfectly fine for desktops that might see a maximum of 1000 URLs being requested per hour or something, but on a Bro cluster, there could be thousands per second.

I had an implementation of the v1 of that API running with Bro years ago, but even that didn’t work well enough that I could ever distribute it.