Error when extracting URLs from email traffic

Hello Bro list,

I am attempting to write a script to extract URLs from SMTP. The script below is my starting point and it seems to work pretty well except that I am getting an error occasionally on some of the connections. The end goal (and I am a ways away atm) is to eventually get the URLs fed into the intel framework to attempt to alert on potential spearphishing.

Script:
@load base/frameworks/intel
@load base/utils/urls
@load ./where-locations.bro

event file_over_new_connection(f: fa_file, c: connection, is_orig: bool)
{
const mail_servers = { 192.168.50.72, 192.168.50.75 };

if ( c$id$orig_h !in mail_servers )
return;
if ( ! f?$conns )
return;
if ( f$source != “SMTP” )
return;

if ( ! f?$bof_buffer )
return;

for ( cid in f$conns )
{
local urls = find_all_urls_without_scheme(f$bof_buffer);
for ( url in urls )
{

print fmt(url);

}
}
}

The error is:
1404827445.346519 error in ./extract_urls_in_email_v1.bro, line 38: too few arguments for format (fmt(url) and

Does anyone know what might be causing this error?

Best Regards,
Derek

fmt() is like sprintf. you just want

    print url;

I think your error might be a simple one ... fmt() should use this
syntx: print fmt("%s",url);

-Josh

This is actually a script that has been written already. Check out policy/frameworks/intel/seen/smtp-url-extraction.bro. You’ll need to modify this script a little, but it has most of what you need.

If you just want to see if certain URLs are in emails, then you could actually already do that with the Intelligence Framework, without having to write your own script.

Actually, nevermind. fmt() will accept either version if you are
passing data into it. I copied your script and removed some elements
(const mail_servers, logic checks for SMTP and mail_servers) and it
processed correctly.

-Josh

Cool thanks all!

If you just want to see if certain URLs are in emails, then you could actually already do that with the Intelligence Framework, without having to write your own script.

That’s essentially what I want to do, I just want to generate the intel “on-the-fly” by taking out URLs from emails, white listing out common legit domains seen in our environment, feeding the list into the intel framework then writing a notice or a specific log file of potential spearphish when the URL is found in http traffic. Basically an attempt to alert on a clicker in a spearphish when we are not already aware that the Domain/URL is bad.

It could turn out that the volume of clickers even after whitelisting makes it not feasible for analysis but I thought it would be a good exercise to go down the road.

That works correctly most of the time, but it has the same problem that printf does:

    jazoff@air /tmp $ cat f.bro
    event bro_init() {
        local s = "hello %s world";
        print fmt(s);
    }

    jazoff@air /tmp $ bro f.bro
    error in ./f.bro, line 3 and ./f.bro, line 2: too few arguments for format (fmt(s) and hello %s world)

Good point, thanks Justin.

-Josh