Error when extracting URLs from email traffic

Derek_Banks · July 8, 2014, 4:43pm

Hello Bro list,

I am attempting to write a script to extract URLs from SMTP. The script below is my starting point and it seems to work pretty well except that I am getting an error occasionally on some of the connections. The end goal (and I am a ways away atm) is to eventually get the URLs fed into the intel framework to attempt to alert on potential spearphishing.

Script:
@load base/frameworks/intel
@load base/utils/urls
@load ./where-locations.bro

event file_over_new_connection(f: fa_file, c: connection, is_orig: bool)
{
const mail_servers = { 192.168.50.72, 192.168.50.75 };

if ( c$id$orig_h !in mail_servers )
return;
if ( ! f?$conns )
return;
if ( f$source != “SMTP” )
return;

if ( ! f?$bof_buffer )
return;

for ( cid in f$conns )
{
local urls = find_all_urls_without_scheme(f$bof_buffer);
for ( url in urls )
{

print fmt(url);

}
}
}

The error is:
1404827445.346519 error in ./extract_urls_in_email_v1.bro, line 38: too few arguments for format (fmt(url) and

Does anyone know what might be causing this error?

Best Regards,
Derek

Azoff_Justin · July 8, 2014, 4:51pm

fmt() is like sprintf. you just want

print url;

Josh_Liburdi · July 8, 2014, 4:52pm

I think your error might be a simple one ... fmt() should use this
syntx: print fmt("%s",url);

-Josh

Hosom_Stephen_M · July 8, 2014, 4:57pm

This is actually a script that has been written already. Check out policy/frameworks/intel/seen/smtp-url-extraction.bro. You’ll need to modify this script a little, but it has most of what you need.

If you just want to see if certain URLs are in emails, then you could actually already do that with the Intelligence Framework, without having to write your own script.

Josh_Liburdi · July 8, 2014, 5:02pm

Actually, nevermind. fmt() will accept either version if you are
passing data into it. I copied your script and removed some elements
(const mail_servers, logic checks for SMTP and mail_servers) and it
processed correctly.

-Josh

Derek_Banks · July 8, 2014, 5:24pm

Cool thanks all!

If you just want to see if certain URLs are in emails, then you could actually already do that with the Intelligence Framework, without having to write your own script.

That’s essentially what I want to do, I just want to generate the intel “on-the-fly” by taking out URLs from emails, white listing out common legit domains seen in our environment, feeding the list into the intel framework then writing a notice or a specific log file of potential spearphish when the URL is found in http traffic. Basically an attempt to alert on a clicker in a spearphish when we are not already aware that the Domain/URL is bad.

It could turn out that the volume of clickers even after whitelisting makes it not feasible for analysis but I thought it would be a good exercise to go down the road.

Azoff_Justin · July 8, 2014, 5:28pm

That works correctly most of the time, but it has the same problem that printf does:

    jazoff@air /tmp $ cat f.bro
    event bro_init() {
        local s = "hello %s world";
        print fmt(s);
    }

jazoff@air /tmp $ bro f.bro
error in ./f.bro, line 3 and ./f.bro, line 2: too few arguments for format (fmt(s) and hello %s world)

Josh_Liburdi · July 8, 2014, 7:08pm

Good point, thanks Justin.

-Josh

Topic		Replies	Views
Quick smtp-url-extraction question Zeek	13	117	May 6, 2022
smtp url extraction logs Zeek	2	123	May 6, 2022
Some issues with find_all_urls() function Zeek	3	138	May 6, 2022
smtp-url-extraction Zeek	1	135	May 6, 2022
extract URLs from emails, web pages, and documents Zeek	1	86	May 6, 2022

Error when extracting URLs from email traffic

Related topics