Keyword matching in documents

Hi,

Is it possible for Bro to perform keyword matching on document files (such as text, open office, pdf etc.) and generate notices when the keyword is found.

Regards

Vikram Basu

I have made a sample Bro script after looking into the ssn-exposure and credit-card-exposure scripts. But I am getting error

“{“ts”:1505214009.989112,“level”:“Reporter::ERROR”,“message”:“string without NUL terminator: \u0022CONFIDENTIAL\u005cx0a\u0022”,“location”:""}” in reporter.log

How would I fix this ?

Regards

Vikram

Here is the script

#Keyword Matching Basic script

@load base/frameworks/notice

module KeywordMatch;

export {

Keyword Matching Log ID definition

redef enum Log::ID += { LOG };

redef enum Notice::Type += {

Matched

};

type Info: record {

ts: time &log;

uid: string &log;

id: conn_id &log;

word: string &log &optional;

data: string &log;

};

The Keyword that is being matched

const keyword = “CONFIDENTIAL” &redef;

}

event bro_init() &priority=5

{

Log::create_stream(KeywordMatch::LOG, [$columns=Info]);

}

function check_keyword(c: connection, data: string): bool

{

local it_matched = F;

if ( keyword in data )

{

it_matched = T;

}

if ( it_matched )

{

local log: Info = [$ts=network_time(),

$uid=c$uid, $id=c$id,

$word=keyword, $data=data];

Log::write(KeywordMatch::LOG, log);

NOTICE([$note=Matched,$conn=c,

$msg=fmt(“Keyword Matched %s”,keyword),

$sub=data,$identifier=cat(c$id$orig_h,c$id$resp_h)]);

return T;

}

return F;

}

event KeywordMatch::stream_data(f: fa_file, data: string)

{

local c: connection;

for ( id in f$conns )

{

c = f$conns[id];

break;

}

if ( c$start_time > network_time()-20secs )

check_keyword(c, data);

}

event file_new (f: fa_file)

{

if ( f$source ==“HTTP” )

{

Files::add_analyzer(f, Files::ANALYZER_DATA_EVENT,

[$stream_event=KeywordMatch::stream_data]);

}

}

Hi Vikram,

it turns out that you found a small bug (or at least gotcha) in Bro. Bro
has a few functions that do not deal very well with binary data. "in"
happens to be one of them.

I wrote a small patch to Bro that should fix this problems. It is in the
branch topic/johanna/in-binary. If you want to manually apply it, you only
need the single line change in Expr.cc:
https://github.com/bro/bro/compare/topic/johanna/in-binary

I also created a merge request for this at
https://bro-tracker.atlassian.net/browse/BIT-1845 if you are interested in
tracking this.

Johanna