SQL usage in Bro

Hi Brolist & especially Seth:

I've created a Bro policy called 'stomper.bro' which matches http requests
against a blacklist (and acts appropriately, issuing temporary host-pair blocks
to prevent access to forbidden URLs), which is loaded when bro starts up - the
data structure is sufficiently crude that it loads ~ 700k urls in 5 seconds, but
is inefficient in usage, although I've thought about amortizing the conversion
of the simple structure into a more efficient one during the bro run (the first
time a hit is made to a particular domain, convert it to a more efficient
representation on the fly).

However, I've thought about databasizing this, either via a broccoli enabled
'oracle' program, fed URLs and returning bro events signifying actions to take,
or using the database extensions Seth has added to the bro code to access a
persistent database instead.

Does anyone have any information on performance metrics of the postgresql
bindings for bro, both with the sql server on localhost, and being on a remote
box (might be accessed by multiple bros)? I would be interested particularly in
the rate of requests that can be handled and answered, and the latency
(obviously, doing realtime blocking of forbidden domains requires
near-instantaneous response).

Thanks in advance

However, I've thought about databasizing this, either via a broccoli enabled
'oracle' program, fed URLs and returning bro events signifying actions to take,
or using the database extensions Seth has added to the bro code to access a
persistent database instead.

Heh. I *wish* the database extension was finished. :slight_smile: It's close, but it doesn't quite work yet.

Does anyone have any information on performance metrics of the postgresql
bindings for bro, both with the sql server on localhost, and being on a remote
box (might be accessed by multiple bros)?

The way I've been implementing it is that performance of the database wouldn't have much of an impact on anything. It's currently implemented to behave asynchronously where a query is executed and as the data becomes available it is inserted into a hidden internal copy of the variable. Once the query is done returning data, the hidden variable is assigned overtop of the original variable with all of the potentially new data. The timers then continue on and do any other database backed variables that may need to be updated with the same process.

It seems that you may be confused about how it works though. What I'm implementing is just for pulling data into variables on a interval. Here's an example.....

global bad_urls: set[string] &query="SELECT url FROM bad_urls" &query_interval=1hour;

That will place the elements from the single field returned from the query into the string set every hour (replacing the previous data). It's not the end-all solution that people are looking for I think, but it's part of it for sure.

  .Seth

Interesting.. I was thinking about doing something like this just using broccoli..

start with a plain..

    global bad_urls: set[string];

add new events similar to request_id...

    event set_add(tbl: string, key: string);
    event set_remove(tbl: string, key: string);

    event table_add(tbl: string, key: string, val: string);
    event table_remove(tbl: string, key: string);

then you would have code that uses broccoli that selects the rows from the DB and fires off events like

    set_add("bad_urls", "http://example.com/")

This way you could use any database, or even just a flatfile for storing bad
urls.. all the logic for getting the actual records would be implemented in
python(or C or Ruby...), the only changes to bro would be the new set and table
events.

Seth Hall wrote:

However, I've thought about databasizing this, either via a broccoli
enabled
'oracle' program, fed URLs and returning bro events signifying actions
to take,
or using the database extensions Seth has added to the bro code to
access a
persistent database instead.

Heh. I *wish* the database extension was finished. :slight_smile: It's close, but
it doesn't quite work yet.

Does anyone have any information on performance metrics of the postgresql
bindings for bro, both with the sql server on localhost, and being on
a remote
box (might be accessed by multiple bros)?

The way I've been implementing it is that performance of the database
wouldn't have much of an impact on anything. It's currently implemented
to behave asynchronously where a query is executed and as the data
becomes available it is inserted into a hidden internal copy of the
variable. Once the query is done returning data, the hidden variable is
assigned overtop of the original variable with all of the potentially
new data. The timers then continue on and do any other database backed
variables that may need to be updated with the same process.

It seems that you may be confused about how it works though. What I'm
implementing is just for pulling data into variables on a interval.
Here's an example.....

global bad_urls: set[string] &query="SELECT url FROM bad_urls"
&query_interval=1hour;

That will place the elements from the single field returned from the
query into the string set every hour (replacing the previous data).
It's not the end-all solution that people are looking for I think, but
it's part of it for sure.

.Seth

Well, thats cool in a different way than I envisioned - I assumed you could
issue a query and an event would be raised when the results were available.
This is closer the the idea of databased-backed persistent variables, although
on a timed basis. Is there some way that an immediate refresh can be requested
by bro, e.g. when the backing database changes, sending an event to bro which
can then trigger a refresh on the dataset?

I'm thinking the paradigm you are using may work for my application, with a few
tweaks....

Thanks in advance.

I love that this stuff is finally being discussed. :slight_smile:

Is there some way that an immediate refresh can be requested
by bro, e.g. when the backing database changes, sending an event to bro which
can then trigger a refresh on the dataset?

I think this could be accommodated by calling a function which would kick off the update immediately. You could wrap the function inside an event handler and then you'd have something that broctl could call.

I'm thinking the paradigm you are using may work for my application, with a few
tweaks....

The only thing I don't really how to handle the opposite direction. I can't come up with a clean syntax for pushing back into a database. It would be great if you could do...
add bad_urls["http://www.microsoft.com/"];
... and the URL would get pushed into the database. You could use my bro-dblogger project to do it, but you'd have to do the "add" like above in addition to...
event db_log("bad_urls", [$url="http://www.microsoft.com/"];

It's kind of messy, but maybe it's not as bad as I'm thinking.

   .Seth

Thanks Seth:

Seth Hall wrote:

I love that this stuff is finally being discussed. :slight_smile:

Is there some way that an immediate refresh can be requested
by bro, e.g. when the backing database changes, sending an event to
bro which
can then trigger a refresh on the dataset?

I think this could be accommodated by calling a function which would
kick off the update immediately. You could wrap the function inside an
event handler and then you'd have something that broctl could call.

The event handling part is a piece of cake, but I'm unclear on how to 'kick off
the update immediately', which I presume is part of your patch. Do you have
further data on that piece of the puzzle?

I'm thinking the paradigm you are using may work for my application,
with a few
tweaks....

The only thing I don't really how to handle the opposite direction. I
can't come up with a clean syntax for pushing back into a database. It
would be great if you could do...
add bad_urls["http://www.microsoft.com/"];
... and the URL would get pushed into the database. You could use my
bro-dblogger project to do it, but you'd have to do the "add" like above
in addition to...
event db_log("bad_urls", [$url="http://www.microsoft.com/"];

It's kind of messy, but maybe it's not as bad as I'm thinking.

  .Seth

For my application, it isn't necessarily essential to write back to the
database, although it would be nice to have statistics columns that could be
updated as hits occur - could do that via a brocolli enabled external database
helper app.

Off the top of my head, tho', as far as pushing back to the database, why not
the same syntax as you are using, with an update sql command, and interval along
with an invisible 'modified' flag per row, so that only rows which were actually
modified were written back??? Still not a true database backed table, but
closer… (now if bro supported OOP…, aw never mind…)

The event handling part is a piece of cake, but I'm unclear on how to 'kick off
the update immediately', which I presume is part of your patch. Do you have
further data on that piece of the puzzle?

My thought would be that you could do something like...

> broctl db_update bad_urls

That would throw an event named db_update to one or all of the hosts (still haven't decided on this yet) which would be handled like this (theoretically)...

event db_update(var)
  {
  force_db_update(var);
  }

The force_db_update function could be a built-in-function that would lookup the variable named by the value of the string "var" and force it do update from the database.

could do that via a brocolli enabled external database
helper app.

Like bro_dblogger maybe?
   http://github.com/sethhall/bro-dblogger

The syntax I gave in my previous email works for the dblogger project.

Off the top of my head, tho', as far as pushing back to the database, why not
the same syntax as you are using, with an update sql command, and interval along
with an invisible 'modified' flag per row, so that only rows which were actually
modified were written back??? Still not a true database backed table, but
closer… (now if bro supported OOP…, aw never mind…)

Maybe if there was an attribute to attach to tables and sets to indicate that you'd like to throw an event when an item is added? Off the top of my head now...

function new_bad_url(val: string)
  {
  event db_log("bad_urls", [$url=val]);
  }
global bad_urls: set[string] &add_func=new_bad_url;

Alternatively, that could be written as:
global bad_urls: set[string] &add_func=function(val: string) { event db_log("bad_urls", [$url=val]); };

That should work and I don't *think* it would be very difficult to write the &add_func attribute. And it fits right alongside the existing &expire_func attribute. :slight_smile:

   .Seth

Seth Hall wrote:
<snip>

My thought would be that you could do something like...

> broctl db_update bad_urls

That would throw an event named db_update to one or all of the hosts
(still haven't decided on this yet) which would be handled like this
(theoretically)...

event db_update(var)
  {
  force_db_update(var);
  }

The force_db_update function could be a built-in-function that would
lookup the variable named by the value of the string "var" and force
it do update from the database.

<snip>

Ok, I presume the force_db_update() function is a yet-to-be-created function.
The same practical effect would seem to be accrued if there was a way to access
the timer, and force an immediate expiration, or if the syntax of the
declaration was changed, e.g. your example:

global bad_urls: set[string] &query="SELECT url FROM bad_urls"
&query_interval=1hour;

perhaps could be augmented with an event, ala

global bad_urls: set[string] &query="SELECT url FROM bad_urls"
&query_interval=1hour &query_event=update_badurls;

which would then allow script-level access to the updating process.

Perhaps we can work together on this?

the top of my head now...

function new_bad_url(val: string)
  {
  event db_log("bad_urls", [$url=val]);
  }
global bad_urls: set[string] &add_func=new_bad_url;

Alternatively, that could be written as:
global bad_urls: set[string] &add_func=function(val: string) { event
db_log("bad_urls", [$url=val]); };

Yeah, that was just the approach I was thinking of too while catching
up on this thread. (Well, maybe tweaked slightly so that the &add_func
function returns the value to *actually* put in the set, if any.)

    Vern

perhaps could be augmented with an event, ala

global bad_urls: set[string] &query="SELECT url FROM bad_urls"
&query_interval=1hour &query_event=update_badurls;

which would then allow script-level access to the updating process.

In your example, when would the event attached to the &query_event attribute be raised and what arguments would be passed into it?

Perhaps we can work together on this?

That would be great. It sounds like you're working on the sort of stuff I've been doing for a while where you're trying to take external intelligence and use it to it's full extent within Bro. I'm working on an intelligence framework for integrating that sort of intelligence now, would you be interested in reframing our discussion more in that light since it appears what both of our goals are?

   .Seth

Ah, I'm glad you mentioned this. I would really like to see &add_func work more similarly to &expire_func. The function given to &add_func would return a bool to allow or prevent an item from being added to the table/set. It would make it so that a script developer wouldn't have to anticipate all of the situations where someone using their script would want to exclude data from a table or set. The table or set would just have to be declared with &redef so that a user could add their own &add_func.

Is there a better example for returning the value to be put into the set? I can't think of any situations when I'd use that.

   .Seth

Is there a better example for returning the value to be put into the
set? I can't think of any situations when I'd use that.

Me neither. But perhaps your version could be add_func_pred, just
so we preserve the possibility?

    Vern