Autodoc: how to link to another script?

Hi,

what's the best way to link from one script to another? I know I can link to identifiers and types, but linking to a full script would be nice.....

Or maybe a more general question for best practices: In my tunnel branch I've added an optional column to conn.log showing to tunnel information. In the autodoc for this column it would be nice to link to the redef'able consts that control tunnel decapsulation. My idea was to link to the script that redef's these const to enable the tunnel decapsulation. But it also be nice (or better) to be able to link to where the consts directly as a "block" instead of linking to them individually. They are in bro.init in a module called Tunnel.

cu
gregor

Where have you done this extension? One of the rules of thumb I've been trying to follow is that frameworks don't depend on protocols.

  .Seth

what's the best way to link from one script to another?

The :doc: role: http://sphinx.pocoo.org/markup/inline.html

That can be embedded in any of the doc-style comments:

e.g. This renders a link (in the summary section) to the documentation for conn's main.bro:

##! :doc:`/scripts/base/protocols/conn/main`

And this would render a link to an index of all the script docs in that base/protocols/conn directory:

##! :doc:`/scripts/base/protocols/conn/index`

- Jon

The connection record type has an optional field tunnel_parent. I actually added that in bro.init, since I forgot that we can now extend records with +=.

from bro.init:

----8<------
module Tunnel;
export {
     ## Records the identity of a the parent of a tunneled connection.
     type parent_t: record {
         ## The 4-tuple of the tunnel "connection".
         cid: conn_id;
         ## The type of tunnel.
         tunnel_type: tunneltype_t;
     } &log;
} # end export
module GLOBAL;

type connection: record {
     [the other fields]
     tunnel_parent: Tunnel::parent_t &optional;
};
----8<------

I could actually add this tunnel_parent field in my tunnel.bro script that logs child-conn-id <--> parent_t

However, today I thought it might be handy to at least add the tunneltype to conn.log as and indicator whether a particular connection was tunneled or not. That's why I added a "tunnel_type" field to the Conn::Info record (in conn.bro).
So, while I directly modified the conn/main.bro script, it doesn't depend on the code in the framework part. Not sure what you mean by
"depend" though.

I guess I could also try to extend Conn::Info in tunnel.bro, right. However, if multiple scripts to this, then the order of columns in conn.log would depend on the order in which these scripts are loaded..... Thinking of this some more, I think that the http-scripts already do this, so the order of columns isn't well-defined anyways, right? And if somebody writes a script to parse a bro log file, then one has to check the header, right?

cu
gregor

The connection record type has an optional field tunnel_parent. I actually added that in bro.init, since I forgot that we can now extend records with +=.

Haha, get that out of there! :stuck_out_tongue:

   type parent_t: record {

You may want to check out: http://www.bro-ids.org/development/script-conventions.html

It certainly makes the code easier to read if everyone names things consistently.

I could actually add this tunnel_parent field in my tunnel.bro script that logs child-conn-id <--> parent_t

I'm actually starting to wonder if tunnel.bro should go in base/protocols/conn/. That actually seems like the appropriate place since it has to do with connections. It's just using extra core support to find and log these tunnels. I would still extend the Conn::Info type in the tunnels.bro script though. What do you think?

However, today I thought it might be handy to at least add the tunneltype to conn.log as and indicator whether a particular connection was tunneled or not. That's why I added a "tunnel_type" field to the Conn::Info record (in conn.bro).
So, while I directly modified the conn/main.bro script, it doesn't depend on the code in the framework part. Not sure what you mean by
"depend" though.

Nevermind. I was sort of still thinking along the lines of a tunnel framework or something. I just don't want to see frameworks @load-ing anything out of protocols/. If integration needs to happen I think it should happen the other way (the protocol loading the framework and pulling data from it or something). It irrelevant if you put the tunnels.bro file in base/protocols/conn/.

However, if multiple scripts to this, then the order of columns in conn.log would depend on the order in which these scripts are loaded..... Thinking of this some more, I think that the http-scripts already do this, so the order of columns isn't well-defined anyways, right? And if somebody writes a script to parse a bro log file, then one has to check the header, right?

You've got it. I'm hoping to get everyone away from the notion of column numbers even. Once we get binary logging added it will really be inconsequential because you will essentially load a log (or logs) and request specifically named fields from the log since the binary log doesn't have a notion of column ordering anyway. For the ascii logs, looking at the headers certainly works though. In most cases if people just use the default loaded scripts as-is we should maintain pretty steady column ordering for most columns.

If you can tell, I'm tired of file format parsing ever getting in the way of doing actual analysis. :slight_smile:

  .Seth

You may want to check out: http://www.bro-ids.org/development/script-conventions.html

ooops. will do.

I could actually add this tunnel_parent field in my tunnel.bro script that logs child-conn-id<--> parent_t

I'm actually starting to wonder if tunnel.bro should go in base/protocols/conn/. That actually seems like the appropriate place since it has to do with connections. It's just using extra core support to find and log these tunnels. I would still extend the Conn::Info type in the tunnels.bro script though. What do you think?

sounds good to me. However, I wouldn't put it in base. I think the default should be to not decapsulate tunnels!

[snip]

You've got it. I'm hoping to get everyone away from the notion of column numbers even. Once we get binary logging added it will really be inconsequential because you will essentially load a log (or logs) and request specifically named fields from the log since the binary log doesn't have a notion of column ordering anyway. For the ascii logs, looking at the headers certainly works though.

Yeah. People using awk will hate that....
(But I'm using mostly python these days anyways)

In most cases if people just use the default loaded scripts as-is we should maintain pretty steady column ordering for most columns.

That's what's worrying me: people assuming a fixed ordering when writing analysis scripts. We should probably just mention this somewhere in the Getting Starting and From 1.5 to 2.0 HowTo.

cu
Gregor

sounds good to me. However, I wouldn't put it in base. I think the default should be to not decapsulate tunnels!

I agree. I think we should have a configuration variable to enable it, but the support for *how* it's actually accomplished and logged seems like something that should be in the base.

Yeah. People using awk will hate that....
(But I'm using mostly python these days anyways)

Actually they won't! I'm not sure how it will look in the end, but it's going to be something like this (Gilbert can give more and better detail):

  ds2txt -s $'\t' -f host,uri,referrer http.ds.* | awk -F $'\t' '{if ($1 == "www.google.com") print}'

You get to dynamically create your own column ordering which will stay consistent since you're defining it at search time. The awk use-case is one that I'm trying to make sure is really nice because I use awk for a lot of stuff too. :slight_smile:

That's what's worrying me: people assuming a fixed ordering when writing analysis scripts. We should probably just mention this somewhere in the Getting Starting and From 1.5 to 2.0 HowTo.

Good point.

  .Seth

Continuing this thought... outside of base/ (in policy/protocols/conn) it might make sense to do things that actually "detect" something. I consider non-obfuscated tunnel decapsulation very similarly to normal protocol analysis. The rule of thumb is that the scripts in base/ should only be doing protocol logging and state building which is exactly what it sounds like your tunnel.bro script is doing. :slight_smile:

  .Seth

Well it depends. The script does two--three things:

1) enable tunnel decapsulation by redef'ing the appropriate consts
2) create a tunnel.log file that logs all tunneled connections (c$id,
    c$uid) and the parent connection.
3) provide a single point were the tunnel stuff is documented (what it
    does, how to tune it, its limitations). (I love the new autodoc
    features!!)

(1) and (3) are kinda related. I always found it very hard to know and understand what all the 100's of redef'able consts in bro.init did. I think doing it this way is nice way of putting the documentation together and giving users and easy way to access the functionality (load the tunnel script, look at it's documentation for details)

We can probably split it up and put (1) in policy/ and (2) in base/. However, (2) only works if the connection_compressor is disabled (otherwise the identity of the tunnel is lost), so this makes it more problematic to put it in base (at least while the connection_compressor remains on by default)

I've attached the current version. Might be easier to just look at it than explaining it via email :wink:

cu
gregor

tunnel.bro (2.59 KB)

(1) and (3) are kinda related. I always found it very hard to know and understand what all the 100's of redef'able consts in bro.init did. I think doing it this way is nice way of putting the documentation together and giving users and easy way to access the functionality (load the tunnel script, look at it's documentation for details)

We can probably split it up and put (1) in policy/ and (2) in base/. However, (2) only works if the connection_compressor is disabled (otherwise the identity of the tunnel is lost), so this makes it more problematic to put it in base (at least while the connection_compressor remains on by default)

Ah, ok. That makes sense. That's probably the right way to do it. Make it disabled by default (with scripts using it loaded in base) and create a file named policy/protocols/conn/decapsulate-tunnels.bro (or something like that). In the decapsulate-tunnels.bro script, you could document that it relies on the connection compressor being disabled. If people don't read the docs for scripts before @load-ing them it's hard to blame the script author. :slight_smile:

I've attached the current version. Might be easier to just look at it than explaining it via email :wink:

Cool, thanks. I'll take a look later.

  .Seth

I was just about to implement it this way, when the following occurred to me: If I split it up, then conn.log will always contain a column "tunneltype" (since (2) is in base) even if the tunnel decapsulation isn't enabled. This might be counter intuitive for users, since the presence of the column would suggest that something with tunnels is happening (esp. now that we have extendable log files)

(Just for the tunnels this whole discussion is definitely overblown, but IMHO we'll face such questions more often further down the road, so I think we should figure out what the best way to do it is)

cu
gregor

I was just about to implement it this way, when the following occurred to me: If I split it up, then conn.log will always contain a column "tunneltype" (since (2) is in base) even if the tunnel decapsulation isn't enabled. This might be counter intuitive for users, since the presence of the column would suggest that something with tunnels is happening (esp. now that we have extendable log files)

You're right. I tend to go through the same flailing back and forth when making little decisions like this too. :slight_smile:

(Just for the tunnels this whole discussion is definitely overblown, but IMHO we'll face such questions more often further down the road, so I think we should figure out what the best way to do it is)

Yep, completely agreed.

  .Seth

Actually it's not really difficult in awk to use the header as index
(perhaps a bit slower, but not sure how much). And Gilbert is working
on a Python interface to logs.

Robin