Script reorg proposal

Seems we are still not quite happy with the layout of the new scripts.
The current directory structure can be a bit confusing to follow, and
also for users, to find out what to load.

Seth and I were kicking around another idea last week. It's somewhat
radical compared to "Bro tradition" but I think it makes sense.

The proposal is to move more scripts into the the set loaded by
default. While currently, Bro essentially does nothing when no further
scripts are specified, we would change things so that by default, Bro
now loads all the basic scripts that do just logging and state
building (but not more extensive/expensive kinds of analysis or
detection).

So when users just run Bro on a link/trace, they'll immediately get a
bunch of high-value log files, without needing to figure out anything
else (and some may just want to stop right there in terms of what to
learn about Bro, which is fine).

In addition, however, as there clearly is value in running Bro with a
minimal bare-bones config (for fine-tuned trace analysis, research, or
debugging), we'd also provide an option (i.e., redefable script
variable and/or environemnt variable), that brings back the old
behaviour of loading just bro.init. With that, one could still
cherry-pick what to load.

With this scheme, we'd organize the script installation like this,
assuming --prefix=/usr:

    /usr/share/bro/base/

        - All the scripts that are loaded by default (large parts of
        the current protocols and frameworks directories). Most users
        wouldn't need to know much about these scripts (but they'll be
        documented and can be extended).

    /usr/share/bro/policy/

        - All the scripts that the user can load in addition. I.e.,
        much like the 1.5 policy/ directory, but with less stuff in it.

    /usr/share/bro/site/

        - Local site-policies. Depending on file system standards,
        this may go somehwere else as well.

    /usr/share/bro/contrib/

        - At some point, we'll add the set of contributed scripts
        here. Will be externally managed in some way we haven't
        figured out yet.

BROAPTH would include all these four directories.

What do you guys think?

Robin

Makes sense and addresses some of the noob complaints. As long as there is a way to run barebones, I think it is fine.

With this scheme, we'd organize the script installation like this,
assuming --prefix=/usr:

   /usr/share/bro/base/

   /usr/share/bro/policy/

   /usr/share/bro/site/

   /usr/share/bro/contrib/

BROAPTH would include all these four directories.

If they're all in BROPATH, that means it's possible for scripts/packages in one directory to overshadow equivalently named ones in another directory. e.g. this makes it hard for one to create an "protocols/http" package that exists in all 4 dirs, which seems like a reasonable use to me. Is there a specific path ordering or naming convention that can deal with this?

I think I'd probably like for the default BROPATH to just contain the $prefix/share/bro.

- Jon

I think that this is an excellent idea since it addresses the new user
(!RTFM)/first impression problem.
cheers,
scott

Good catch, and actually this is my mistake when writing the summary,
sorry.

While I don't have my notes with me right now, I think what Seth and I
actually discussed is that BROPATH would include only
$prefix/share/bro and $prefix/share/bro/policy. That allows to pull in
the optional stuff from policy/ directly without needing the prefix
(which is the common case); but all the other stuff does require the
base/contrib/site prefixes, and we don't get conflicts with
directories overshadowin each other.

Does that sound better?

Perhaps site/ should be in BROPATH as well though, in particiular if
it ends up being moved to somewhere else than share/bro/.

Robin

I like the idea as well. However I'm wondering whether it would make sense to put the stuff that's loaded by default into specific policy file, e.g., default.bro and load that file by default. This would make it easier to selectively remove some parts of the analysis instead of having to go from-all-to-nothing. Then we could also just use a single (base) directory for all the scripts instead of bro/base and bro/policy (which kinda irks me). The default.bro file could actually live in the site directory, so it's readily editable for users.

This way users get a nice set of default analyzers and logs and it's easy for slightly advanced users to disable some analyzers. I think that's important since for a bunch of analyzers or framework features I expect it to be a close call whether it should be loaded on default or not.....

In any case I think it's important to have a nice documentation briefly describing what each of the policy files or packages does. Basically a quick guide to: "if I want X I have to load a,b,c". Don't know whether this readily comes out of the new documentation framework or not. Even if it does it might make sense to manually structure it thematically. In any case the document should link to the generated documentation of the individual scripts/packages. Might also be nice if these brief descriptions would also make it into the default.bro file as comments.

cu
Gregor

I like this rather simple scheme.

    /usr/share/bro/site/

        - Local site-policies. Depending on file system standards,
        this may go somehwere else as well.

It always bothered me that the site scripts were in the same place as
the distribution policy scripts. This made it more difficult to keep
local customixations in git or other VCS. What about ~/.bro/site? (see
below)

BROAPTH would include all these four directories.

What about a ~/.bro directory in addition to BROPATH? This would allow
users to customize/override default script versions and facilite script
hacking. For example, say a user wants to replace
PREFIX/bro/base/foo.bro with a custom version. This would simply require
creating ~/.bro/base/foo.bro. We might raise a (suppressable) warning
that the file in the home directory is shadowed in this case. In
general, I could imagine that a ~/.bro directory makes it easier for
UNIX-folks to get-it-up-and-running by simply creating policy scripts in
~/.bro.

Thoughts?

    Matthias

Does this make an implicit assumption that only one user is configuring the Bro policy for a site or system? Or does bro run as root and hence this would almost always be in /root/.bro ?

Does this make an implicit assumption that only one user is
configuring the Bro policy for a site or system?

No, I did not mean to imply a single "Bro admin" per system, although
this is probably common practice.

Or does bro run as root and hence this would almost always be in
/root/.bro ?

On many UNIX flavors [1], Bro will probably need to run as root in order to
access the network interfaces. But supporting ~/.bro has also benefits
for users who simply want to do trace analysis (i.e., no root
privileges required) and customize "their" Bro. Another plus is that
rolling Bro updates system-wide or uninstalling Bro is independent of a
user's configuration.

    Matthias

[1] Some BSDs support access control via groups, and IIRC Robin wrote a
    patch for Linux.

While I don't have my notes with me right now, I think what Seth and I
actually discussed is that BROPATH would include only
$prefix/share/bro and $prefix/share/bro/policy.

Sounds better.

Perhaps site/ should be in BROPATH as well though, in particiular if
it ends up being moved to somewhere else than share/bro/.

Yes, I think it's ok to have site/ in the default BROPATH (now that I realize $prefix/share/bro will also be there).

- Jon

What about a ~/.bro directory in addition to BROPATH?

I could see that being a good default place to search for custom scripts.

This would allow
users to customize/override default script versions and facilite script
hacking. For example, say a user wants to replace
PREFIX/bro/base/foo.bro with a custom version. This would simply require
creating ~/.bro/base/foo.bro.

I don't think intentional script overshadowing is a use-case that will work as nicely as you want it to if, after the reorganization, most scripts are still as part of a script "package" and referred to internally by a relative path. e.g. the "base/foo/" package might have a "base/foo/bar.bro" script that's loaded internally via its "__load__.bro" that does "@load ./bar". That means you should never be able to overshadow that bar.bro.

- Jon

I don't think intentional script overshadowing is a use-case that will
work as nicely as you want it to if, after the reorganization, most
scripts are still as part of a script "package" and referred to
internally by a relative path. e.g. the "base/foo/" package might
have a "base/foo/bar.bro" script that's loaded internally via its
"__load__.bro" that does "@load ./bar". That means you should never
be able to overshadow that bar.bro.

Sorry, I'm not sure if I get it. Are you saying that the "@load ./bar"
directive in __load__.bro is problematic, because, if we search in $HOME
before the current directory, the wrong bar.bro (in the users home
directory) is picked rather than the one in base/foo?

    Matthias

One of the main motivations for the proposed reorg is actually to
split the two apart. Currently, they are all inside one directory
hierarchy, but that makes it hard to get a good picture of what's
there. With a default.bro that would get even worse becauese now one
would need to check that file to see what's already loaded.

With the two dirs split, there'll probably be some top-level.bro in
base/ pulling in all the other stuff. To remove only some parts one
could either copy-and-edit that file, or (better where it works)
@unload stuff. But in any case, that's something only "the experts"
would be doing anyway.

Robin

A few thoughts regarding where to put site policies:

    - I think the default for all pre-built packages should be
    whereever the target system's file system standard wants such
    stuff.

    - A user can always pick a different place. With BroCtl, it's
    single config option; and otherwise one just points BROPATH to
    that new location.

    - That said, we still need a default for the source install of
    course. I'm not sure I like ~/.bro for that, it's not where I'd
    intuitivelely look for local scripts, in particular when working
    as root.

    Does anybody have a good idea where the different OSs/distros want
    such local scripts files to be located? I'd say let's just pick
    one of those as the default for the src install as well.

    - I don't think we should rely (or "approve") the overshadowing.
    It will work to some degree (but not always, per Jon's mail), but
    in any case let's not make that the official way of extending Bro. :slight_smile:

Robin

I don't think intentional script overshadowing is a use-case that will
work as nicely as you want it to if, after the reorganization, most
scripts are still as part of a script "package" and referred to
internally by a relative path. e.g. the "base/foo/" package might
have a "base/foo/bar.bro" script that's loaded internally via its
"__load__.bro" that does "@load ./bar". That means you should never
be able to overshadow that bar.bro.

Sorry, I'm not sure if I get it. Are you saying that the "@load ./bar"
directive in __load__.bro is problematic, because, if we search in $HOME
before the current directory, the wrong bar.bro (in the users home
directory) is picked rather than the one in base/foo?

No, it's that relative script loading (e.g. @load ./bar) is currently implemented such that it first always searches the directory-of-the-current-script-being-loaded (e.g. the place where __load__.bro lives; $PWD may differ from that).

I guess we could change that, but to me it seems like increasing complexity and decreasing intuitiveness for a situation that will be run into more often by accident rather than intention. Instead we should probably recommend extending/modifying script behavior through the public API that it advertises. I think what you're trying to do with script overshadowing is workaround the lack of OO in the language, and if we wanted to solve that, it would be better to address it directly instead of hacking around it?

- Jon

    - A user can always pick a different place. With BroCtl, it's
    single config option; and otherwise one just points BROPATH to
    that new location.

This makes me wonder whether

    BROPATH="/path/to/foo:/here/is/bar"

are considered like prefixes, i.e., standard sub-directories like
site, policy, etc. are also included?

    - That said, we still need a default for the source install of
    course. I'm not sure I like ~/.bro for that, it's not where I'd
    intuitivelely look for local scripts, in particular when working
    as root.

I agree that *installing* local scripts into ~/.bro is not the best
choice. Rather, I was proposing ~/.bro in addition to the base script
installation.

    Does anybody have a good idea where the different OSs/distros want
    such local scripts files to be located? I'd say let's just pick
    one of those as the default for the src install as well.

/var could be a good choice since the site policies are subject to
modification. Perhaps /var/bro/site?

    - I don't think we should rely (or "approve") the overshadowing.
    It will work to some degree (but not always, per Jon's mail), but
    in any case let's not make that the official way of extending Bro. :slight_smile:

ACK ;-). I thought of overshadowing rather a bug than feature that would
have to be solved when scanning the script include paths.

    Matthias

No, it's that relative script loading (e.g. @load ./bar) is currently
implemented such that it first always searches the
directory-of-the-current-script-being-loaded (e.g. the place where
__load__.bro lives; $PWD may differ from that).

Got it, thanks for clarifying.

I think what you're trying to do with script overshadowing is
workaround the lack of OO in the language, and if we wanted to solve
that, it would be better to address it directly instead of hacking
around it?

Substituting scripts is indeed a hack at best to replace or extend
existing code. Because the language itself used to have little
customization points (data redefs, callback tables for a few
structures), I found myself often replacing entire chunks of the base
code by means of substituting implementations. Maybe this strategy is
not apt anymore after the script reorganization and the new logging
framework?!

    Matthias

I'd actually like the path to have bro/, bro/policy/, and bro/site/. The things that people are typically going to be loading will come from those two directories. I'd like people to understand they're loading stuff from contrib when they do so i'd like to leave that out of the path.

(This is all with the caveat that I just started reading this thread and this may have been addressed already).

  .Seth

having to go from-all-to-nothing. Then we could also just use a single
(base) directory for all the scripts instead of bro/base and bro/policy
(which kinda irks me).

One of the main motivations for the proposed reorg is actually to
split the two apart. Currently, they are all inside one directory
hierarchy, but that makes it hard to get a good picture of what's
there.

I think that needs to be solved by documentation. Have some (short) doc that lists everything that's @load'able and briefly describes what it does.(1)

With a default.bro that would get even worse becauese now one
would need to check that file to see what's already loaded.

One option would be to put everything that's @load'able into the default.bro file and comment everything that we don't load per default. We could also include the descriptions I mentioned above as comments.

With the two dirs split, there'll probably be some top-level.bro in
base/ pulling in all the other stuff. To remove only some parts one
could either copy-and-edit that file, or (better where it works)
@unload stuff. But in any case, that's something only "the experts"
would be doing anyway.

So this top-level.bro is basically the same as my default.bro except it lives in base/, right? So what's the advantage? That it's easy to see what's loaded by default based on the directory where the script is?

If that's indeed the case and you want to do the split into two directories, I would still advocate to do top-level.bro the way I described my default.bro, i.e., but it in /site and add comments to make it "user serviceable" :slight_smile:

Another question would be whether one would split protocol analysis between base and policy? E.g., is there going to be "base/http/" and "policy/http" and when I load the first as package I get the basic and when I also load "policy/http" as package I get the heavier analysis or would I also cherry-pick additional features in "policy/http/*"?

cu
Gregor

(1) This list of @load'able things should be sorted thematically which shouldn't be so hard given that we already have the directory structure. We can proably also get the descriptions from the autodoc feature.

   - That said, we still need a default for the source install of
   course. I'm not sure I like ~/.bro for that, it's not where I'd
   intuitivelely look for local scripts, in particular when working
   as root.

Yeah, that's not a good place to install anything by default, but my understanding was it would just be an additional place to search by default. So it doesn't hurt anything, but it's also not extremely useful seeing as a BROPATH can just be customized to include it. I don't have a strong opinion either way.

   Does anybody have a good idea where the different OSs/distros want
   such local scripts files to be located? I'd say let's just pick
   one of those as the default for the src install as well.

All the binary packages use an install prefix of "/opt/bro" as per FHS[1]. And except for Mac, variable/run-time related data (BroControl's spool and log dirs) are set to go into "/var/opt/bro" (for Mac it seemed reasonable to think users expect stuff to be installed in a single self-contained place -- the "app" mentality). Currently the site-specific scripts would go in "/opt/bro/share/bro/site".

According to the FHS, "No other package files may exist outside the /opt, /var/opt, and /etc/opt", so that limits the options for where local scripts would go. /etc/opt would be for static, host-specific config files, which doesn't seem entirely suitable. /var/opt is also not an exact match because that tends to be more for data that varies *during* operation. My interpretation is that we're not really violating any rule with the current placement of them, although in a similar case[2], people more familiar with packaging would probably recommend something under "/var/opt".

The standard recommendation for a manual build/install is to put stuff in /usr/local, but the specific place that's appropriate for local scripts is again ambiguous. Or maybe this issue is just outside the scope of FHS, which says "Local placement of local files is a local issue, so FHS does not attempt to usurp system administrators."

Our current approach might be adequate since the admin has an easy way (change default BROPATH) to choose the best place for their local files.

[1] Filesystem Hierarchy Standard
[2] http://ldn.linuxfoundation.org/forum/lsb-general-forum/topic/fhs-type-question-where-put-modifiable-data-shared-all-users