Organizing plugins (Re: [JIRA] (BIT-1222) topic/robin/reader-writer-plugins)

(Taking this to the mailing list for discussion.)

I think that script and any tests (assuming the plugin test
infrastructure is in place?) need to move into the plugin.

Agreed in general. But there are two more general questions going in
here I think:

     - Part of the problem is that right now, Bro's standard tree of
       scripts is still unchanged: while the core
       analyzers/readers/writers are now plugins, their corresponding
       scripts remains where they always were (and hence get pulled in
       unconditionally).

       Question is: do we want to change that? I'm reluctant to do
       that right now, as it would be major structural change, and we
       don't have much experience yet with the plugins' organization.
       I would prefer to leave the standard scripts as they are for
       now.

     - What's our strategy for moving non-standard stuff out of the
       main distribution?

       Generally, I think we should start a separate bro-plugins
       repository where we keep non-standard plugins (both from us,
       and from external folks as long as there's a clear maintainer).

       We could then take the stance that everything dependending on
       optional functionality would go there, rather than into Bro
       itself. Right now, I think that would mean support for
       DataSeries and ElasticSearch.

So, in short: what would you guys think about solving the problem by
moving DataSeries and ElasticSearch (including their scripts and
tests) out into a new bro-plugin repository, but otherwise leaving
things as they are right now?

Robin

Yeah, seems like a reasonable first-step. I’m wondering if it makes sense to break them up even further in to separate repos like "dataseries-bro-plugin" and "elasticsearch-bro-plugin” ? Done like that, it may make the maintainer of the plugin more obvious and so also distinguishes ones that are externally contributed as the owner will have their own independent repo. If the aim was to have a central place for people to discover plugins, simply keeping a running list of repo URLs somewhere could also provide that.

- Jon

      Question is: do we want to change that? I'm reluctant to do
      that right now, as it would be major structural change, and we
      don't have much experience yet with the plugins' organization.
      I would prefer to leave the standard scripts as they are for
      now.

Mmmppphhh... Not sure if I should say "let's do it!" or not. I'm *really* tempted to say that we should make the break. We're early in the 2.4 dev cycle and now's the perfect time to get that plugin organizational experience. At the very least, this would force some better practices on us and the community. Only one way to get the experience. :slight_smile:

I'm actually starting to wonder now what you mean by "standard scripts"? You're obviously saying that the ES and dataseries scripts should be broken out, but could you give an example where you think we should leave it alone for now? I may be losing track of this conversation.

      Generally, I think we should start a separate bro-plugins
      repository where we keep non-standard plugins (both from us,
      and from external folks as long as there's a clear maintainer).

Agreed. Those should be easy to break out even further into separate repositories once we have an easy system for managing dependencies (i.e. a package manager).

      We could then take the stance that everything dependending on
      optional functionality would go there, rather than into Bro
      itself. Right now, I think that would mean support for
      DataSeries and ElasticSearch.

And libgeoip!

So, in short: what would you guys think about solving the problem by
moving DataSeries and ElasticSearch (including their scripts and
tests) out into a new bro-plugin repository, but otherwise leaving
things as they are right now?

In case it's not obvious, I'm voting for making the larger change, whatever that is. It just feels wrong to leave this code split up half way done.

  .Seth

Mmmppphhh... Not sure if I should say "let's do it!" or not. I'm
*really* tempted to say that we should make the break. We're early in
the 2.4 dev cycle and now's the perfect time to get that plugin
organizational experience.

Well, my preference would be to first get experience with maintaining
some external plugins before we take the step to reorganize all the
existing stuff. That's quite a bit of work, and if we get it wrong
we'll have to do it again later ...

the community. Only one way to get the experience. :slight_smile:  I'm
actually starting to wonder now what you mean by "standard scripts"?
You're obviously saying that the ES and dataseries scripts should be
broken out, but could you give an example where you think we should
leave it alone for now?

Take the HTTP analyzer for example. The event engine code lives in
src/analyzer/protocol/http/. If this were an external plugin, it would
have its corresponding Bro scripts in src/analyzer/protocol/http/. So
I guess that means we would now move
scripts/{base,policy}/protocols/http in there.

But I'm not sure it's always clear-cut where existign scripts would
move; and what about those that don't have a corresponding src/* part?
I think the answer is that those would move into script-only plugins,
which in principle should already be supported as well; but where do
they live? Maybe we want to move all the plugins out of src/ anyways?
And how does this all play along with the envisioned CBAN (or whatever
we call it these days)?

So, I'm with you that should figure this all out, but I would prefer
to do that as a separate step that for now leaves the existing
structure in place until we know these answers.

And libgeoip!

Good point, although this would hurt a bit more: geoip is optional but
still pretty standard functionality that we would now require people
to install separately ... So while I agree that this should move into
its own plugin as well, maybe that's also somethign for later
(generally, most of the bifs should move to plugins as well; we should
reorg them broadly by functionality).

Robin

Yeah, I'm torn on this. It does make sense for the reasons you give,
but one repository also has its appeal:

    - it's easy to just get them all by cloning, or packaging, the one
      thing.

    - administratively, we just need to manage/mirror one repo.

    - we can add some infrastructure to the repo to easily build and
      test them all at once, including as part of Jenkins.

    - we can market that one repo as a vetted source for plugins,
      including plugins maintained externally that follow certain
      standards, like having a maintainer who fixes problems and makes
      sure it works with the current release (we'd ping that person
      when something breaks and remove the plugin if there's no fix).
      [1]

    - independent of what we do, people can of course still have their
      own repos elsewhere anyways.

Opinions?

Robin

[1] That said, maybe even that is already more effort than we really
want to invest into external code?

Maybe still have one repo that relies on git submodules, one per plugin?

  - Easy to clone everything w/ —recursive.

  - Could hold common packaging and testing infrastructure.

  - Still has administrative overhead of having to create/mirror many repos.

Was there more concern regarding admin overhead other than the initial cost of setting-up/mirroring? Is there a limit to how many repos can be mirrored?

Could also do a hybrid approach where only external plugins are submodules, but internally maintained ones just get committed directly to main "bro-plugin” repo. That would cut down on the admin overhead.

Another worry: should the way plugins are organized make it easy to be selective about which plugins to build/install ? Say that I just want the dataseries plugin, but also happen satisfy dependencies of the elasticsearch plugin, would it be more “awkward" for me if all plugins live in same repo and share build infrastructure? “Awkward” meaning it’s going to download/build/install things I don’t want.

- Jon

Was there more concern regarding admin overhead other than the initial
cost of setting-up/mirroring? Is there a limit to how many repos can
be mirrored?

No, it's indeed mostly the setup, plus the potential messiness of
having 1000s of individial repositories show up when somebody goes to
github (to be a bit overly optimistic :slight_smile:

Could also do a hybrid approach where only external plugins are
submodules, but internally maintained ones just get committed directly
to main "bro-plugin” repo. That would cut down on the admin overhead.

I like this model. Keeping our own stuff in a single repo feels more
convinient to me, but I can see pulling in external ones separately.
And we'd still have "vetting power" becaue we have to move the
submodule forward.

Another worry: should the way plugins are organized make it easy to be
selective about which plugins to build/install ?

Yes, actually I think so. Building/testing all together is convenient,
but I'm not sure it should be the default (*installing* all together
should pretty certainly not be the default). So I'm thinking to do it
selectively, with the option to do them all. Maybe that's just a
top-level Makefile with individual targets
{build,install,test}-dataseries, {build,install}-elasticsearch, and
then an overall {build,install,test}-all. In addition one can also
always go to the subdirectories individually and build there, they are
all supposed to build standalone as well.

So do we want to go ahead with this model? Then I'd set that up and
move DS and ES over.

One more question: would we then want to pull bro-plugin as a
submodule into bro/aux? Then people would automatically get the stuff
there as well (but it wouldn't be build automatically), and it would
allow for easier testing as it's clear where things are located.

Robin

So do we want to go ahead with this model?

Sure. I’m thinking if a problem is found, it’s not hard to convert to one of the other models since they should share the same directory structure and only differ in which dirs are chosen to be git submodules.

One more question: would we then want to pull bro-plugin as a
submodule into bro/aux?

I think that would be ok unless you’re worried about keeping the size/overhead of cloning bro down.

- Jon

No, that's fine I think. I don't have particular worries, most of my
questions are just fishing for opinions on how we structure this best. :slight_smile:

Robin

So, I'm with you that should figure this all out, but I would prefer
to do that as a separate step that for now leaves the existing
structure in place until we know these answers.

I think you gave a good justification. Personally I am still a bit mixed on if we should even include the scripts that we put in base/protocols/ in with the analyzers. I can actually see real justification for not including them.

And libgeoip!

Good point, although this would hurt a bit more: geoip is optional but
still pretty standard functionality that we would now require people
to install separately ... So while I agree that this should move into
its own plugin as well, maybe that's also somethign for later

Fair enough. Eventually when this reorganization happens in earnest I think we'll have to use the rule of thumb we discussed yesterday. Always seek to shorten init-bare.bro.

  .Seth


I don't like that stuff might not automatically build. Would it be possible to have the plugins add stuff to bro's configure output? So that plugins that are available and able to find their dependencies automatically build?

  .Seth

Yeah, that's yet another decision to make ... Undecided. :slight_smile:

Robin

Hmmm ... Not sure I like that. To me these separately maintained
plugins are optional things that shouldn't be pulled in automatically.
Would you say they should also install automatically? What about those
that don't even have external dependencies? Would they always be
installed/loaded?

Also, if we integrated them into the central configure, we'd probably
also need to provide their options, like --with-dataseries=/path/to/ds ...

Robin

Yep. I feel more and more like we're going to need the package management system soon after (or in conjunction?) with this.

  .Seth

We'll need it eventually, but I wouldn't like to wait for it. Do you
think we'd break much by moving DS and ES into bro-plugins/, without
building them automatically as well?

Robin

I'm not very fond of the ES support being even harder to get working. I'm still very hopeful that we'll get mechanisms and documentation in place that make the ES stuff much easier. Ultimately it's probably a small concern.

  .Seth

Any idea how widely that's used currently? It wouldn't be harder, but
a bit.

I think the alternatives right now are: (1) holding off completely on
moving readers/writers to plugins, or (2) getting some hack in place
so that we avoid the problem that Jon noticed with readers/writers
being used in script-land that aren't available (could probably done
with some ugly ifdefs).

I could see (2) if we had to. I would prefer to avoid (1).

Robin

Not very widely. It still has issues we're looking to correct too.

I think we can skip 1 and 2. :slight_smile:

  .Seth

Are there any plans for packaging plugins and pushing those into various distributions' repositories (e.g. CentOS, Debian, FreeBSD)? 'sudo yum install bro-2.4-elasticsearch-writer' seems like it would be pretty convenient for users, assuming there are plans to support it. On a related note, it seems like individual maintainers could acquire blessed status pretty quickly without getting the bro team involved by pushing their individual plugin upstream somewhere: anything that 'yum search bro-plugin' (or equivalent) yields would probably be assumed to be somewhat stable (or at least stable enough to install without thinking about it too hard first :slight_smile:

Are there plans to package a bro-2.4-plugin-devel or equivalent to make it possible for the folks who have installed bro via e.g. apt or yum to build plugins without having to also pull down and compile a complete version of bro? I think this could make plugin development quite a bit more accessible for new folks, assuming the overhead of maintaining such a package wasn't unreasonable.

-Gilbert

All good questions. We still have a lot figure out with all of this,
would certainly be nice to be able to do "yum install bro-my-plugin".
Right now the only built-in way to distribute plugins as binaries is
"make bdist" with provided skeleton Makefile: that builds tarball with
everything that the plugin needs at runtime.

I also realized another problem (maybe) with the current skeleton for
writing plugins: it doesn't keep things nicely inside a single build
directory, as we usually do with all cmake-built stuff (there's a
reason for why not, but a few sylinks could probably solve that).

Anyways, I would still like to go ahead with things just as they are
now (assuming we manage to not break anything/much). Nothing's cast in
stone, but I think it will work better to work this out once things
are in master so that people start using it.

Robin