CBAN design proposal

Here’s an updated design doc for CBAN, a plugin manager for Bro, with some new ideas on how it could work:

https://www.bro.org/development/projects/cban2.html

Eventually it may be good to ask for feedback on bro-users to make sure we’re not missing important features the community would want, but I thought I’d start w/ bro-dev in the interest of wasting fewer people’s time in the case the design is way off base.

- Jon

Cool, thanks. I'm going to send some feedback but first I wanted to
bring something up that might be controversial. :slight_smile:

As I read through the design doc, I started questioning our plan of
curating CBAN content. I know that's what we've been intending to do,
but is that really the best approach? I don't know of script
repositories for other languages that enforce quality control on
submissions beyond checking technical conventions like certain meta
data being there.

I'm getting concerned that we're creating an unncessary bottleneck by
imposing the Bro Team into the critical path for submissions and
updates. Why not let people control their stuff themselves? They'd
register things with CBAN to make it available to everybody, but we'd
stay out of the loop for anything further. We may still want to keep a
local copy of the code to protect against stuff going offline, but
that could be automated. Or we could even move that burden to the
authors as well and just keep a reference where to find the code,
without a local copy; if it disappears, so be it.

In other words, my proposal is to put authors into control of their
code, and make them fully responsible for it too --- not us. We'd just
connect authors with users, with as little friction as possible.

If we want some kind of quality measure, we could introduce stars for
modules that user assign if they like something; or even some facility
to leave comments or other feedback that's visible to everybody. We
could also verify if a plugins builds and loads correctly, or if tests
pass. But we wouldn't block it if something fails, just mark it (say,
with a banner saying "tests pass", "tests fail", "no tests").

Thoughts?

Robin

In other words, my proposal is to put authors into control of their
code, and make them fully responsible for it too --- not us. We'd just
connect authors with users, with as little friction as possible.

I support this completely.

If we want some kind of quality measure, we could introduce stars for
modules that user assign if they like something; or even some facility
to leave comments or other feedback that's visible to everybody. We

I think community vetting and feedback (and experience stories) is the right way to go here.

Bro team vetting something will be very hard. My personal experience says there are times when scripts bring surprises weeks and months down the line - so testing isn't merely run a few days and give an OK.

Aashish

I think there is a broad spectrum from no interaction to vetting and pulling into the main repository. It is about finding the right balance.

I agree with minimal restrictions that block submissions. There needs to be some basic quality control and standardization there. For example, do you have all the right pieces.

I do think there is another level of non blocking checks and quality control we can provide. For example, we can do checks for exec calls and give warnings to users. I think Puppet Forge has a nice model here. We won't block a submission, but these checks encourage better development and help new users trust submissions. That said, I think these must be automated. They can't block on a human reviewing them.

Finally, I think we need a way to let the whole community endorse scripts or script authors.

I think there is a broad spectrum from no interaction to vetting and
pulling into the main repository. It is about finding the right
balance.

Yep, and I'm arguing for going very far towards the "no interaction"
side, with just some automatic checks for the most crucial things.
Maybe the initial pull request for integration could be merged
manually to avoid any spamming, but any updates for example should not
require any interaction from us.

I do think there is another level of non blocking checks and quality
control we can provide.

Yes, indeed, I like this. With things already merged in, we can in
parallel, in the background, build up a toolbox of things to check for
and mark a module accordingly.

Robin

I think we’re generally on the same page, but I wanted to elaborate a bit on what I envisioned with the plugin submission process.

When a contributor submits a new script, there would be some mandatory checks that would need to pass for the script to be included:

  • Is the plugin structure valid?
  • Is there a valid metadata file? This file could list things like dependencies, what version of Bro the plugin works on, etc. I think the bare minimum of what needs to be in the plugin is: 1) version number and 2) license information. It’s possible that we wouldn’t even need the license if the submission process puts a default license on any plugin that doesn’t specify otherwise.
  • If there are dependencies, are those already in the CBAN repository?
  • Is Bro able to load this plugin with --parse-only, or does it generate a parse error?
  • Is there already a plugin with that name and version number? If so, the author would need to increment the version.

I think this is a nice balance between some bare minimum checks designed to ensure that the plugin is actually installable and not putting too much of a burden on contributors.

Once a plugin has been submitted, if it passes those checks, I think it should be automatically pulled into a repo. I don’t think that we need manual intervention for this. At this point, though, I think we could run some “quality tests” and give the plugin a quality score. This might be things like:

  • Is there documentation? (Both a README and checking to see if, for example, external redef-able consts are documented).
  • Are there btests?
  • Do the tests pass?
  • Does the plugin load in bare mode?

These quality scores would be strictly informative, and wouldn’t prevent or modify the acceptance of the plugin.

What I’m imagining is something like this: https://forge.puppet.com/vshn/gitlab/scores#module-release-info

–Vlad

To clarify, is the concern w/ the amount of non-automatable tasks or with the model of requiring authors to “ping” some middle-man in order for updates to their plugin to become visible to all CBAN users?

Because most what I had outlined could be automated (apart from the suggestion that initial submissions have someone from the Bro Team quickly review whether it looks legit).

Though I am on board for trying to draw up a contrasting design where the focus is on directly connecting users w/ plugin authors without any blocking processes or bureaucracy in the middle-man. That would make life easier for authors, and that’s maybe even a higher priority than maximizing the quality/consistency of user experience because, without authors, there won’t be much for users to experience in the first place.

- Jon

To clarify, is the concern w/ the amount of non-automatable tasks or
with the model of requiring authors to “ping” some middle-man in order
for updates to their plugin to become visible to all CBAN users?

Both actually. Putting us into the path introduces delay and requires
somebody making time available. This is already not working well for
the current plugin repository, where things stall because we would
like to provide feedback and help guide the author along, but then
don't have the cycles for actually doing that. The delay/effort will
be shorter/less the more tasks can be automated, but at the beginning
we won't have much automation in place I imagine and even long-term
certain stuff could never be automated. So I'm wondering if we really
need to be in the path at all, seems that can only cause friction that
would be better to avoid in particular as we kick things off.

Because most what I had outlined could be automated

Yep, understood, my mail was not directly targetting your proposal;
that just triggered me thinking about this again. My comments were
meant more broadly, we've been discussing different notions of vetting
over time with various subsets of people.

That would make life easier for authors, and that’s maybe even a
higher priority than maximizing the quality/consistency of user
experience because, without authors, there won’t be much for users to
experience in the first place.

Yeah, that's exactly what I'm advocating: making it easy should really
be priority number one, with everything else coming second. If you see
ways to adapt the design to target that specifically, I'm all for it.

Robin

When a contributor submits a new script, there would be some mandatory
checks that would need to pass for the script to be included:

The "mandatory" is where I disagree. I believe there's just to much
involved with any initial vetting, even if conceptually simply, that
it will create a bottleneck. Take your first question as an example:
"Is the plugin structure valid?" We wouldn't get very far with a
simple yes/no answer. We'd have to explain what exactly is not valid,
with some of that being just convention, or maybe something that
matters in general for plugins but for some reason not the particular
one. Or what if the plugin works with some version of Bro, but not
another. Are we going say it needs to work with the current release?
What if a new release comes out and breaks all existing plugins? What
if then an update for a plugin comes in that doesn't move to the new
version yet?

At this point, though, I think we could run some "quality tests" and
give the plugin a quality score. This might be things like:

Yep, I'm fully on board with this part, that's good information we can
provide to users about the state of something. And that state could be
"totally broken". :slight_smile:

Robin

I guess there is a balance here. If we do no mandatory checks and you could submit something that isn’t even a Bro plugin, the repository could become cluttered with junk. Do we really want things that don’t even “compile”? I guess we can wait and see for some of these decisions, meaning start with optional and decide to make them mandatory if it becomes a problem.

However, where we can’t do that is with the metadata we collect. If we don’t require what we think is important metadata in the beginning, then we will have a gap if we decide it was important all along. So there I would err towards overcorrecting in the beginning, and make things optional in the future if it turns out not to be important.

Yeah, I have ideas, but seems like there might be another day of some discussion before I try to formally reframe a design doc. Here’s the direction I'm thinking:

- still have a "bro/cban" github repository, but it now contains git submodules that point directly to other github accounts/repos

- the initial submission process involves doing a pull request on the bro/cban repo where the only change made is the addition of a submodule. These merges probably can be automated, but if a human were to do it, I’d expect it wouldn’t be a time-consuming task — just check if the change is adding a submodule and then click a button to merge (don’t even have to look at the contents of the submodule).

- a person that has submitted something to cban needs no further interaction with it and they resume their typical development workflow — cban client's “update” command will fetch/pull directly from their git repo.

- all submodules get scanned by an out-of-band nightly process which checks for brokenness and other quality metrics

- submodules that are found to have never been in a working state are auto-removed (or could initially be a task that’s not a big deal for a human to do every so often if metrics of brokenness are consistently available)

- it’s not a big deal for a submodule to temporarily enter in to a broken state — cban users can always roll back to a previous version or just uninstall it. It’s up to the community to communicate/collaborate directly w/ the author here to get things fixed.

- metadata associated w/ plugins is all optional, but its existence contributes to some arbitrary “quality” rating/metrics

- metadata logically can be categorized in two types, one type is related to discoverability (tags, author, license, etc) and one type is related to interoperability (version number, dependencies)

- discoverability metadata is aggregated during the nightly quality check processes and automatically commits that information to the “bro/cban” repo. Without doing this, I think cban clients would have an incredibly slow “search” command that goes out to each submodule individually and gathers metadata. (features related to discoverability might be lower priority in general)

- interoperability metadata can also be aggregated nightly along the discoverability metadata, but when the cban client is actually going to perform specific operations on a particular submodule, it gets this data directly from the cloned submodule(s) to make sure the info is up-to-date. Version numbering can probably be done via git tags, but dependency info stored in a canonically named text file.

I think all those points make things easy on contributors, minimize direct involvement of the Bro Team in sorting out problems related to particular plugins, and provide a useful way for users to discover and maintain Bro plugins. There’s more potential for users to encounter broken/bad plugins, but maybe that also encourages stronger community involvement w/ users more likely to try and help get problems resolved.

- Jon

I don’t feel like we have converged on agreement regarding the balance of mandatory vs. optional checks.

I think we need to define some basic metadata as a requirement for interoperability and discovery. Otherwise, what do we really end up providing above and beyond GitHub.

Other quality checks can be optional, as long as we can change that in the future. I still think we should do do some basic checks to avoid completely broken stuff. It might mean more work for us in making sure we have good feedback and documentation.

In general we all want to avoid human interaction becoming a bottleneck to submissions.

I propose that we keep mandatory checks minimal, but not non-existent, and then we reevaluate when we have real data about how well this works. But I would really like more feedback from the community. Maybe I am an outlier here?

I guess there is a balance here. If we do no mandatory checks and you could submit something that isn’t even a Bro plugin, the repository could become cluttered with junk. Do we really want things that don’t even “compile”?

The clutter could still be removed by an out-of-band process. e.g. there’s no initial check for whether a submission actually works, but after X days of a nightly process finding it is broken, it gets auto-removed.

However, where we can’t do that is with the metadata we collect. If we don’t require what we think is important metadata in the beginning, then we will have a gap if we decide it was important all along. So there I would err towards overcorrecting in the beginning, and make things optional in the future if it turns out not to be important.

I think the most important metadata has to do w/ plugin interoperability (versioning and dependency info) and metadata that improves discoverability and search features is less important. For one reason, the former has a more objective correctness to it and the later is more subjective. Having wrong or missing discoverability metadata is also not going to cause things to break this missing interoperability data would.

But even though I think interoperability metadata is important, I’m also not sure it needs to be collected/aggregated before plugin submissions are accepted — it might be something the client can collect “just in time” directly from a clone of the plugin itself. Even if a plugin doesn’t initially include this, the expected behavior could be for the cban client to use the plugin’s master branch and assume it will work w/ everything. If the user finds that not to be true, then they just uninstall it and ask the author to add proper versioning/dependency info or they might even try to add it themselves and submit the fixes back to the author.

Metadata that helps improve discoverability and search features (topics/tags/keywords, author, license, etc) I don’t see becoming so important but underused to the point that you’d wish it were a requirement for submissions to be accepted.

I don't expect adding metadata to be so much a burden that people avoid it entirely. Were there other reasons to think people won’t eventually add metadata info even if none is initially required?

- Jon

That is a good point. I am more concerned about accumulating clutter.

I think starting w/ either approach could end up evolving/devolving in to the other?

If you had no checks in place, but then later instituted mandatory checks, you might be able to have the cban client not remove things a user has already checked out. So you can delist plugins if they fail the new checks, but users would still have the local version they can use (if somehow they’ve got it in a configuration that’s usable to them, but that doesn’t pass the new mandatory quality checks).

I lean toward starting w/ the most streamlined and least complicated approach and seeing what quality control checks you need to layer on top of it because we might just expend a lot of effort planning for problems that don’t actual ever pop up in practice. But as a person that has to do development work on cban I might be biased toward doing what seems easier for me, so I’m fine not having a vote.

- Jon

I guess I’m not seeing more vs. less complicated; I see mandatory vs. optional being the difference.

The important design goals here I think are:

1. Not block anything on a human if possible.
2. Be extensible so we can add future checks and change things between optional and mandatory.
3. Require as little information as needed, but no less.

And these goals are really in service of the broader goals of having a useful repository with low barrier to entry. It is is these two goals that are in tension a bit.

I’d prefer not to have anything in there that we know is broken, and I believe Robin is concerned that any blocks are going to require interaction on our part. I don’t think that is the case, but both of us are speculating with out data. I think compromises can be made as long as we have the flexibility to change approaches as things progress and we get data back.

- it’s not a big deal for a submodule to temporarily enter in to a broken state — cban users can always roll back to a previous version or just uninstall it. It’s up to the community to communicate/collaborate directly w/ the author here to get things fixed.

I really like the community-centric approach. Regarding the discussion
about checks and consistency I think that basically all conventions,
that could be enforced automatically, will make the archive easier to
work with (for authors and for end-users). But there is another thing
that came to my mind: How are situations handled in which the author
becomes the bottleneck?

Imagine there was someone who published an awesome script but a new
version of Bro breaks it. Another one patched the script and sends the
patch to the original author. What will happen, in case he does not
respond? Personally I don't like repositories which end up with entries
like: "awesome-script", "awesome-script v2", "awesome-script by Jan" ...
To avoid this one might consider to support forking plugins or organize
the plugins user-centered ("jan/awesome-script", "anna/awesome-script").

Best regards,
Jan

(I will respond to the actual proposal in more depth later.)

That is a good point. I am more concerned about accumulating clutter.

If I understold it correctly, I don't think that the central CBAN
repository will accumulate clutter. The automatic checks will help
simply age out repos that do not comply with the minimal standards. It's
up to the devs to comply if the want to be integrated.

More generally, there will presumably some functionality to add
"remotes" to one's configuration, allowing plugin writers to point to
experimental code if they wish. Then they can still hack out code and
mix it with existing CBAN plugins, at their own risk.

With a small linter in place, we would also lower the bar for devs to
comply. Homebrew has a nice checker, for example.

On a cosmetic note, will thing be called CBAN? I find it a very cryptic
name, often confused it with BPAN (even though it doesn't make sense),
and was wondering whether we should converge on some more pronounceable
candidates.

    Matthias

I think that’s fine, but that isn’t what I thought Robin was saying. I thought he did not want minimal standards.

Yeah, I have ideas, but seems like there might be another day of some
discussion before I try to formally reframe a design doc. Here’s the
direction I'm thinking:

I like the process you sketch, that sounds like the right way to go to
me.

A few notes, also trying to address some of Adam's comments:

- the initial submission process involves doing a pull request on the
bro/cban repo where the only change made is the addition of a
submodule. These merges probably can be automated, but if a human
were to do it, I’d expect it wouldn’t be a time-consuming task

Yeah, maybe that initial merge is one task we leave to a human, who
could then actually take a quick 30sec look at the module to see if
it's not totally off the mark. That would address Adam's point about
what if somebody submits something that's not even a Bro thing (but I
wouldn't go further; e.g., don't try to compile, etc.. Everything
looking roughly right gets in at this time.)

- submodules that are found to have never been in a working state are
auto-removed (or could initially be a task that’s not a big deal for a
human to do every so often if metrics of brokenness are consistently
available)

Auto-removal sounds dangerous to me; there may be different reasons
why something's not in a good state. I'd leave cleanup to humans too:
if there's a module that's consistently flagged as broken, that's when
we can send a mail to the author and remove it manually if no
improvement is in sight. I'd rather err on the side of having a broken
module than remove something that could actually still be useful.

- metadata logically can be categorized in two types, one type is
related to discoverability (tags, author, license, etc) and one type
is related to interoperability (version number, dependencies)

I wouldn't object to making some meta-data mandatory, per Adam's
comments. For example enforcing having an author and a license would
be useful I think. Author gets us contact information and license is
always worth clarifying.

- discoverability metadata is aggregated during the nightly quality
check processes and automatically commits that information to the
“bro/cban” repo.

Would it be better to maintain this information outside of git in a
state file that clients download? Otherwise the repository will
clutter up quite a bit over time with tons of automatic commits.

Overall, I agree that we can always add more restrictions later if it
turns out necessary. It's not that we'll have 1000s of Bro modules in
there within the first two weeks (as long as we prevent somebody
spamming us).

Robin