Proposal: Operating the Bro package manager offline

At BroCon several people pointed out a need to install Bro packages on
machines that do not have a direct external connection. One idea would
be some kind of proxy scheme where an intermediary git repository
mirrors packages locally; bro-pkg would then pull from there. However,
I don't think any of us really liked that idea much. After a few
rounds of discussions, Seth and I came up with a different idea that
seems easier to manage: extending bro-pkg to bundle packages into
deployment files that can be easily pushed to Bro systems simply by
copying them over.

I’ve tried to flesh this out a bit more, and would be interested to
hear what you all think about this approach. And @Jon: Do you think
this would be doable that way?

Here’s the idea:

1. Generally, one first uses bro-pkg as usual to install packages onto
   a local Bro system that does have external Internet connectivity
   (this could be just a dummy Bro installation). One installs new
   packages there, updates existing ones, etc., until reaching a state
   that one wants to push out to the actual Bro system.

2. We add a new “bundle” command to bro-pkg that serializes the
   current state of packages into a single file on disk, a “package
   bundle”. The bundle contains the complete content of all currently
   installed packages, using some kind of suitable container format
   (could be just a ZIP file, or whatever works; the internal
   representation doesn’t really matter).

3. Users create such a bundle on the local system and then simply copy
   that bundle file over to all target Bro machines that do not have a
   external connectivity themselves, using whatever mechanism they
   have available (e.g., just scp; or maybe through some configuration
   management system like Ansible etc.).

4. On the target machine, one runs a corresponding “bro-pkg unbundle”
   command on that bundle file. That command will completely replace
   the system’s current set of packages with the bundle’s content. As
   a result, that machine will now have exactly the same packages
   installed as the original system.

This would be the general scheme. A couple of people I talked to at
BroCon confirmed that this would offer a viable solution for them, and
that they would indeed much prefer copying files around over
maintaining local git mirrors.

Some additional thoughts on variations/extensions of this basic scheme:

- I’m not quite sure if the bundle should contain just the packages
  themselves or further bro-pkg state as well, such as which packages
  are currently loaded. Right now I’m learning towards saying “just
  the packages”; that would basically treat bundles just as a
  transport mechanism to get packages over to another box. The actual
  Bro machines would still keep control over which packages to
  actually load, etc.

- As it is described above, Step 1 would require having a local Bro
  installation into which packages get installed before they can be
  bundled up. It would be nice to have a mode where bro-pkg can
  operate without having a Bro around at all, just downloading
  packages locally somewhere for bundling them up. I could also see
  offering an even simpler mode where one simply lists packages to
  bundle on the command-line: “bro-pkg bundle <pkg1> <pkg2> <pkg3>”.
  That would be particularly useful with configuration management
  systems I think.

- It would be neat if bro-pkg's Python library exposed operations to
  inspect & retrieve the content of a bundle, such as iterating over
  the packages inside a bundle and iterating over the files inside a
  package. That way one could easily build target-side scripts that
  process and validate bundles before going ahead and installing them
  (e.g., imposing custom restrictions on what kind of packages one
  allows to put in place; or ensuring that a bundle always contains a
  set of packages the site deems mandatory, to avoid configuration
  mistakes; or even just logging what gets pushed out).

What do you guys think about this?

Robin

One can, depending on the level of paranoia or risk management:

Use a proxy for accessing git repos with packages. For GitHub it presents a problem - because github is essential an arbitrary file upload service and without terminating SSL one cannot create good enough proxy rules around it.

What we do is we create a mirror of GitHub repositories that copy only white listed parts. This is beyond the package manager project.

I really like the idea of being able to bundle - cap - unbundle package and I think it should be supported. With possibly a configuration file to tell “bundle these packages” only.

Sounds great.. What you are describing is basically source and binary packages.

The only thing I would do differently is that you wouldn't want bundles (at least not as the only feature) but individual source and binary packages.

For example

bro-pkg install sethhall/domain-tld

would still install the package from git like normal, but one could do

bro-pkg dist sethhall/domain-tld

or just

bro-pkg dist # inside the project

and this would generate a versioned sethhall_domain-tld-1.0.0.tar.gz source package

Then, that file could be copied over to the target machine and

bro-pkg install sethhall_domain-tld-1.0.0.tar.gz

or variations of

bro-pkg install https://packages.internal/bro/sethhall_domain-tld-1.0.0.tar.gz

would install it.

The hard part is how to handle compiled architecture specific packages. For something like a myricom plugin,

bro-pkg dist bro-plugins/myricom

would generate a bro-plugins_myricom-1.0.0.tar.gz source package as before

If copied over to the target system

bro-pkg install bro-plugins_myricom-1.0.0.tar.gz

would build and install it as long as build tools were installed. If 'no build tools on servers' is a requirement, then

bro-pkg build bro-plugins/myricom

on a build server would compile and generate something like bro-plugins_myricom-1.0.0-bro2.5-linux-amd64.tar.gz

And with some more glue, could easily build .deb and .rpm packages.

This is basically how python/pip/wheel works now (after ~10 years of being pretty bad).

You could still have a bundle command as a very small layer on top of packages, but the building block should be an individual package distribution.

Also,

That command will completely replace the system’s current set of packages with the bundle’s content

should probably not be the only behavior. Installing should be the default unless you do something like

bro-pkg install --replace my.bro.bundle

otherwise it would be impossible to compose bundles.

@Jon: Do you think this would be doable that way?

Yeah, looks viable.

- I’m not quite sure if the bundle should contain just the packages
themselves or further bro-pkg state as well, such as which packages
are currently loaded. Right now I’m learning towards saying “just
the packages”; that would basically treat bundles just as a
transport mechanism to get packages over to another box. The actual
Bro machines would still keep control over which packages to
actually load, etc.

Yeah, I think that’s generally how it would be also. Though, maybe since the default behavior of the “install” command is to automatically do a subsequent “load” it makes sense to automatically do loads after “unbundle” as well. It would then be up to user whether they actually “@load packages” in the target machines local.bro or just pick and choose, so they still have complete control over loading/unloading.

- As it is described above, Step 1 would require having a local Bro
installation into which packages get installed before they can be
bundled up. It would be nice to have a mode where bro-pkg can
operate without having a Bro around at all, just downloading
packages locally somewhere for bundling them up. I could also see
offering an even simpler mode where one simply lists packages to
bundle on the command-line: “bro-pkg bundle <pkg1> <pkg2> <pkg3>”.
That would be particularly useful with configuration management
systems I think.

I think bro-pkg currently works fine even if you don’t have a local Bro installation?

If you build plugins, you’d need Bro source code, but don’t actually need it installed. Then on the target machine, “unbundle” just unpacks into whatever bro-pkg paths are configured for script_dir/plugin_dir.

- Jon

In concept, I like the idea of simply extending the “install” command to be able to install from a source or binary packages, but how would the package update process look like for users?

E.g. with the bundle/unbundle the update process would be:

# On source machine
$ bro-pkg refresh
$ bro-pkg update —all
$ bro-pkg bundle

# Copy bundle to target machine

# On target machine
$ bro-pkg unbundle

With that approach, the user never has to consider individual packages so updating is still straight-forward. But with the approach of being able to install specific versions of packages from a source/binary distribution, how do you make it simple for a user to update to newer versions when absent an internet connection?

- Jon

Not sure "binary package" is the right term, at least for the first
part of your description (where nothing gets built). The difference
seems more to be having one thing (the bundle) containing all of the
packages, vs. creating individual package files to distribute. I
would like the bundle approach more because it means one can more
easily recreate the same state on multiple servers; all the package
management happens locally on one system and the final state just gets
pushed out. That's in contrast to manually ensuring that all the
target systems now end up with the right set of packages.

For packages that need to be built I can see that real binary packages
would be useful too (like indeed in the "no build tools on server"
setting), but that sounds like an orthogonal feature (and more complex
to add to the package manager).

Robin

Yeah, I think that’s generally how it would be also. Though, maybe
since the default behavior of the “install” command is to
automatically do a subsequent “load” it makes sense to automatically
do loads after “unbundle” as well.

What would that do for updates? Say I've unloaded a package, but I'm
still updating it? That shouldn't enable it, so would it do that
implicit "load" only on first install?

I think bro-pkg currently works fine even if you don’t have a local Bro installation?

I was thinking that it needs bro-config. Is that only for "autoconfig"
to set up the right paths? If so, then maybe we can add an option to
"autoconfig" to setup (and create) local paths for this
working-without-a-Bro mode?

Robin

Yeah, I think that’s generally how it would be also. Though, maybe
since the default behavior of the “install” command is to
automatically do a subsequent “load” it makes sense to automatically
do loads after “unbundle” as well.

What would that do for updates? Say I've unloaded a package, but I'm
still updating it? That shouldn't enable it, so would it do that
implicit "load" only on first install?

Yeah, I imagine it detecting whether a given package in a bundle is already installed. If it is, then just install/overwrite it without changing its “loaded” status. If not, then install and also load.

I think bro-pkg currently works fine even if you don’t have a local Bro installation?

I was thinking that it needs bro-config. Is that only for "autoconfig"
to set up the right paths?

Yes, bro-config is only necessary for the “autoconfig” command.

If so, then maybe we can add an option to
"autoconfig" to setup (and create) local paths for this
working-without-a-Bro mode?

Currently for that use-case, I’d say “just don’t run autoconfig, the default paths are sufficient”. On the source machine, packages can get installed anywhere. On the destination machine, the unbundling process will relocate everything to the correct paths. In other words, the installation paths of a bundle are not hardcoded within it.

But if we want to keep the bro-pkg initial setup procedure more consistent, I can have autoconfig first ask a question: “Will you be using this system to create package bundles?”

If no, proceed with the current autoconfig logic (use bro-config to setup paths).

If yes, still proceed with autoconfig if bro-config is available, else no-op.

- Jon

Yeah, I imagine it detecting whether a given package in a bundle is
already installed. If it is, then just install/overwrite it without
changing its “loaded” status. If not, then install and also load.

Yeah, makes sense.

Currently for that use-case, I’d say “just don’t run autoconfig, the
default paths are sufficient”.

What are the default paths? In other words, where do the downloaded
packages get put if I don't set anything further up?

Robin

They get put within ~/.bro-pkg/

(The Advanced Configuration [1] section has more usage info related to that).

- Jon

[1] http://bro-package-manager.readthedocs.io/en/stable/quickstart.html#advanced-configuration

After discussion with some more folks, and the email thread, some
further tweaks to my original "bundle" proposal:

    - Following Justin's suggestion, unbundling should by default not
      replace everything currently installed; and instead offer a
      "--replace" option if one wants that.

    - People seem to like the "bro-pkg bundle <pkg1> <pkg2> <pkg3>"
      approach quite a bit as well, so that should be a "first class
      citizen" too. So we'd have two modes: either (1) bundle what's
      currently installed locally, or (2) bundle what's given
      explicitly at startup (and ignore the current local state
      completely). For the latter mode, allowing the list of packages
      to come from a configuration file (rather than on the command
      line) would be very useful as well I hear.

    - Per Jon's suggestion, when unbundling a first-time install of a
      package autoloads it.

    - I'm sure this would just work automatically, but for
      completeness: unbundling should install packages just exactly
      the same way as if they had been coming in "normally" from git.
      In particular, if that target system later does get external
      connectivity and switches to using git directly, it should "just
      work".

    - Also for completness: The "bundle mode" should work without
      requiring a local Bro installation. Per Jon's reply, that would
      be the case automatically already as well.

    - One clarification, as I'm not sure that was clear: "bundle"
      would not build any architecture-specific code locally, that
      would happen at unbundle time on the target server. "bundle" is
      really just a transport mechanism for the getting package source
      content over to offline systems.

    - I'm withdrawing the part on exposing an introspection API for
      bundles through Python. I wouldn't object to having that, but
      it's probably unnecessary given that one can also just look at
      the package directory once things are installed. Should be good
      enough.

Robin

What happens when --replace is not used and different version of a package that’s in the bundle is already installed? I think asking user whether it’s ok is the way to go, but how much attention to draw to it? E.g.

$ bro-pkg unbundle mybundle.zip
The following packages will be INSTALLED:
  foo (1.0.0)
  bar (1.0.0 -> 2.0.0)
  baz (4.0.0 -> 3.0.0)
Proceed? [Y/n]

On answering “yes”, does it just go ahead or does it then ask for each individual package if it’s ok to overwrite it?

- Jon

What happens when --replace is not used and different version of a
package that’s in the bundle is already installed? I think asking
user whether it’s ok is the way to go,

Agree, plus a "--force" option once more to skip any questions for
batch operation.

$ bro-pkg unbundle mybundle.zip
The following packages will be INSTALLED:
  foo (1.0.0)
  bar (1.0.0 -> 2.0.0)
  baz (4.0.0 -> 3.0.0)
Proceed? [Y/n]

On answering “yes”, does it just go ahead or does it then ask for each individual package if it’s ok to overwrite it?

Just a single "yes" sounds good to me, I would see it more as
double-check to confirm the bundle is right. If it's not, one can (and
should I argue) always rebuild the bundle with different packages.

Robin

‘bundle’ and ‘unbundle’ commands are now available in bro-pkg 0.7 and should work as described earlier in the thread.

- Jon

Great, many thanks! I'll give it a try soon.

Robin