script loading changes

Robin, Seth, and I were talking about doing the following changes/additions
to the script loading functionality in case anyone has input:

1) When an @load or command line argument relies on BROPATH to search for
   the script/package, directory separator characters (i.e. '/') must be
   replaced with dots ('.'). e.g. "@load frameworks.software". ".bro"
   file extensions are still optional.

   This makes loading packaged scripts look more similar to other languages
   and better supports the new @prefixes implementation described below.

2) Allow @load to recognize absolute paths or paths relative to the currently-
   loading script.

   e.g. if foo.bro and bar.bro live in the same directory, bar.bro can do
   "@load ./foo". ".bro" file extension is still optional.

   When a script is loaded relatively, we're still going to have to be able
   to track where it is within BROPATH (if it is at all) in order for change
   (4) described below to work.

   This doesn't really conflict with rule (1) about replacing path separator
   characters with dots when naming scripts-to-load since in this case we're
   not relying on BROPATH, but I wonder if that's not really clear since @load
   is used in both cases, with (2) being implied by arguments that start with
   '.' or '/'. Maybe we should use a new directive other than @load for this
   case?

3) A new "@add x when y ..." directive that's an alternative to @load, and
   can be used to conditionally load script named 'x' only when the script
   named 'y' has already been loaded. The evaluation of that condition is
   postponed until after all scripts specified via @load or as command line
   arguments have been loaded. 'x' and 'y' can be named according to either
   (1) or (2) above.

4) The way the @prefixes directive works is going to change. It currently
   augments future @loads by additionally checking for a prefixed version of
   the argument, meaning @prefixes has to be specified very early on for it to
   work right. The new implementation of @prefixes will mean: after all
   scripts have been loaded, for each script and for each prefix, search
   BROPATH for a prefixed version of the script in the "dot-syntax" specified
   by (1) and load it if it exists (if the script is a __load__.bro the
   __load__ part is ignored in the canonicalized version of the name such that
   it appears as if it's a "package" once again).

   e.g. doing "@load frameworks.software" and a "@prefixes local" means a file
   in BROPATH named "local.frameworks.software.bro" in BROPATH, will be loaded
   at the end of the script loading process if it exists. ".bro" extension
   still optional.

- Jon

Nice summary, thanks!

Two additional thoughts:

1) When an @load or command line argument relies on BROPATH to search for
   the script/package, directory separator characters (i.e. '/') must be
   replaced with dots ('.').

Do we want to say "may" instead of "must"? So both versions would be
acceptable. If we @load a.b.c and don't find that file literally, we
try a/b/c. I think that would help help with (2) as well.

3) A new "@add x when y ..." directive that's an alternative to @load, and
   can be used to conditionally load script named 'x' only when the script
   named 'y' has already been loaded. The evaluation of that condition is
   postponed until after all scripts specified via @load or as command line
   arguments have been loaded. 'x' and 'y' can be named according to either
   (1) or (2) above.

One thing we haven't considered yet: what if "x" itself has new @loads
or @adds? Can we make that well-defined?

Robin

Apologies if I'm missing the point / this already exists :slight_smile: That being said...

3) A new "@add x when y ..." directive that's an alternative to @load, and
    can be used to conditionally load script named 'x' only when the script
    named 'y' has already been loaded. The evaluation of that condition is
    postponed until after all scripts specified via @load or as command line
    arguments have been loaded. 'x' and 'y' can be named according to either
    (1) or (2) above.

Rather than an "@add x when y", what about "@allows x" defined within y, where @allows was evaluated with the stipulations described above?

Alternatively, @try x, then define a set of @require directives in x that must be met in order for x to be loaded (but that silently fail / throw a warning if they fail to load)?

--Gilbert

1) When an @load or command line argument relies on BROPATH to search for
    the script/package, directory separator characters (i.e. '/') must be
    replaced with dots ('.'). e.g. "@load frameworks.software". ".bro"
    file extensions are still optional.

Are dots in directory or script names still allowed (and are they handled correctly?). E.g., what happens if I have a script

    foo.bar/foo/bar.blub.something.bro

And one more question: for debugging: can we have some functionality (e.g., command line argument) that will print all scripts with absolute paths in the order they are loaded? This might be handy in cases were path / script names are ambiguous or a users is wondering which files are loaded exactly.

cu
Gregor

There is a script named misc/loaded-scripts now that outputs a log file named loaded_scripts.log by default. Determining loaded scripts is all handled in script-land now due to some change Robin did recently. The output still sucks pretty bad, but it's certainly readable.

.Seth

> 1) When an @load or command line argument relies on BROPATH to
> search for the script/package, directory separator characters
> (i.e. '/') must be replaced with dots ('.').

Do we want to say "may" instead of "must"? So both versions would be
acceptable.

So I take this to mean the search order for "@load a.b.c" would be
to look in BROPATH for "a.b.c", "a.b.c.bro", "a/b/c", then "a/b/c.bro".

And the search order for "@load a/b/c" would be "a/b/c" then "a/b/c.bro"

We could do that, but I think in the shipped scripts should use one
form consistently -- probably the dotted form or else we'd just be
adding a feature that never gets used.

And actually I think the search orders listed above should be reversed
to make the common case faster.

> 3) A new "@add x when y ..." directive

One thing we haven't considered yet: what if "x" itself has new @loads
or @adds? Can we make that well-defined?

If 'x' contains @loads, they get loaded immediately during the scan of 'x'
and set some flag to indicate we should go back to iterating from the
beginning of the current pool of @adds-to-evaluate to re-check dependencies.

If 'x' contains @adds, then those are added to the end of the pool of
@adds-to-evaluate (we should be somewhere in the middle of iterating over
the pool at this point).

Another problem I was thinking of was what happens when there's

  @add x when y
  @add y when z

And there's only an "@load z", creating a dependency chain. To resolve
those cases, we can take the same approach I describe above and reset
the iterator to the beginning of the @adds-to-evaluate pool in order to
re-check dependencies whenever one of the elements evaluates successfully.

Dependency cycles will just get dropped on the floor if no element on the
cycle is ever explicitly @load'd.

- Jon

Apologies if I'm missing the point

Oops, I guess I should have described what the point of @add actually
was. It's mainly so that it's easier for a script package to define
optional components that automatically get loaded when a dependency is
met.

e.g. the protocols/http package has some scripts that depend on
frameworks/software, and those components should be optionally loaded
only when frameworks/software has been explicitly loaded instead of
a load of protocols/http always resulting in a load of frameworks/software.

Rather than an "@add x when y", what about "@allows x" defined within
y, where @allows was evaluated with the stipulations described above?

I think the problem with this is that if a user were developing their
own script package that had optional components that depend on e.g.
frameworks/software, they wouldn't be able to express that without
directly editing something in frameworks/software to @allow their
scripts. And that's discouraged.

Alternatively, @try x, then define a set of @require directives in x
that must be met in order for x to be loaded (but that silently fail /
throw a warning if they fail to load)?

We want the attempt to load 'x' to be postponed until after the normal
loading of @load'd scripts to avoid ordering issues.

- Jon

Are dots in directory or script names still allowed (and are they
handled correctly?). E.g., what happens if I have a script

foo.bar/foo/bar.blub.something.bro

Yikes, you're right, that's tough to resolve unless we say dots
aren't allowed in package/directory or script names. Or we can
escape them in the script:

    @load foo\.bar.foo.bar\.blub\.something
or
    @load foo\.bar.foo.bar\.blub\.something\.bro

I think exposing the dotted format to users this way might be more
trouble than it's worth, want to ditch (1) in the original proposal?

- Jon

We could do that, but I think in the shipped scripts should use one
form consistently

Yes, definitly; and I'd also use the the dotted form for that. It just
seems to make things more consistent (in particular with the relative
paths) if we also allow the slashes.

And actually I think the search orders listed above should be reversed
to make the common case faster.

Isn't the dotted-case the common one if that's what the default
scripts will be using?

If 'x' contains @loads, they get loaded immediately during the scan of 'x'
and set some flag to indicate we should go back to iterating from the
beginning of the current pool of @adds-to-evaluate to re-check dependencies.

If 'x' contains @adds, then those are added to the end of the pool of
@adds-to-evaluate (we should be somewhere in the middle of iterating over
the pool at this point).

Sounds good. I can see some quite hard to comprehend dependenices
showing up, like globals defined in one script and tests for with
@ifdef in another. But I think that's fine, the advantages here weigh
more than the opportunity for someone to shoot himself into the foot. :slight_smile:

And there's only an "@load z", creating a dependency chain. To resolve
those cases, we can take the same approach I describe above and reset

Sounds good as well.

Let's just make sure to document this all very well. :slight_smile:

Robin

What if we always try the literal version first, before trying the
alternatives with dots replaced, .bro added, etc?

There could still be cases where things get mixed up but it looks
unlikley to become an actual problem.

Robin

> And actually I think the search orders listed above should be
> reversed to make the common case faster.

Isn't the dotted-case the common one if that's what the default
scripts will be using?

The dotted format is what's used most often in the @load, but
usually not what the literal filename/path is going to be.

@load frameworks.software.base

would do best to look in BROPATH for "frameworks/software/base.bro"
first instead of the literal "frameworks.software.base" filename.

- Jon

Got it. But actually I don't think it matters much in terms of
performance, let's do whatever gives us the best semantics.

Robin

What if we always try the literal version first, before trying the
alternatives with dots replaced, .bro added, etc?

Ok, let me try again. So there's a file in BROPATH with the name

    foo.bar/foo/bar.blub.something.bro

To load that in a script you do

    @load foo.bar/foo/bar.blub.something

And then starting with the literal version it searches BROPATH for

1) foo.bar/foo/bar.blub.something
2) foo.bar/foo/bar.blub.something.bro
3) foo/bar/foo/bar/blub/something
4) foo/bar/foo/bar/blub/something.bro

We never make it past #2 because that's the right name.

But from a new user's perspective, it's not intuitive to try the
literal version because everywhere else, the convention is to reference
scripts in packages in the dotted format, which would be:

   @load foo.bar.foo.bar.blub.something

and that would search BROPATH for

1) foo.bar.foo.bar.blub.something
2) foo.bar.foo.bar.blub.something.bro
3) foo/bar/foo/bar/blub/something
4) foo/bar/foo/bar/blub/something.bro

And none of those are right.

- Jon

    @load foo.bar/foo/bar.blub.something

1) foo.bar/foo/bar.blub.something
2) foo.bar/foo/bar.blub.something.bro
3) foo/bar/foo/bar/blub/something
4) foo/bar/foo/bar/blub/something.bro

We never make it past #2 because that's the right name.

Right.

But from a new user's perspective, it's not intuitive to try the
literal version because everywhere else, the convention is to reference
scripts in packages in the dotted format, which would be:

   @load foo.bar.foo.bar.blub.something

and that would search BROPATH for

1) foo.bar.foo.bar.blub.something
2) foo.bar.foo.bar.blub.something.bro
3) foo/bar/foo/bar/blub/something
4) foo/bar/foo/bar/blub/something.bro

And none of those are right.

Yeah, but I at least wouldn't expect anything else than the literal
path to work here for foo.bar/foo/bar.blub.something anyway; not sure
that's really unintuitive, it's messy to begin with already. :slight_smile:

How about one more tweak to the rules to make this explict: if there's
at least one slash in a name, we don't do any dot-to-slash
substiations at all. I.e., if you use use actual paths, you need to
point to literal files names. If you use dot-style, we'll translate it
into paths for you. Does that make sense?

Robin

Yeah; was thinking that @require could evaluate after all @load's have been completed, very similar to @load x when y.

More relevant, though: I'm kind of worried that @load x when y would break logical encapsulation a little bit (since it seems like 'load x when y' essentially means that something other than x needs to be aware of exactly what x needs to load in order to run).

I guess that's a matter of preference, though.

--Gilbert

Yeah, but I at least wouldn't expect anything else than the literal
path to work here for foo.bar/foo/bar.blub.something anyway; not sure
that's really unintuitive, it's messy to begin with already. :slight_smile:

Yeah, good point. I think e.g. python doesn't even really handle crazy
stuff like that well, so the answer of "don't try naming packages and
scripts with dots" is probably ok.

How about one more tweak to the rules to make this explict: if there's
at least one slash in a name, we don't do any dot-to-slash
substiations at all. I.e., if you use use actual paths, you need to
point to literal files names. If you use dot-style, we'll translate it
into paths for you. Does that make sense?

Yes, that seems right to me.

- Jon

Dipping into this thread (so maybe missing important context - apologies
if so), this is seeming pretty weird:

@load frameworks.software.base

would do best to look in BROPATH for "frameworks/software/base.bro"
first instead of the literal "frameworks.software.base" filename.

Why isn't the answer that this should just be

  @load frameworks/software/base

? What's so cool about using '.'s instead of '/'s that it's worth
conflicting with the user's file-system mental model?

    Vern

More relevant, though: I'm kind of worried that @load x when y would
break logical encapsulation a little bit (since it seems like 'load x
when y' essentially means that something other than x needs to be
aware of exactly what x needs to load in order to run).

"@add x when y" isn't actually saying that y is required for x to work
(I guess generally it will be), but rather that there's a condition
that must be met before loading 'x'.

But I think you're right that, for how we'd use it, it's weird for
'x' itself to not be maintaining the conditions that are required to
load it. It's going to be hard to maintain "@add x when y" statements
because you have to look whether you need to change any whenever you
change what 'x' is loading.

Let's step back for a second, I'm not seeing why we're coming up with
such complex solutions anymore.

The problem we're trying to solve is how to best organize the packages
under protocol/ such that there's a way to load them without a possibly
unwanted side-effect of loading an entire package under frameworks/
because one script happens to require it (is that right Seth?).

e.g. we want "@load protocols/http" to avoid loading frameworks/software.

Why don't we just write the __load__.bro manifest of the package to not
@load scripts that may have unwanted side effects? The __load__.bro that
gets loaded implicitly is already provided only for convenience and may
not actually load the entire contents of the package.

I think a user should have three choices:

1) pick & choose exactly what scripts you want from a package
2) rely on a package's __load__.bro to pick a minimal set of scripts
   from the package that best reflect its core functionality
   e.g. via "@load protocols/http"
3) the "I don't care how much extra stuff gets loaded, give me everything
   in the package" approach via "@load protocols/http/*" to recursively
   load all files that have a .bro extension (skipping __load__.bro)

Only #3 is new functionality. These choices make it quite clear to the
user what to expect, while the @add/when conditional loading stuff is going
to appear a bit magical and hard to troubleshoot.

- Jon

? What's so cool about using '.'s instead of '/'s that it's worth
conflicting with the user's file-system mental model?

Familiarity with other languages (e.g. python modules) is the main
reason. I'm on the fence as to whether the benefit of that is
worth it.

- Jon

I'm honestly not too sure how the new scripting stuff works, exactly, so forgive my ignorance. I'm also not a language design guy.

That being said, I disagree that the file-system mental model should be applied to a scripting language that uses packages / modules. The idea is that a package is meant to abstract away certain path details that are inherent to things like #include; instead of worrying about files, you worry about code constructs.

For example, bro.http.headers may not exist as a file on its own; instead, bro.http.headers may exist in "./bro/http/extensions.bro" As such, I see a huge difference between @load a/bro/module and @load a.bro.module: one loads a file, and the other loads a construct. As it happens, if code is written well, there's a lot of overlap there. . . but there's still a big conceptual difference.

I think I'd tentatively argue that, if '.' and '/' are completely interchangeable (and *must* remain that way), bro scripts haven't yet gotten to the point where they should worry about using '.' to load stuff. If anything, I think folks trying to learn bro script with python / java / etc. experience could find the syntax deceptive: "I don't think that . means what you think it means."

Just my $0.02, though.

--Gilbert