Thoughts on documenting scripts

I've added a page on how we might use Sphinx to auto-generate Bro
script documentation to the projects list, see

    http://bro.icir.org/devel/projects/autodoc.html

This includes a mock-up of how Sphinx input and output could look
like.

What do you guys think?

Robin

What do you guys think?

Very nice, I like the overall setup, it looks like a smooth way to
automate the documentation generation process for Bro scripts! A few
comments:

- How do you define a Public Interface? It seems to me that the sections
  before are also customizable by the user and thus constitute part of
  the "public" part of the script.

- The role of DPD is not entirely clear from the HTML in the sections
  Packet Filter and Port Analysis. It reads as if FTP is only invoked
  for 21/tcp. How is that supposed to be read?

- I am not sure if this would happen already, but it would be great if
  external types, such as the connection record, would also be links in
  the HTML generated by Sphinx.

- I feels a bit too much to repeat the word Param in front of every
  event parameter. What about writing 'Parameters' once and then listing
  all parameters in bullet-point style (or just as is, in bold and then
  the description).

- Have you considered only using the namespace prefix (FTP::slight_smile: for
  handlers that are not part of the module and omitting it for functions
  and events that are inside the module?

- In addition to the included children, it would also be nice to create
  a list of parents, i.e., the scripts that include the script whose
  documentation is currently being read.

   Matthias

- How do you define a Public Interface?

I'm thinking whatever the "export section" provides. The earlier
sections are things the script configures, but they are part of
*another* script's interface.

- The role of DPD is not entirely clear from the HTML in the sections
  Packet Filter and Port Analysis. It reads as if FTP is only invoked
  for 21/tcp. How is that supposed to be read?

This is not for DPD, but for the port-based detection. The
interpretation is: everything on port 21 will always be analyzed as
FTP (while all other ports need a matching DPD pattern first).

But there are probably nicer ways of phrasing this in the generated
documentation that in my example.

- I am not sure if this would happen already, but it would be great if
  external types, such as the connection record, would also be links in
  the HTML generated by Sphinx.

Yes, that's the plan. Everything would be extensively cross-linked,
and in this case the link would go to bro.init's apge (which is a
special case however, which we'll probably need to handle separately
in some form).

- I feels a bit too much to repeat the word Param in front of every
  event parameter.

I totally agree. I had earlier suggested to do this implicitly,
e.g., instead of ":param id: bla bla", write just "id: bla bla" and
do some smart pattern matching to recognize that it's actually a
parameter description.

I'm still in favor of doing that but didn't want to load too many
details onto the proposal.

What about writing 'Parameters' once and then listing all parameters
in bullet-point style (or just as is, in bold and then the
description).

... and that would be another good option of doing it.

- Have you considered only using the namespace prefix (FTP::slight_smile: for
  handlers that are not part of the module and omitting it for functions
  and events that are inside the module?

Yeah, I was unsure what's the best way for giving the namespaces.
This could be indeed a good compromise.

- In addition to the included children, it would also be nice to create
  a list of parents, i.e., the scripts that include the script whose
  documentation is currently being read.

I like that. However, Sphinx doesn't provide that directly so we'd
need to assemble that information somehow (should be possible)

Robin

BTW, how does this all fit with the one language to rule them all goal, i.e. ReST? Or is rest just for the web site, manual and Bro book?

Very nice, I like the overall setup, it looks like a smooth way to
automate the documentation generation process for Bro scripts! A few
comments:

BTW, how does this all fit with the one language to rule them all goal, i.e. ReST? Or is rest just for the web site, manual and Bro book?

Sphinx is based on ReST. It defines additional rest directive to make
writing code documentation easier. But the markup in the docstrings is
ReST.

cu
gregor

- I feels a bit too much to repeat the word Param in front of every
  event parameter.

I totally agree.
[cut]

The 'param' in fron of every parameter also looks strange in the HTML
output.
But I also see, that having to put it in front of every parameter might
be cumbersome, so I also like Robins idea here (note, that I was the one
who said, we should put a directive in front of every parameter, but now
I think that either way is fine).

It might also be nice to repeat the parameters type in the HTML output.
This way I can look at just one place and don't have to look back at the
event/function definition to find the type. But YMMV.

What about writing 'Parameters' once and then listing all parameters
in bullet-point style (or just as is, in bold and then the
description).

... and that would be another good option of doing it.

I think that would be the way to go for HTML output

cu
Gregor

Hi,

I've been thinking about documenting the public interface VS.
documenting everything. It might be helpful to have a cross referenced
documentation of everything if somebody wants to dig deeper.
How this the following sound to you:

* Per default only extract documentation for the public interface
  (i.e., the export section plus some globals)
* However, also document the internal workings (which IMHO is good
  style anyway)
* Have an option to extract and generate the full documentation. (We
  probably won't ship this documentation, but people who want to hack
  around in scripts can generate it for themselves)

* This raises the question of documenting generated events VS. event
  handlers. I guess that bro should be able to distinguish them, since
  the definition of policy-generated event is presumably followed by a
  semicolon and no body while a handler always has a body, right?
  (Or could one define a policy-generated event with a header?)
  If bro can make that distinction, then it should be able to separate
  handlers from event definitions.

cu
gregor

* However, also document the internal workings (which IMHO is good
  style anyway)

It's good style but I'd prefer to leave it informal how to do that.
Enforcing consistent documentation of script internals increases the
burden on the scritp writer quite a bit, without that much benefit
actually (because often one will end up just reading the script code
anyway; there's no better documention then that.)

* This raises the question of documenting generated events VS. event
  handlers. I guess that bro should be able to distinguish them, since
  the definition of policy-generated event is presumably followed by a
  semicolon and no body while a handler always has a body, right?

The easier way would be to look for uses of the "event" statement;
that directly shows what events are raised. Independent of the
question whether to generate docs for internals, I can see
summarizing for which events a script has handlers, and which events
it generates. That's quite interesting information as it introduces
dependencies with other scripts.

Robin

One question I just came across was how to document the values for an enum. My example is below, I think it would be nice to either link to the potential values for that enum or include them inline (I think I prefer inline).

  # Enable and disable the log. If you only want logs for certain transactions,
  # that can be configured with this variable as well.
  const logging: Directions = LocalHosts &redef;

In another file, the Directions enum might be:
  type Directions: enum{Inbound, Outbound, Enabled, Disabled};

It's a slightly convoluted example because the code doesn't exactly match the implementation that I currently have in my logging framework but it doesn't really matter since that will be changing anyway.

  .Seth

I'm not quite sure which alternatives you describe here, but my
thiking would be to document (1) the enum tyoe itself, and then (2)
the various values indvidually, e.g.:

# Defines the alternatives for XYZ.
type Directions: enum {

    ## Description for inbound.
    Inbound,

    ## Description for outbound.
    Outbound

    ...
};

Is that what you mean by "inline"?

Robin

I meant inline in the output documentation as opposed to making you click a link to see the potential values for the enum. Does that make sense?

Maybe in the output documentation, each enum value that is displayed could be a link to the documentation for the specific value, I hadn't even considered that yet.

  .Seth

Ah, I misunderstood, sorry. Yes, inline then sounds fine to me.

Robin