Discussion: the guidance we want to give to package authors on the tags they assign

I want to start a discussion here of the guidance we want to give to package authors on the tags they assign in zkg.meta, to ensure people have a chance to chime in, and we start-out with the benefit of multi-perspective group process, so we reach for the best result.

My proposal is just to articulate principles for good tag selection, to rein in the existing scattershot we've seen so far, by giving the authors guidance on what we want to see. I think we need to do this, to counteract that nearly everyone takes their guidance from what they see the people before them have done. If bad habits occurred and are allowed to persist, people will dutifully adopt those bad habits.

I posit that: the ideal set of tags will provide matches with queries of the form: "Has a plugin for X already been coded?" And also matches with some of the relevant queries for: "What other plugins have been coded for aspect Y?" Find the words by filling in the sentences: "I implemented X." and "I implemented an instance of Y." For Y, use the plural (indicators, scanners, scripts) except when only the singular makes sense.

Use the hyphen where punctuation is needed. Never use underscore.

Don't add "analyzer" nor "protocol" nor "plugin" as a suffix.

Don't mention bro or zeek. These are all Bro/zeek analyzers and plugins.

The ideal set of tags can also include one that is perhaps unique to this package (but not four or five that are unique to this package). This is as a moniker, so that saying "go look at fizzamajig" should lead, by following the fizzamajig tag, to what you intended the listener to see.

Conversely avoid banal tags. If you write a piece of software, nonetheless "a", "piece", "of", and "software" are all bad tags.

Capital letters should be a rarity, i.e. in DoS because dos to many eyes, immediately connotes a pre-Windows Microsoft operating system. att&ck is fine punctuated that way, and PostgreSQL and all the CVE are reasonable to capitalize. SSL, TLS, TCP, PKI, UPnP, and EternalBlue are stalking-horses, to consider, while we reach consensus, whether we are better off just lowercasing where the capitalization is not essential. If in doubt, just use alllowercase. Tags function quite well in alllowercase, and that is what most people have done.

If anyone uses the hyphen-form for a word, then everyone shall use the hyphen-form for consistency. It does often increase readability, and is a small price for the increase of understanding in the portion of our community which it benefits.

Anyone who disagrees with any of these details, PLEASE do chime in as I only seek that we we reach for the best result, not that we we reach for my idea of what the best result is.

Anyone who has additional heuristics of goodness to add, also chime in with them. We'll probably, after consensus, enact change by sending some PRs to a few packages to unify them more. I did a sort of census last evening. Of 273 tags used, I would banish 51 of them, and revise the punctuation or capitalization of 15 others.
      - Duffy O'Craven

Sounds good to me, unifying tags makes sense. The one thing I'd add is
a selection of standardized tags for general categorization, along the
lines of the existing: https://docs.zeek.org/projects/package-manager/en/stable/package.html#suggested-tags

Probably best to start with extending that section with the guidelines
you propose before approaching package authors with individual PRs.
That way, there'll be something to point them to.

Robin

I'm wondering if we simultaneously want to strongly guide the package name capitalization, where for instance I see in the output of
/usr/local/zeek/bin/zeek -N | grep -i net
Zeek::Login - Telnet/Rsh/Rlogin analyzers (built-in)
Zeek::NetBIOS - NetBIOS analyzer support (built-in)
Zeek::BACNET - Bacnet Protocol analyzer (dynamic, no version information)

With the name as BACNET, and the description as Bacnet, the module name is ALL-CAPS, and the description is Initial-Cap. Conversely it is BACnet with precisely that capitalization that is the only usage which is standard within its own industry. https://github.com/amzn/zeek-plugin-bacnet/issues/9
I don't want to be guiding the zeek-plugin-bacnet to go opposite to what we are about to recommend universally.

None of the built-in module names output from zeek -N are lowercase, but almost all the tags are (or use their branded-case, such as PostgreSQL). The behavior documented in section 3.2.2. aliases field (..the way this field operates is that, for each alias, it simply creates a symlink of the same name within the directory associated with the script_dir path) means that the zkg load argument, and zeek script @load argument probably can't be case-insensitive, though any desired variants can be specified via the aliases.

Short postscript here, because I just learned there is more than one namespace involved, and that config.name in plugin.cc is probably where zeek -N picks up the string that it utilizes:
The code has:

namespace plugin {
    namespace Zeek_BACNET {
        Plugin plugin;
        }
    }
using namespace plugin::Zeek_BACNET;

plugin::Configuration Plugin::Configure() {
    AddComponent(new ::analyzer::Component("BACNET", ::analyzer::bacnet::BACNET_Analyzer::Instantiate));
    plugin::Configuration config;
    config.name = "Zeek::BACNET";
    config.description = "Bacnet Protocol analyzer";
    return config;
    }

../../zeek/bin/zeek -NN | grep -i net
output here, showing existing variation, so if there are nuances in points to be discussed, we have actual examples to compare and contrast.

    [Event] irc_network_info
Zeek::Login - Telnet/Rsh/Rlogin analyzers (built-in)
    [Analyzer] Telnet (ANALYZER_TELNET, enabled)
Zeek::NetBIOS - NetBIOS analyzer support (built-in)
    [Analyzer] Contents_NetbiosSSN (enabled)
    [Analyzer] NetbiosSSN (ANALYZER_NETBIOSSSN, enabled)
    [Event] netbios_session_message
    [Event] netbios_session_request
    [Event] netbios_session_accepted
    [Event] netbios_session_rejected
    [Event] netbios_session_raw_message
    [Event] netbios_session_ret_arg_resp
    [Event] netbios_session_keepalive
    [Function] decode_netbios_name
    [Function] decode_netbios_name_type
    [Event] rdp_client_network_data
Zeek::BACNET - Bacnet Protocol analyzer (dynamic, no version information)
    [Analyzer] BACNET (ANALYZER_BACNET, enabled)
    [Event] bacnet

This kind of lines up with the templating work that I’ve been working on. First with binpac_quickstart, I just made the name uppercase: https://github.com/grigorescu/binpac_quickstart/blob/master/templates/plugin_cc.jinja2#L8

The new version, CookieCutter (https://github.com/esnet/cookiecutter-zeekpackage), is a bit more intelligent. Uses the pretty name first, then slugifies it when needed.

“project_name”: “Evil Bit Checker”,
“project_slug”: “{{ cookiecutter.project_name.lower().replace(’ ', ‘‘).replace(’-', '’) }}”,

“project_short_description”: “Checks for the evil bit being set”,

I want to be careful to not add any extra hoops for the developer, but to make reasonable defaults be the easiest path forward, in the hopes that more people will choose those.

–Vlad