The input framework is about to shake up some assumptions that we make in the logging framework now. The assumption now is that our logs are dead and only write once then ignore. The input framework makes it possible to use logs as a persistent storage mechanism but in my opinion much better than the old &persistent attribute because it gives us a way to provide an interface between the storage mechanism (database, log file, etc) and the way the data is stored in Bro. It gets around the problems I've had with iterative development of Bro scripts and safety of maintaining state which led me to avoid &persistent like the plague.
I don't like the model of keeping persistent state (that's how I'll refer to files/databases used with the input framework and logging framework) in the CWD in a hidden .state directory either. Ultimately that ends up putting persistent state in the spool/ directory when using broctl which seems very wrong and unsafe to me since "spool" implies log spooling intended for eventual rotation.
I propose we add another field to logging filters which indicates what "type" of log is being written. The default type could be "LOG" which would do the normal rotation and write to the normal logging location (whatever that means for the plugin being used for writing). Optionally we could use the "PERSISTENT_STATE" type (better names?) which would store to whatever output plugin is configured for the filter in a more appropriate location and not do the normal file rotation and other log maintenance.
Ultimately being able to store persistent state with a script level defined interface on how to write to the store and read from it using the logging framework and input framework we could pull off a lot of stuff that is now either difficult or impossible.
Thoughts on changes to the logging framework to fit this model better?
Yeah, I agree we need some different model here. What I'm not sure is
that hardcoding use-cases (like LOG and PERSISTENT_STATE) is the best
way to go. How far would it get us if we could just specify
destination paths outside of the spool directory somehow? (We can
already disable rotation etc. with the current filters.).
I would suggest that, rather than trying to attach this functionality to the logging framework, we wrap the input / output frameworks into a single unified set of script functions that handles generic K/V data I/O, e.g.:
Unfortunately, logging currently only does create, and input only does read, so we have update and delete to somehow work into the mix. For store operations, I believe those two operations are relatively important. Since not every log writer is suited for CRUD, it might make sense to support C & R on all formats, and only support U & D on those formats meant to be used for store operations (e.g. redis, memcache, whatever).
Anyway, I'd imagine the logging framework would need support for U & D thrown in to really make this model work well. We'd also need to ensure that ds_update / ds_delete failed in a sane way for formats on which it was not supported.
unified set of script functions that handles generic K/V data I/O, e.g.:
I've been thinking about this a lot too and we may be getting to that point before long.
Unfortunately, logging currently only does create, and input only does
read, so we have update and delete to somehow work into the mix.
I'm not really sure that update and delete can even fit into the model generally as it is. It may turn out that a future IO framework might consume the input framework but the logging framework would remain standalone although it would probably be able to reuse code internally from the IO framework.
Talked with Seth about this yesterday: I don't really want to create
an indendepent persistence framework in parallel but I believe that we
could create a key/value interface on top of the current input/output
framework, potentially even purely in script-land.