Hey folks:
I'm working on building threading into the logging framework in parallel with the work I'm doing on DataSeries. I believe I have a plan, and was hoping to get some input from other folks on the list.
At the moment, a log message:
*) Is generated deep within Bro, eventually finding its way to LogMgr::Write
*) In LogMgr::Write, the following happens:
> Checks that an appropriate LogMgr::Stream exists for the writer
> Checks that any relevant LogWriter has been properly initialized
> Applies any necessary filters
*) The log message is turned into a LogVal **
> In the case of a remote filter, the LogVal ** is spirited away to serializations unknown
> In the case of a local filter, the LogVal ** is passed along to the appropriate LogWriter::Write for processing.
So, to change this to support threading, I was planning to turn LogMgr::Stream into a self-contained object with two 0mq message-passing sockets attached:
*) One write-only (PUSH) 0mq socket created by the parent LogMgr, that could be used to send messages to the Stream object (for example, something like LOG_WRITE)
*) One read-only (PULL) 0mq socket created by the child Stream object, which would be used to receive the messages the LogMgr sent.
After the LogMgr::Stream was created, the PUSH 0mq socket would be the only means with which to communicate with it (in order to avoid needing evil things like semaphores / condition variables). This means that the LogMgr would need to generate and pass messages to the LogMgr::Stream object.
Thus, the LogWriter initialization bit and the LogWriter logging bit would both happen within the context of the Stream thread (and as a result of writing an appropriate message to the LogMgr::Stream's PUSH socket).
I figure the LogMgr would need to be able to generate (at a minimum) the following types of messages:
*) EnableStream
*) DisableStream
*) StreamInit
*) StreamFinish
*) RotateLog
*) LogMessage
(Note: as a shortcut, we could probably build a fast-track LogMessageInProc type that passed a pointer to the data to log, rather than encapsulating everything when passing within a single process... but I figure that's an optimization, and could probably be dealt with later if it proves to be necessary).
Anyway, I figure that, after this point, we'd be close to having an entirely self-contained logging infrastructure; so long as the message format was standardized (and I do have a rough draft of a message format which I'd be happy to send out, assuming the above isn't too confusing and seems technically sound), anything that spoke the correct message format could act as a logger for Bro.
So. . . thoughts? Does that make sense? What's bad / broken about doing things this way?
Thanks,
Gilbert Clark