Reading through this thread, I'm wondering if we should focus on
improving identification of log lines in terms of where they come from
and when they were generated, while keeping to go through the existing
mechanism of sending messages back to main process for output (so that
we don't need the mutex). If we sent timestamps & thread IDs along
with the Debug() messages, one could later post-process debug.log to,
get things sorted/split as desired.
This wouldn't support the use case of "millions of lines" very well,
but I'm not convinced that's what we should be designing this for. A
mutex becomes potentially problematic at that volume as well, and it
also seems like a rare use case to begin with. In cases where it's
really needed, a local patch to get logs into files directly (as you
have done already) might just do the trick, no?
Robin
We could definitely change DebugLogger to improve the log line identification, and route it through the threading framework’s Debug() call. That will avoid turning debug.log into "word salad”.
However, that would also cause a delay in writing the log lines, and I've run into situations working on Zeek where that kind of delay would make debugging harder.
For example, sometimes I run tail on the log file in a terminal window. Then, when the code hits a breakpoint in a debugger, I can analyze the program state by looking at log lines emitted right before the breakpoint triggers, and compare them to variable contents, the stack trace, etc. That won't work if logging is delayed.
There are multiple, conflicting use cases for logging in Zeek. Sometimes a developer might think:
- Maximized throughput is important, but a delay is okay
- No delay can be tolerated, but slower throughput is okay
- Correct temporal ordering in the log is (or isn’t) important
- fflush() after every write is (or isn’t) important
- Debug logging output should go to the debug.log file, or stdout, or somewhere else
This is a pretty common situation around logging, in my experience.
One way to solve it, as Robin says, is for a developer with a use case Zeek doesn't support to apply a temporary local patch. Unfortunately, that doesn't help other developers who might have the same use case. Also, I personally hate to spend time writing code and getting it to work well, and then throw it away.
On other projects, I've used a different approach that's worked really well: use a single, common logging API, but let it send its output to different output mechanisms that support different use cases. Then a developer could pick the output mechanism that works best for their use case at runtime, using a command line option or environment variable. I think it wouldn’t be very complicated to add that to Zeek.
- Bob