The fact that RemoteSerializer and broker::Manager are calling
different Write() functions seems to be a broader issue: we get
different semantics that way. For RemoteSerializer, log events and log
filters run only on the originating nodes; those guys make all
decisions about what's getting logged exactly and they then send that
on to the manager, which just writes out the data it receives. With
Broker, however, both events and filters run (also) on the manager, so
that it's making its own decisions on what to record. The filters can
be different on the manager, and they will have access to different
state.
I'm not sure what approach is better actually, I think the Broker
semantics can be both helpful and harmful, depending on use case. In
any case, it's a change in semantics compacted to the old
communication system, and I'm not sure we want that.
I'm wondering if there's a reason that in the Broker case things
*have* to be this way. Is there something that prevents the Broker
manager from doing the same as the RemoteSerializer?
The fact that RemoteSerializer and broker::Manager are calling
different Write() functions seems to be a broader issue: we get
different semantics that way. For RemoteSerializer, log events and log
filters run only on the originating nodes; those guys make all
decisions about what's getting logged exactly and they then send that
on to the manager, which just writes out the data it receives. With
Broker, however, both events and filters run (also) on the manager, so
that it's making its own decisions on what to record. The filters can
be different on the manager, and they will have access to different
state.
I'm not sure what approach is better actually, I think the Broker
semantics can be both helpful and harmful, depending on use case. In
any case, it's a change in semantics compacted to the old
communication system, and I'm not sure we want that.
I think we want the old behavior for 2 reasons:
1. The workers only send the &log fields to the managers, so the events are raised with half the fields missing.
2. Having the logger node be as much of a dumb byte mover as possible is best for performance reasons. Having the log events and log filters run on the workers lets that functionality scale out across the nodes. Especially if a filter is used that would remove a large percent of the entries.
If someone really wanted the log_* events to run on the manager, they could redef Cluster::worker2manager_events right?
I'm wondering if there's a reason that in the Broker case things
*have* to be this way. Is there something that prevents the Broker
manager from doing the same as the RemoteSerializer?
Jon would know best, but I'd guess one form was more convenient to use than the other and it may have been assumed that they both did the same thing.
Having the logger node be as much of a dumb byte mover as possible is best for performance reasons. Having the log events and log filters run on the workers lets that functionality scale out across the nodes. Especially if a filter is used that would remove a large percent of the entries.
This.
Especially over time as we see more and more cores per processor, it’s best to distribute the processing load. By putting the filter in the logger, the logger will then need to enter the interpreter for each log message to determine if it needs to throw away data it just received. That’s expensive and limits scalability on multiple fronts.
I'm wondering if there's a reason that in the Broker case things
*have* to be this way. Is there something that prevents the Broker
manager from doing the same as the RemoteSerializer?
Some background: when Broker sends to a log topic, the message has the
structure of a pair (id, (x, y, z, ..)) where id is an enum with the log
stream name and (x, y, z, ...) a record of log columns. Therefore, in
broker::Manager::Process() where messages are parsed and dispatched, the
log messages go into logging::Manager::Write(EnumVal*, RecordVal*).
Such messages get created via broker::Manager::Log(EnumVal*, RecordVal*,
RecordType*). The only caller of this function is logging::Manager.
Purely from an API perspective, could we just move the call from one
Write() function to the other?
I think the answer is yes, but I've looked at bit more at the code and
I think I see where the challenge is: that 2nd Write() method (the one
the RemoteSerializer is using to output already filtered logs) takes
threading::Values, not Vals. That means switching over from one
Write() to the other isn't straight-forward because we don't have code
that sends threading::Values over the Broker connection. We could
convert the Val's into threading::Values once received, but that'd be
kind of odd:: I'm pretty sure the distinction was due to
threading::Values being quite a bit more efficient to send.
It should be pretty straight-forward to add the necessary
threading::Value-to-Broker conversion code (just a bit tedious :).
I'll look into that.
I think I was aware of the differences and went ahead with that approach because there's the extra technical work of writing code to convert value types as Robin mentions and also it's conceptually more flexible than the old way.
I understand the argument that the old semantics (manager not running log events/filters) may be more performant, though, I’d consider whether the internal comm. framework or the base/user scripts should be the one to decide.
I think the later is better, so the problem breaks down into (1) does the user have the ability to fully control whether log events/filters run on any given node via scripts? and (2) are the default settings/scripts sane for the common use-case?
(1) is likely true, so (2) sounds like it needs to be fixed.
Just a different idea on how to approach solving the issue without having to touch the framework's internals. (it’s been a while, hope it’s not way off base)
I agree that generally it'd be nice to be able to do it either way.
However, I'm pretty sure at this point that we need the separate
high-performance path that the old communication introduced, for the
reasons discussed in this thread and also for consistency. I'm working
on adding that code, and I think it should be the standard model, just
as it is currently. In addition to that, one can always send log_*
events around and then do custom processing on the receiving side.
That's not quite as transparent as "normal" log messages would be with
their configuration and filters, but that might actually be a good
thing: if we actually had both mechanisms (sender- and receiver-side
filtering) built transparently into the logging framework, it could
end up being quite confusing what's used when.
I propose that for now we make Broker work like the current model, and
then we go from there. If we need more, we can add that later. The
less semantic differences we have between old and new communication,
the easier the switch will be.
I propose that for now we make Broker work like the current model, and
then we go from there. If we need more, we can add that later. The
less semantic differences we have between old and new communication,
the easier the switch will be.
Yeah, I agree that it should behave the same. Was just also trying to reframe the problem to give ideas for solutions that give back old semantics (i.e. via script-level mechanisms). No big deal, since there’s no demand for it.
In addition to that, one can always send log_*
events around and then do custom processing on the receiving side.
That's not quite as transparent as "normal" log messages would be with
their configuration and filters, but that might actually be a good
thing: if we actually had both mechanisms (sender- and receiver-side
filtering) built transparently into the logging framework, it could
end up being quite confusing what's used when.
It was actually always confusing to me that a remote log entry versus a local log entry would be processed differently regarding the log_* events. The event is a property of the Log::Stream definition and the logging API or docs don’t distinguish between outgoing versus incoming log entries there, or do they? Or is a Stream meant to be thought of from only the perspective of an outgoing sequence of log entries?
Could be I misunderstood the log framework the whole time, and that’s why broker behaved the way it did
I know, it's a bit confusing. Some of that is historic and part of
trying to maintain semantics as things were involving (both logging
framework and communication; quite similar actually to what we've been
discussing here: what should be done where). It all came out of
"remote printing" where a print-statement would just send what it
would normally print into a file, over to another node---that means
everything was fully procecessed already as it was received. The other
part is the performance optimization: special-casing log transmission
for batching and volume, so that it doesn't become a bottleneck.
Thinking about it as just outgoing entries fits its best I think. Onw
the receiving side, the entries don't "really" enter the flu logging
framework, they just take a fast path directly into the writers. One
thing I'm doing is renaming methods to make that bit clearer; the two
Write() methods are clearly misleading.