[Bro-Commits] [git/bro] topic/johanna/file-analysis-x509: Change x509 log - now certificates are only logged once per hour. (0d50b8b)

Can you elaborate? I sense an opportuntity to improve our API. :slight_smile:

Robin

I think he was running into the interplay between script land and the core. Actually, I think that for the SSL/TLS analyzer, this is one of the times we don't need a file id generated in script land. That's probably a better choice in this case.

  .Seth

That might be true, I am not entirely sure.

What I did was to call…

file_mgr->DataIn(reinterpret_cast<const u_char*>(cert.data()), cert.length(),
    bro_analyzer()->GetAnalyzerTag(), bro_analyzer()->Conn(), ${rec.is_orig});
file_mgr->EndOfFile(bro_analyzer()->GetAnalyzerTag(), bro_analyzer()->Conn(), ${rec.is_orig});

in exactly this order (so - directly following each other). Which does not work.

Changing it to...

string fid = file_mgr->DataIn(reinterpret_cast<const u_char*>(cert.data()), cert.length(),
    bro_analyzer()->GetAnalyzerTag(), bro_analyzer()->Conn(), ${rec.is_orig});
file_mgr->EndOfFile(fid);

makes it work perfectly.

Thinking about it, it makes sense that this might be due to the interplay of the script land
file ID generation and the core file ID. However… just looking at the code, it still
seems a bit non-intuitive that the first version does not work, and the second
version does.

I also do not really think this is sufficiently documented in the comments of
Manager.h. This basically is not mentioned at all there…

I have no idea if that is just me that got a wrong impression that this should work
when looking at the Interface. But - I could see other people perhaps making the same mistake.

Johanna

What I did was to call…

file_mgr->DataIn(reinterpret_cast<const u_char*>(cert.data()), cert.length(),
   bro_analyzer()->GetAnalyzerTag(), bro_analyzer()->Conn(), \{rec\.is\_orig\}\); file\_mgr\-&gt;EndOfFile\(bro\_analyzer\(\)\-&gt;GetAnalyzerTag\(\), bro\_analyzer\(\)\-&gt;Conn\(\), {rec.is_orig});

in exactly this order (so - directly following each other). Which does not work.

It think it should work provided that matching file handles are generated at the script-layer for this type of file. (not sure whether they are in this case, didn’t check)

I also do not really think this is sufficiently documented in the comments of
Manager.h. This basically is not mentioned at all there…

Yeah, it should probably at least link to [1] at least once. Do you think it would help to link to that in each method where it matters?

[1] http://www.bro.org/development/howtos/file-analysis-file-id.html

- Jon

file_mgr->DataIn(reinterpret_cast<const u_char*>(cert.data()), cert.length(),
  bro_analyzer()->GetAnalyzerTag(), bro_analyzer()->Conn(), \{rec\.is\_orig\}\); file\_mgr\-&gt;EndOfFile\(bro\_analyzer\(\)\-&gt;GetAnalyzerTag\(\), bro\_analyzer\(\)\-&gt;Conn\(\), {rec.is_orig});

in exactly this order (so - directly following each other). Which does not work.

It think it should work provided that matching file handles are generated at the script-layer for this type of file. (not sure whether they are in this case, didn’t check)

Well, file handles are generated at the script layer. The current code is…

function get_file_handle(c: connection, is_orig: bool): string
  {
  set_session(c);

  local depth: count;

  if ( is_orig )
    {
    depth = c$ssl$client_depth;
    ++c$ssl$client_depth;
    }
  else
    {
    depth = c$ssl$server_depth;
    ++c$ssl$server_depth;
    }

  return cat(Analyzer::ANALYZER_SSL, c$start_time, is_orig, id_string(c$id), depth);
  }

I have no idea if that is “matching” or not - in any case, the above C code does not work in combination with that
file handle generation. Thinking about it, it makes sense - the EndOfFile function probably calls get_file_handle again for
that connection, gets a different file handle back (because the counters are increased), and hence removal will not work.

But….

I also do not really think this is sufficiently documented in the comments of
Manager.h. This basically is not mentioned at all there…

Yeah, it should probably at least link to [1] at least once. Do you think it would help to link to that in each method where it matters?

[1] http://www.bro.org/development/howtos/file-analysis-file-id.html

…even after reading through the how to, I was not quite clear on the fact, that get_file_handle
has to always return the same value for the same file. Which is impossible to accomplish in cases
like this, because, several certificates are sent over a connection, and you do not get any further information
with the get_file_handle call. So - you more or less have to return differing IDs for everything.

Perhaps the EndOfFile functions should get some warning that details in which circumstances you can use
them (probably - static per-connection ID).

Also - it might be nice to throw some kind of reporter warning when EndOfFile is called for a file ID that
does not exists.

If I understood everything right, that should never happen during normal operations, should it?

Johanna

function get_file_handle(c: connection, is_orig: bool): string
  {
  set_session(c);

  local depth: count;

  if ( is_orig )
    {
    depth = c$ssl$client_depth;
    ++c$ssl$client_depth;
    }
  else
    {
    depth = c$ssl$server_depth;
    ++c$ssl$server_depth;
    }

  return cat(Analyzer::ANALYZER_SSL, c$start_time, is_orig, id_string(c$id), depth);
  }

I have no idea if that is “matching” or not - in any case, the above C code does not work in combination with that
file handle generation. Thinking about it, it makes sense - the EndOfFile function probably calls get_file_handle again for
that connection, gets a different file handle back (because the counters are increased), and hence removal will not work.

Yes, that looks like what would happen.

…even after reading through the how to, I was not quite clear on the fact, that get_file_handle
has to always return the same value for the same file. Which is impossible to accomplish in cases
like this, because, several certificates are sent over a connection, and you do not get any further information
with the get_file_handle call. So - you more or less have to return differing IDs for everything.

Instead of incrementing the depth in get_file_handle, it could increment it in a x509_certificate handler? That way the handle remains the same between the DataIn/EndOfFile pairs, but the next DataIn/EndOfFile will get a new file handle due to the incremented depth.

Not suggesting you change it from using the pre-computed file id method you currently have, just challenging the impossibility claim :). And I do agree the whole script-layer generation of file handle string is difficult to understand/use, but I don’t have any ideas at the moment for helping that...

Perhaps the EndOfFile functions should get some warning that details in which circumstances you can use
them (probably - static per-connection ID).

I don’t think there’s any limitation/circumstances unique to EndOfFile that differs from the other FAF APIs.

Also - it might be nice to throw some kind of reporter warning when EndOfFile is called for a file ID that
does not exists.

If I understood everything right, that should never happen during normal operations, should it?

It could be that the file timed out and was already cleaned up before a protocol analyzer called EndOfFile. I think it can also be the case that a protocol analyzer only ever calls EndOfFile because a connection got closed/reset before any file data was seen.

- Jon

…even after reading through the how to, I was not quite clear on the fact, that get_file_handle
has to always return the same value for the same file. Which is impossible to accomplish in cases
like this, because, several certificates are sent over a connection, and you do not get any further information
with the get_file_handle call. So - you more or less have to return differing IDs for everything.

Instead of incrementing the depth in get_file_handle, it could increment it in a x509_certificate handler? That way the handle remains the same between the DataIn/EndOfFile pairs, but the next DataIn/EndOfFile will get a new file handle due to the incremented depth.

Not suggesting you change it from using the pre-computed file id method you currently have, just challenging the impossibility claim :). And I do agree the whole script-layer generation of file handle string is difficult to understand/use, but I don’t have any ideas at the moment for helping that...

That might work, yes… but..

If I do several DataIn -> EndOfFile -> DataIn -> EndOfFile calls in rapid succession (without anything in-between), as I am doing at the moment, is is guaranteed that the events that are raised by the FaF in an EndOfFile call are processed by scriptland before the (directly following) next DataIn call?

As I understood things so far, the event queue cannot / will not be deleted in-between these calls. But I might be wrong.

Johanna

If you are using a pre-computed file-id, then no. If you aren’t, then yes, it flushes the event queue every time it needs to call out to the script-layer in order to generate a file-id.

- Jon